Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode (advanced mode)

Version 18

    Note - this post is relevant for versions older than 3.6.5000 and can also be used to explain advanced mode usage.

    For simple configuration of 3.6.5000 and above see Recommended Network Configuration Examples for RoCE Deployment

     

    This post provides a configuration example of lossless RoCE for MLNX-OS switches in DSCP-based QoS mode.

    For other configuration modes, see Getting Started with RoCE Configuration.

     

    References

     

     

    Overview

    This solution offers the following network setup:

     

    Configuration

    1. Enable ECN for RoCE traffic over traffic class 3.

    Traffic over DSCP 26 is mapped to traffic class 3 by default.

    switch (config) # interface ethernet 1/1-1/32 traffic-class 3 congestion-control ecn minimum-absolute 150 maximum-absolute 1500

    Note: If TCP traffic runs over other traffic class, it is recommended to configure ECN on it.

     

    2. Configure the buffer pool - allocating pool 0 for lossy traffic, and pool 1 for lossless RoCE traffic.

    Note: In this example, the shared space is split equally between the RoCE pool and non-RoCE pool. If the network has a different ratio of RoCE/non-RoCE traffic, the shared space can be divided accordingly.

    switch (config) # advanced buffer management force // required for version 3.6.5000 and above

    switch (config) # pool ePool1 size 16777000 type dynamic

    switch (config) # pool ePool0 size 5242880 type dynamic

    switch (config) # pool iPool1 size 5242880 type dynamic

    switch (config) # pool iPool0 size 5242880 type dynamic

     

    3. Bind the interfaces to switch-priority. Bind switch priorities 3 and 6 to ingress PG group 3 and 6.

    • Traffic over DSCP 26 is mapped to switch-priority 3 by default.
    • Traffic over DSCP 48 is mapped to switch-priority 6 by default.

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg6 bind switch-priority 6

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg3 bind switch-priority 3

     

    4. Map ingress/egress interface to pool configuration by allocating buffer to priority 3 and mapping it to a lossless pool, and allocating buffer to priority 6 and mapping it to a lossy pool.

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg3 map pool iPool1 type lossless reserved 67538 xoff 18432 xon 18432 shared alpha 2

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg6 map pool iPool0 type lossy reserved 10240 shared alpha 8

    switch (config) # interface ethernet 1/1-1/32 egress-buffer ePort.tc3 map pool ePool1 reserved 1500 shared alpha inf

     

    5. Set a strict priority to CNPs over traffic class 6.

    Traffic over DSCP 48 is mapped to switch-priority 6 by default.

    Note: In this example equal weighted round robin scheduling is used between RoCE and non-RoCE traffic (which is set according to switch defaults, hence no additional commands are required). If the network has a different ratio of RoCE/non-RoCE traffic, the round-robin weights can be set accordingly.

    switch (config) # interface ethernet 1/1-1/32 traffic-class 6 dcb ets strict

     

    6. Set trust mode L3 (DSCP).

    switch (config) # interface ethernet 1/1-1/32 qos trust L3

     

    7. Enable receive PFC on priority 3 on all ports.

    switch (config) # dcb priority-flow-control enable force

    switch (config) # dcb priority-flow-control priority 3 enable

    switch (config) # interface ethernet 1/1-1/32 dcb priority-flow-control mode on force

     

    8. [Optional] Enable DCBX LLDP.

    Note: This is required in case the adapter card relies on LLDP configuration in the switch for setting priority for PFC. See Lossless RoCE Configuration for Linux Drivers in DSCP-Based QoS Mode.

    switch (config) #lldp