Lossless RoCE Configuration for MLNX-OS Switches in DSCP-Based QoS Mode

Version 13

    This post provides a configuration example of lossless RoCE for MLNX-OS switches in DSCP-based QoS mode.

    For other configuration modes, see Getting Started with RoCE Configuration.

     

    References

     

     

    Overview

    This solution offers the following network setup:

     

    Configuration

    1. Enable ECN for RoCE traffic over traffic class 3.

    Traffic over DSCP 26 is mapped to traffic class 3 by default.

    switch (config) # interface ethernet 1/1-1/32 traffic-class 3 congestion-control ecn minimum-absolute 150 maximum-absolute 1500

    Note: If TCP traffic runs over other traffic class, it is recommended to configure ECN on it.

     

    2. Configure the buffer pool - allocating pool 0 for lossy traffic, and pool 1 for lossless RoCE traffic.

    Note: In this example, the shared space is split equally between the RoCE pool and non-RoCE pool. If the network has a different ratio of RoCE/non-RoCE traffic, the shared space can be divided accordingly.

    switch (config) # pool ePool1 direction egress-mc size 16777000 type dynamic

    switch (config) # pool ePool0 direction egress size 5242880 type dynamic

     

     

    switch (config) # pool iPool1 direction ingress size 5242880 type dynamic

    switch (config) # pool iPool0 direction ingress size 5242880 type dynamic

     

    3. Bind the interfaces to switch-priority. Bind switch priorities 3 and 6 to ingress PG group 3 and 6.

    • Traffic over DSCP 26 is mapped to switch-priority 3 by default.
    • Traffic over DSCP 48 is mapped to switch-priority 6 by default.

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg6 bind switch-priority 6

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg3 bind switch-priority 3

     

    4. Map ingress/egress interface to pool configuration by allocating buffer to priority 3 and mapping it to a lossless pool, and allocating buffer to priority 6 and mapping it to a lossy pool.

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg3 map pool iPool1 type lossless reserved 67538 xoff 18432 xon 18432 shared alpha 2

    switch (config) # interface ethernet 1/1-1/32 ingress-buffer iPort.pg6 map pool iPool0 type lossy reserved 10240 shared alpha 8

    switch (config) # interface ethernet 1/1-1/32 egress-buffer ePort.tc3 map pool ePool1 reserved 1500 shared alpha inf

     

    5. Set a strict priority to CNPs over traffic class 6.

    Traffic over DSCP 48 is mapped to switch-priority 6 by default.

    Note: In this example equal weighted round robin scheduling is used between RoCE and non-RoCE traffic (which is set according to switch defaults, hence no additional commands are required). If the network has a different ratio of RoCE/non-RoCE traffic, the round-robin weights can be set accordingly.

    switch (config) # interface ethernet 1/1-1/32 traffic-class 6 dcb ets strict

     

    6. Set trust mode L3 (DSCP).

    switch (config) # interface ethernet 1/1-1/32 qos trust L3

     

    7. Enable receive PFC on priority 3 on all ports.

    switch (config) # dcb priority-flow-control enable force

    switch (config) # dcb priority-flow-control priority 3 enable

    switch (config) # interface ethernet 1/1-1/32 dcb priority-flow-control mode on force

     

    8. [Optional] Enable DCBX LLDP.

    Note: This is required in case the adapter card relies on LLDP configuration in the switch for setting priority for PFC. See Lossless RoCE Configuration for Linux Drivers in DSCP-Based QoS Mode.

    switch (config) #lldp