Lossless RoCE Configuration for Linux Drivers in DSCP-Based QoS Mode

Version 17

    This post provides a Linux configuration example for enabling L3 priority (DSCP) based lossless RoCE traffic, when using Trust L3 in the switch configuration.

    For other RoCE Profile solutions, see Getting Started with RoCE Configuration.

     

    References

     

    Overview

    This solution involves a simple network setup and basic configuration on the adapter.

    • L3 priority 3 (DSCP) is enabled with DC-QCN (ECN) for RDMA traffic. Therefore, RDMA traffic should run ToS 106 (DSCP 26).
    • The configuration example uses DSCP 48 for CNP traffic.
    • VLANs are not mandatory in this solution.
    • PFC is enabled.
    • MLNX_OFED 4.1 or above should be installed on the server.
    • Switches should be enabled with ECN on priority 3. For example, Lossless RoCE Configuration for MLNX-OS switch in DSCP-based QoS mode.
    • Network switches are assumed to be configured with "Trust L3", meaning that traffic will be classified according to the DSCP priority to the right buffer/pool/queue. For more information, see Understanding QoS Classification (Trust) on Spectrum Switches.

     

    Some of the configuration steps below can either be done permanently or temporarily (can be kept until next boot).

    •      Permanent configuration -  device reset (mlxfwreset) or host reboot is required.
    •      Temporary configuration - takes immediate effect, but is erased in the next boot.

     

     

    Configuration

    1. Enable DCQCN on priority 3 (RoCE traffic).

    Non-volatile:

    # mlxconfig -d /dev/mst/<mst-device> -y s ROCE_CC_PRIO_MASK_P1=8 ROCE_CC_PRIO_MASK_P2=8

    OR

    Volatile:

    # echo 1 > /sys/class/net/<interface>/ecn/roce_np/enable/3

    # echo 1 > /sys/class/net/<interface>/ecn/roce_rp/enable/3

     

    2. Set trust DSCP mode.

    # echo dscp > /sys/class/net/<interface>/qos/trust

     

    For more information, see HowTo Configure Trust Mode on Mellanox Adapters.

     

    3. Set CNP DSCP to 48.

    Non-volatile:

    # mlxconfig -d /dev/mst/<mst-device> -y s CNP_DSCP_P1=48 CNP_DSCP_P2=48

    OR

    Volatile:

    # echo 48 > /sys/class/net/<interface>/ecn/roce_np/cnp_dscp

     

    4. [Optional] Enable ECN for TCP traffic.

    # sysctl -w net.ipv4.tcp_ecn=1

    net.ipv4.tcp_ecn = 1

    Note: This command is nonpersistent.

     

    4. Set the RoCE mode to V2 using RDMA-CM

    # cma_roce_mode -d mlx5_0 -p 1 -m 2

    For more details, see HowTo Set the Default RoCE Mode When Using RDMA CM.

     

    5. Set RoCE DSCP to 26 (ToS 106) using RDMA-CM.

    # cma_roce_tos -d mlx5_0 -t 106

    For more information, see HowTo Set Egress ToS/DSCP on RDMA-CM QPs.

     

    6.  Activate PFC on priority 3.

    Using mlnx_qos tool (non-volatile):

    # mlnx_qos -i <interface> --pfc 0,0,0,1,0,0,0,0

    For more information, see HowTo Configure PFC on ConnectX-4.

     

    OR

     

    Using LLDP DCBX, and configuration in the switch (non-volatile):

    # mlxconfig -d /dev/mst/mt4115_pciconf0 -y s LLDP_NB_DCBX_P1=TRUE LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2

    Note: This requires LLDP service to be enabled in the switch.

     

     

    OR

     

    Using LLDP DCBX, and the configuration in the switch (volatile).

    # service lldpad start

    # lldptool -T -i <interface_name> -V PFC -c willing=yes

    Note: This requires LLDP service to be enabled in the switch.

     

    For more information, see HowTo Auto-Config PFC and ETS on ConnectX-4 via LLDP DCBX.