RoCE Configuration for Mellanox Adapters (Profile 5)

Version 10

    The following post shows a configuration example for Linux to enable L3 priority (DSCP) based Lossless RoCE traffic, when using trust L3 in the switch configuration. This method described as Profile 5.

    For other RoCE Profile solutions see Getting Started with RoCE Configuration . The configuration example is using DSCP 26 (ToS 105) for RoCE traffic and DSCP 48 for CNP traffic.

     

    References

     

    Overview

    This solutions has the simple network assumption and basic configuration on the adapter.

    • L3 priority 3 (DSCP) is enabled with DC-QCN (ECN) for the RDMA traffic. Therefore RDMA traffic should run ToS 105 (DSCP 26).
    • The configuration example is using DSCP 48 for CNP traffic.
    • VLANs are not mandatory in this solution.
    • PFC is enabled.
    • MLNX_OFED 4.1 or above should be installed on the server.
    • The switches should be enabled with ECN on priority 3. For example, RoCE Configuration for Spectrum installed with MLNX-OS (Profile 5)
    • The network switches are assumed to be configured with "Trust L3"  which means that the traffic will be classified according to the DSCP priority to the right buffer/pool/queue. For more info, see Understanding QoS Classification (Trust) on Spectrum Switches .

     

    Some of the configuration below can be done in permanent or temporary configurations (which is kept until next boot) way. User can choose either.

    for permanent configuration after running mlxconfig a device reset (mlxfwreset) or host reboot is required.

     

    Configuration

    1. Set DSCP (L3) as trust mode for the NIC

    # echo dscp > /sys/class/net/<interface>/qos/trust

     

    For more info, see HowTo Configure Trust Mode on Mellanox Adapters.

     

    2. Enable DCQCN on priority 3 which is used for RoCE traffic.

    Option 1: Non-Volatile (Firmware configuration)

    # mlxconfig -d /dev/mst/<mst-device> -y s ROCE_CC_PRIO_MASK_P1=8 ROCE_CC_PRIO_MASK_P2=8

     

    Option 2: Volatile (driver configuration)

    # echo 1 > /sys/class/net/<interface>/ecn/roce_np/enable/3

    # echo 1 > /sys/class/net/<interface>/ecn/roce_rp/enable/3

     

    2. Set the CNP priority to 48

    Option 1: Non-Volatile (Firmware configuration)

    # mlxconfig -d /dev/mst/<mst-device> -y s CNP_DSCP_P1=48 CNP_DSCP_P2=48

     

    Option 2: Volatile (driver configuration)

    # echo 48 > /sys/class/net/<interface>/ecn/roce_np/cnp_dscp

     

    3. (Optional) Enable ECN for TCP traffic:

    # sysctl -w net.ipv4.tcp_ecn=1

    net.ipv4.tcp_ecn = 1

     

    Note: This command is not persistent.

     

    4. Set the RoCE mode to V2 for RDMA CM traffic.

    # cma_roce_mode -d mlx5_0 -p 1 -m 2

    For more details, see HowTo Set the Default RoCE Mode When Using RDMA CM.

     

    5. Set the default ToS to 105 (DSCP 26).

    # cma_roce_tos -d mlx5_0 -t 105

    For more information, see HowTo Set Egress ToS/DSCP on RDMA-CM QPs.

     

    6.  Activate PFC on priority 3.

    Option 1: using mlnx_qos tool (Non Volatile)

    # mlnx_qos -i <interface> --pfc 0,0,0,1,0,0,0,0

     

    See also, HowTo Configure PFC on ConnectX-4

     

    Option 2: Using LLDP DCBX, getting the configuration from the switch (Firmware configuration)

    # mlxconfig -d /dev/mst/mt4115_pciconf0 -y s LLDP_NB_DCBX_P1=TRUE LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2

     

    See also, HowTo Auto-Config PFC and ETS on ConnectX-4 via LLDP DCBX

     

    Option 3: Using LLDP DCBX, getting the configuration from the switch (Driver/OS configuration):

    # service lldpad start

    # lldptool -T -i <interface_name> -V PFC enabled=3