RoCE Configuration on Mellanox Adapters (PCP-Based Lossless Traffic)

Version 21

    This post provides a configuration example for Mellanox devices installed with MLNX_OFED running RoCE over a lossless network, in PCP-based QoS mode.



    • Mellanox adapters and switches support DSCP based QoS and flow control, which is easier and simpler to configure and doesn't require VLANS, QoS is maintained across routers.
    • QoS parameters are set on QP creation, When working with RDMA-CM it is possible to set QoS parameters for RDMA-CM created QPs
    • Some of the configuration steps below can either be done permanently or temporarily (can be kept for the next boot).

           For permanent configuration after running mlxconfig, a device reset (mlxfwreset) or host reboot is required.



    Step 1 - Set QoS parameters

    Map sk-prio 2 to SL 3 (Note: This command is nonpersistent)

    # vconfig set_egress_map <vlan-interface> 2 3

    [Optional] Set ToS to 106 (DSCP 26) for ALL RoCE traffic (Note: This command is nonpersistent)

    # echo 106 > /sys/class/infiniband/<mlx-device>/tc/1/traffic_class

    [Optional] Set the RDMA-CM ToS to 106 (DSCP 26) (Note: This command is nonpersistent)

    # cma_roce_tos -d <mlx_dev> -t 106

    [Optional] Enable ECN for TCP traffic (Note: This command is nonpersistent)

    # sysctl -w net.ipv4.tcp_ecn=1


    Step 2 - Enable PFC on RoCE prioritry

    Activate PFC on priority 3

         Method 1 - Using mlnx_qos tool (Note: This command is nonpersistent):

    # mlnx_qos -i <interface> --pfc 0,0,0,1,0,0,0,0

         Method 2 - Using LLDP DCBX, and configuration in the switch (Note: This requires LLDP service to be enabled in the switch)

    # mlxconfig -d /dev/mst/mt4115_pciconf0 -y s LLDP_NB_DCBX_P1=TRUE LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2



    <interface> refers to parent interface (for example ens2f0)

    <vlan-interface> refers to vlan interface (for example ens2f0.100)

    <mst-device> refers to MST device. (for example  /dev/mst/mt4115_pciconf0)

    <mlx-device> refers to mlx device (for example mlx5_0)