Lossless RoCE Configuration for Spectrum-based Cumulus Switches in DSCP-Based QoS Mode (Ver. 3.6)

Version 6

    The following post provides a configuration example of lossless RoCE for Spectrum-based Cumulus-OS switches in DSCP-based QoS mode.

    Notes:

     

    References

     

    Overview

    This solution utilizes the following network setup:

     

    Configuration

    Step 1 - Hosts configuration

    Map the NIC DSCP 26 to priority 3, set lossless to priority 3

    Example for linux with mlnx_qos:

    mlnx_qos -i <interface> --pfc 0,1,0,0,0,0,0,0 --dscp2prio=set,26,3

    Note: other configurations remain the same (all RoCE traffic should be with tclass=106 or DSCP=26)

     

    Step 2 - Switch configuration

    1. Enable ECN and RED for priority 3

    ## File: /etc/cumulus/datapath/traffic.conf

    ecn_red.port_group_list = [ecn_red_port_group]

    ecn_red.ecn_red_port_group.cos_list = [3]

    ecn_red.ecn_red_port_group.port_set = swp1-swp32

    ecn_red.ecn_red_port_group.ecn_enable = true

    ecn_red.ecn_red_port_group.red_enable = false

    ecn_red.ecn_red_port_group.min_threshold_bytes = 153600

    ecn_red.ecn_red_port_group.max_threshold_bytes = 1536000

    ecn_red.ecn_red_port_group.probability = 100

     

    2. Set trust mode to DSCP, map DSCP values to COS

    ## File: /etc/cumulus/datapath/traffic.conf

    traffic.packet_priority_source_set = [dscp]

    traffic.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63] #for all Priorities

    traffic.cos_1.priority_source.dscp = []

    traffic.cos_2.priority_source.dscp = [48]  # for CNPs

    traffic.cos_3.priority_source.dscp = [26]  # for RoCE

    traffic.cos_4.priority_source.dscp = []

    traffic.cos_5.priority_source.dscp = []

    traffic.cos_6.priority_source.dscp = []

    traffic.cos_7.priority_source.dscp = []

     

    3. Map switch priority to priority groups

    ## File: /etc/cumulus/datapath/traffic.conf

    traffic.priority_group_list = [control, service, bulk]

    priority_group.control.cos_list = [2]

    priority_group.service.cos_list = [3]

    priority_group.bulk.cos_list = [0,1,4,5,6,7]

     

    4. Enable PFC for priority 3

    ## File: /etc/cumulus/datapath/traffic.conf

    pfc.port_group_list = [pfc_port_group]

    pfc.pfc_port_group.cos_list = [3]

    pfc.pfc_port_group.port_set = swp1-swp32

    pfc.pfc_port_group.port_buffer_bytes = 70000

    pfc.pfc_port_group.xoff_size = 18000

    pfc.pfc_port_group.xon_delta = 0

    pfc.pfc_port_group.tx_enable = true

    pfc.pfc_port_group.rx_enable = true

    5. Configure weighted round-robin (to set the traffic group scheduling weight)

    ## File: /etc/cumulus/datapath/traffic.conf

    scheduling.algorithm = dwrr

    priority_group.control.weight = 0

    priority_group.service.weight = 16

    priority_group.bulk.weight = 16

    6. Assign group IDs, create buffer pools

    ## File: /usr/lib/python2.7/dist-packages/cumulus/__chip_config/mlx/datapath.conf

    priority_group.control.id = 0

    priority_group.service.id = 0

    priority_group.bulk.id = 0

    priority_group.control.service_pool = 0

    priority_group.service.service_pool = 0

    priority_group.bulk.service_pool = 0

    flow_control.service_pool = 1

    ingress_service_pool.0.percent = 50.0  # all priority groups

    ingress_service_pool.0.mode = 1 # dynamic buffering

    ingress_service_pool.1.percent = 50.0 # all lossless traffic

    ingress_service_pool.1.mode = 1 # dynamic buffering

    egress_service_pool.0.percent = 50.0 # all lossy priority groups, UC and MC

    egress_service_pool.0.mode = 1

    egress_service_pool.1.percent = 100.0 # all lossless priority groups

    egress_service_pool.1.mode = 1

    7. Configure alpha values

    Note: the values are set to maximize performance and shouldn't be altered without consulting Mellanox

    ## File: /usr/lib/python2.7/dist-packages/cumulus/__chip_config/mlx/datapath.conf

    For ingress buffer:

    priority_group.control.ingress_buffer.dynamic_quota = 11
    priority_group.service.ingress_buffer.dynamic_quota = 11
    priority_group.bulk.ingress_buffer.dynamic_quota = 11
    flow_control.ingress_buffer.dynamic_quota = 9 # RoCE related configuration

    For Unicast egress buffer:

    priority_group.bulk.egress_buffer.uc.sp_dynamic_quota = 11

    priority_group.service.egress_buffer.uc.sp_dynamic_quota = 11

    priority_group.control.egress_buffer.uc.sp_dynamic_quota = 11

    For Multicast egress buffer:

    priority_group.bulk.egress_buffer.mc.sp_dynamic_quota    = 9
    priority_group.service.egress_buffer.mc.sp_dynamic_quota = 9
    priority_group.control.egress_buffer.mc.sp_dynamic_quota = 9

    7. Apply configuration

    systemctl restart switchd.service