HowTo Configure QoS on Mellanox Switches (SwitchX)

Version 18

    This post presents how to configure QoS on Mellanox switches (SwitchX IC).


    To do that we assume you have two types of traffic in your network:

    • RoCE  (Lossless L2 traffic)
    • TCP (Lossy L2 traffic)


    This post already assumes that you know how to configure your Ethernet network to run with PFC.
    If not - refer to the following posts:


    The setup example here is based on the setup example of the following post:


    Priority 3 is enabled and used for the RoCE application only whereas TCP traffic is sent over priority 0.



    • 4x Hosts
    • 4x ConnectX-3, MLNX_OFED 2.1, RH6.4 (or later)
    • 1x SX1036 switch system (or any other Mellanox Ethernet switch), MLNX-OS 3.3.4304 (or later)



    1. RoCE Network, VLAN100 (lossless) -
    2. TCP Network, VLAN200 (lossy) -


    Host Functions

    1. 2x Application servers (connected to VLAN100 and VLAN200)
    2. 1x Web server (Connected to VLAN200)
    3. 1x Storage back-end server (Connected to VLAN100)


    Network Flows

    1. Web -  App-1 (TCP) on VLAN200
    2. Web -  App-2 (TCP) on VLAN200
    3. App-1 -  Storage (RoCE) on VLAN100
    4. App-2 -  Storage (RoCE) on VLAN100




    Switch Configuration - PFC

    Create VLAN and set a switchport in hybrid (or trunk) mode. Run:

    switch (config) # vlan 100

    switch (config vlan 100) # exit

    switch (config) # vlan 200

    switch (config vlan 200) # exit

    switch (config) # interface ethernet 1/1-1/4 switchport mode hybrid

    switch (config) # interface ethernet 1/1 switchport hybrid allowed-vlan all

    switch (config) # interface ethernet 1/2 switchport hybrid allowed-vlan all

    switch (config) # interface ethernet 1/3 switchport hybrid allowed-vlan all

    switch (config) # interface ethernet 1/4 switchport hybrid allowed-vlan all



    Enable PFC. Run:

    switch (config) # dcb priority-flow-control enable

    switch (config) # dcb priority-flow-control priority 3 enable
    switch (config) # interface ethernet 1/1-1/4 dcb priority-flow-control mode on force


    Verify PFC configuration. Run:

    switch (config)# show dcb priority-flow-control

    PFC enabled

    Priority Enabled List   :3

    Priority Disabled List 
    0 1 2 4 5 6 7

    TC     Lossless

    ---    ----------

    0           N

    1           Y

    2           Y

    3           N

    Interface      PFC admin        PFC oper

    ------------  --------------   -------------


    1/1            On               Enabled

    1/2            On               Enabled

    1/3            On               Enabled

    1/4            On               Enabled

    switch (config) #


    Switch Configuration - QoS:

    QoS configuration is composed of three steps:

    1. Set egress scheduling mode
    2. Map priority to Traffic Class (TC)
    3. Set weight limit per TC


    Egress Scheduling:

    The egress scheduling mode can be configured to weighted round robin (WRR) or strict priority (SP). WRR mode is enabled by default.


    To change the egress scheduling mode to SP, run:

    switch (config)# no dcb ets enable



    Priority mapping to TC:

    There are 8 priorities (0-7) that can be mapped to 4 TCs. The default mapping of priorities to TCs is as follows:

    • Priority 0,1 mapped to TC 0
    • Priority 2,3 mapped to TC 1
    • Priority 4,5 mapped to TC 2
    • Priority 6,7 mapped to TC 3


    Note: TC0 and TC3 are lossy TCs, while TC1 and TC2 can be lossless as well as lossy (depends on the PFC configuration). It is possible but not recommended to map PFC enabled priorities (lossless traffic) to those TC0 or TC3.

    It is possible to change the priority mapping if you wish to give a TC to single priority to map several priorities to more than one TC.

    To configure different mapping, run the following command per ingress interface:

    switch (config interface ethernet 1/1) # vlan map-priority 3 traffic-class 1

    switch (config interface ethernet 1/1) #


    Set bandwidth limit per TC:

    By default each TC has 25% of the total bandwidth limit. However, it is possible to change the bandwidth limit (bucket) per TC.


    To change the bandwidth limit per TC, run:


    switch (config)# dcb ets tc bandwidth 20 20 30 30

    switch (config) # show dcb ets


    ETS enabled


    TC Bandwidth


    0  20%

    1  20%

    2  30%

    3  30%

    Number of Traffic Class: 4


    switch (config) #



    Note: If the scheduling mode is SP and not WRR (ETS), this configuration has no effect.



    QoS needs may differ from one application to another, you may want to experiment with the configuration parameters and see its effect on your network.


    For example, to give one application higher bandwidth than the other:

    1. Configure each application to run in a different priority in the network (with or without PFC enabled - it depends if the application is lossy or lossless)
    2. Work in WRR mode (ETS must be enabled - this is the case by default)
    3. Map each priority to a different TC. For example, map priority 0 to TC 0 and priority 5 to TC 2 (this is the mapping by default)
    4. Change the bandwidth limit per TC to reflect the applications' needs (you may want to test different options here). For example, configure TCs 0,1,3 with 1% bandwidth and TC 2 with 97%.

    switch (config)# dcb ets tc bandwidth 1 1 97 1



    Here is a screenshot example of such test on 10GbE links.




    Server Configuration
    For server configuration, refer to HowTo Run RoCE and TCP over L2 enabled with PFC