This post presents how to configure QoS on Mellanox switches (SwitchX IC).
To do that we assume you have two types of traffic in your network:
- RoCE (Lossless L2 traffic)
- TCP (Lossy L2 traffic)
This post already assumes that you know how to configure your Ethernet network to run with PFC.
If not - refer to the following posts:
- HowTo Run RoCE over L2 Enabled with PFC
- HowTo Run RoCE and TCP over L2 Enabled with PFC
- For more information about the subject, refer to http://www.mellanox.com/related-docs/prod_software/RoCE_with_Priority_Flow_Control_Application_Guide.pdf
- End-to-End QoS Configuration for Mellanox Switches and Adapters
The setup example here is based on the setup example of the following post:
Priority 3 is enabled and used for the RoCE application only whereas TCP traffic is sent over priority 0.
- 4x Hosts
- 4x ConnectX-3, MLNX_OFED 2.1, RH6.4 (or later)
- 1x SX1036 switch system (or any other Mellanox Ethernet switch), MLNX-OS 3.3.4304 (or later)
- RoCE Network, VLAN100 (lossless) - 126.96.36.199
- TCP Network, VLAN200 (lossy) - 188.8.131.52
- 2x Application servers (connected to VLAN100 and VLAN200)
- 1x Web server (Connected to VLAN200)
- 1x Storage back-end server (Connected to VLAN100)
- Web - App-1 (TCP) on VLAN200
- Web - App-2 (TCP) on VLAN200
- App-1 - Storage (RoCE) on VLAN100
- App-2 - Storage (RoCE) on VLAN100
Switch Configuration - PFC
Create VLAN and set a switchport in hybrid (or trunk) mode. Run:
switch (config) # vlan 100
switch (config vlan 100) # exit
switch (config) # vlan 200
switch (config vlan 200) # exit
switch (config) # interface ethernet 1/1-1/4 switchport mode hybrid
switch (config) # interface ethernet 1/1 switchport hybrid allowed-vlan all
switch (config) # interface ethernet 1/2 switchport hybrid allowed-vlan all
switch (config) # interface ethernet 1/3 switchport hybrid allowed-vlan all
switch (config) # interface ethernet 1/4 switchport hybrid allowed-vlan all
Enable PFC. Run:
switch (config) # dcb priority-flow-control enable
switch (config) # dcb priority-flow-control priority 3 enable
switch (config) # interface ethernet 1/1-1/4 dcb priority-flow-control mode on force
Verify PFC configuration. Run:
switch (config)# show dcb priority-flow-control
Priority Enabled List :3
Priority Disabled List
0 1 2 4 5 6 7
Interface PFC admin PFC oper
------------ -------------- -------------
1/1 On Enabled
1/2 On Enabled
1/3 On Enabled
1/4 On Enabled
switch (config) #
Switch Configuration - QoS:
QoS configuration is composed of three steps:
- Set egress scheduling mode
- Map priority to Traffic Class (TC)
- Set weight limit per TC
The egress scheduling mode can be configured to weighted round robin (WRR) or strict priority (SP). WRR mode is enabled by default.
To change the egress scheduling mode to SP, run:
switch (config)# no dcb ets enable
Priority mapping to TC:
There are 8 priorities (0-7) that can be mapped to 4 TCs. The default mapping of priorities to TCs is as follows:
- Priority 0,1 mapped to TC 0
- Priority 2,3 mapped to TC 1
- Priority 4,5 mapped to TC 2
- Priority 6,7 mapped to TC 3
Note: TC0 and TC3 are lossy TCs, while TC1 and TC2 can be lossless as well as lossy (depends on the PFC configuration). It is possible but not recommended to map PFC enabled priorities (lossless traffic) to those TC0 or TC3.
It is possible to change the priority mapping if you wish to give a TC to single priority to map several priorities to more than one TC.
To configure different mapping, run the following command per ingress interface:
switch (config interface ethernet 1/1) # vlan map-priority 3 traffic-class 1
switch (config interface ethernet 1/1) #
Set bandwidth limit per TC:
By default each TC has 25% of the total bandwidth limit. However, it is possible to change the bandwidth limit (bucket) per TC.
To change the bandwidth limit per TC, run:
switch (config)# dcb ets tc bandwidth 20 20 30 30
switch (config) # show dcb ets
Number of Traffic Class: 4
switch (config) #
Note: If the scheduling mode is SP and not WRR (ETS), this configuration has no effect.
QoS needs may differ from one application to another, you may want to experiment with the configuration parameters and see its effect on your network.
For example, to give one application higher bandwidth than the other:
- Configure each application to run in a different priority in the network (with or without PFC enabled - it depends if the application is lossy or lossless)
- Work in WRR mode (ETS must be enabled - this is the case by default)
- Map each priority to a different TC. For example, map priority 0 to TC 0 and priority 5 to TC 2 (this is the mapping by default)
- Change the bandwidth limit per TC to reflect the applications' needs (you may want to test different options here). For example, configure TCs 0,1,3 with 1% bandwidth and TC 2 with 97%.
switch (config)# dcb ets tc bandwidth 1 1 97 1
Here is a screenshot example of such test on 10GbE links.
For server configuration, refer to HowTo Run RoCE and TCP over L2 enabled with PFC