This is an archived document. Please refer to the more recent knowledge base articles on Getting Started with RoCE Configuration
This post is showing how to configure RoCE v2.0 End to End starting with ConnectX-3 Pro adapters over Mellanox SwitchX based switches configured with L3 (OSPF).
- RoCE v2 Considerations
- HowTo Configure OSPF on Mellanox Switches (Running-Config)
- HowTo Run RoCE over L2 Enabled with PFC
The network in this setup consists of four Mellanox switches, L3 enabled and configured with OSPF.
1. The running config of the setup can be found in HowTo Configure OSPF on Mellanox Switches (Running-Config) post.
2. In addition, it is recommended to enable PFC on all router ports for the lossless priority (e.g. 3) used for the RoCE application:
switch (config) # dcb priority-flow-control enable
switch (config) # dcb priority-flow-control priority 3 enable
switch (config) # interface ethernet 1/1-1/36 dcb priority-flow-control mode on force
For additional information about PFC configuration refer to HowTo Run RoCE over L2 Enabled with PFC.
3. By default, the router perform DSCP to PCP (L2 priority) mapping (fixed mapping), to map from PCP of one network to PCP of the other network (to preserve the priority), run following command on all switches:
switch (config) # qos map dscp-to-pcp preserve-pcp
Note: This command is applicable only for Mellanox switches based on SwitchX IC.
4. The switches sx01 and sx02 in the example above perform ECMP (multi-path) - load sharing. The default load sharing hash function is based on source IP and UDP/TCP port as well as Destination IP and UDP/TCP port and traffic class (in the CLI it is "all" option).
(Optional) To change the load sharing function use the switch command: ip load-sharing:
sx01 (config) # ip load-sharing ?
source-ip-port source ip and TCP/UDP port
destination-ip-port destination ip and TCP/UDP port
source-destination-ip-port source & destination ip and TCP/UDP port
traffic-class traffic class
all all options
sx01 (config) # show ip load-sharing
Load sharing: all
sx01 (config) #
Server Configuration (Linux)
options mlx4_core roce_mode=2
options mlx4_core roce_mode=2 rr_proto=23456
#ifconfig eth2 22.214.171.124/24 up ; route add -net 126.96.36.199 -gw 188.8.131.52
#ifconfig eth2 184.108.40.206/24 up ; route add -net 220.127.116.11 -gw 18.104.22.168
5. Configure QoS on the server. The QoS is important for priority map to TC (similar to RoCE v1). Refer to End-to-End QoS Configuration for Mellanox Switches (SwitchX) and Adapters for more details.
6. In order to work with RDMA_CM libraries run the following commands:
# mount -t configfs none /sys/kernel/config
# cd /sys/kernel/config/rdma_cm
# mkdir mlx4_0
# cd mlx4_0
# echo RoCE V2 > default_roce_mode
# cd ..
# rmdir mlx4_0
Note: The Possible value for default_roce_mode parameters are "IB/RoCE V1" and "RoCE V2"
1. Check the current RoCE Mode:
# cat /sys/module/mlx4_core/parameters/roce_mode 2
2. A basic verification test would be to run one of the performance tests with "-R" enabled (for RoCE)
#ib_write_bw -R -d mlx4_0 -i 1 --report_gbits -D 10
#ib_write_bw -R -d mlx4_0 -i 1 --report_gbits 22.214.171.124 -D 10