HowTo Configure RoCE on ConnectX-4

Version 10

    This post explains how to enable RoCE (V1 or V2) on ConnectX-4 (mlx5 driver).

    This post is basic and meant for beginners who have some background in Mellanox adapters and RoCE technology.

     

    References

     

    Overview

    mlx5 supports both RoCEv1 and RoCE v2 per net device (interface). Each interface has a different GID address per protocol.

    The user can choose which GID to use when running RoCE traffic.

     

    Note: Lossless network is required here, similar to ConnectX-3 adapter, either global pause flow control or PFC should be configured on the network for smooth operation.

     

    Mapping between Interface, GID and RoCE version

    We will start with an example:

    • Two servers connected back to back using ConnectX-4 adapter.
    • MLNX_OFED is installed.

     

    In this example, the interface name is ens785f0.

    # ibdev2netdev

    ...

    mlx5_0 port 1 ==> ens785f0 (Up)

     

    In most cases, by default the GID indexes that should be used are 0 and 1.

     

    1. Check the GID index mapping to Interface at this location

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/0

    ens785f0

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/1

    ens785f0

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/2

    ens785f0

     

    In this example, you can see the GIDs 0,1 and 2 are mapped to our interface ens785f0.

     

    2. Check the RoCE type for GIDs 0,1,2

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/0

    IB/RoCE V1

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/1

    RoCE V2

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/2

    RoCE V1.5

    In this example, you can see that GID 0 is mapped to RoCEv1 or IB and GID 1 is mapped to RoCE v2.

     

    3. Get the GID address.

    # cat /sys/class/infiniband/mlx5_0/ports/1/gids/0

    fe80:0000:0000:0000:e61d:2dff:fef2:a488

    # cat /sys/class/infiniband/mlx5_0/ports/1/gids/1

    fe80:0000:0000:0000:e61d:2dff:fef2:a488

    # cat /sys/class/infiniband/mlx5_0/ports/1/gids/2

    fe80:0000:0000:0000:e61d:2dff:fef2:a488

     

    InterfaceGID IndexRoCE versionGID Address
    ens785f00RoCEv1fe80:0000:0000:0000:e61d:2dff:fef2:a488
    ens785f01RoCEv2fe80:0000:0000:0000:e61d:2dff:fef2:a488
    ens785f02RoCEv1.5fe80:0000:0000:0000:e61d:2dff:fef2:a488

     

    Show Pretty GIDs

    Once you understand how it works, you can check the Understanding show_gids Script .

    This is basically a short script that helps you get this mapping is a nice table view.

     

    Testing RoCE over ConnectX-4 using Perftests tools

     

    We will start with the following setup:

    • Two servers connected back to back using ConnectX-4 adapter.
    • MLNX_OFED is installed.

     

    As an example, we can configure IPs to the two interfaces (same interface name: ens785f0)

     

    • 9.9.9.5/24
    • 9.9.9.6/24

     

    Testing RoCE v1

    On one of the hosts which acts as a server, run the ib_send_bw server using "-x 0" in order to use GID number 0 (see the table above).

     

    Note: "-x 0" is the default parameter.

     

    # ib_send_bw -x 0

     

    ************************************

    * Waiting for client to connect... *

    ************************************

     

    The other host which acts as a client, use the same flag "-x 0" for RoCEv1.

    # ib_send_bw 9.9.9.5 --report_gbits -F -x 0

    ---------------------------------------------------------------------------------------

                        Send BW Test

    Dual-port       : OFF Device         : mlx5_1

    Number of qps   : 1 Transport type : IB

    Connection type : RC Using SRQ      : OFF

    TX depth        : 128

    CQ Moderation   : 100

    Mtu             : 4096[B]

    Link type       : Ethernet

    Gid index       : 0

    Max inline data : 0[B]

    rdma_cm QPs : OFF

    Data ex. method : Ethernet

    ---------------------------------------------------------------------------------------

    local address: LID 0000 QPN 0x01a6 PSN 0x8c023c

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:93

    remote address: LID 0000 QPN 0x0202 PSN 0x3d7628

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:137

    ---------------------------------------------------------------------------------------

    #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]

    65536      1000             95.13              95.12     0.181429

    ---------------------------------------------------------------------------------------

     

     

    Testing RoCE v2

    On one of the hosts which acts as a server, run the ib_send_bw server using "-x 1" flag in order to use GID number 1 for RoCEv2 (see the table above).

     

    # ib_send_bw -x 1

     

    ************************************

    * Waiting for client to connect... *

    ************************************

     

    The other host which acts as a client, use the same flag "-x 1" for RoCEv2.

    # ib_send_bw 9.9.9.5 --report_gbits -F -x 1

    ---------------------------------------------------------------------------------------

                        Send BW Test

    Dual-port       : OFF Device         : mlx5_1

    Number of qps   : 1 Transport type : IB

    Connection type : RC Using SRQ      : OFF

    TX depth        : 128

    CQ Moderation   : 100

    Mtu             : 4096[B]

    Link type       : Ethernet

    Gid index       : 1

    Max inline data : 0[B]

    rdma_cm QPs : OFF

    Data ex. method : Ethernet

    ---------------------------------------------------------------------------------------

    local address: LID 0000 QPN 0x01a6 PSN 0x8c023c

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:93

    remote address: LID 0000 QPN 0x0202 PSN 0x3d7628

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:137

    ---------------------------------------------------------------------------------------

    #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]

    65536      1000             95.13              95.12     0.181429

    ---------------------------------------------------------------------------------------

     

     

    Testing RoCE over a VLAN Interface

     

    We will start with the same setup as above :

    • Two servers connected back to back using ConnectX-4 adapter.
    • MLNX_OFED is installed.

     

    As an example: we can configure IPs to the two interfaces (same interface name: ens785f0)

    • 9.9.9.5/24
    • 9.9.9.6/24

     

    Configure VLAN Interface

    See some examples here Configure 802.1Q VLAN Tagging Using the Command Line.

    1. Create the VLAN interface (VLAN 100 over ens785f0).

    # ip link add link ens785f0 name ens785f0.100 type vlan id 100

     

    2. Set an IP address for the interface.

     

    • Set for the first host:
    # ifconfig ens785f0.100 99.99.99.5/24 up

     

    • Set for the second host:
    # ifconfig ens785f0.100 99.99.99.6/24 up

    3. Make sure that ping is running between the hosts.

     

    Find the GID Index and the RoCE Version

    1. Find the GID index (in this example, it is 6, 7, 8).

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/6

    ens785f0.100

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/7

    ens785f0.100

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/8

    ens785f0.100

     

    2. Find the RoCE version per GID

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/6

    IB/RoCE V1

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/7

    RoCE V2

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/8

    RoCE V1.5

     

    Server 1:

    InterfaceGID IndexRoCE version
    ens785f0.1006RoCEv1
    ens785f0.1007RoCEv2
    ens785f0.1008RoCEv1.5

     

     

    Perform the same on the second server.

    Note: While configuring VLANs on two different servers, it may be that the GID indexes will not be the same. For example, in this case, on Server 1 (above) GID 7 should be used for RoCE v2 while on Server 2 it is GID number 8.

     

    Server 2:

    InterfaceGID IndexRoCE version
    ens785f0.1007RoCEv1
    ens785f0.1008RoCEv2
    ens785f0.1009RoCEv1.5