HowTo Configure RoCE on ConnectX-4

Version 11

    This post explains how to enable RoCE (V1 or V2) on ConnectX-4 (mlx5 driver).

    This post is basic and meant for beginners who have some background in Mellanox adapters and RoCE technology.

     

    >>Learn for free about Mellanox solutions and technologies in the Mellanox Online Academy

     

    References

     

    Overview

    mlx5 supports both RoCEv1 and RoCE v2 per net device (interface). Each interface has a different GID address per protocol.

    The user can choose which GID to use when running RoCE traffic.

     

    Note: Lossless network is required here, similar to ConnectX-3 adapter, either global pause flow control or PFC should be configured on the network for smooth operation.

     

    Mapping between Interface, GID and RoCE version

    We will start with an example:

    • Two servers connected back to back using ConnectX-4 adapter.
    • MLNX_OFED is installed.

     

    In this example, the interface name is ens785f0.

    # ibdev2netdev

    ...

    mlx5_0 port 1 ==> ens785f0 (Up)

     

    In most cases, by default the GID indexes that should be used are 0 and 1.

     

    1. Check the GID index mapping to Interface at this location

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/0

    ens785f0

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/1

    ens785f0

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/2

    ens785f0

     

    In this example, you can see the GIDs 0,1 and 2 are mapped to our interface ens785f0.

     

    2. Check the RoCE type for GIDs 0,1,2

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/0

    IB/RoCE V1

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/1

    RoCE V2

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/2

    RoCE V1.5

    In this example, you can see that GID 0 is mapped to RoCEv1 or IB and GID 1 is mapped to RoCE v2.

     

    3. Get the GID address.

    # cat /sys/class/infiniband/mlx5_0/ports/1/gids/0

    fe80:0000:0000:0000:e61d:2dff:fef2:a488

    # cat /sys/class/infiniband/mlx5_0/ports/1/gids/1

    fe80:0000:0000:0000:e61d:2dff:fef2:a488

    # cat /sys/class/infiniband/mlx5_0/ports/1/gids/2

    fe80:0000:0000:0000:e61d:2dff:fef2:a488

     

    InterfaceGID IndexRoCE versionGID Address
    ens785f00RoCEv1fe80:0000:0000:0000:e61d:2dff:fef2:a488
    ens785f01RoCEv2fe80:0000:0000:0000:e61d:2dff:fef2:a488
    ens785f02RoCEv1.5fe80:0000:0000:0000:e61d:2dff:fef2:a488

     

    Show Pretty GIDs

    Once you understand how it works, you can check the Understanding show_gids Script .

    This is basically a short script that helps you get this mapping is a nice table view.

     

    Testing RoCE over ConnectX-4 using Perftests tools

     

    We will start with the following setup:

    • Two servers connected back to back using ConnectX-4 adapter.
    • MLNX_OFED is installed.

     

    As an example, we can configure IPs to the two interfaces (same interface name: ens785f0)

     

    • 9.9.9.5/24
    • 9.9.9.6/24

     

    Testing RoCE v1

    On one of the hosts which acts as a server, run the ib_send_bw server using "-x 0" in order to use GID number 0 (see the table above).

     

    Note: "-x 0" is the default parameter.

     

    # ib_send_bw -x 0

     

    ************************************

    * Waiting for client to connect... *

    ************************************

     

    The other host which acts as a client, use the same flag "-x 0" for RoCEv1.

    # ib_send_bw 9.9.9.5 --report_gbits -F -x 0

    ---------------------------------------------------------------------------------------

                        Send BW Test

    Dual-port       : OFF Device         : mlx5_1

    Number of qps   : 1 Transport type : IB

    Connection type : RC Using SRQ      : OFF

    TX depth        : 128

    CQ Moderation   : 100

    Mtu             : 4096[B]

    Link type       : Ethernet

    Gid index       : 0

    Max inline data : 0[B]

    rdma_cm QPs : OFF

    Data ex. method : Ethernet

    ---------------------------------------------------------------------------------------

    local address: LID 0000 QPN 0x01a6 PSN 0x8c023c

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:93

    remote address: LID 0000 QPN 0x0202 PSN 0x3d7628

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:137

    ---------------------------------------------------------------------------------------

    #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]

    65536      1000             95.13              95.12     0.181429

    ---------------------------------------------------------------------------------------

     

     

    Testing RoCE v2

    On one of the hosts which acts as a server, run the ib_send_bw server using "-x 1" flag in order to use GID number 1 for RoCEv2 (see the table above).

     

    # ib_send_bw -x 1

     

    ************************************

    * Waiting for client to connect... *

    ************************************

     

    The other host which acts as a client, use the same flag "-x 1" for RoCEv2.

    # ib_send_bw 9.9.9.5 --report_gbits -F -x 1

    ---------------------------------------------------------------------------------------

                        Send BW Test

    Dual-port       : OFF Device         : mlx5_1

    Number of qps   : 1 Transport type : IB

    Connection type : RC Using SRQ      : OFF

    TX depth        : 128

    CQ Moderation   : 100

    Mtu             : 4096[B]

    Link type       : Ethernet

    Gid index       : 1

    Max inline data : 0[B]

    rdma_cm QPs : OFF

    Data ex. method : Ethernet

    ---------------------------------------------------------------------------------------

    local address: LID 0000 QPN 0x01a6 PSN 0x8c023c

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:93

    remote address: LID 0000 QPN 0x0202 PSN 0x3d7628

    GID: 254:128:00:00:00:00:00:00:230:29:45:255:254:242:164:137

    ---------------------------------------------------------------------------------------

    #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]

    65536      1000             95.13              95.12     0.181429

    ---------------------------------------------------------------------------------------

     

     

    Testing RoCE over a VLAN Interface

     

    We will start with the same setup as above :

    • Two servers connected back to back using ConnectX-4 adapter.
    • MLNX_OFED is installed.

     

    As an example: we can configure IPs to the two interfaces (same interface name: ens785f0)

    • 9.9.9.5/24
    • 9.9.9.6/24

     

    Configure VLAN Interface

    See some examples here Configure 802.1Q VLAN Tagging Using the Command Line.

    1. Create the VLAN interface (VLAN 100 over ens785f0).

    # ip link add link ens785f0 name ens785f0.100 type vlan id 100

     

    2. Set an IP address for the interface.

     

    • Set for the first host:
    # ifconfig ens785f0.100 99.99.99.5/24 up

     

    • Set for the second host:
    # ifconfig ens785f0.100 99.99.99.6/24 up

    3. Make sure that ping is running between the hosts.

     

    Find the GID Index and the RoCE Version

    1. Find the GID index (in this example, it is 6, 7, 8).

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/6

    ens785f0.100

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/7

    ens785f0.100

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/ndevs/8

    ens785f0.100

     

    2. Find the RoCE version per GID

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/6

    IB/RoCE V1

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/7

    RoCE V2

    # cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/8

    RoCE V1.5

     

    Server 1:

    InterfaceGID IndexRoCE version
    ens785f0.1006RoCEv1
    ens785f0.1007RoCEv2
    ens785f0.1008RoCEv1.5

     

     

    Perform the same on the second server.

    Note: While configuring VLANs on two different servers, it may be that the GID indexes will not be the same. For example, in this case, on Server 1 (above) GID 7 should be used for RoCE v2 while on Server 2 it is GID number 8.

     

    Server 2:

    InterfaceGID IndexRoCE version
    ens785f0.1007RoCEv1
    ens785f0.1008RoCEv2
    ens785f0.1009RoCEv1.5