HowTo Configure Soft-RoCE

Version 26

    Soft-RoCE is a software implementation of RoCE that allows RoCE to run on any Ethernet network adapter whether it offers hardware acceleration or not.

    Soft-RoCE is released as part of upstream kernel 4.8 (or above). Mellanox OFED 4.0 or upstream driver could be used. If you install  MLNX_OFED 4.0, you automatically get the Soft-RoCE kernel module and user space libraries.

    This post demonstrates how to install and setup upstream Soft-RoCE (aka RXE), and is meant for IT managers and developers who wish to test RDMA on software over any 3rd party adapters.

     

    References

     

    Overview

    Soft-RoCE is a software implementation of the RDMA transport. It is a project that has been developed as a Github community project, with primary contributions from IBM, Mellanox, and System Fabric Works. Soft-RoCE is now ready for Linux upstream submission. Soft-RoCE leverages the same efficiency characteristics as RoCE, providing a complete RDMA stack implementation over any NIC.

     

    How Soft-RoCE works: Soft-RoCE driver implements the InfiniBand RDMA transport over the Linux network stack. It enables a system with standard Ethernet adapter to interoperate with hardware RoCE adapter or with another system running Soft-RoCE.

    Soft-RoCE emulates and works like Mellanox mlx4 hardware RoCE adapter, it has librxe user space library (same as the libmlx4 user space library) and ib_rxe kernel module (same as the mlx4_ib kernel module).

     

    How to test Soft-RoCE today: Follow this configuration procedure and/or the Github wiki pages.

     

    Architecture

    soft-roce.PNG

     

    Setup

        • CentOS 7 / Ubuntu 14.04 with two servers connected via an Ethernet switch or back-to-back.

     

    Note: MLNX_OFED or any other OFED driver should not be installed on the servers. In case OFED is installed, it should be removed prior to the test.

     

    Installation

    To get Soft-RoCE capabilities, you need to install the kernel and user space libraries on both servers:

     

    Kernel Installation

     

    1. Clone the kernel (4.9.0). Run:

    # cd /usr/src

    # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux
    Cloning into 'linux'...

    remote: Counting objects: 5548596, done.

    remote: Compressing objects: 100% (498423/498423), done.

    remote: Total 5548596 (delta 403806), reused 0 (delta 0)

    Receiving objects: 100% (5548596/5548596), 1.03 GiB | 292.00 KiB/s, done.

    Resolving deltas: 100% (4638969/4638969), done.

    Checking out files: 100% (56233/56233), done.

     

     

     

    2. Compile the kernel. Run:

    # cd linux

    # cp /boot/config-`uname –r` .config

    # make menuconfig

    ...

     

    This window will open:

    1.jpg

     

    3. To enable Soft-RoCE, perform the following steps:

     

    3.1 Press “/” : This will open a text field (search) for you.

    3.2 Type “rxe” and “OK”.

    3.3 Press "1"  to go to “Software RDMA over Ethernet (ROCE) driver”.

    3.4 Press "Space" to enable “Software RDMA over Ethernet (ROCE) driver”. You should see “<M>” before it.

    3.5 Go to “Save” using the right arrow in the keyboard and then press "Enter".

    3.6 Press "Enter" to save the file as “.config” file.

    3.7 Press "Enter" to exit.

    3.8 Go to “Exit” using the right arrow and press "Enter" repeatedly until you exit the wizard.

     

    4. Compile. Run:

    # make –j $(nproc)

    ...

    # make –j $(nproc) modules_install

    ...

    # make –j $(nproc) install

    ...

     

    Note:  If the output contains the following message "grubby fatal error: unable to find a suitable template" it means that we need to edit the /boot/grub/grub.conf file and manually add the kernel details to the file.

     

    Here is an example of the grub.conf file:

    # Mellanox Technologies Multiboot GRUB Configuration

    default 1

    timeout 10

     

     

    title CentOSreleasex86_64-3.10.0-123.el7.x86_64

    root (hd0,0)

    kernel /vmlinuz-CentOSreleasex86_64-3.10.0-123.el7.x86_64 root=/dev/sda2 console=tty0 console=ttyS0,115200n8 rhgb

    initrd /initramfs-CentOSreleasex86_64-3.10.0-123.el7.x86_64.img

     

    title CentOSreleasex86_64-4.9.0-rxe

    root (hd0,0)

    kernel /vmlinuz-4.9.0 root=/dev/sda2 console=tty0 console=ttyS0,115200n8 rhgb

    initrd /initramfs-4.9.0.img

     

    6. Reboot the server and login again.

     

    7. Make sure that the new kernel and the rdma_rxe module were installed. Run:

    # uname -r

    4.9.0

     

    # modinfo rdma_rxe

    filename:       /lib/modules/4.9.0/kernel/drivers/infiniband/sw/rxe/rdma_rxe.ko

    version:        0.2

    license:        Dual BSD/GPL

    description:    Soft RDMA transport

    author:         Bob Pearson, Frank Zago, John Groves, Kamal Heib

    srcversion:     D0E124BDAB1CF1F18E7DC56

    depends:        ib_core,ip6_udp_tunnel,udp_tunnel

    vermagic:       4.9.0 SMP mod_unload modversions

    parm:           add:Create RXE device over network interface

    parm:           remove:Remove RXE device over network interface

     

    User Space Libraries Installation

    The user space libraries supporting Soft-RoCE have not been distributed yet. This means you have to get the sources and build and install them manually.

     

    1. Clone the user space libraries. Run:

    # git clone https://github.com/linux-rdma/rdma-core.git

    Cloning into 'rdma-core'...

    remote: Counting objects: 16117, done.

    remote: Compressing objects: 100% (90/90), done.

    remote: Total 16117 (delta 56), reused 0 (delta 0), pack-reused 16027

    Receiving objects: 100% (16117/16117), 4.65 MiB | 908.00 KiB/s, done.

    Resolving deltas: 100% (11195/11195), done.

    Checking out files: 100% (519/519), done.

     

    2. Compile and install user space libraries. Run:

    # cd rdma_core

    # bash build.sh

    ...

    ...

    # cd build ; sudo make install

    ...

     

    Setup RXE

    The rxe_cfg script is responsible for starting, stopping, and configuring RXE.

    It loads and removes RXE kernel module (rxe_rdma) and couples it with a preferred Ethernet interface.

     

    The rxe_cfg is located in ../rdma-core/providers/rxe (current script does not add them to the path).

     

    1. Get status. Run:

    # rxe_cfg status

    rxe modules not loaded

    Name    Link  Driver  Speed  NMTU  IPv4_addr  RDEV  RMTU

    enp5s0  yes   e1000e

    enp6s0  no    e1000e

     

    We can see that there are 2 potential Ethernet interfaces that an RXE interface can be coupled with: enp5s0 and enp6s0.

     

    Note: The RXE kernel module is not yet loaded at this point.

     

    In this case e1000e is an Intel driver and enp5s0 is the 1GbE port on the server.

     

    To view other options for this command, run:

    # rxe_cfg help

      Usage:

        rxe_cfg [options] start|stop|status|persistent

        rxe_cfg debug on|off|<num>

        rxe_cfg [-n] add <ndev>

        rxe_cfg [-n] remove <ndev>|<rdev>

     

        <ndev> = network device e.g. eth3

        <rdev> = rdma device e.g. rxe1

     

      Options:

        -h: print this usage information

        -n: do not make the configuration action persistent

        -v: print additional debug output

        -l: show status for interfaces with link up

        -p <num>: (start command only) - set ethertype

     

    2. Start RXE. Run:

    Note: To start RXE you need to have super-user permissions.

    # sudo rxe_cfg start

      Name    Link  Driver  Speed  NMTU  IPv4_addr  RDEV  RMTU

      enp5s0  yes   e1000e

      enp6s0  no    e1000e

     

    To verify RXE kernel module is loaded, run:

    # lsmod |grep rdma_rxe

    rdma_rxe              114688  0

    ip6_udp_tunnel         16384  1 rdma_rxe

    udp_tunnel             16384  1 rdma_rxe

    ib_core               208896  9 rdma_rxe,ib_cm,rdma_cm,ib_umad,ib_uverbs,ib_ipoib,iw_cm,ib_ucm,rdma_ucm

     

    3. Create a new RXE device/interface by coupling it with an Ethernet interface. Run:

    In this example, we use enp5s0.

    # sudo rxe_cfg add enp5s0

     

    4. Check the status of rxe_cfg, make sure that rxe0 was added under RDEV (rxe device).

    It is also possible to check the ibv_devices command.

    # rxe_cfg

      Name    Link  Driver  Speed  NMTU  IPv4_addr  RDEV  RMTU

      enp5s0  yes   e1000e                          rxe0

      enp6s0  no    e1000e

     

    # ibv_devices

        device                 node GUID

        ------              ----------------

        rxe0                022590fffe8321de

     

    5. Test connectivity.

     

    - On the server:

    # ibv_rc_pingpong -d rxe0 -g 0

     

    - On the client:

    # ibv_rc_pingpong -d rxe0 -g 0 <server_management_ip>