HowTo Configure NFS over RDMA (RoCE)

Version 20

    This post shows how to configure NFS over RDMA (RoCE). In this post it is done over CentOS7, in other Linux OS distributions it is similar.

     

    References (General)

     

     

     

    References (Performance)

    Here are links to the Oracle OFA Dev Workshop presentations, given in March 2015.  The 2nd one shows that RDMA on InfiniBand gets better read and write performance using TCP/IP (via IPoIB).  Oracle found that for large I/O throughput, RDMA reads and write were both a lot better than using TCP/IPoIB.  For small random I/O, RDMA still provided a big benefit for reads but only a small benefit for writes.

    The CohortFS folks are planning to put NFSoRDMA into the Ganesha NFS server starting with Ganesha 2.3. Some background on Ganesha:

     

     

     

     

     

    Setup

    Two servers equipped with ConnectX-3 adapters (40GbE) connected via Mellanox Ethernet SX1036 switch.

    1%3Fauth_token%3D462ab0829eba99558b16c556a2b1fa900a9d2027

    It is also possible to test it over 56GbE, 40-56GbE Switch Upgrade License is Now Free.

     

    Prerequisites

    1. In this  example CentOS 7 was installed on both servers along with MLNX-OFED 2.4

     

    OS: CentOS 7

    Driver: MLNX_OFED 2.4

     

    2. Configure IP addresses on the ConnectX-3 adapter port connected to the switch.

    For example, configure on Host A, and similarly on Host B (with IP address 19.19.19.8).

    Host A # ifconfig ens1 19.19.19.7/24 up

     

    3. Verify MLNX_OFED version.

    Host A # ofed_info -s

    MLNX_OFED_LINUX-2.4-1.0.0:

     

    4. Verify IP connectivity between the servers.

    Host A # ping 19.19.19.8

    PING 19.19.19.8 (19.19.19.8) 56(84) bytes of data.

    64 bytes from 19.19.19.8: icmp_seq=1 ttl=64 time=0.099 ms

    5. Make sure RDMA is running properly between the servers. refer to HowTo Enable, Verify and Troubleshoot RDMA for examples.

     

    6. Verify that NFS without RDMA is working correctly (mount NFS directory from the server, check basic read/write operations)

     

    Procedure

    1. Create a directory to be mounted via the NFS.

    Host A # mkdir /root/my_directory

     

    2. On the Server side (host A) configure /etc/exports file to have this line:

    /root/my_directory *(rw,async,insecure,no_root_squash)

    More info on the parameters of /etc/exports can be found in here and here.

     

    3. Load the RDMA transport module on the server.

     

    Host A # modprobe svcrdma

     

    4. Start the NFS service.

    HostA # service nfs start

    Redirecting to /bin/systemctl start  nfs.service

    HostA # service nfs status

    Redirecting to /bin/systemctl status  nfs.service

    nfs-server.service - NFS Server

       Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled)

       Active: active (exited) since Thu 2015-02-26 02:49:15 IST; 3 weeks 6 days ago

      Process: 3099 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS $RPCNFSDCOUNT (code=exited, status=0/SUCCESS)

      Process: 3096 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)

      Process: 3093 ExecStartPre=/usr/libexec/nfs-utils/scripts/nfs-server.preconfig (code=exited, status=0/SUCCESS)

    Main PID: 3099 (code=exited, status=0/SUCCESS)

       CGroup: /system.slice/nfs-server.service

     

     

    Mar 13 20:47:52 HostA systemd[1]: Started NFS Server.

    Mar 17 01:10:21 HostA systemd[1]: Started NFS Server.

    Mar 25 18:20:07 HostA systemd[1]: Started NFS Server.

    Mar 25 20:34:09 HostA systemd[1]: Started NFS Server.

    HostA #     

     

    5.  Instruct the server to listen on the RDMA transport port.

    Note, 20049 is the default port (selected by IANA) but any other port can work (should be aligned in the NFS client as well).

    Host A # echo rdma 20049 > /proc/fs/nfsd/portlist

    Host A # cat /proc/fs/nfsd/portlist

    rdma 20049

    udp 2049

    tcp 2049

     

    6. Load the RDMA transport module on the client (host B)

    HostB # modprobe xprtrdma            

     

    7. Mount the directory (created on step 1) to local directory on the client side via rdma protocol.

    HostB # mount -o rdma,port=20049 19.19.19.7:/root/my_directory /mnt/my_directory

     

    8. Check the mount parameters:

    Host B # mount | grep my_directory

    19.19.19.7:/root/my_directory on /mnt/my_directory type nfs (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=rdma,port=20049,timeo=600,retrans=2,sec=sys,mountaddr=19.19.19.7,mountvers=3,mountproto=tcp,local_lock=none,addr=19.19.19.7)

     

    9. Verify access to the remote directory:

    Host B # touch /mnt/my_directory/1.txt

    Host A # ls /root/my_directory

    1.txt

     

    10. To check throughput/latency of the NFS over RDMA, fio can be used (or any similar test). For example

    # fio --rw=randread --bs=64k --numjobs=4 --iodepth=8 --runtime=30 --time_based --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --exitall --name task1 --filename=/mnt/my_directory/1.txt --size=10000000

    ...

     

    11. When testing NFS and changing the configuration file under /etc/exports, you need to export this file by using the command "exportfs -a" (The exportfs command maintains the current table of exports for the NFS server).

    # exportfs -a

    More information about exportfs can be found on the man page (click here).

     

    12. Wireshark support for NFSoRDMA was added to the development tree 1.99.7 and will be released as a part of Wireshark 2.0

     

    Click here to download 1.99.7 version -  https://www.wireshark.org/download/automated/win64/