10 Replies Latest reply on Jun 12, 2017 2:15 PM by drewmorin

    NFS over RoCE Ubuntu 16.04 with latest OFED

    rbabchis

      I'm having trouble with NFS over RoCE on Ubuntu 16.04 using the latest OFED (MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu16.04-x86_64.tgz)

       

      Works with Inbox drivers (mostly) but not no much with latest OFED

      I managed to get NFS working with RoCE by following the docs on this site using the Inbox drivers for Ubuntu 16.04. I was having some little issues and I know the Ubuntu stuff is out of date so I wanted to install the latest OFED/mlx4 drivers, etc... as per recommendations on this site. So I did that. All went as planned. IP functionality is all there and RDMA tools/tests all work. The newest mlx4 driver is confirmed loaded and everything seems to work great. Except one thing.

       

      Now I have a problem. The svcrdma and xprtrdma modules won't load. Thus no RDMA support for NFS. I get the following errors. I have a feeling this can be resolved somehow - like by recompiling kernel modules and such but that is over my head at the moment. Or maybe I just messed something up (crossing fingers)? Can anyone help?

      NFS server:

      # modprobe svcrdma
      modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

      dmesg errors:

      [105699.696980] rpcrdma: Unknown symbol rdma_event_msg (err 0)
      [105699.697056] rpcrdma: disagrees about version of symbol ib_create_cq
      [105699.697059] rpcrdma: Unknown symbol ib_create_cq (err -22)
      [105699.697069] rpcrdma: disagrees about version of symbol rdma_resolve_addr
      [105699.697071] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
      [105699.697183] rpcrdma: Unknown symbol ib_event_msg (err 0)
      [105699.697213] rpcrdma: disagrees about version of symbol ib_dereg_mr
      [105699.697215] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
      [105699.697224] rpcrdma: disagrees about version of symbol ib_query_qp
      [105699.697226] rpcrdma: Unknown symbol ib_query_qp (err -22)
      [105699.697236] rpcrdma: disagrees about version of symbol rdma_disconnect
      [105699.697238] rpcrdma: Unknown symbol rdma_disconnect (err -22)
      [105699.697245] rpcrdma: disagrees about version of symbol ib_alloc_fmr
      [105699.697247] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
      [105699.697294] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
      [105699.697295] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
      [105699.697301] rpcrdma: disagrees about version of symbol rdma_resolve_route
      [105699.697303] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
      [105699.697398] rpcrdma: disagrees about version of symbol rdma_bind_addr
      [105699.697400] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
      [105699.697441] rpcrdma: disagrees about version of symbol rdma_create_qp
      [105699.697443] rpcrdma: Unknown symbol rdma_create_qp (err -22)
      [105699.697479] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
      [105699.697487] rpcrdma: disagrees about version of symbol ib_destroy_cq
      [105699.697489] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
      [105699.697494] rpcrdma: disagrees about version of symbol rdma_create_id
      [105699.697496] rpcrdma: Unknown symbol rdma_create_id (err -22)
      [105699.697582] rpcrdma: disagrees about version of symbol rdma_listen
      [105699.697584] rpcrdma: Unknown symbol rdma_listen (err -22)
      [105699.697587] rpcrdma: disagrees about version of symbol rdma_destroy_qp
      [105699.697589] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
      [105699.697597] rpcrdma: disagrees about version of symbol ib_query_device
      [105699.697599] rpcrdma: Unknown symbol ib_query_device (err -22)
      [105699.697606] rpcrdma: disagrees about version of symbol ib_get_dma_mr
      [105699.697607] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
      [105699.697617] rpcrdma: disagrees about version of symbol ib_alloc_pd
      [105699.697618] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
      [105699.697673] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
      [105699.697734] rpcrdma: disagrees about version of symbol rdma_connect
      [105699.697736] rpcrdma: Unknown symbol rdma_connect (err -22)
      [105699.697769] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
      [105699.697842] rpcrdma: disagrees about version of symbol rdma_destroy_id
      [105699.697844] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
      [105699.697872] rpcrdma: disagrees about version of symbol rdma_accept
      [105699.697874] rpcrdma: Unknown symbol rdma_accept (err -22)
      [105699.697882] rpcrdma: disagrees about version of symbol ib_destroy_qp
      [105699.697883] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
      [105699.697964] rpcrdma: disagrees about version of symbol ib_dealloc_pd
      [105699.697965] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

       

      NFS client:

      # modprobe xprtrdma         
      modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

      dmesg errors:

      [106055.692454] rpcrdma: Unknown symbol rdma_event_msg (err 0)
      [106055.692480] rpcrdma: disagrees about version of symbol ib_create_cq
      [106055.692481] rpcrdma: Unknown symbol ib_create_cq (err -22)
      [106055.692484] rpcrdma: disagrees about version of symbol rdma_resolve_addr
      [106055.692485] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
      [106055.692520] rpcrdma: Unknown symbol ib_event_msg (err 0)
      [106055.692529] rpcrdma: disagrees about version of symbol ib_dereg_mr
      [106055.692530] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
      [106055.692532] rpcrdma: disagrees about version of symbol ib_query_qp
      [106055.692533] rpcrdma: Unknown symbol ib_query_qp (err -22)
      [106055.692536] rpcrdma: disagrees about version of symbol rdma_disconnect
      [106055.692536] rpcrdma: Unknown symbol rdma_disconnect (err -22)
      [106055.692538] rpcrdma: disagrees about version of symbol ib_alloc_fmr
      [106055.692539] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
      [106055.692552] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
      [106055.692553] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
      [106055.692554] rpcrdma: disagrees about version of symbol rdma_resolve_route
      [106055.692555] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
      [106055.692565] rpcrdma: disagrees about version of symbol rdma_bind_addr
      [106055.692565] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
      [106055.692573] rpcrdma: disagrees about version of symbol rdma_create_qp
      [106055.692574] rpcrdma: Unknown symbol rdma_create_qp (err -22)
      [106055.692583] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
      [106055.692585] rpcrdma: disagrees about version of symbol ib_destroy_cq
      [106055.692585] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
      [106055.692587] rpcrdma: disagrees about version of symbol rdma_create_id
      [106055.692587] rpcrdma: Unknown symbol rdma_create_id (err -22)
      [106055.692613] rpcrdma: disagrees about version of symbol rdma_listen
      [106055.692614] rpcrdma: Unknown symbol rdma_listen (err -22)
      [106055.692615] rpcrdma: disagrees about version of symbol rdma_destroy_qp
      [106055.692615] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
      [106055.692617] rpcrdma: disagrees about version of symbol ib_query_device
      [106055.692618] rpcrdma: Unknown symbol ib_query_device (err -22)
      [106055.692619] rpcrdma: disagrees about version of symbol ib_get_dma_mr
      [106055.692620] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
      [106055.692622] rpcrdma: disagrees about version of symbol ib_alloc_pd
      [106055.692623] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
      [106055.692638] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
      [106055.692657] rpcrdma: disagrees about version of symbol rdma_connect
      [106055.692658] rpcrdma: Unknown symbol rdma_connect (err -22)
      [106055.692668] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
      [106055.692690] rpcrdma: disagrees about version of symbol rdma_destroy_id
      [106055.692690] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
      [106055.692698] rpcrdma: disagrees about version of symbol rdma_accept
      [106055.692699] rpcrdma: Unknown symbol rdma_accept (err -22)
      [106055.692701] rpcrdma: disagrees about version of symbol ib_destroy_qp
      [106055.692701] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
      [106055.692724] rpcrdma: disagrees about version of symbol ib_dealloc_pd
      [106055.692725] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

        • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
          alkx

          Hi Ryan, could you check if MOFED-3.4 has the same behaviour?

            • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
              rbabchis

              Nice a new version. Yes I will try today and report back. Thanks.

                • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                  rage@mellanox.com

                  Hi Ryan,

                   

                  What is the status on trying the new driver? Are you still seeing errors?

                   

                  ~Rage

                    • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                      rbabchis

                      Sorry I realized I had upgraded one of the systems I was testing with to

                      Ubuntu 16.10 - and that's not supported (it could be!). I'll have to

                      figure something out when I have time. I don't mind going back to 16.04,

                      but the process of setting things back up again is downer.

                       

                       

                      Ryan

                        • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                          rage@mellanox.com

                          Not a problem, Whenever you get time, update us on this issue. I'm sure the community would like to know...

                           

                          ~Rage

                            • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                              rbabchis

                              Same thing. It's those svcrdma and xprtrdma modules... I don't understand why this was overlooked. I wonder if those modules which I believe come with Ubuntu aren't being updated/replaced with the rest of the modules from Mellanox.

                               

                              One the server side (client is the same pretty much):

                               

                              root@igor:~# uname -a
                              Linux igor 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

                               

                              root@igor:~# lsb_release -a
                              No LSB modules are available.
                              Distributor ID: Ubuntu
                              Description:    Ubuntu 16.04.1 LTS
                              Release:        16.04
                              Codename:       xenial

                              root@igor:~# modprobe svcrdma                        
                              modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

                              root@igor:~# echo rdma 20049 > /proc/fs/nfsd/portlist
                              -su: echo: write error: Protocol not supported

                              root@igor:~# dmesg

                              [537309.544424] mlx4_en: enp1s0: Close port called
                              [537312.268446] Compat-mlnx-ofed backport release: 2ed8a21
                              [537312.268449] Backport based on mlnx_ofed/mlnx_rdma.git 2ed8a21
                              [537312.268450] compat.git: mlnx_ofed/mlnx_rdma.git
                              [537312.281715] mlx4_core: Mellanox ConnectX core driver v3.4-1.0.0 (25 Sep 2016)
                              [537312.281761] mlx4_core: Initializing 0000:01:00.0
                              [537314.046353] mlx4_core 0000:01:00.0: DMFS high rate mode not supported
                              [537314.046525] mlx4_core: device is working in RoCE mode: Roce V1                                                                                                    
                              [537314.046527] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
                              [537314.046528] mlx4_core: UD QP Gid type is: V1
                              [537314.945058] mlx4_core 0000:01:00.0: PCIe link speed is 5.0GT/s, device supports 5.0GT/s
                              [537314.945061] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
                              [537314.970245] pps_core: LinuxPPS API ver. 1 registered
                              [537314.970248] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
                              [537314.972474] PTP clock support registered
                              [537314.984817] mlx4_en: Mellanox ConnectX HCA Ethernet driver v3.4-1.0.0 (25 Sep 2016)
                              [537314.984933] mlx4_en 0000:01:00.0: Activating port:1
                              [537314.985004] mlx4_en: 0000:01:00.0: Port 1: enabling only PFC DCB ops
                              [537314.985006] mlx4_en: 0000:01:00.0: Port 1: Failed to query disable_32_14_4_e field for QCN
                              [537314.988534] mlx4_en: 0000:01:00.0: Port 1: Using 64 TX rings
                              [537314.988537] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
                              [537314.988540] mlx4_en: 0000:01:00.0: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
                              [537314.988921] mlx4_en: 0000:01:00.0: Port 1: Initializing port
                              [537314.998296] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.4-1.0.0 (25 Sep 2016)
                              [537314.998572] mlx4_core 0000:01:00.0: mlx4_ib_add: allocated counter index 1 for port 1
                              [537315.025711] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0
                              [537315.337917] mlx4_en: enp1s0:   frag:0 - size:1536 prefix:0 stride:1536
                              [537315.338140] mlx4_en: enp1s0:   frag:1 - size:4096 prefix:1536 stride:4096
                              [537315.338335] mlx4_en: enp1s0:   frag:2 - size:3390 prefix:5632 stride:3392
                              [537315.381364] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
                              [537317.232116] mlx4_en: enp1s0: Link Up
                              [537317.232198] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
                              [537317.287122] mlx4_en: enp1s0: Link Down
                              [537317.392025] mlx4_en: enp1s0: Link Up

                              [537447.634106] rpcrdma: Unknown symbol rdma_event_msg (err 0)
                              [537447.634210] rpcrdma: disagrees about version of symbol ib_create_cq
                              [537447.634214] rpcrdma: Unknown symbol ib_create_cq (err -22)
                              [537447.634228] rpcrdma: disagrees about version of symbol rdma_resolve_addr
                              [537447.634231] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
                              [537447.634406] rpcrdma: Unknown symbol ib_event_msg (err 0)
                              [537447.634450] rpcrdma: disagrees about version of symbol ib_dereg_mr
                              [537447.634452] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
                              [537447.634466] rpcrdma: disagrees about version of symbol ib_query_qp
                              [537447.634469] rpcrdma: Unknown symbol ib_query_qp (err -22)
                              [537447.634484] rpcrdma: disagrees about version of symbol rdma_disconnect
                              [537447.634487] rpcrdma: Unknown symbol rdma_disconnect (err -22)
                              [537447.634497] rpcrdma: disagrees about version of symbol ib_alloc_fmr
                              [537447.634500] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
                              [537447.634565] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
                              [537447.634567] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
                              [537447.634576] rpcrdma: disagrees about version of symbol rdma_resolve_route
                              [537447.634578] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
                              [537447.634621] rpcrdma: disagrees about version of symbol rdma_bind_addr
                              [537447.634624] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
                              [537447.634663] rpcrdma: disagrees about version of symbol rdma_create_qp
                              [537447.634666] rpcrdma: Unknown symbol rdma_create_qp (err -22)
                              [537447.634756] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
                              [537447.634771] rpcrdma: disagrees about version of symbol ib_destroy_cq
                              [537447.634775] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
                              [537447.634789] rpcrdma: disagrees about version of symbol rdma_create_id
                              [537447.634812] rpcrdma: Unknown symbol rdma_create_id (err -22)
                              [537447.634949] rpcrdma: disagrees about version of symbol rdma_listen
                              [537447.634953] rpcrdma: Unknown symbol rdma_listen (err -22)
                              [537447.634958] rpcrdma: disagrees about version of symbol rdma_destroy_qp
                              [537447.634963] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
                              [537447.634976] rpcrdma: disagrees about version of symbol ib_query_device
                              [537447.634978] rpcrdma: Unknown symbol ib_query_device (err -22)
                              [537447.634988] rpcrdma: disagrees about version of symbol ib_get_dma_mr
                              [537447.634991] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
                              [537447.635005] rpcrdma: disagrees about version of symbol ib_alloc_pd
                              [537447.635007] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
                              [537447.635090] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
                              [537447.635176] rpcrdma: disagrees about version of symbol rdma_connect
                              [537447.635179] rpcrdma: Unknown symbol rdma_connect (err -22)
                              [537447.635232] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
                              [537447.635336] rpcrdma: disagrees about version of symbol rdma_destroy_id
                              [537447.635339] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
                              [537447.635379] rpcrdma: disagrees about version of symbol rdma_accept
                              [537447.635382] rpcrdma: Unknown symbol rdma_accept (err -22)
                              [537447.635393] rpcrdma: disagrees about version of symbol ib_destroy_qp
                              [537447.635396] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
                              [537447.635512] rpcrdma: disagrees about version of symbol ib_dealloc_pd
                              [537447.635515] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

                    • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                      lorenzo.ivaldi@unige.it

                      I've the same problem now. Any suggestions?

                       

                      Thanks in advance

                      • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                        drewmorin

                        Having this this exact same issue. Fresh install of Ubuntu Server 16.04 LTS. RDMA is up and working. When I go to load the RDMA transport modules, per this thread HowTo Configure NFS over RDMA (RoCE)  I get the same error as above.

                        Would like to know if anyone has found a solution.

                        Thanks.