11 Replies Latest reply on Sep 30, 2017 4:30 AM by boliniak

    NFS over RoCE Ubuntu 16.04 with latest OFED

    rbabchis

      I'm having trouble with NFS over RoCE on Ubuntu 16.04 using the latest OFED (MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu16.04-x86_64.tgz)

       

      Works with Inbox drivers (mostly) but not no much with latest OFED

      I managed to get NFS working with RoCE by following the docs on this site using the Inbox drivers for Ubuntu 16.04. I was having some little issues and I know the Ubuntu stuff is out of date so I wanted to install the latest OFED/mlx4 drivers, etc... as per recommendations on this site. So I did that. All went as planned. IP functionality is all there and RDMA tools/tests all work. The newest mlx4 driver is confirmed loaded and everything seems to work great. Except one thing.

       

      Now I have a problem. The svcrdma and xprtrdma modules won't load. Thus no RDMA support for NFS. I get the following errors. I have a feeling this can be resolved somehow - like by recompiling kernel modules and such but that is over my head at the moment. Or maybe I just messed something up (crossing fingers)? Can anyone help?

      NFS server:

      # modprobe svcrdma
      modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

      dmesg errors:

      [105699.696980] rpcrdma: Unknown symbol rdma_event_msg (err 0)
      [105699.697056] rpcrdma: disagrees about version of symbol ib_create_cq
      [105699.697059] rpcrdma: Unknown symbol ib_create_cq (err -22)
      [105699.697069] rpcrdma: disagrees about version of symbol rdma_resolve_addr
      [105699.697071] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
      [105699.697183] rpcrdma: Unknown symbol ib_event_msg (err 0)
      [105699.697213] rpcrdma: disagrees about version of symbol ib_dereg_mr
      [105699.697215] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
      [105699.697224] rpcrdma: disagrees about version of symbol ib_query_qp
      [105699.697226] rpcrdma: Unknown symbol ib_query_qp (err -22)
      [105699.697236] rpcrdma: disagrees about version of symbol rdma_disconnect
      [105699.697238] rpcrdma: Unknown symbol rdma_disconnect (err -22)
      [105699.697245] rpcrdma: disagrees about version of symbol ib_alloc_fmr
      [105699.697247] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
      [105699.697294] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
      [105699.697295] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
      [105699.697301] rpcrdma: disagrees about version of symbol rdma_resolve_route
      [105699.697303] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
      [105699.697398] rpcrdma: disagrees about version of symbol rdma_bind_addr
      [105699.697400] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
      [105699.697441] rpcrdma: disagrees about version of symbol rdma_create_qp
      [105699.697443] rpcrdma: Unknown symbol rdma_create_qp (err -22)
      [105699.697479] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
      [105699.697487] rpcrdma: disagrees about version of symbol ib_destroy_cq
      [105699.697489] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
      [105699.697494] rpcrdma: disagrees about version of symbol rdma_create_id
      [105699.697496] rpcrdma: Unknown symbol rdma_create_id (err -22)
      [105699.697582] rpcrdma: disagrees about version of symbol rdma_listen
      [105699.697584] rpcrdma: Unknown symbol rdma_listen (err -22)
      [105699.697587] rpcrdma: disagrees about version of symbol rdma_destroy_qp
      [105699.697589] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
      [105699.697597] rpcrdma: disagrees about version of symbol ib_query_device
      [105699.697599] rpcrdma: Unknown symbol ib_query_device (err -22)
      [105699.697606] rpcrdma: disagrees about version of symbol ib_get_dma_mr
      [105699.697607] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
      [105699.697617] rpcrdma: disagrees about version of symbol ib_alloc_pd
      [105699.697618] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
      [105699.697673] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
      [105699.697734] rpcrdma: disagrees about version of symbol rdma_connect
      [105699.697736] rpcrdma: Unknown symbol rdma_connect (err -22)
      [105699.697769] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
      [105699.697842] rpcrdma: disagrees about version of symbol rdma_destroy_id
      [105699.697844] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
      [105699.697872] rpcrdma: disagrees about version of symbol rdma_accept
      [105699.697874] rpcrdma: Unknown symbol rdma_accept (err -22)
      [105699.697882] rpcrdma: disagrees about version of symbol ib_destroy_qp
      [105699.697883] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
      [105699.697964] rpcrdma: disagrees about version of symbol ib_dealloc_pd
      [105699.697965] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

       

      NFS client:

      # modprobe xprtrdma         
      modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

      dmesg errors:

      [106055.692454] rpcrdma: Unknown symbol rdma_event_msg (err 0)
      [106055.692480] rpcrdma: disagrees about version of symbol ib_create_cq
      [106055.692481] rpcrdma: Unknown symbol ib_create_cq (err -22)
      [106055.692484] rpcrdma: disagrees about version of symbol rdma_resolve_addr
      [106055.692485] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
      [106055.692520] rpcrdma: Unknown symbol ib_event_msg (err 0)
      [106055.692529] rpcrdma: disagrees about version of symbol ib_dereg_mr
      [106055.692530] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
      [106055.692532] rpcrdma: disagrees about version of symbol ib_query_qp
      [106055.692533] rpcrdma: Unknown symbol ib_query_qp (err -22)
      [106055.692536] rpcrdma: disagrees about version of symbol rdma_disconnect
      [106055.692536] rpcrdma: Unknown symbol rdma_disconnect (err -22)
      [106055.692538] rpcrdma: disagrees about version of symbol ib_alloc_fmr
      [106055.692539] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
      [106055.692552] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
      [106055.692553] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
      [106055.692554] rpcrdma: disagrees about version of symbol rdma_resolve_route
      [106055.692555] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
      [106055.692565] rpcrdma: disagrees about version of symbol rdma_bind_addr
      [106055.692565] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
      [106055.692573] rpcrdma: disagrees about version of symbol rdma_create_qp
      [106055.692574] rpcrdma: Unknown symbol rdma_create_qp (err -22)
      [106055.692583] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
      [106055.692585] rpcrdma: disagrees about version of symbol ib_destroy_cq
      [106055.692585] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
      [106055.692587] rpcrdma: disagrees about version of symbol rdma_create_id
      [106055.692587] rpcrdma: Unknown symbol rdma_create_id (err -22)
      [106055.692613] rpcrdma: disagrees about version of symbol rdma_listen
      [106055.692614] rpcrdma: Unknown symbol rdma_listen (err -22)
      [106055.692615] rpcrdma: disagrees about version of symbol rdma_destroy_qp
      [106055.692615] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
      [106055.692617] rpcrdma: disagrees about version of symbol ib_query_device
      [106055.692618] rpcrdma: Unknown symbol ib_query_device (err -22)
      [106055.692619] rpcrdma: disagrees about version of symbol ib_get_dma_mr
      [106055.692620] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
      [106055.692622] rpcrdma: disagrees about version of symbol ib_alloc_pd
      [106055.692623] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
      [106055.692638] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
      [106055.692657] rpcrdma: disagrees about version of symbol rdma_connect
      [106055.692658] rpcrdma: Unknown symbol rdma_connect (err -22)
      [106055.692668] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
      [106055.692690] rpcrdma: disagrees about version of symbol rdma_destroy_id
      [106055.692690] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
      [106055.692698] rpcrdma: disagrees about version of symbol rdma_accept
      [106055.692699] rpcrdma: Unknown symbol rdma_accept (err -22)
      [106055.692701] rpcrdma: disagrees about version of symbol ib_destroy_qp
      [106055.692701] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
      [106055.692724] rpcrdma: disagrees about version of symbol ib_dealloc_pd
      [106055.692725] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

        • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
          alkx

          Hi Ryan, could you check if MOFED-3.4 has the same behaviour?

            • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
              rbabchis

              Nice a new version. Yes I will try today and report back. Thanks.

                • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                  rage@mellanox.com

                  Hi Ryan,

                   

                  What is the status on trying the new driver? Are you still seeing errors?

                   

                  ~Rage

                    • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                      rbabchis

                      Sorry I realized I had upgraded one of the systems I was testing with to

                      Ubuntu 16.10 - and that's not supported (it could be!). I'll have to

                      figure something out when I have time. I don't mind going back to 16.04,

                      but the process of setting things back up again is downer.

                       

                       

                      Ryan

                        • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                          rage@mellanox.com

                          Not a problem, Whenever you get time, update us on this issue. I'm sure the community would like to know...

                           

                          ~Rage

                            • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                              rbabchis

                              Same thing. It's those svcrdma and xprtrdma modules... I don't understand why this was overlooked. I wonder if those modules which I believe come with Ubuntu aren't being updated/replaced with the rest of the modules from Mellanox.

                               

                              One the server side (client is the same pretty much):

                               

                              root@igor:~# uname -a
                              Linux igor 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

                               

                              root@igor:~# lsb_release -a
                              No LSB modules are available.
                              Distributor ID: Ubuntu
                              Description:    Ubuntu 16.04.1 LTS
                              Release:        16.04
                              Codename:       xenial

                              root@igor:~# modprobe svcrdma                        
                              modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

                              root@igor:~# echo rdma 20049 > /proc/fs/nfsd/portlist
                              -su: echo: write error: Protocol not supported

                              root@igor:~# dmesg

                              [537309.544424] mlx4_en: enp1s0: Close port called
                              [537312.268446] Compat-mlnx-ofed backport release: 2ed8a21
                              [537312.268449] Backport based on mlnx_ofed/mlnx_rdma.git 2ed8a21
                              [537312.268450] compat.git: mlnx_ofed/mlnx_rdma.git
                              [537312.281715] mlx4_core: Mellanox ConnectX core driver v3.4-1.0.0 (25 Sep 2016)
                              [537312.281761] mlx4_core: Initializing 0000:01:00.0
                              [537314.046353] mlx4_core 0000:01:00.0: DMFS high rate mode not supported
                              [537314.046525] mlx4_core: device is working in RoCE mode: Roce V1                                                                                                    
                              [537314.046527] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
                              [537314.046528] mlx4_core: UD QP Gid type is: V1
                              [537314.945058] mlx4_core 0000:01:00.0: PCIe link speed is 5.0GT/s, device supports 5.0GT/s
                              [537314.945061] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
                              [537314.970245] pps_core: LinuxPPS API ver. 1 registered
                              [537314.970248] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
                              [537314.972474] PTP clock support registered
                              [537314.984817] mlx4_en: Mellanox ConnectX HCA Ethernet driver v3.4-1.0.0 (25 Sep 2016)
                              [537314.984933] mlx4_en 0000:01:00.0: Activating port:1
                              [537314.985004] mlx4_en: 0000:01:00.0: Port 1: enabling only PFC DCB ops
                              [537314.985006] mlx4_en: 0000:01:00.0: Port 1: Failed to query disable_32_14_4_e field for QCN
                              [537314.988534] mlx4_en: 0000:01:00.0: Port 1: Using 64 TX rings
                              [537314.988537] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
                              [537314.988540] mlx4_en: 0000:01:00.0: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
                              [537314.988921] mlx4_en: 0000:01:00.0: Port 1: Initializing port
                              [537314.998296] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.4-1.0.0 (25 Sep 2016)
                              [537314.998572] mlx4_core 0000:01:00.0: mlx4_ib_add: allocated counter index 1 for port 1
                              [537315.025711] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0
                              [537315.337917] mlx4_en: enp1s0:   frag:0 - size:1536 prefix:0 stride:1536
                              [537315.338140] mlx4_en: enp1s0:   frag:1 - size:4096 prefix:1536 stride:4096
                              [537315.338335] mlx4_en: enp1s0:   frag:2 - size:3390 prefix:5632 stride:3392
                              [537315.381364] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
                              [537317.232116] mlx4_en: enp1s0: Link Up
                              [537317.232198] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
                              [537317.287122] mlx4_en: enp1s0: Link Down
                              [537317.392025] mlx4_en: enp1s0: Link Up

                              [537447.634106] rpcrdma: Unknown symbol rdma_event_msg (err 0)
                              [537447.634210] rpcrdma: disagrees about version of symbol ib_create_cq
                              [537447.634214] rpcrdma: Unknown symbol ib_create_cq (err -22)
                              [537447.634228] rpcrdma: disagrees about version of symbol rdma_resolve_addr
                              [537447.634231] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
                              [537447.634406] rpcrdma: Unknown symbol ib_event_msg (err 0)
                              [537447.634450] rpcrdma: disagrees about version of symbol ib_dereg_mr
                              [537447.634452] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
                              [537447.634466] rpcrdma: disagrees about version of symbol ib_query_qp
                              [537447.634469] rpcrdma: Unknown symbol ib_query_qp (err -22)
                              [537447.634484] rpcrdma: disagrees about version of symbol rdma_disconnect
                              [537447.634487] rpcrdma: Unknown symbol rdma_disconnect (err -22)
                              [537447.634497] rpcrdma: disagrees about version of symbol ib_alloc_fmr
                              [537447.634500] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
                              [537447.634565] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
                              [537447.634567] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
                              [537447.634576] rpcrdma: disagrees about version of symbol rdma_resolve_route
                              [537447.634578] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
                              [537447.634621] rpcrdma: disagrees about version of symbol rdma_bind_addr
                              [537447.634624] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
                              [537447.634663] rpcrdma: disagrees about version of symbol rdma_create_qp
                              [537447.634666] rpcrdma: Unknown symbol rdma_create_qp (err -22)
                              [537447.634756] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
                              [537447.634771] rpcrdma: disagrees about version of symbol ib_destroy_cq
                              [537447.634775] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
                              [537447.634789] rpcrdma: disagrees about version of symbol rdma_create_id
                              [537447.634812] rpcrdma: Unknown symbol rdma_create_id (err -22)
                              [537447.634949] rpcrdma: disagrees about version of symbol rdma_listen
                              [537447.634953] rpcrdma: Unknown symbol rdma_listen (err -22)
                              [537447.634958] rpcrdma: disagrees about version of symbol rdma_destroy_qp
                              [537447.634963] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
                              [537447.634976] rpcrdma: disagrees about version of symbol ib_query_device
                              [537447.634978] rpcrdma: Unknown symbol ib_query_device (err -22)
                              [537447.634988] rpcrdma: disagrees about version of symbol ib_get_dma_mr
                              [537447.634991] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
                              [537447.635005] rpcrdma: disagrees about version of symbol ib_alloc_pd
                              [537447.635007] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
                              [537447.635090] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
                              [537447.635176] rpcrdma: disagrees about version of symbol rdma_connect
                              [537447.635179] rpcrdma: Unknown symbol rdma_connect (err -22)
                              [537447.635232] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
                              [537447.635336] rpcrdma: disagrees about version of symbol rdma_destroy_id
                              [537447.635339] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
                              [537447.635379] rpcrdma: disagrees about version of symbol rdma_accept
                              [537447.635382] rpcrdma: Unknown symbol rdma_accept (err -22)
                              [537447.635393] rpcrdma: disagrees about version of symbol ib_destroy_qp
                              [537447.635396] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
                              [537447.635512] rpcrdma: disagrees about version of symbol ib_dealloc_pd
                              [537447.635515] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

                    • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                      lorenzo.ivaldi@unige.it

                      I've the same problem now. Any suggestions?

                       

                      Thanks in advance

                      • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                        drewmorin

                        Having this this exact same issue. Fresh install of Ubuntu Server 16.04 LTS. RDMA is up and working. When I go to load the RDMA transport modules, per this thread HowTo Configure NFS over RDMA (RoCE)  I get the same error as above.

                        Would like to know if anyone has found a solution.

                        Thanks.

                        • Re: NFS over RoCE Ubuntu 16.04 with latest OFED
                          boliniak

                          I had the same in Centos 7 environment with OFED 3.3.x.x , I know that Mellanox is not supported NFS over RDMA with OFED 3.4 onwards but 3.3 in my situation is also don't working. Maybe Mellanox support will tell us more about it ??

                           

                          BR

                          Adam