2 Replies Latest reply on Feb 23, 2015 3:30 PM by billbroadley

    Documentation for using Linux, NFS, and RDMA?

    billbroadley

      I've read the Linux Kernel IPoIB and NFS-RDMA documentation.  It doesn't seem particularly current with the 2.6 kernels mentioned.

       

      I'm looking for documentations for doing NFS-RDMA with the linux kernel 3.13 (which is used by Ubuntu 14.04 LTS) and mellanox FDR cards and switch.

       

      I've got RDMA and IPoIB working.  So ping, ibv_srq_pingping, udaddy, rdma_server, ib_send_bw, rping, and ucmatose all show client <-> server connectivity.

       

      Even the mount works:

      root@c2-31:/# mount -o rdma,port=20049 10.18.3.34:/export/1 /mnt/ib

      root@c2-31:/#

       

      I can cd around, but whenever I try to actually read a file I get:

      root@c2-31:/mnt/ib/bill# ls -alh testfile

      -rw-r--r-- 1 root root 1.0G Feb 20 19:31 testfile

      root@c2-31:/mnt/ib/bill# cat testfile

      cat: testfile: Input/output error

       

      On both client and server I do have:

      # ulimit -l

      unlimited

      # tail -4 /etc/security/limits.conf

      * soft memlock unlimited

      * hard memlock unlimited

      root soft memlock unlimited

      root hard memlock unlimited

       

      My goal for this is to get it working and write a puppet module to manage the client and server setup for NFS RDMA.

        • Re: Documentation for using Linux, NFS, and RDMA?

          Try running with NFSv3.

           

          If there are still problems, try running "rpcdebug -m rpc -s all" on client and server and post dmesg outputs from both.

            • Re: Documentation for using Linux, NFS, and RDMA?
              billbroadley

              RDMA bandwidth testing works:

              server# ib_send_bw -d mlx4_0 -i 1 -F -b

              ------------------------------------------------------------------

                                  Send Bidirectional BW Test

              Connection type : RC

              Inline data is used up to 400 bytes message

                local address:  LID 0x02, QPN 0x01c6, PSN 0xef4fed

                remote address: LID 0x04, QPN 0x004c, PSN 0x98d931

              Mtu : 2048

              ------------------------------------------------------------------

              #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec] 

              ------------------------------------------------------------------

               

              client:

              # ib_send_bw -d mlx4_0 -i 1 -F  10.18.3.31 -b

              ------------------------------------------------------------------

                                  Send Bidirectional BW Test

              Connection type : RC

              Inline data is used up to 400 bytes message

                local address:  LID 0x04, QPN 0x004c, PSN 0x98d931

                remote address: LID 0x02, QPN 0x01c6, PSN 0xef4fed

              Mtu : 2048

              ------------------------------------------------------------------

              #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec] 

              Conflicting CPU frequency values detected: 1200.000000 != 2101.000000

              Test integrity may be harmed !

              Conflicting CPU frequency values detected: 1200.000000 != 2101.000000

              Test integrity may be harmed !

              Warning: measured timestamp frequency 2099.8 differs from nominal 1200 MHz

                65536        1000            11457.31               11456.84

              ------------------------------------------------------------------

              root@nas-2-1:~#

               

              Does the lack of results from the server side look like a problem?

               

              On the client I did:

              # mount -o rdma,port=20049 10.18.3.58:/export/1 /mnt/ib

               

              I can cd around, and cat/touch small files:

              # cat .gnuplot-wxt

              raise=1

              persist=0

              ctrl=0

              rendering=2

              hinting=100

              #

               

              But not anything large:

              root@c2-31:/mnt/ib# md5sum testfile

              md5sum: testfile: Input/output error

               

              I've not tried NFSv3 yet, does NFSv4 have a known problem with RDMA?  The same pair of machines has NFSv4 working over GigE.  I tried your command and:

              # dmesg -c

              # cat testfile

              cat: testfile: Input/output error

              # rpcdebug -m rpc -s all

              # dmesg

              [242031.688205] RPC:       looking up Generic cred

              [242031.688213] RPC:       looking up Generic cred

              [242031.688216] RPC:       looking up Generic cred

              [242031.688222] RPC:       new task initialized, procpid 12386

              [242031.688224] RPC:       allocated task ffff880845c33000

              [242031.688236] RPC:   110 __rpc_execute flags=0x81

              [242031.688241] RPC:   110 call_start nfs4 proc OPEN (async)

              [242031.688243] RPC:   110 call_reserve (status 0)

              [242031.688246] RPC:   110 reserved req ffff880853255600 xid dfed959b

              [242031.688249] RPC:   110 call_reserveresult (status 0)

              [242031.688251] RPC:   110 call_refresh (status 0)

              [242031.688253] RPC:   110 refreshing UNIX cred ffff881050ef20c0

              [242031.688255] RPC:   110 call_refreshresult (status 0)

              [242031.688257] RPC:   110 call_allocate (status 0)

              [242031.688263] RPC:       xprt_rdma_allocate: size 6344 too large for buffer[1024]: prog 100003 vers 4 proc 1

              [242031.688269] RPC:       xprt_rdma_allocate: size 6344, request 0xffff88082b81c000

              [242031.688271] RPC:   110 call_bind (status 0)

              [242031.688273] RPC:   110 call_connect xprt ffff880846752000 is connected

              [242031.688275] RPC:   110 call_transmit (status 0)

              [242031.688277] RPC:   110 xprt_prepare_transmit

              [242031.688279] RPC:   110 xprt_cwnd_limited cong = 0 cwnd = 4096

              [242031.688281] RPC:   110 rpc_xdr_encode (status 0)

              [242031.688283] RPC:   110 marshaling UNIX cred ffff881050ef20c0

              [242031.688286] RPC:   110 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data

              [242031.688290] RPC:   110 xprt_transmit(220)

              [242031.688293] RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88082b81d83c len 220 hdrlen 220

              [242031.688297] RPC:       rpcrdma_register_frmr_external: Using frmr ffff880845ccda90 to map 1 segments

              [242031.688301] RPC:       rpcrdma_create_chunks: reply chunk elem 3124@0x82b81e3f4:0xe006730f (last)

              [242031.688305] RPC:       rpcrdma_marshal_req: reply chunk: hdrlen 48 rpclen 220 padlen 0 headerp 0xffff88082b81d100 base 0xffff88082b81d760 lkey 0x8000

              [242031.688309] RPC:   110 xmit complete

              [242031.688311] RPC:   110 sleep_on(queue "xprt_pending" time 4355352568)

              [242031.688313] RPC:   110 added to queue ffff880846752258 "xprt_pending"

              [242031.688315] RPC:   110 setting alarm for 60000 ms

              [242031.688318] RPC:       wake_up_first(ffff880846752190 "xprt_sending")

              [242031.688333] RPC:       rpcrdma_event_process: event rep ffff880845ccda90 status 0 opcode 8 length 4294936584

              [242031.688340] RPC:       rpcrdma_event_process: event rep           (null) status 0 opcode 0 length 4294936584

              [242031.688473] RPC:       rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 48

              [242031.688479] RPC:       rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082b81c000

              [242031.688479]                    RPC request 0xffff880853255600 xid 0x9b95eddf

              [242031.688484] RPC:       rpcrdma_count_chunks: chunk 308@0x82b81e3f4:0xe006730f

              [242031.688486] RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 308)

              [242031.688489] RPC:   110 xid dfed959b complete (308 bytes received)

              [242031.688492] RPC:   110 __rpc_wake_up_task (now 4355352568)

              [242031.688493] RPC:   110 disabling timer

              [242031.688496] RPC:   110 removed from queue ffff880846752258 "xprt_pending"

              [242031.688500] RPC:       __rpc_wake_up_task done

              [242031.688544] RPC:   110 __rpc_execute flags=0x881

              [242031.688547] RPC:   110 call_status (status 308)

              [242031.688549] RPC:   110 call_decode (status 308)

              [242031.688551] RPC:   110 validating UNIX cred ffff881050ef20c0

              [242031.688554] RPC:   110 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data

              [242031.688560] RPC:   110 call_decode result 0

              [242031.688563] RPC:       wake_up_first(ffff881052515e98 "NFSv4.0 transport Slot table")

              [242031.688566] RPC:   110 return 0, status 0

              [242031.688568] RPC:   110 release task

              [242031.688570] RPC:       wake_up_first(ffff880846752190 "xprt_sending")

              [242031.688573] RPC:       xprt_rdma_free: called on 0xffff88085436a000

              [242031.688580] RPC:       rpcrdma_event_process: event rep ffff880845ccda90 status 0 opcode 7 length 4294936584

              [242031.688586] RPC:   110 release request ffff880853255600

              [242031.688588] RPC:       wake_up_first(ffff880846752320 "xprt_backlog")

              [242031.688590] RPC:       rpc_release_client(ffff881052517200)

              [242031.688604] RPC:   110 freeing task

              [242031.688657] RPC:       new task initialized, procpid 12386

              [242031.688660] RPC:       allocated task ffff88085145fb38

              [242031.688672] RPC:   111 __rpc_execute flags=0x1

              [242031.688678] RPC:   111 call_start nfs4 proc READ (async)

              [242031.688681] RPC:   111 call_reserve (status 0)

              [242031.688686] RPC:   111 reserved req ffff880853255600 xid e0ed959b

              [242031.688689] RPC:   111 call_reserveresult (status 0)

              [242031.688692] RPC:   111 call_refresh (status 0)

              [242031.688695] RPC:   111 refreshing UNIX cred ffff881050ef20c0

              [242031.688697] RPC:   111 call_refreshresult (status 0)

              [242031.688699] RPC:   111 call_allocate (status 0)

              [242031.688705] RPC:       xprt_rdma_allocate: size 684, request 0xffff88082bea8000

              [242031.688707] RPC:   111 call_bind (status 0)

              [242031.688709] RPC:   111 call_connect xprt ffff880846752000 is connected

              [242031.688711] RPC:   111 call_transmit (status 0)

              [242031.688713] RPC:   111 xprt_prepare_transmit

              [242031.688715] RPC:   111 xprt_cwnd_limited cong = 0 cwnd = 4096

              [242031.688717] RPC:   111 rpc_xdr_encode (status 0)

              [242031.688719] RPC:   111 marshaling UNIX cred ffff881050ef20c0

              [242031.688721] RPC:   111 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data

              [242031.688724] RPC:   111 xprt_transmit(152)

              [242031.688727] RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88082bea97f8 len 152 hdrlen 152

              [242031.688731] RPC:       rpcrdma_register_frmr_external: Using frmr ffff880845ccd040 to map 1 segments

              [242031.688734] RPC:       rpcrdma_create_chunks: write chunk elem 4096@0x8460d0000:0xe006310f (more)

              [242031.688737] RPC:       rpcrdma_register_frmr_external: Using frmr ffff880845ccda68 to map 1 segments

              [242031.688739] RPC:       rpcrdma_event_process: event rep ffff880845ccd040 status 0 opcode 8 length 4294936584

              [242031.688744] RPC:       rpcrdma_create_chunks: write chunk elem 152@0x82bea9974:0xe006720d (last)

              [242031.688747] RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 152 padlen 0 headerp 0xffff88082bea9100 base 0xffff88082bea9760 lkey 0x8000

              [242031.688749] RPC:       rpcrdma_event_process: event rep ffff880845ccda68 status 0 opcode 8 length 4294936584

              [242031.688753] RPC:   111 xmit complete

              [242031.688755] RPC:   111 sleep_on(queue "xprt_pending" time 4355352568)

              [242031.688757] RPC:   111 added to queue ffff880846752258 "xprt_pending"

              [242031.688759] RPC:   111 setting alarm for 60000 ms

              [242031.688762] RPC:       wake_up_first(ffff880846752190 "xprt_sending")

              [242031.688876] RPC:       rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 112

              [242031.688881] RPC:       rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082bea8000

              [242031.688881]                    RPC request 0xffff880853255600 xid 0x9b95ede0

              [242031.688884] RPC:       rpcrdma_count_chunks: chunk 4096@0x8460d0000:0xe006310f

              [242031.688887] RPC:       rpcrdma_inline_fixup: srcp 0xffff88085436a094 len 60 hdrlen 60

              [242031.688890] RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 4156)

              [242031.688892] RPC:   111 xid e0ed959b complete (4156 bytes received)

              [242031.688894] RPC:   111 __rpc_wake_up_task (now 4355352568)

              [242031.688896] RPC:   111 disabling timer

              [242031.688898] RPC:   111 removed from queue ffff880846752258 "xprt_pending"

              [242031.688900] RPC:       __rpc_wake_up_task done

              [242031.688905] RPC:   111 __rpc_execute flags=0x801

              [242031.688907] RPC:   111 call_status (status 4156)

              [242031.688908] RPC:   111 call_decode (status 4156)

              [242031.688910] RPC:   111 validating UNIX cred ffff881050ef20c0

              [242031.688913] RPC:   111 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data

              [242031.688915] RPC:   111 call_decode result 0

              [242031.688918] RPC:       wake_up_first(ffff881052515e98 "NFSv4.0 transport Slot table")

              [242031.688921] RPC:   111 return 0, status 0

              [242031.688922] RPC:   111 release task

              [242031.688924] RPC:       wake_up_first(ffff880846752190 "xprt_sending")

              [242031.688927] RPC:       xprt_rdma_free: called on 0xffff88085436a000

              [242031.688934] RPC:       rpcrdma_event_process: event rep ffff880845ccd040 status 0 opcode 7 length 4294936584

              [242031.688937] RPC:       rpcrdma_event_process: event rep ffff880845ccda68 status 0 opcode 7 length 4294936584

              [242031.688941] RPC:   111 release request ffff880853255600

              [242031.688943] RPC:       wake_up_first(ffff880846752320 "xprt_backlog")

              [242031.688945] RPC:       rpc_release_client(ffff881052517200)

              [242031.689159] RPC:       new task initialized, procpid 12386

              [242031.689162] RPC:       allocated task ffff880845c32c00

              [242031.689173] RPC:   112 __rpc_execute flags=0x81

              [242031.689177] RPC:   112 call_start nfs4 proc CLOSE (async)

              [242031.689179] RPC:   112 call_reserve (status 0)

              [242031.689182] RPC:   112 reserved req ffff880853255600 xid e1ed959b

              [242031.689184] RPC:   112 call_reserveresult (status 0)

              [242031.689185] RPC:   112 call_refresh (status 0)

              [242031.689188] RPC:   112 refreshing UNIX cred ffff881050ef20c0

              [242031.689190] RPC:   112 call_refreshresult (status 0)

              [242031.689192] RPC:   112 call_allocate (status 0)

              [242031.689196] RPC:       xprt_rdma_allocate: size 3244 too large for buffer[1024]: prog 100003 vers 4 proc 1

              [242031.689201] RPC:       xprt_rdma_allocate: size 3244, request 0xffff88082b81c000

              [242031.689203] RPC:   112 call_bind (status 0)

              [242031.689205] RPC:   112 call_connect xprt ffff880846752000 is connected

              [242031.689207] RPC:   112 call_transmit (status 0)

              [242031.689208] RPC:   112 xprt_prepare_transmit

              [242031.689210] RPC:   112 xprt_cwnd_limited cong = 0 cwnd = 4096

              [242031.689213] RPC:   112 rpc_xdr_encode (status 0)

              [242031.689214] RPC:   112 marshaling UNIX cred ffff881050ef20c0

              [242031.689217] RPC:   112 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data

              [242031.689220] RPC:   112 xprt_transmit(160)

              [242031.689223] RPC:       rpcrdma_inline_pullup: pad 0 destp 0xffff88082b81d800 len 160 hdrlen 160

              [242031.689226] RPC:       rpcrdma_register_frmr_external: Using frmr ffff880845ccc5f0 to map 1 segments

              [242031.689229] RPC:       rpcrdma_create_chunks: reply chunk elem 2760@0x82b81d944:0xe005ef0f (last)

              [242031.689233] RPC:       rpcrdma_marshal_req: reply chunk: hdrlen 48 rpclen 160 padlen 0 headerp 0xffff88082b81d100 base 0xffff88082b81d760 lkey 0x8000

              [242031.689235] RPC:   112 xmit complete

              [242031.689238] RPC:   112 sleep_on(queue "xprt_pending" time 4355352568)

              [242031.689240] RPC:   112 added to queue ffff880846752258 "xprt_pending"

              [242031.689241] RPC:   112 setting alarm for 60000 ms

              [242031.689244] RPC:       wake_up_first(ffff880846752190 "xprt_sending")

              [242031.689262] RPC:       rpcrdma_event_process: event rep ffff880845ccc5f0 status 0 opcode 8 length 4294936584

              [242031.689417] RPC:       rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 48

              [242031.689424] RPC:       rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082b81c000

              [242031.689424]                    RPC request 0xffff880853255600 xid 0x9b95ede1

              [242031.689428] RPC:       rpcrdma_count_chunks: chunk 132@0x82b81d944:0xe005ef0f

              [242031.689431] RPC:       rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 132)

              [242031.689433] RPC:   112 xid e1ed959b complete (132 bytes received)

              [242031.689436] RPC:   112 __rpc_wake_up_task (now 4355352568)

              [242031.689438] RPC:   112 disabling timer

              [242031.689440] RPC:   112 removed from queue ffff880846752258 "xprt_pending"

              [242031.689445] RPC:       __rpc_wake_up_task done

              [242031.689452] RPC:   112 __rpc_execute flags=0x881

              [242031.689454] RPC:   112 call_status (status 132)

              [242031.689456] RPC:   112 call_decode (status 132)

              [242031.689459] RPC:   112 validating UNIX cred ffff881050ef20c0

              [242031.689461] RPC:   112 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data

              [242031.689467] RPC:   112 call_decode result 0

              [242031.689470] RPC:       wake_up_first(ffff881052515e98 "NFSv4.0 transport Slot table")

              [242031.689474] RPC:   112 return 0, status 0

              [242031.689476] RPC:   112 release task

              [242031.689478] RPC:       wake_up_first(ffff880846752190 "xprt_sending")

              [242031.689481] RPC:       xprt_rdma_free: called on 0xffff88085436a000

              [242031.689489] RPC:       rpcrdma_event_process: event rep ffff880845ccc5f0 status 0 opcode 7 length 4294936584

              [242031.689493] RPC:   112 release request ffff880853255600

              [242031.689495] RPC:       wake_up_first(ffff880846752320 "xprt_backlog")

              [242031.689497] RPC:       rpc_release_client(ffff881052517200)

              [242031.689523] RPC:   112 freeing task

               

              Here's the mount:

              # mount | grep rdma

              10.18.3.58:/export/1 on /mnt/ib type nfs (rw,rdma,port=20049,vers=4,addr=10.18.3.58,clientaddr=10.18.3.31)