6 Replies Latest reply on May 14, 2014 7:37 AM by inbusiness

    Issues with SRP and unexplainable "ibping" behaviour.

      Hello everyone,

       

      I'm chasing a bit of assistance in troubleshooting SRP over an Infiniband setup which I have at home. Essentially I'm not seeing the disk I/O performance I was expecting between my SRP Initiator and Target and want to troubleshoot where the problem could be. I wanted to start at the Infiniband infrastructure and working up from there. If I can verify that my Infiniband is setup correctly and performing as it should, I can start to troubleshoot the additional technologies and protocols involved.

       

      Some basic information first:

       

      SRP Target: Oracle Solaris v11.1 Server with ZFS pools as LU (Logical Units).

      SRP Initiator: VMware ESXi v5.5.

       

      Mellanox MHGH28-XTC (MT25418) cards are being used in both the Infiniband devices above. A CX4 cable is used to directly connect between them.

       

      Now to the best of my knowledge, the drivers, VIBs and configuration has all been done correctly and I'm at the point where my ESXi v5.5 can actually see the LU, mount it and I can store data on there. At this stage, it seems to be purely a performance issue which I'm trying to resolve.

       

      Some CLI outputs below:

       

      STORAGE-SERVER

      STORAGE-SERVER:/# ibstat

      CA 'mlx4_0'

              CA type: 0

              Number of ports: 2

              Firmware version: 2.9.1000

              Hardware version: 160

              Node GUID: 0x001a4bffff0c6214

              System image GUID: 0x001a4bffff0c6217

              Port 1:

                      State: Active

                      Physical state: LinkUp

                      Rate: 20

                      Base lid: 2

                      LMC: 0

                      SM lid: 1

                      Capability mask: 0x00000038

                      Port GUID: 0x001a4bffff0c6215

                      Link layer: IB

              Port 2:

                      State: Down

                      Physical state: Polling

                      Rate: 10

                      Base lid: 0

                      LMC: 0

                      SM lid: 0

                      Capability mask: 0x00000038

                      Port GUID: 0x001a4bffff0c6216

                      Link layer: IB

       

      VM-HYPER:

      /opt/opensm/bin # ./ibstat

      CA 'mlx4_0'

              CA type: MT25418

              Number of ports: 2

              Firmware version: 2.7.0

              Hardware version: a0

              Node GUID: 0x001a4bffff0cb178

              System image GUID: 0x001a4bffff0cb17b

              Port 1:

                      State: Active

                      Physical state: LinkUp

                      Rate: 20

                      Base lid: 1

                      LMC: 0

                      SM lid: 1

                      Capability mask: 0x0251086a

                      Port GUID: 0x001a4bffff0cb179

                      Link layer: InfiniBand

              Port 2:

                      State: Down

                      Physical state: Polling

                      Rate: 8

                      Base lid: 0

                      LMC: 0

                      SM lid: 0

                      Capability mask: 0x0251086a

                      Port GUID: 0x001a4bffff0cb17a

                      Link layer: InfiniBand


      The "LIDs" in the above outputs indicate that the SM (Subnet Manager) is working as far as I'm aware.

       

      From the SRP target, I can see the other Infiniband host:

       

      STORAGE-SERVER:/# ibhosts

      Ca      : 0x001a4bffff0cb178 ports 2 "****************** HCA-1"

      Ca      : 0x001a4bffff0c6214 ports 2 "MT25408 ConnectX Mellanox Technologies"

       

      I thought I'd start with using the "ibping" utility to verify Infiniband connectivity. This is where I got some really strange results:

       

      Firstly, I could not get the ibping daemon running on the SRP initiator (ESXi) at all. The command would execute, but then just return to the shell:

       

      /opt/opensm/bin # ./ibping -S

      /opt/opensm/bin #

       

      So I tried to switch to running the ibping daemon on the SRP target (Oracle Solaris), which seemed to work as it should and it appeared to be awaiting some pings to come through. Great! Now going back to the SRP initiator, I ran the ibping utility with the LID of the SRP target. But it was unsuccessful:

       

      /opt/opensm/bin # ./ibping -L 2

      ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable

      ibwarn: [3502756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)

      ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable

      ibwarn: [3502756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)

      ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable

      ...

      ..

      .

      ---  (Lid 2) ibping statistics ---

      10 packets transmitted, 0 received, 100% packet loss, time 9360 ms

      rtt min/avg/max = 0.000/0.000/0.000 ms

       

      OK, let's try the Port GUID of the SRP target instead of the LID:

       

      /opt/opensm/bin # ./ibping -G 0x001a4bffff0c6215

      ibwarn: [3504924] _do_madrpc: recv failed: Resource temporarily unavailable

      ibwarn: [3504924] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)

      ibwarn: [3504924] ib_path_query_via: sa call path_query failed

      ./ibping: iberror: failed: can't resolve destination port 0x001a4bffff0c6215

       

      I restarted the ibping daemon on the SRP target with 1 level of debugging, and  re-ran the pings from the client (SRP initiator). I can see that the pings are actually reaching the SRP target and a reply is being sent:

       

      STORAGE-SERVER:/# ibping -S -d

      ibdebug: [11188] ibping_serv: starting to serve...

      ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

      ibwarn: [11188] mad_respond_via: dest Lid 1

      ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

      ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

      ibwarn: [11188] mad_respond_via: dest Lid 1

      ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

      ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

      ibwarn: [11188] mad_respond_via: dest Lid 1

      ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

      ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

      ibwarn: [11188] mad_respond_via: dest Lid 1

      ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

      ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER

      ibwarn: [11188] mad_respond_via: dest Lid 1

      ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000

       

      The strangest observation is yet to come however. If I run the ibping on the client with 2 levels of debug, I get a few replies in the final statistics output when the ibping is terminated (this does not work under single level of debugging in my experience):

       

      /opt/opensm/bin # ./ibping -L -dd 2

      ...

      ..

      .

      ibdebug: [3508744] ibping: Ping..

      ibwarn: [3508744] ib_vendor_call_via: route Lid 2 data 0x3ffcebc7aa0

      ibwarn: [3508744] ib_vendor_call_via: class 0x132 method 0x1 attr 0x0 mod 0x0 datasz 216 off 40 res_ex 1

      ibwarn: [3508744] mad_rpc_rmpp: rmpp (nil) data 0x3ffcebc7aa0

      ibwarn: [3508744] umad_set_addr: umad 0x3ffcebc7570 dlid 2 dqp 1 sl 0, qkey 80010000

      ibwarn: [3508744] _do_madrpc: >>> sending: len 256 pktsz 320

      send buf

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0001 8001 0000 0002 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0132 0101 0000 0000 0000 0000 4343 c235

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 1405 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      ibwarn: [3508744] umad_send: fd 3 agentid 1 umad 0x3ffcebc7570 timeout 1000

      ibwarn: [3508744] umad_recv: fd 3 umad 0x3ffcebc7170 timeout 1000

      ibwarn: [3508744] umad_recv: mad received by agent 1 length 320

      ibwarn: [3508744] _do_madrpc: rcv buf:

      rcv buf

      0132 0181 0000 0000 0000 00ac 4343 c234

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 1405 6763 2d73 746f 7261

      6765 312e 6461 726b 7265 616c 6d2e 696e

      7465 726e 616c 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      ibwarn: [3508744] umad_recv: fd 3 umad 0x3ffcebc7170 timeout 1000

      ibwarn: [3508744] umad_recv: mad received by agent 1 length 320

      ibwarn: [3508744] _do_madrpc: rcv buf:

      rcv buf

      0132 0181 0000 0000 0000 00ac 4343 c235

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 1405 6763 2d73 746f 7261

      6765 312e 6461 726b 7265 616c 6d2e 696e

      7465 726e 616c 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      ibwarn: [3508744] mad_rpc_rmpp: data offs 40 sz 216

      rmpp mad data

      6763 2d73 746f 7261 6765 312e 6461 726b

      7265 616c 6d2e 696e 7465 726e 616c 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000 0000 0000 0000 0000

      0000 0000 0000 0000

      Pong from STORAGE-SERVER (Lid 2): time 7.394 ms

      ibdebug: [3508744] report: out due signal 2

       

      --- STORAGE-SERVER (Lid 2) ibping statistics ---

      10 packets transmitted, 3 received, 70% packet loss, time 9556 ms

      rtt min/avg/max = 7.394/12.335/15.344 ms

       

       

      I'm stumped. Anyone have any ideas on what is going on or how to troubleshoot further?

        • Re: Issues with SRP and unexplainable "ibping" behaviour.

          Actually, looking at the Level 2 debugs a bit further, it seems that the replies are indeed making their way back to the ibping client (ESXi), you can see this in the receive buffers and the hex dump, but the following message seem to indicate something is amiss on the ESXi server:

           

          ibwarn: [3511788] _do_madrpc: recv failed: Resource temporarily unavailable

           

          On a side note, I'm seeing a lot of references to the word "mad" in all the debugging information. I wonder is someone is hinting at something.

          • Re: Issues with SRP and unexplainable "ibping" behaviour.

            And some additional information on the Mellanox VIBs installed on the ESXi 5.5 Server:


            ~ # esxcli software vib list | egrep Mellanox

            net-ib-cm       1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-ib-core     1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-ib-ipoib    1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-ib-mad      1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-ib-sa       1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-ib-umad     1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-mlx4-core   1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            net-mlx4-ib     1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

            scsi-ib-srp     1.8.2.0-1OEM.500.0.0.472560 Mellanox PartnerSupported  2013-12-22

             

            ~ # esxcli software vib list | egrep opensm

            ib-opensm       3.3.15  Intel      VMwareAccepted    2013-12-22

            • Re: Issues with SRP and unexplainable "ibping" behaviour.
              inbusiness

              Hi!

               

              What the firmware of Mellanox MHGH28-XTC (MT25418) cards?

               

              I think you must use firmware 2.9.1200.