5 Replies Latest reply on Sep 14, 2018 6:17 AM by alkx

    Can't ibping Lid or GUID but can ping by ip

    brian.daniel

      We are using an SB7790 unmanaged switch connected to:

      1. VMWARE (6.5) server with opensm on a guest Centos VM (7.5) - Mellanox ConnectX-4
      2. Server with Ubuntu (16.04.5 LTS) - Mellanox ConnectX-4
      3. Have all updated

       

      Successful items:

      • Opensm is running (active) from Centos VM
      • ibstat finds all interfaces with active and linkup.
      • ibnetworkdiscover finds all interfaces connected
      • We can ping by ip to and from each server

       

      Unsuccessful item:

      • Not able to ibping across switch

       

      We're not sure what we might be missing.

       

      Can't find many resources to do more troubleshooting. Anyone that could help would be greatly appreciated!

       

      Thanks

      Brian

        • Re: Can't ibping Lid or GUID but can ping by ip
          alkx

          Is there any error messages? Does ibtracert work (#ibtracert <src lid> <dst lid>?

            • Re: Can't ibping Lid or GUID but can ping by ip
              brian.daniel

              ibtracert works

               

              We actually have connection but we are only able to ibping to the GUID that is binded on OpenSM but can't ibping to the other GUIDs now.

                • Re: Can't ibping Lid or GUID but can ping by ip
                  alkx

                  Hi Brian,

                  When using virtualization, GRH (global routing header) must be present in the packet. For ibping, --dgid <GID> parameter need to be used (see man ibping).

                  To get GIDs, on the server run 'show_gids' and use the output on the client side

                  Server

                  #show_gids

                  DEV     PORT    INDEX   GID                                     IPv4            VER     DEV

                  ---     ----    -----   ---                                     ------------    ---     ---

                  mlx5_1  1       0       fe80:0000:0000:0000:248a:0703:009c:01a7                 v1

                   

                  Client

                  #ibping --dgid  fe80:0000:0000:0000:248a:0703:009c:01a7 18

                   

                  If you like to check RDMA connectivity between VMs, use utilities from perftest package (ib_read_bw, ib_write_bw, etc) with -R parameter.

                    • Re: Can't ibping Lid or GUID but can ping by ip
                      brian.daniel

                      Thank you for responding quickly.

                       

                      I am able to ibping to the gid on first dev but not on the second one:

                       

                      SERVER:

                      -----------------------

                      # show_gids

                       

                      DEV     PORT    INDEX   GID                                     IPv4            VER     DEV

                      ---     ----    -----   ---                                     ------------    ---     ---

                      mlx5_0  1       0       fe80:0000:0000:0000:248a:0703:0014:f9ac                 v1

                      mlx5_1  1       0       fe80:0000:0000:0000:248a:0703:0014:f850                 v1

                      n_gids_found=2

                       

                       

                      CLIENT:

                      name@server:/etc/infiniband$ ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f9ac 8

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.109 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.095 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.139 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.174 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.159 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.190 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.169 ms

                      Pong from centos-dgx1.brane.systems.(none) (Lid 8 Gid fe80::248a:703:14:f9ac): time 0.163 ms

                      ^Z[6]   Killed                  ibping 8

                      [7]   Killed                  ibping -S

                       

                       

                      [8]+  Stopped                 ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f9ac 8

                      name@server:/etc/infiniband$ ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f850 8

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ibwarn: [47999] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 8 Gid fe80::248a:703:14:f850)

                      ^Z

                      [9]+  Stopped                 ibping --dgid fe80:0000:0000:0000:248a:0703:0014:f850 8

                       

                       

                      How can ibping the other gids?

                       

                       

                       

                      Thanks

                      Brian