HowTo Find the Path Between Two LIDs in InfiniBand Network using ibdiagpath

Version 2

    This post provides a short procedure of how to find the path between two adapters (using the LIDs) on an InfiniBand network using ibdiagpath tool.

    This post is for beginners.

     

    References

    • MLNX_OFED User Manual
    • ibdiagpath man page

     

    Overview

    The ibdiagpath tool traces a path between two end-points and provides information regarding the nodes and switches traversed along the path. It utilizes device specific health queries for the different devices along the path. The way ibdiagpath operates depends on the addressing mode used on the command line.

     

    Command Options:

    1. If directed route addressing is used (by using the --dr_path flag), the local node is the source node and the route to the destination port is known (example : ibdiagpath --dr_path 0,1).

    2. If LID-route addressing is used (by using --src_lid and --dest_lid as arguments), then the source and destination ports of a route are specified by their LIDs, (not necessary starting from the host the tool is running on), the source LID could be located on any host.  In this case, the actual path will be calculated  as follows:

         a. From the local port to the source port

         b. From the source port (LID) to the destination port (LID), is defined by means of Subnet Management Linear Forwarding Table queries of the switch nodes along that path.

     

    Example: ibdiagpath --src_lid 1 --dest_lid 28

     

    For further information, please refer to the tool’s help (-h) or man pages.

     

    Example

    1. Run ibnetdiscover to find the network elements (HCAs and switches).

    In this example, we will select two hosts with LID 1 and LID 28 (see below in red).

    #  ibnetdiscover

    src/query_smp.c:195; umad (DR path slid 0; dlid 0; 0,1,1,31 Attr 0x11:0) bad status 110; Connection timed out

    #

    # Topology file: generated on Mon Jul 25 11:30:26 2016

    #

    # Initiated from node 0002c9030004e938 port 0002c9030004e939

     

    vendid=0x2c9

    devid=0xcb20

    sysimgguid=0xe41d2d030003e470

    switchguid=0xe41d2d030003e470(e41d2d030003e470)

    Switch  36 "S-e41d2d030003e470"         # "SwitchIB Mellanox Technologies" base port 0 lid 268 lmc 0

    [1]     "S-e41d2d030003e470"[2]         # "SwitchIB Mellanox Technologies" lid 268 4xEDR

    [2]     "S-e41d2d030003e470"[1]         # "SwitchIB Mellanox Technologies" lid 268 4xEDR

    [3]     "S-f4521403005764b0"[1]         # "MF0;switch-de779e:SX6012/U1" lid 174 4xFDR

    [19]    "S-e41d2d030003e470"[21]                # "SwitchIB Mellanox Technologies" lid 268 4xEDR

    [21]    "S-e41d2d030003e470"[19]                # "SwitchIB Mellanox Technologies" lid 268 4xEDR

    [29]    "S-e41d2d030003e470"[30]                # "SwitchIB Mellanox Technologies" lid 268 4xEDR

    [30]    "S-e41d2d030003e470"[29]                # "SwitchIB Mellanox Technologies" lid 268 4xEDR

    [34]    "H-e41d2d030061f957"[1](e41d2d030061f957)               # "r-ufm216 HCA-2" lid 2 4xEDR

    [35]    "H-0002c903003421b0"[2](2c903003421b2)          # "r-ufm111 HCA-1" lid 3 4xFDR

     

    vendid=0x2c9

    devid=0xc738

    sysimgguid=0xf4521403005764b0

    switchguid=0xf4521403005764b0(f4521403005764b0)

    Switch  12 "S-f4521403005764b0"         # "MF0;switch-de779e:SX6012/U1" enhanced port 0 lid 174 lmc 0

    [1]     "S-e41d2d030003e470"[3]         # "SwitchIB Mellanox Technologies" lid 268 4xFDR

    [2]     "H-0002c9030004e938"[1](2c9030004e939)          # "r-ufm101 HCA-1" lid 27 4xQDR

    [3]     "H-0002c9030006ba5a"[1](2c9030006ba5b)          # "r-ufm101 HCA-2" lid 30 4xQDR

    [6]     "H-0002c90300337140"[1](2c90300337141)          # "r-ufm100 HCA-2" lid 28 4xFDR

    [8]     "H-e41d2d03005cf1f8"[1](e41d2d03005cf1f8)      # "r-ufm96 HCA-1" lid 1 4xSDR

     

    vendid=0x2c9

    devid=0x1003

    sysimgguid=0x2c903003421b3

    caguid=0x2c903003421b0

    Ca      2 "H-0002c903003421b0"          # "r-ufm111 HCA-1"

    [2](2c903003421b2)      "S-e41d2d030003e470"[35]                # lid 3 lmc 0 "SwitchIB Mellanox Technologies" lid 268 4xFDR

     

    vendid=0x2c9

    devid=0x1013

    sysimgguid=0xe41d2d030061f956

    caguid=0xe41d2d030061f957

    Ca      1 "H-e41d2d030061f957"          # "r-ufm216 HCA-2"

    [1](e41d2d030061f957)   "S-e41d2d030003e470"[34]                # lid 2 lmc 0 "SwitchIB Mellanox Technologies" lid 268 4xEDR

     

    vendid=0x2c9

    devid=0x673c

    sysimgguid=0x2c9030006ba5d

    caguid=0x2c9030006ba5a

    Ca      2 "H-0002c9030006ba5a"          # "r-ufm101 HCA-2"

    [1](2c9030006ba5b)      "S-f4521403005764b0"[3]         # lid 30 lmc 0 "MF0;switch-de779e:SX6012/U1" lid 174 4xQDR

     

    vendid=0x2c9

    devid=0x1003

    sysimgguid=0x2c90300337143

    caguid=0x2c90300337140

    Ca      2 "H-0002c90300337140"          # "r-ufm100 HCA-2"

    [1](2c90300337141)      "S-f4521403005764b0"[6]         # lid 28 lmc 0 "MF0;switch-de779e:SX6012/U1" lid 174 4xFDR

     

    vendid=0x2c9

    devid=0x1013

    sysimgguid=0xe41d2d03005cf1f8

    caguid=0xe41d2d03005cf1f8

    Ca      1 "H-e41d2d03005cf1f8"          # "r-ufm96 HCA-1"

    [1](e41d2d03005cf1f8)   "S-f4521403005764b0"[8]         # lid 1 lmc 0 "MF0;switch-de779e:SX6012/U1" lid 174 4xSDR

     

    vendid=0x2c9

    devid=0x673c

    sysimgguid=0x2c9030004e93b

    caguid=0x2c9030004e938

    Ca      1 "H-0002c9030004e938"          # "r-ufm101 HCA-1"

    [1](2c9030004e939)      "S-f4521403005764b0"[2]         # lid 27 lmc 0 "MF0;switch-de779e:SX6012/U1" lid 174 4xQDR

     

     

    2. Run ibdiagpath to find the path between source LID 1 and destination LID 28. Both are not located on the same machine we run this tool.

    In the output you can see the path from the host (LID 27) where we run ibdiagpath to the source LID host (LID 1), and the path from the source LID host (LID 1) to the destination LID host (LID 28). In this example, we can see that the source host is connected to the switch on port 6 and the destination post is connected to the switch on port 8 while the switch LID is 174 (See below in red).

    # ibdiagpath --src_lid 1 --dest_lid 28

    ----------

    Load Plugins from:

    /usr/local/share/ibdiagnet2.1.1/plugins/

    (You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH" env variable)

     

    Plugin Name                                   Result     Comment

    libibdiagnet_cable_diag_plugin-2.1.1          Succeeded  Plugin loaded

    libibdiagnet_phy_diag_plugin-2.1.1            Succeeded  Plugin loaded

     

    ---------------------------------------------

    Discovery

    -I- ----------------------------------------------

    -I- Traversing the path from local to source

    -I- ----------------------------------------------

    -I- From: lid=27 port guid=0x0002c9030004e939 dev=26428 r-ufm101/U1 Port=1

    -I- To : lid=174 port guid=0xf4521403005764b0 dev=51000 switch-de779e/U1 Port=2

    -I- From: lid=174 port guid=0xf4521403005764b0 dev=51000 switch-de779e/U1 Port=8

    -I- To : lid=01 port guid=0xe41d2d03005cf1f8 dev=4115 r-ufm96/U1 Port=1

    -I- ----------------------------------------------

    -I- Traversing the path from source to destination

    -I- ----------------------------------------------

    -I- From: lid=01 port guid=0xe41d2d03005cf1f8 dev=4115 r-ufm96/U1 Port=1

    -I- To : lid=174 port guid=0xf4521403005764b0 dev=51000 switch-de779e/U1 Port=8

    -I- From: lid=174 port guid=0xf4521403005764b0 dev=51000 switch-de779e/U1 Port=6

    -I- To : lid=28 port guid=0x0002c90300337141 dev=4099 r-ufm100/U2 Port=1

     

     

     

     

    -I- Fabric Discover finished successfully

     

     

    -I- Discovered 3 nodes (1 Switches & 2 CA-s).

     

     

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- VS Capability GMP finished successfully

     

     

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- VS Capability SMP finished successfully

     

     

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- VS ExtendedPortInfo finished successfully

     

     

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- Port Info Extended finished successfully

     

     

    -I- Duplicated GUIDs detection finished successfully

     

     

    -I- Duplicated Node Description detection finished successfully

     

     

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- Switch Info retrieving finished successfully

     

     

    ---------------------------------------------

    Links Check

    -I- Links Check finished successfully

     

     

    ---------------------------------------------

    Port Counters

    -I- Retrieving PMClassPortInfo ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- Ports counters retrieving finished successfully

     

     

    -I- Going to sleep for 1 seconds until next counters sample

    -I- Time left to sleep ... 1 seconds.

     

     

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -I- Ports counters retrieving (second time) finished successfully

     

     

    -I- Ports counters value Check finished successfully

     

     

    -I- Ports counters Difference Check (during run) finished successfully

     

     

    ---------------------------------------------

    Speed / Width checks

    -I- Link Speed Check (Compare to supported link speed)

    -E- Links Speed Check finished with errors

    -E- Link: switch-de779e/U1/P8<-->r-ufm96/U1/P1 - Unexpected actual link speed 2.5 (enable_speed1="2.5 or 5 or 10 or 14 or FDR10", enable_speed2="2.5 or 5 or 10 or 14 or 25" therefore final speed should be 14)

     

     

    -I- Link Width Check (Compare to supported link width)

    -I- Links Width Check finished successfully

     

     

    ---------------------------------------------

    Temperature Sensing

    -I- Retrieving ... 3/3 nodes (1/1 Switches & 2/2 CA-s) retrieved.

    -W- Temperature Sensing finished with errors

    -W- r-ufm96/U1 - No response for MAD SMPTempSensingGet

     

     

    ---------------------------------------------

    Summary

    -I- Stage                     Warnings   Errors     Comment

    -I- Discovery                 0          0

    -I- Links Check               0          0

    -I- Port Counters             0          0

    -I- Speed / Width checks      0          1

    -I- Temperature Sensing       1          0

     

     

    -I- You can find detailed errors/warnings in: /var/tmp/ibdiagpath/ibdiagnet2.log

     

     

     

     

    -I- ibdiagnet database file   : /var/tmp/ibdiagpath/ibdiagnet2.db_csv

    -I- LST file                  : /var/tmp/ibdiagpath/ibdiagnet2.lst

    -I- Network dump file         : /var/tmp/ibdiagpath/ibdiagnet2.net_dump

    -I- Ports Counters file       : /var/tmp/ibdiagpath/ibdiagnet2.pm