14 Replies Latest reply on Jul 1, 2016 6:43 AM by thildemar

    SM LID is not configured warning

    thildemar

      Hi All,

       

      I have a pair of 4036 switches connected to a few server nodes.  Each server node has a dual port Connectx2 card with one link to each switch.  The switches then have two connections between each other.  I am able to send data across the network and all seems ok, but running ibchecknet on either switch throws some warnings:

       

      #warn: Lid is not configured lid 6 port 2

      #warn: SM Lid is not configured

      Port check lid 6 port 2:  FAILED

      #warn: Lid is not configured lid 6 port 34

      #warn: SM Lid is not configured

      Port check lid 6 port 34:  FAILED

      #warn: Lid is not configured lid 6 port 35

      #warn: SM Lid is not configured

      Port check lid 6 port 35:  FAILED

      #warn: Lid is not configured lid 1 port 1

      #warn: SM Lid is not configured

      Port check lid 1 port 1:  FAILED

       

       

      ibnetdiscover does show all devices as expected so I am unsure what this warning is all about.  Any advice?

        • Re: SM LID is not configured warning
          halr

          Hi Adam,

           

          The scripts are old and deprecated.

           

          There are no LIDs or SM LIDs for the SM to configure on switch external ports (lid 6 ports 2, 34, and 35) so those warnings are meaningless.

           

          Is LID 1 port 1 an HCA in some node ? That might be of concern. What is it connected to ? Is there an SM running on that subnet to which it is attached ?

           

          -- Hal

            • Re: SM LID is not configured warning
              thildemar

              Switches have two links between each other (ports 1-1 and 2-2).  Each server has one link to each switch (Ports 34-36).

              Here is the output of net discover:

               

              4036-SW1(utilities)# ibnetdiscover

              #

              # Topology file: generated on Thu Jun 30 10:32:28 2016

              #

              # Initiated from node 0008f10500203b28 port 0008f10500203b28

               

               

              vendid=0x8f1

              devid=0x5a5a

              sysimgguid=0x8f10500109553

              switchguid=0x8f10500109552(8f10500109552)

              Switch  36 "S-0008f10500109552"        # "Mellanox 4036 # 4036-SW2" enhanced port 0 lid 6 lmc 0

              [1]    "S-0008f10500203b28"[1]        # "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

              [2]    "S-0008f10500203b28"[2]        # "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

              [34]    "H-0002c903004e445a"[1](2c903004e445b)          # "IGA-S2D1" lid 2 4xQDR

              [35]    "H-0008f104039a3c1c"[2](8f104039a3c1e)          # "IGA-S2D2" lid 5 4xQDR

              [36]    "H-0008f104039a4e3c"[2](8f104039a4e3e)          # "IGA-S2D3" lid 8 4xQDR

               

               

              vendid=0x8f1

              devid=0x5a5a

              sysimgguid=0x8f10500203b29

              switchguid=0x8f10500203b28(8f10500203b28)

              Switch  36 "S-0008f10500203b28"        # "Mellanox 4036 # 4036-SW1" enhanced port 0 lid 1 lmc 0

              [1]    "S-0008f10500109552"[1]        # "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

              [2]    "S-0008f10500109552"[2]        # "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

              [34]    "H-0002c903004e445a"[2](2c903004e445c)          # "IGA-S2D1" lid 3 4xQDR

              [35]    "H-0008f104039a3c1c"[1](8f104039a3c1d)          # "IGA-S2D2" lid 4 4xQDR

              [36]    "H-0008f104039a4e3c"[1](8f104039a4e3d)          # "IGA-S2D3" lid 7 4xQDR

               

               

              vendid=0x2c9

              devid=0x673c

              sysimgguid=0x8f104039a4e3f

              caguid=0x8f104039a4e3c

              Ca      2 "H-0008f104039a4e3c"          # "IGA-S2D3"

              [1](8f104039a4e3d)      "S-0008f10500203b28"[36]                # lid 7 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

              [2](8f104039a4e3e)      "S-0008f10500109552"[36]                # lid 8 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

               

               

              vendid=0x2c9

              devid=0x673c

              sysimgguid=0x8f104039a3c1f

              caguid=0x8f104039a3c1c

              Ca      2 "H-0008f104039a3c1c"          # "IGA-S2D2"

              [1](8f104039a3c1d)      "S-0008f10500203b28"[35]                # lid 4 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

              [2](8f104039a3c1e)      "S-0008f10500109552"[35]                # lid 5 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

               

               

              vendid=0x2c9

              devid=0x673c

              sysimgguid=0x2c903004e445d

              caguid=0x2c903004e445a

              Ca      2 "H-0002c903004e445a"          # "IGA-S2D1"

              [1](2c903004e445b)      "S-0008f10500109552"[34]                # lid 2 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR

              [2](2c903004e445c)      "S-0008f10500203b28"[34]                # lid 3 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR

              4036-SW1(utilities)#

               

              Ping and data transfer works, but transfer speed is only about 2gb/sec which seems very low.  Still trying to figure out if I have a switch/fabric issue or if the slow speed is just configuration.  Server are Windows using the latest supported Mellanox driver for the Connectx-2 cards (4.80) and firmware (2.10.720).  RDMA appears to be detecting fine on the hosts, so at least some of the config seems to be correct =/

                • Re: SM LID is not configured warning
                  halr

                  So LID 1 is other switch. So this also a is false warning from that script.

                   

                  ibnetdiscover says 4xQDR for all your links so this looks right. That would be 10 Gbps (signaling rate) derated to 8 Gbps (max data rate). What app are you running to determine 2Gbps thruput ? Switch/fabric looks OK to me unless there are errors being encountered. Try ibqueryerrors and see what it says.

                • Re: SM LID is not configured warning
                  thildemar

                  Sorry, you mention those scripts as depreciated.  Is there a newer set i should be using to test this?

                    • Re: SM LID is not configured warning
                      halr

                      Those scripts were deprecated back in April 2011.

                       

                      Try ibqueryerrors

                       

                      I don't know if this exists in Windows environment though.

                        • Re: SM LID is not configured warning
                          thildemar

                          Running ntttcp.exe per http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf

                          Throughput(MB/s) = 1531.362

                          Results are similar with file copy or other network test tools.

                           

                          ibqueryerrors does exist, but no -f switch on windows it seems.  This is the base output:

                           

                          PS C:\> ibqueryerrors

                          Errors for "IGA-S2D3"

                             GUID 0x8f104039a4e3d port 1: [PortXmitWait == 1]

                             GUID 0x8f104039a4e3e port 2: [PortXmitWait == 541647202]

                          Errors for 0x8f10500203b28 "Mellanox 4036 # 4036-SW1"

                             GUID 0x8f10500203b28 port ALL: [PortXmitWait == 86266712]

                             GUID 0x8f10500203b28 port 1: [PortXmitWait == 33430055]

                             GUID 0x8f10500203b28 port 34: [PortXmitWait == 52836657]

                          Errors for 0x8f10500109552 "Mellanox 4036 # 4036-SW2"

                             GUID 0x8f10500109552 port ALL: [PortXmitWait == 2344169726]

                             GUID 0x8f10500109552 port 0: [PortXmitWait == 261]

                             GUID 0x8f10500109552 port 34: [PortXmitWait == 2344169465]

                          Errors for "IGA-S2D1"

                             GUID 0x2c903004e445b port 1: [PortXmitWait == 59]

                           

                           

                          ## Summary: 5 nodes checked, 4 bad nodes found

                          ##          80 ports checked, 9 ports have errors beyond threshold

                            • Re: SM LID is not configured warning
                              halr

                              Are you running single NTttcp receiver/sender between servers or are there multiple of these going on concurrently ?

                               

                              Is PortXmitWait increasing ? It is indicative of congestion. Perhaps some machine is slow (receiving). Are results same independent of which server ?

                                • Re: SM LID is not configured warning
                                  thildemar

                                  Hi Hal,

                                  Thanks for the help on this.

                                  I am running between two servers with the following:

                                  ntttcp.exe -r -a 16 -t 15 -m 16,*,10.1.4.11

                                  ntttcp.exe -s -a 16 -t 15 -m 16,*,10.1.4.11

                                   

                                  Which two servers does not seem to make a difference (all three are identical anyway).  PortXmitWait increases quite a bit on the send side of things (both node and switch ports) when running these tests.

                                    • Re: SM LID is not configured warning
                                      halr

                                      PortXmitWait on sending side means some link is slow (I suspect sending side is faster than receiving side).

                                       

                                      I'm not familiar with Windows performance tuning.

                                       

                                      Is ntttcp throughput in bytes/sec or bits/sec ? I was assuming bits/sec but it looks like it might be bytes/sec from the ntttcp posts I just looked at.

                                        • Re: SM LID is not configured warning
                                          thildemar

                                          Here is the full output, throughput is MegaBytes/Sec.  To compare a normal 1GBE connection usually tests round 112 MB/S.  These have been in the 1500-2000 range.

                                          Copyright Version 5.31

                                          Network activity progressing...

                                           

                                          Thread  Time(s) Throughput(KB/s) Avg B / Compl

                                          ======  ======= ================ =============

                                               0   15.005        61078.307     65536.000

                                               1   15.005       147897.368     65536.000

                                               2   15.005        31699.300     65536.000

                                               3   15.005        60741.353     65536.000

                                               4   15.005       117652.516     65536.000

                                               5   15.005        24687.238     65536.000

                                               6   15.005        95541.486     65536.000

                                               7   15.005        73238.520     65536.000

                                               8   15.005       142587.138     65536.000

                                               9   15.130        87679.577     65536.000

                                              10   15.005       153727.957     65536.000

                                              11   15.005       147705.432     65536.000

                                              12   15.005        67552.949     65536.000

                                              13   15.005        73229.990     65536.000

                                              14   15.005       142220.327     65536.000

                                              15   15.005       133327.291     65536.000

                                           

                                           

                                          #####  Totals:  #####

                                           

                                           

                                             Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)

                                          ================ =========== ============== ================

                                              22878.187500      15.010       3913.049         1524.196

                                           

                                           

                                          Throughput(Buffers/s) Cycles/Byte       Buffers

                                          ===================== =========== =============

                                                      24387.142       0.821    366051.000

                                           

                                           

                                          DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)

                                          ============= ============= =============== ==============

                                              43232.911         1.078       87842.039          0.530

                                           

                                           

                                          Packets Sent Packets Received Retransmits Errors Avg. CPU %

                                          ============ ================ =========== ====== ==========

                                               6130646           699305        4427      0      1.955