7 Replies Latest reply on Sep 17, 2018 6:03 PM by seisc7

    Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

    seisc7

      Need help, I'm running out of ideas!

      I have a Dell M1000e blade chassis with M3601Q 40gbps Mellanox infiniband switches in I/O slot B1C1, connects to Midplane on C1. I have M910 Poweredge blades with J05yt connectX3 mezzanine card plugged. I have installed latest MLNX OFED 4.4. The OS is based on CentOS7.4 within Rocks Manzanita cluster. Since it is a blade, connection is via midplane. Switch lights are steady and good.

       

      After following prior posts, executing the commands such as ibhosts, ibstat, lspci | grep Mell, lspci -Qvvs 07:00.0, ifcong -a, HCA_self_test.ofed, and mstflint -d 07:00.0 q, the best I can tell is my port is down/Initializing and I have subnet manager issue.  I cannot get it Active or an IP show. Can you please help me diagnose? I'll post some needed output, let me know what else is required.

       

      Thank you much!

       

      [root@headnode /]# hca_self_test.ofed

      ---- Performing Adapter Device Self Test ----

      Number of CAs Detected ................. 2

      PCI Device Check ....................... PASS

      Kernel Arch ............................ x86_64

      Host Driver Version .................... MLNX_OFED_LINUX-4.4-2.0.7.0 (OFED-4.4-2.0.7): 3.10.0-693.el7.x86_64

      Host Driver RPM Check .................. PASS

      Firmware on CA #0 HCA .................. v2.10.2132

      Firmware on CA #1 HCA .................. v2.10.2132

      Host Driver Initialization ............. PASS

      Number of CA Ports Active .............. 0

      Port State of Port #1 on CA #0 (HCA)..... DOWN (InfiniBand)

      Port State of Port #2 on CA #0 (HCA)..... DOWN (InfiniBand)

      Port State of Port #1 on CA #1 (HCA)..... INIT (InfiniBand)

      Port State of Port #2 on CA #1 (HCA)..... DOWN (InfiniBand)

      Error Counter Check on CA #0 (HCA)...... FAIL

          REASON: found errors in the following counters

            Errors in /sys/class/infiniband/mlx4_0/ports/1/counters

               link_error_recovery: 93

               symbol_error: 65535

      Error Counter Check on CA #1 (HCA)...... PASS

      Kernel Syslog Check .................... PASS

      Node GUID on CA #0 (HCA) ............... 00:02:c9:03:00:f9:2e:80

      Node GUID on CA #1 (HCA) ............... 00:02:c9:03:00:f9:32:f0

      ------------------ DONE ---------------------

       

      [root@headnode /]# ibhosts

      Ca    : 0x0002c90300f92e80 ports 2 "headnode HCA-1"

       

      [root@headnode /]# ibstat

      CA 'mlx4_0'

          CA type: MT4099

          Number of ports: 2

          Firmware version: 2.10.2132

          Hardware version: 0

          Node GUID: 0x0002c90300f92e80

          System image GUID: 0x0002c90300f92e83

          Port 1:

              State: Down

              Physical state: Polling

              Rate: 10

              Base lid: 0

              LMC: 0

              SM lid: 0

              Capability mask: 0x02514868

              Port GUID: 0x0002c90300f92e81

              Link layer: InfiniBand

          Port 2:

              State: Down

              Physical state: Polling

              Rate: 10

              Base lid: 0

              LMC: 0

              SM lid: 0

              Capability mask: 0x02514868

              Port GUID: 0x0002c90300f92e82

              Link layer: InfiniBand

      CA 'mlx4_1'

          CA type: MT4099

          Number of ports: 2

          Firmware version: 2.10.2132

          Hardware version: 0

          Node GUID: 0x0002c90300f932f0

          System image GUID: 0x0002c90300f932f3

          Port 1:

              State: Initializing

              Physical state: LinkUp

              Rate: 40

              Base lid: 0

              LMC: 0

              SM lid: 0

              Capability mask: 0x02514868

              Port GUID: 0x0002c90300f932f1

              Link layer: InfiniBand

          Port 2:

              State: Down

              Physical state: Polling

              Rate: 10

              Base lid: 0

              LMC: 0

              SM lid: 0

              Capability mask: 0x02514868

              Port GUID: 0x0002c90300f932f2

              Link layer: InfiniBand

      [root@headnode /]#

       

      [root@headnode /]# mstflint -d 05:00.0 q

      Image type:            FS2

      FW Version:            2.10.2132

      Device ID:             4099

      Description:           Node             Port1            Port2            Sys image

      GUIDs:                 0002c90300f92e80 0002c90300f92e81 0002c90300f92e82 0002c90300f92e83

      MACs:                                       000000000000     000000000000

      VSD:                  

      PSID:                  DEL0A10210018

       

      [root@headnode /]# lspci -Qvvs 05:00.0

      05:00.0 Infiniband controller: Mellanox Technologies MT27500 Family [ConnectX-3]

          Subsystem: Mellanox Technologies ConnectX-3 IB QDR Dual Port Mezzanine Card

          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

          Latency: 0, Cache Line Size: 64 bytes

          Interrupt: pin A routed to IRQ 34

          Region 0: Memory at fb100000 (64-bit, non-prefetchable) [size=1M]

          Region 2: Memory at f4800000 (64-bit, prefetchable) [size=8M]

          Capabilities: [40] Power Management version 3

              Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

              Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

          Capabilities: [48] Vital Product Data

              Product Name: DELL ConnectX-3 Mezz

              Read-only fields:

                  [PN] Part number: 0J05YT              

                  [EC] Engineering changes: A00

                  [SN] Serial number: IL0J05YT7403125S000Q

                  [V0] Vendor specific: DDR/QDR SFF mezz

                  [RV] Reserved: checksum good, 0 byte(s) reserved

              Read/write fields:

                  [V1] Vendor specific: N/A  

                  [YA] Asset tag: N/A                        

                  [RW] Read-write area: 107 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 252 byte(s) free

              End

          Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-

              Vector table: BAR=0 offset=0007c000

              PBA: BAR=0 offset=0007d000

          Capabilities: [60] Express (v2) Endpoint, MSI 00

              DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

                  ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 116.000W

              DevCtl:    Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+

                  RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

                  MaxPayload 256 bytes, MaxReadReq 512 bytes

              DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-

              LnkCap:    Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited

                  ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

              LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+

                  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

              LnkSta:    Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

              DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

              DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled

              LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

                   Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

                   Compliance De-emphasis: -6dB

              LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

                   EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

          Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

              ARICap:    MFVC- ACS-, Next Function: 0

              ARICtl:    MFVC- ACS-, Function Group: 0

          Capabilities: [148 v1] Device Serial Number 00-02-c9-03-00-f9-2e-80

          Capabilities: [154 v2] Advanced Error Reporting

              UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

              UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

              UESvrt:    DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-

              CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

              CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

              AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

          Capabilities: [18c v1] #19

          Kernel driver in use: mlx4_core

          Kernel modules: mlx4_core

       

      [root@headnode ~]# sminfo -p 1

      ibwarn: [8670] _do_madrpc: recv failed: Connection timed out

      ibwarn: [8670] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0)

      sminfo: iberror: failed: query

      [root@headnode ~]# sminfo -p 2

      ibwarn: [8684] _do_madrpc: recv failed: Connection timed out

      ibwarn: [8684] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0)

      sminfo: iberror: failed: query

      [root@headnode ~]#

       

      Opensm

      ******************************************************************

      ****************** ERRORS DURING INITIALIZATION ******************

      ******************************************************************

      Sep 12 11:18:51 735239 [A3E15700] 0x01 -> state_mgr_check_tbl_consistency: ERR 3322: lid 1 is wrongly assigned to port 0x0002c90300f92e81 ('headnode HCA-1' port 1) in port_lid_tbl

      Sep 12 11:18:51 735367 [A3E15700] 0x02 -> state_mgr_check_tbl_consistency: Clearing Lid for port 0x0002c90300f92e81

      Sep 12 11:18:51 735375 [A3E15700] 0x01 -> state_mgr_check_tbl_consistency: ERR 3322: lid 3 is wrongly assigned to port 0x0002c90300f932f1 ('headnode HCA-2' port 1) in port_lid_tbl

      Sep 12 11:18:51 735392 [A3E15700] 0x02 -> state_mgr_check_tbl_consistency: Clearing Lid for port 0x0002c90300f932f1

      Sep 12 11:18:51 735430 [A3E15700] 0x01 -> osm_ucast_port_is_zero_lid: ERR 3A04: Port 0x2c90300f932f1 (headnode HCA-2 port 1) has LID 0. An initialization error occurred. Ignoring port

      Sep 12 11:18:51 735449 [A3E15700] 0x01 -> osm_ucast_port_is_zero_lid: ERR 3A04: Port 0x2c90300f92e81 (headnode HCA-1 port 1) has LID 0. An initialization error occurred. Ignoring port

      Sep 12 11:18:51 735462 [A3E15700] 0x01 -> osm_ucast_port_is_zero_lid: ERR 3A04: Port 0x2c90300f92e81 (headnode HCA-1 port 1) has LID 0. An initialization error occurred. Ignoring port

      Sep 12 11:18:51 735468 [A3E15700] 0x01 -> osm_ucast_port_is_zero_lid: ERR 3A04: Port 0x2c90300f932f1 (headnode HCA-2 port 1) has LID 0. An initialization error occurred. Ignoring port

      Sep 12 11:18:51 735480 [A3E15700] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches

      Sep 12 11:18:51 740351 [A3E15700] 0x80 -> Errors during initialization

      Sep 12 11:18:51 740385 [A3E15700] 0x01 -> do_sweep:

       

       

      [root@headnode ~]# nmcli connection show ib0

      connection.id:                          ib0

      connection.uuid:                        65aec7ac-2335-44aa-b9c2-0945379d8111

      connection.stable-id:                   --

      connection.interface-name:              ib0

      connection.type:                        infiniband

      connection.autoconnect:                 yes

      connection.autoconnect-priority:        0

      connection.autoconnect-retries:         -1 (default)

      connection.timestamp:                   0

      connection.read-only:                   no

      connection.permissions:                 --

      connection.zone:                        --

      connection.master:                      --

      connection.slave-type:                  --

      connection.autoconnect-slaves:          -1 (default)

      connection.secondaries:                 --

      connection.gateway-ping-timeout:        0

      connection.metered:                     unknown

      connection.lldp:                        -1 (default)

      ipv4.method:                            auto

      ipv4.dns:                               --

      ipv4.dns-search:                        --

      ipv4.dns-options:                       (default)

      ipv4.dns-priority:                      0

      ipv4.addresses:                         --

      ipv4.gateway:                           --

      ipv4.routes:                            --

      ipv4.route-metric:                      -1

      ipv4.ignore-auto-routes:                no

      ipv4.ignore-auto-dns:                   no

      ipv4.dhcp-client-id:                    --

      ipv4.dhcp-timeout:                      0

      ipv4.dhcp-send-hostname:                yes

      ipv4.dhcp-hostname:                     --

      ipv4.dhcp-fqdn:                         --

      ipv4.never-default:                     yes

      ipv4.may-fail:                          yes

      ipv4.dad-timeout:                       -1 (default)

      ipv6.method:                            link-local

      ipv6.dns:                               --

      ipv6.dns-search:                        --

      ipv6.dns-options:                       (default)

      ipv6.dns-priority:                      0

      ipv6.addresses:                         --

      ipv6.gateway:                           --

      ipv6.routes:                            --

      ipv6.route-metric:                      -1

      ipv6.ignore-auto-routes:                no

      ipv6.ignore-auto-dns:                   no

      ipv6.never-default:                     no

      ipv6.may-fail:                          yes

      ipv6.ip6-privacy:                       0 (disabled)

      ipv6.addr-gen-mode:                     stable-privacy

      ipv6.dhcp-send-hostname:                yes

      ipv6.dhcp-hostname:                     --

      ipv6.token:                             --

      infiniband.mac-address:                 80:00:02:08:FE:80:00:00:00:00:00:00:00:02:C9:03:00:F9:32:F1

      infiniband.mtu:                         auto

      infiniband.transport-mode:              connected

      infiniband.p-key:                       default

      infiniband.parent:                      --

      proxy.method:                           none

      proxy.browser-only:                     no

      proxy.pac-url:                          --

      proxy.pac-script:                       --

        • Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!
          alkx

          Check if using latest firmware - 2.36.5000 for you device helps? 2.10.XXXX is extremely outdated. http://www.mellanox.com/page/firmware_table_dell_archive

          As additional step, after installing 2.36 firmware, check if using MOFED-4.0 (or even 3.4 or using Inbox Infiniband package) makes the issue go away. Try to explicitly specify guid in opensm.conf configuration file or on the command line (opensm --guid <GUID>)

            • Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!
              seisc7

              Thank you, appreciate the help! I'll work on this today and report. I'm using an older infiniband QDR switch M3601Q than the new M4001Q. I do know the standard firmware update failed when I was installing OFED, had to do a force install.

                • Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!
                  alkx

                  Hi Joel,

                  When running OEM HCA, using Mellanox firmware is not supported and Mellanox OFED has no firmware images for OEM cards. Hopefully, you didn't burn it and that's why there is a link to Dell archive.

                    • Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!
                      seisc7

                      Makes sense, thank you. I have downloaded everything I might need. The PSID DEL0A10210018 doesn't have a match, since the switch is older, picked couple to try. The firmware link for M3601Q switch leads to the newest OEM link you shared as well.

                       

                      I'll give it all a good shot and report through weekend. Hoping this does the magic!

                        • Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!
                          seisc7

                          Well, it was one busy weekend troubleshooting and a lot of work. I may have solved few issues but it is not perfect yet!

                           

                          The OEM updates (tried few) would not work because of PSID mistmatch, if there is a work around, please let me know. I'm not able to find any firmware online for PSID of the switch M3601Q.

                          [root@headnode Infini Switch firmware]# ls

                          fw-sx-9_2_8000-0269NG_B1.bin

                          [root@headnode Infini Switch firmware]# lspci | grep Mellanox

                          07:00.0 Infiniband controller: Mellanox Technologies MT27500 Family [ConnectX-3]

                          [root@headnode Infini Switch firmware]# mstflint -d 07:00.0 -i fw-sx-9_2_8000-0269NG_B1.bin b

                              Current FW version on flash:  2.10.2132

                              New FW version:              9.2.8000

                          -E- PSID mismatch. The PSID on flash (DEL0A10210018) differs from the PSID in the given image (DEL09E0210003).

                          [root@headnode Infini Switch firmware]#

                           

                          I tried forcing GUID through command line as suggested as I don't have a opensm.conf file anywhere.

                           

                          Then I went ahead and uninstalled Mellanox OFED and started with Open Fabrics OFED. There were few missing errors (cmake, libnl3-devel, numactl-devel,  devel-grind), after getting those rpm's and dependencies all sorted, it did install. The Port GUID did recognize and infiniband is active. DHCP didn't do it, so I set it up as manual, may not be perfect yet. The issues lingering now are OFED related, I cant seem to get opensm run auto, it has to be started with #/etc/init.d/opensmd start. After starting it, ibv_devinfo and nmcli connection show gives:

                          [root@headnode ~]# ibv_devinfo

                          hca_id:    mlx4_0

                              transport:            InfiniBand (0)

                              fw_ver:                2.10.2132

                              node_guid:            0002:c903:00f9:32f0

                              sys_image_guid:            0002:c903:00f9:32f3

                              vendor_id:            0x02c9

                              vendor_part_id:            4099

                              hw_ver:                0x0

                              board_id:            DEL0A10210018

                              phys_port_cnt:            2

                                  port:    1

                                      state:            PORT_ACTIVE (4)

                                      max_mtu:        4096 (5)

                                      active_mtu:        4096 (5)

                                      sm_lid:            1

                                      port_lid:        1

                                      port_lmc:        0x00

                                      link_layer:        InfiniBand

                           

                                  port:    2

                                      state:            PORT_DOWN (1)

                                      max_mtu:        4096 (5)

                                      active_mtu:        4096 (5)

                                      sm_lid:            0

                                      port_lid:        0

                                      port_lmc:        0x00

                                      link_layer:        InfiniBand

                           

                          [root@headnode ~]# nmcli connection show

                          NAME                UUID                                  TYPE            DEVICE

                          Wired connection 2  a40b3b41-66e7-3d87-a77c-e79ccd002698  802-3-ethernet  em1   

                          Wired connection 3  7b5a96ce-3df4-3534-8a35-b430f3f1e3e5  802-3-ethernet  em2   

                          ib0                 b4fdfa83-45ba-4904-a8ec-377234b898ee  infiniband      ib0   

                          virbr0              d36acaba-3663-4199-ae03-0b2a39aa75df  bridge          virbr0

                          Bridge em1          1dad842d-1912-ef5a-a43a-bc238fb267e7  bridge          --    

                          Bridge em2          0578038a-64e9-a2fd-0a28-e4cd0b553930  bridge          --    

                          System ib0          2ab4abde-b8a5-6cbc-19b1-2bfb193e4e89  infiniband      --    

                          System pem1         c19149d5-4e53-4636-b52a-81d213a8a3cb  802-3-ethernet  --    

                          System pem2         7379072d-ea75-335e-2486-0afa3cd10c77  802-3-ethernet  --    

                          Wired connection 1  d4070b38-e850-4a48-83a7-223ecca993f7  802-3-ethernet  --    

                          ib0                 4e22b1f1-3e0c-4b84-b0d9-85b0755728ac  infiniband      --    

                          ib0                 152321c5-8ba1-4865-9eca-5a18a889ffb7  infiniband      --    

                          ib1                 9fd439a6-da5e-4928-9265-47a636b3aaea  infiniband      --  

                           

                          #ifconfig -a ib0

                          ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520

                                  inet 10.1.27.7  netmask 255.0.0.0  broadcast 10.1.77.77

                                  inet6 fe80::202:c903:f9:32f1  prefixlen 64  scopeid 0x20<link>

                          Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).

                                  infiniband 80:00:02:08:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)

                                  RX packets 0  bytes 0 (0.0 B)

                                  RX errors 0  dropped 0  overruns 0  frame 0

                                  TX packets 289  bytes 19652 (19.1 KiB)

                                  TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

                           

                          Next resolution: I'm waiting on two Dell flash SD's for CMC, so I can get all drivers updated on the chassis and nodes. It is a lot slower through UEFI and some drivers are too big anyway. Hopefully the I/O update may help! Next, I may do a fresh install of Rocks Cluster 7 (Manzanita) and try the prior versions of Mellanox OFED such as 4.1 or 3.xx. I can come back to OFED as well.

                           

                          Issues persisting: The commands ibstat, ibhosts, etc. of OFED do not work, perhaps a failure on OFED side. The ib0 still shows hardware error, perhaps firmware issue. HCA test command do not work, but seems good as port is active. I have a different issue of Rocks Clusters command "insert-ethers" non responding to connect the switch and compute nodes, hence the reinstall.

                           

                          Sorry, seems like a mess, thank you for the time! I know I'll get around it one way or the other, may even have to buy a newer m4001 switch that has current drivers. Wonder if Mellanox will share an archive m3601q firmware?