9 Replies Latest reply on Mar 12, 2016 1:51 PM by weijia

    ConnectX-4 CX456A does not work with opensm

    weijia

      I have two servers each installed with a ConnectX-4 VPI 100Gb NIC (model:CX456A,two ports). The two ports are connected back to back using two copper cable. I have no problem when the two ports are set to Ethernet mode. The performance is quite close to 100Gb/s. To try the InfiniBand mode, I turn port one into InfiniBand Mode and restart the servers.

       

      ibv_info shows the following:

      ...

      hca_id: mlx5_0

              transport:                      InfiniBand (0)

              fw_ver:                         12.14.2036

              node_guid:                      7cfe:9003:0032:797a

              sys_image_guid:                 7cfe:9003:0032:797a

              vendor_id:                      0x02c9

              vendor_part_id:                 4115

              hw_ver:                         0x0

              board_id:                       MT_2190110032

              phys_port_cnt:                  1

              Device ports:

                      port:   1

                              state:                  PORT_DOWN (1)

                              max_mtu:                4096 (5)

                              active_mtu:             4096 (5)

                              sm_lid:                 0

                              port_lid:               65535

                              port_lmc:               0x00

                              link_layer:             InfiniBand

      ...

      Then I started the opensm daemon(service opensmd start) on one of the servers, but it seems the opensm has problem setting the LID of my card:

       

      Mar 09 15:06:48 031794 [1D22700] 0x03 -> OpenSM 4.6.1.MLNX20160112.774e977

      Mar 09 15:06:48 031842 [1D22700] 0x80 -> OpenSM 4.6.1.MLNX20160112.774e977

      Mar 09 15:06:48 032470 [1D22700] 0x02 -> osm_vendor_init: 1000 pending umads specified

      Mar 09 15:06:48 032516 [1D22700] 0x02 -> osm_vendor_init: 1000 pending umads specified

      Mar 09 15:06:48 051285 [1D22700] 0x80 -> Entering DISCOVERING state

      Mar 09 15:06:48 051416 [1D22700] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x7cfe90030032797a

      Mar 09 15:06:48 086916 [1D22700] 0x02 -> osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0x7cfe90030032797a

      Mar 09 15:06:48 121806 [1D22700] 0x02 -> osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0x7cfe90030032797a

      Mar 09 15:06:48 121939 [1D22700] 0x02 -> osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0x7cfe90030032797a

      Mar 09 15:06:48 122094 [1D22700] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x7cfe90030032797a

      Mar 09 15:06:48 123326 [FF0F1700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:06:48 123690 [EE6D0700] 0x80 -> SM port is down

      Mar 09 15:06:58 052236 [FE0EF700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:07:08 052293 [FC0EB700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:07:18 052465 [FB8EA700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:07:28 052535 [F88E4700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:07:38 052566 [FF8F2700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:07:48 052771 [FE8F0700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:07:58 052805 [FC8EC700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0

      Mar 09 15:08:08 125373 [1D22700] 0x80 -> Exiting SM

       

      I tried this sever times it is always like that. I googled around but can't find use information. Could you please give a hint what else should I do to find the reason?

      Thank you so much!

        • Re: ConnectX-4 CX456A does not work with opensm
          weijia

          I also tried it with SB7700 IB switch. The configuration shows that the subnet manager is enabled:

          =================================================================

          SB7700-IB-100Gb [standalone: master] (config) # show ib sm subnet-prefix

          FE:80:00:00:00:00:00:00

          SB7700-IB-100Gb [standalone: master] (config) # show ib sm sweep-interval

          10 seconds

          SB7700-IB-100Gb [standalone: master] (config) # show ib sm sweep-on-trap

          enable

          SB7700-IB-100Gb [standalone: master] (config) # show ib sm

          enable

          =================================================================

          However, it didn't detected the connection on port 1 and 3:

           

          =================================================================

          SB7700-IB-100Gb [standalone: master] (config) # show interface ib status

           

           

          Interface      Description                                Speed                   Current line rate   Logical port state   Physical port state

          ---------      -----------                                ---------               -----------------   ------------------   -------------------

          IB1/1                                                     -                       -                   Down                 Polling

          IB1/2                                                     -                       -                   Down                 Polling

          IB1/3                                                     -                       -                   Down                 Polling

          IB1/4                                                     -                       -                   Down                 Polling

          IB1/5                                                     -                       -                   Down                 Polling

          IB1/6                                                     -                       -                   Down                 Polling

          IB1/7                                                     -                       -                   Down                 Polling

          IB1/8                                                     -                       -                   Down                 Polling

          IB1/9                                                     -                       -                   Down                 Polling

          IB1/10                                                    -                       -                   Down                 Polling

          IB1/11                                                    -                       -                   Down                 Polling

          IB1/12                                                    -                       -                   Down                 Polling

          IB1/13                                                    -                       -                   Down                 Polling

          IB1/14                                                    -                       -                   Down                 Polling

          IB1/15                                                    -                       -                   Down                 Polling

          IB1/16                                                    -                       -                   Down                 Polling

          IB1/17                                                    -                       -                   Down                 Polling

          IB1/18                                                    -                       -                   Down                 Polling

          IB1/19                                                    -                       -                   Down                 Polling

          IB1/20                                                    -                       -                   Down                 Polling

            -                   Down                 Polling

          IB1/22                                                    -                       -                   Down                 Polling

          IB1/23                                                    -                       -                   Down                 Polling

          IB1/24                                                    -                       -                   Down                 Polling

          IB1/25                                                    -                       -                   Down                 Polling

          IB1/26                                                    -                       -                   Down                 Polling

          IB1/27                                                    -                       -                   Down                 Polling

          IB1/28                                                    -                       -                   Down                 Polling

          IB1/29                                                    -                       -                   Down                 Polling

          IB1/30                                                    -                       -                   Down                 Polling

          IB1/31                                                    -                       -                   Down                 Polling

          IB1/32                                                    -                       -                   Down                 Polling

          IB1/33                                                    -                       -                   Down                 Polling

          IB1/34                                                    -                       -                   Down                 Polling

          IB1/35                                                    -                       -                   Down                 Polling

          IB1/36                                                    -                       -                   Down                 Polling

          ===========================================================================

            • Re: ConnectX-4 CX456A does not work with opensm
              sophie

              Hi Weijia,

               

              Can you please provide from the switch the following outputs:

               

              >show interface ib 1/1 transceiver

              >show interface ib 1/2 transceiver

              >show images

               

              Can you also change the second port to IB and do a loopback test and check if the link comes online.

              If so, try to do a back to back test between the servers using port 2 this time as IB.

               

              Thank you,

              Sophie.

                • Re: ConnectX-4 CX456A does not work with opensm
                  weijia

                  Thank you Sophie, here is the result I get:

                  SB7700-IB-100Gb [standalone: master] # show interface ib 1/1 transceiver

                  IB1/1 state:

                          Unknown cable.

                          identifier              : (0x11)

                          cable/ module type     : -

                          infiniband speeds      : -

                          vendor                 : -

                          cable length           : -

                          part number            : -

                          revision               : -

                          serial number          : -

                   

                  SB7700-IB-100Gb [standalone: master] # show interface ib 1/2 transceiver

                  IB1/2 state:

                          Cable is not present.

                          identifier             : -

                          cable/ module type     : -

                          infiniband speeds      : -

                          vendor                 : -

                          cable length           : -

                          part number            : -

                          revision               : -

                          serial number          : -

                   

                  SB7700-IB-100Gb [standalone: master] # show interface ib 1/3 transceiver

                  IB1/3 state:

                          Unknown cable.

                          identifier              : (0x11)

                          cable/ module type     : -

                          infiniband speeds      : -

                          vendor                 : -

                          cable length           : -

                          part number            : -

                          revision               : -

                          serial number          : -

                  ============================================================

                  My cables are connected with SB7700 1 and 3. Port 2 is empty.

                   

                  I also tried a back-to-back loop connection with two ports configure to IB mode. The link won't get up either:

                   

                  ...

                  CA 'mlx5_0'

                          CA type: MT4115

                          Number of ports: 1

                          Firmware version: 12.14.2036

                          Hardware version: 0

                          Node GUID: 0x7cfe90030032797a

                          System image GUID: 0x7cfe90030032797a

                          Port 1:

                                  State: Down

                                  Physical state: Disabled

                                  Rate: 10

                                  Base lid: 65535

                                  LMC: 0

                                  SM lid: 0

                                  Capability mask: 0x2651e84a

                                  Port GUID: 0x7cfe90030032797a

                                  Link layer: InfiniBand

                  CA 'mlx5_1'

                          CA type: MT4115

                          Number of ports: 1

                          Firmware version: 12.14.2036

                          Hardware version: 0

                          Node GUID: 0x7cfe90030032797b

                          System image GUID: 0x7cfe90030032797a

                          Port 1:

                                  State: Down

                                  Physical state: Disabled

                                  Rate: 10

                                  Base lid: 65535

                                  LMC: 0

                                  SM lid: 0

                                  Capability mask: 0x2651e848

                                  Port GUID: 0x7cfe90030032797b

                                  Link layer: InfiniBand

                  ...

                   

                  It seems that the SB7700 switch complains about the calbe model, which I'm using MCP1600. Should I use a different cable for IB?

                  • Re: ConnectX-4 CX456A does not work with opensm
                    weijia

                    Sophie, What we have are MCP1600-C002 cables,

                    MCP1600-C002 Mellanox® Passive Copper cable, ETH1 100GbE, 100Gb/s, QSFP, LSZH, 2m

                    Actually we need MCP1600-E002 cables to support both IB and ETH. Is that correct?

                    MCP1600-E002 Mellanox® Passive Copper cable, VPI2 , up to 100Gb/s, QSFP, LSZH, 2m

                  • Re: ConnectX-4 CX456A does not work with opensm
                    eddie.notz

                    my 2c,

                     

                    the issue is not with the subnet manger, issue is that the physical link between the 2 servers (in the b2b setup) or between the servers to the switch (in the switch setup) is not linking up -> subnet manager is responsible for the logical side of thing but physical links should be up before.