2 Replies Latest reply on Jul 28, 2017 1:16 AM by c.monty

    HowTo configure ConnectX-3 MT27500 direct connection w/o switch

    c.monty

      Hello!

       

      I have 2 servers equipped with identical card ConnectX-3 MT27500:

      inxi -Nx

      Resuming in non X mode: glxinfo not found. For package install advice run: inxi --recommends

      Network:   Card-1: Mellanox MT27500 Family [ConnectX-3] driver: mlx4_core v: 3.4-2.0.0 bus-ID: 0b:00.0

                 Card-2: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe

                 driver: tg3 v: 3.137 bus-ID: 1b:00.0

                 Card-3: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe

                 driver: tg3 v: 3.137 bus-ID: 1b:00.1

                 Card-4: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe

                 driver: tg3 v: 3.137 bus-ID: 1b:00.2

                 Card-5: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe

                 driver: tg3 v: 3.137 bus-ID: 1b:00.3

                 Card-6: Mellanox MT27500 Family [ConnectX-3] driver: mlx4_core v: 3.4-2.0.0 bus-ID: 86:00.0

       

      The dual port card is configured in eth and ib mode:

      connectx_port_config -s

      --------------------------------

      Port configuration for PCI device: 0000:0b:00.0 is:

      ib

      ib

      --------------------------------

      --------------------------------

      Port configuration for PCI device: 0000:86:00.0 is:

      eth

      eth

      --------------------------------

       

      The servers are directly connected w/o IB switch.

      iblinkinfo does not show the 40Gbps connection:

      CA: ld4464 HCA-1:

            0x248a070300dc2871      0    1[  ] ==( 4X          10.0 Gbps Initialize/  LinkUp)==>       0    1[  ] "ld4465 HCA-1" ( Could be 14.0625 Gbps)

      CA: ld4465 HCA-1:

            0x248a070300dc25c1      0    1[  ] ==( 4X          10.0 Gbps Initialize/  LinkUp)==>       0    1[  ] "ld4464 HCA-1" ( Could be 14.0625 Gbps)

       

      Is it feasible to have a direct IB connection with 2 cards?

      If yes, what is the required configuration?

      How can I utilize 2nd IB port to enhance performance?

       

      THX

        • Re: HowTo configure ConnectX-3 MT27500 direct connection w/o switch
          cmm

          Hi!

          Q. Is it feasible to have a direct IB connection with 2 cards?
          A. Yes.

           

          Q. If yes, what is the required configuration?

          A. One of the servers will need to be running OpenSM. The servers should have InifiniBand adapter drivers installed.The adapter ports must be using the IB protocol. Also, the cable must be an Infiniband cable, compatible with the required adapter port speeds. Then, link will come UP, and each port connected to the cable will get a LID address from the Subnet Manager (OpenSM). If you additionally configure IP interfaces on the adapter ports (in the same IP subnet), you'll be able to PING between servers successfully.

           

          Q. How can I utilize 2nd IB port to enhance performance?
          You would need an InfiniBand switch (between each server) to utilize both ports on each adapter.
          The OpenSM process will work on port 1 of the server its running on. It will detect (via port 1) all of the other server ports connected to the switch and will assign LID addresses to each port. Then, link will come UP, assuming you use proper InfiniBand  cables. If IP addresses are added to each IB interface, you could PING between all IB server ports connected to this switch.

           

          NOTE: Other than a management IP address/mask, InifiniBand switches require no configuration, and any additional configuration - such as changing hostname, changing port speeds, etc., will be optional.

            • Re: HowTo configure ConnectX-3 MT27500 direct connection w/o switch
              c.monty

              Hi Colin,

              thanks for your reply.

              However, I didn't manage to setup an appropriate configuration. Therefore I would like to start from scratch.

              As said, there are 2 identical servers with identical NIC.

              I have installed InfiniBand subnet manager (OpenSM) on both servers. This service is running in Master state on ld4465, and in Standby state on ld4464; but and I think this is the expected behavior.

              ld4465:~ # systemctl status opensmd.service

              ● opensmd.service - LSB: Manage OpenSM

                 Loaded: loaded (/etc/init.d/opensmd; bad; vendor preset: disabled)

                 Active: active (running) since Thu 2017-07-27 17:57:46 CEST; 15h ago

                   Docs: man:systemd-sysv-generator(8)

                Process: 8291 ExecStart=/etc/init.d/opensmd start (code=exited, status=0/SUCCESS)

                  Tasks: 122 (limit: 512)

                 CGroup: /system.slice/opensmd.service

                         └─8370 /usr/sbin/opensm --daemon --pidfile /var/run/opensm.pid

               

              Jul 27 17:57:46 ld4465 opensmd[8291]: /etc/sysconfig/opensm: line 184: port_profile_switch_nodes:...ound

              Jul 27 17:57:46 ld4465 opensmd[8291]: /etc/sysconfig/opensm: line 187: syntax error near unexpect...ull'

              Jul 27 17:57:46 ld4465 opensmd[8291]: /etc/sysconfig/opensm: line 187: `port_prof_ignore_file (null)'

              Jul 27 17:57:46 ld4465 OpenSM[8359]:  Loading Cached Option:guid = 0x248a070300dc25c1

              Jul 27 17:57:46 ld4465 opensmd[8291]: Starting opensm: done..done

              Jul 27 17:57:46 ld4465 OpenSM[8370]: /var/log/opensm.log log file opened

              Jul 27 17:57:46 ld4465 systemd[1]: Started LSB: Manage OpenSM.

              Jul 27 17:57:46 ld4465 OpenSM[8370]: OpenSM 4.8.1.MLNX20170118.1a8ad26

              Jul 27 17:57:46 ld4465 OpenSM[8370]: Entering DISCOVERING state

              Jul 27 17:57:46 ld4465 OpenSM[8370]: Entering MASTER state

               

              ld4464:~ # systemctl status -l opensm.service

              ● opensmd.service - LSB: Manage OpenSM

                 Loaded: loaded (/etc/init.d/opensmd; bad; vendor preset: disabled)

                 Active: active (running) since Fri 2017-07-28 10:10:06 CEST; 3s ago

                   Docs: man:systemd-sysv-generator(8)

                Process: 41211 ExecStop=/etc/init.d/opensmd stop (code=exited, status=0/SUCCESS)

                Process: 41260 ExecStart=/etc/init.d/opensmd start (code=exited, status=0/SUCCESS)

                  Tasks: 122 (limit: 512)

                 CGroup: /system.slice/opensmd.service

                         └─41311 /usr/sbin/opensm --daemon --pidfile /var/run/opensm.pid

               

              Jul 28 10:10:06 ld4464 opensmd[41260]: /etc/sysconfig/opensm: line 184: port_profile_switch_nodes: command not found

              Jul 28 10:10:06 ld4464 opensmd[41260]: /etc/sysconfig/opensm: line 187: syntax error near unexpected token `null'

              Jul 28 10:10:06 ld4464 opensmd[41260]: /etc/sysconfig/opensm: line 187: `port_prof_ignore_file (null)'

              Jul 28 10:10:06 ld4464 OpenSM[41309]:  Loading Cached Option:guid = 0x248a070300dc2871

              Jul 28 10:10:06 ld4464 opensmd[41260]: Starting opensm: done..done

              Jul 28 10:10:06 ld4464 OpenSM[41311]: /var/log/opensm.log log file opened

              Jul 28 10:10:06 ld4464 OpenSM[41311]: OpenSM 4.8.1.MLNX20170118.1a8ad26

              Jul 28 10:10:06 ld4464 systemd[1]: Started LSB: Manage OpenSM.

              Jul 28 10:10:06 ld4464 OpenSM[41311]: Entering DISCOVERING state

              Jul 28 10:10:06 ld4464 OpenSM[41311]: Entering STANDBY state

               

               

              The relevant lines in /etc/sysconfing/opensm reported with error are:

              # ROUTING OPTIONS

              # If TRUE count switches as link subscriptions

              port_profile_switch_nodes FALSE

              # Name of file with port guids to be ignored by port profiling

              port_prof_ignore_file (null)

               

               

              OK, next I share with you the info of current FW etc. ld4465 for simplicity (everything is identical on ld4464).

              ld4465:~ # connectx_port_config -s

              --------------------------------

              Port configuration for PCI device: 0000:0b:00.0 is:

              ib

              eth

              --------------------------------

              --------------------------------

              Port configuration for PCI device: 0000:86:00.0 is:

              ib

              eth

              --------------------------------

              ld4465:~ # mstflint -d 0b:00.0 q

              Image type:          FS2

              FW Version:          2.40.5000

              FW Release Date:     27.10.2016

              Product Version:     02.40.50.00

              Rom Info:            type=UEFI version=14.11.31

                                   type=PXE version=3.4.746 devid=4099

              Device ID:           4099

              Description:         Node             Port1            Port2            Sys image

              GUIDs:               248a070300dc25c0 248a070300dc25c1 248a070300dc25c2 248a070300dc25c3

              MACs:                                     248a07dc25c1     248a07dc25c2

              VSD:

              PSID:                IBM1090111019

              ld4465:~ # mstflint -d 86:00.0 q

              Image type:          FS2

              FW Version:          2.40.5000

              FW Release Date:     27.10.2016

              Product Version:     02.40.50.00

              Rom Info:            type=UEFI version=14.11.31

                                   type=PXE version=3.4.746 devid=4099

              Device ID:           4099

              Description:         Node             Port1            Port2            Sys image

              GUIDs:               248a070300dc2810 248a070300dc2811 248a070300dc2812 248a070300dc2813

              MACs:                                     248a07dc2811     248a07dc2812

              VSD:

              PSID:                IBM1090111019

               

              In this article I found some info that a second instance of the subnet manager should be active for cards with two ports.

              ld4465:~ # opensm -o

              -------------------------------------------------

              OpenSM 4.8.1.MLNX20170118.1a8ad26

              Reading Cached Option File: /etc/opensm/opensm.conf

              Loading Cached Option:guid = 0x248a070300dc25c1

              Command Line Arguments:

              Run Once

              Log File: /var/log/opensm.log

              -------------------------------------------------

              OpenSM 4.8.1.MLNX20170118.1a8ad26

               

              Entering DISCOVERING state

               

              Error from osm_opensm_bind (0x2A)

              Perhaps another instance of OpenSM is already running

              Exiting SM

               

               

              So, I would like to understand first the correct configuration for OpenSM, and then continue with the config of IB.

               

              THX