HowTo Configure IB Routers

Version 29

    The reader should be familiar with IB router architecture and functionality.

     

    Terminology

    • SM: Subnet Manager
    • openSM: InfiniBand-compliant subnet manager and administration
    • openMPI: Open Message Passing Interface
    • SRQ: Shared Receive Queue
    • Per Peer QP: Per Peer Queue Pair (QP)
    • DLID: Destination LID

     

    References

     

    Overview

    An overview describing IB router architecture and functionality can be found here: IB Router Architecture and Functionality .

     

    Setup Example

    A basic setup example includes two nodes connected to a Switch-IB.

    The setup plan includes two subnets (InfiniBand-default and InfiniBand-1) configured on the Switch-IB.

     

    Figure 1- Simple Switch Setup

     

    f1.png

     

    Prerequisites

    • Install MLNX-OFED driver 3.3, or later, which supports router functionality
    • Install MLNX-OS 3.6.1002, or later, which supports IB routers on the Switch-IB

    Configuration

     

    Switch Configuration

    Before you start, make sure you have an SB7780 system and verify that your system has IB router capabilities.

    Note: The default configuration of SB7780 is:

    switch (config) # show system profile

    Profile:             ib

    Number of SWIDs:     2

    Adaptive Routing:    no

    IB Routing:          yes

     

    Note: For the IB multi-swid profile as shown, this is displayed when you type the show interfaces ib status command. For example, it could be all up (assuming sm is enabled) as follows:

    switch (config) # show interfaces ib status

     

    Interface      Description    IB Subnet            Speed           Current line rate   Logical port state   Physical port state  

    ---------      -----------    ------------------   ---------       -----------------   ------------------   -------------------  

    IB1/1                         infiniband-default   edr             100.0 Gbps          Active               LinkUp               

    IB1/2                         infiniband-1         edr             100.0 Gbps          Active               LinkUp               

    ...

    IB1/28                        -                    -               -                   -                    -

    IB1/29                        -                    -               -                   -                    -

    IB1/30                        -                    -               -                   -                    -

    ...

     

    1. In case needed, set the system profile to support IB router with two subnets.

    Note: Use the num-of-swids parameter, which specifies the number of subnets. For IB router functionality you must have at least two subnets.

    This command deletes all the configuration and reloads the switch.

    switch (config)# system profile ib ib-router num-of-swids 2

     

    Warning! The switch configuration is going to be deleted and the system will be reloaded.

    Type 'yes' to confirm profile change: yes

    Note: The maximum number of subnets supported is 6. Each subnet should have at least one running SM (with a different subnet prefix).

     

    After few minutes, log in to the switch again and check the system profile:

    switch (config) # show system profile

    Profile:             ib

    Number of SWIDs:     2

    Adaptive Routing:    no

    IB Routing:          yes

     

    Note: The show interface ib status command displays data only for the first 2 interfaces as these are the only 2 interfaces mapped to Subnets by default.

    Interface      Description    IB Subnet            Speed           Current line rate   Logical port state   Physical port state 

    ---------      -----------    ------------------   ---------       -----------------   ------------------   ------------------- 

    IB1/1                         infiniband-default   edr             100.0 Gbps          Active               LinkUp              

    IB1/2                         infiniband-1         edr             100.0 Gbps          Active               LinkUp              

    ...       

     

    2. Map additional interfaces to the available subnets, unmap the default mapped interfaces (if they are not required).

    Note: The subnet names format is as follows:

    - The first subnet is the infiniband-default subnet.

    - The second subnet is the infiniBand-1 subnet.

    You must map the ports to different subnets in order to operate the IB router function.

    switch (config) # interface ib 1/28 switchport access subnet infiniband-default

    switch (config) # interface ib 1/30 switchport access subnet infiniband-1 force

    switch (config) # no interface ib 1/1 switchport access subnet force

    switch (config) # no interface ib 1/2 switchport access subnet force

    Redisplay the IB status. Use the show interfaces ib status command to show the output.

    switch (config) # show interfaces ib status

     

    Interface      Description    IB Subnet            Speed           Current line rate   Logical port state   Physical port state  

    ---------      -----------    ------------------   ---------       -----------------   ------------------   -------------------  

    IB1/1                         -                    -               -                   -                    -

    IB1/2                         -                    -               -                   -                    -

    ...

    IB1/28                        infiniband-default   edr             100.0 Gbps          Active               LinkUp               

    IB1/29                        -                    -               -                   -                    -

    IB1/30                        infiniband-1         edr             100.0 Gbps          Initialize           LinkUp 

    ...

     

    3. Enable the IB router on the switch.

    switch (config) # ib router
    switch (config) # no ib router shutdown

     

    Then verify that the configuration settings were made correctly:

    switch (config) # show ib router

    Routing state: enabled

     

    4. Enable the subnets (router ports will be UP).

    switch (config) # no interface ib-subnet infiniband-1 shutdown

    switch (config) # no interface ib-subnet infiniband-default shutdown

     

    5. Verify that the subnet prefixes are configured correctly, in this case with FE:C0:00:00:00:00:00:01  and FE:C0:00:00:00:00:00:02).

    In addition, you need check the router port LID -- this LID is per subnet. This LID should be used as dlid (destination LID) when sending traffic from one subnet to the other.

    switch (config) #   show ib router

    Routing state: enabled

     

    IB subnet               Routing enabled

      infiniband-default      enabled  

      infiniband-1            enabled

     

    switch (config) # show interfaces ib-subnet

    infiniband-default state:

      GUID                  : E4:1D:2D:03:00:02:7D:48

      Alias GID             : N/A

      LID                   : 7

      Subnet prefix         : FE:C0:00:00:00:00:00:01

      Physical state        : LinkUp

      Logical state         : Active

      L3 interface state    : Up

     

    infiniband-1 state:

      GUID                  : E4:1D:2D:03:00:02:7D:49

      Alias GID             : N/A

      LID                   : 2

      Subnet prefix         : FE:C0:00:00:00:00:00:02

      Physical state        : LinkUp

      Logical state         : Active

      L3 interface state    : Up

     

    switch (config) # show interfaces ib-subnet brief

     

    IB subnet              LID      Subnet prefix             L3 I/f state       

    -----------------------------------------------------------------------

    infiniband-default     7        FE:C0:00:00:00:00:00:01   Up                 

    infiniband-1           2        FE:C0:00:00:00:00:00:02   Up                 

     

    sb7700-9 [standalone: master] (config) #

     

    You can also run ibnetdiscover on each host to view the lid on the router port.

    For example, running on Server-2, the router port lid is 2.

    # ibnetdiscover

    #

    # Topology file: generated on Tue Apr 26 15:26:41 2016

    #

    # Initiated from node 7cfe9003005d7e52 port 7cfe9003005d7e52

     

    vendid=0x2c9

    devid=0xcb20

    sysimgguid=0xe41d2d0300027d40

    switchguid=0xe41d2d0300027d41(e41d2d0300027d41)

    Switch 37 "S-e41d2d0300027d41" # "MF0;sb7700-9:MSB7700/U1" enhanced port 0 lid 1 lmc 0

    [30] "H-7cfe9003005d7e52"[1](7cfe9003005d7e52) # "jupiter002 HCA-2" lid 139 4xEDR

    [37] "R-e41d2d0300027d48"[2](e41d2d0300027d49) # "MF0;sb7700-9:MSB7700/RT" lid 2 4xFDR

     

    vendid=0x2c9

    devid=0x1013

    sysimgguid=0x7cfe9003005d7e52

    caguid=0x7cfe9003005d7e52

    Ca 1 "H-7cfe9003005d7e52" # "jupiter002 HCA-2"

    [1](7cfe9003005d7e52) "S-e41d2d0300027d41"[30] # lid 139 lmc 0 "MF0;sb7700-9:MSB7700/U1" lid 1 4xEDR

     

    vendid=0x2c9

    devid=0xc839

    sysimgguid=0xe41d2d0300027d40

    rtguid=0xe41d2d0300027d48

    Rt 2 "R-e41d2d0300027d48" # "MF0;sb7700-9:MSB7700/RT"

    [2](e41d2d0300027d49) "S-e41d2d0300027d41"[37] # lid 2 lmc 0 "MF0;sb7700-9:MSB7700/U1" lid 1 4xFDR

     

    SM Parameters and Configuration

    In each IB subnet, the host responsible for the SM must run OpenSM and must use a different subnet prefix. In our example, as we have only one host per subnet, each host will run openSM with a different subnet prefix.

     

    opensm.conf Parameters

     

    A set of parameters for configuring routers is displayed in the list below, and for simplicity's sake should probably be identical on all subnets except for the subnet_prefix, which needs to be set to a unique value in the contiguous range (0xfec0000000000001 to 0xfec000000000001f) of site-local GID prefixes.

     

    1. rtr_aguid_enable

    Default: 0

    This parameter control the alias guid (AGUID) assignment for IB routing and PathRecord handing.

    Possible values:

    0: Router Mode Alias GUIDs is disabled.

    1: SM will configure Router Mode Alias GUIDs. This is the mode that should be used when IB routing is desired.

    2: SM will clear Router Mode Alias GUIDs.

     

    2. subnet_prefix

    Default: 0xfe80000000000000

    The subnet prefix is the identifier of each subnet.

     

    54.jpg

     

     

     

     

     

    See also LRH and GRH InfiniBand Headers.

     

    Possible Values:

    0xfe80000000000000, 0xfec0000000000000-0xfec000000000001f

     

    To enable routing use contiguous range of up to 32 subnets: 0xfec0000000000000 to 0xfec000000000001f.

     

    3. rtr_pr_flow_label

    Default: 0

    Inter subnet PathRecord FlowLabel. This value will be used for all path records crossing between this subnet to the others.

     

    4. rtr_pr_tclass

    Default: 0

    Inter subnet PathRecord traffic class. This value will be used for all path records crossing between this subnet to the others.

     

    5. rtr_pr_sl

    Default: 0

    Inter subnet PathRecord SL. This value will be used for all path records crossing between this subnet to the others.

     

    6. rtr_pr_mtu

    Default: 4

    Inter subnet PathRecord MTU is <=2K. This value will be used for all path records crossing between this subnet to the others.

     

    7. rtr_pr_rate

    Default: 16

    Inter subnet PathRecord MTU is <=EDR(16). This value will be used for all path records crossing between this subnet to the others.

    Possible Values:

    Derived from the specification.

    #define IB_PATH_RECORD_RATE_2_5_GBS             2

    #define IB_PATH_RECORD_RATE_10_GBS              3

    #define IB_PATH_RECORD_RATE_30_GBS              4

    #define IB_PATH_RECORD_RATE_5_GBS               5

    #define IB_PATH_RECORD_RATE_20_GBS              6

    #define IB_PATH_RECORD_RATE_40_GBS              7

    #define IB_PATH_RECORD_RATE_60_GBS              8

    #define IB_PATH_RECORD_RATE_80_GBS              9

    #define IB_PATH_RECORD_RATE_120_GBS             10

    #define IB_PATH_RECORD_RATE_14_GBS              11

    #define IB_PATH_RECORD_RATE_56_GBS              12

    #define IB_PATH_RECORD_RATE_112_GBS             13

    #define IB_PATH_RECORD_RATE_168_GBS             14

    #define IB_PATH_RECORD_RATE_25_GBS              15

    #define IB_PATH_RECORD_RATE_100_GBS             16  (default)

    #define IB_PATH_RECORD_RATE_200_GBS             17

    #define IB_PATH_RECORD_RATE_300_GBS             18

     

    SM Configuration

     

    1. Generate the opensm.conf file (on each host).

    # opensm -c opensm.conf

     

    2. Open the opensm.conf file and change the subnet_prefix parameter by choosing a different number for each host.

    For example:

    Server-1

    # Subnet prefix used on this subnet

    subnet_prefix 0xfec0000000000001

    Server-2

    # Subnet prefix used on this subnet

    subnet_prefix 0xfec0000000000002

     

    3. Set the router port alias guid, rtr_aguid_enable 1 in the opensm.conf file on both servers.

    # Enable router alias guid configuration

    # Values are

    #    0: Router Mode Alias GUIDs is disabled.

    #    1: Configure Router Mode Alias GUIDs.

    #    2: Clear Router Mode Alias GUIDs.

    rtr_aguid_enable 1

     

    4. Run openSM on each host. Use the flag "-F" to run the SM with the updated configuration file for each host.

    For example, for Server-1

    # opensm -F opensm1.conf &

    [1] 13550

    [root@jupiter001 ophir]# -------------------------------------------------

    OpenSM 4.7.0.MLNX20160413.dbbefc2

    Config file is `opensm1.conf`:

    Reading Cached Option File: opensm1.conf

    Loading Cached Option:subnet_prefix = 0xfec0000000000001

    Loading Cached Option:max_op_vls = 5

    Loading Cached Option:fdr10 = 1

    Loading Cached Option:sm_priority = 7

    Loading Cached Option:consolidate_ipv6_snm_req = FALSE

    Loading Cached Option:rtr_aguid_enable = 1

    Command Line Arguments:

    Log File: /var/log/opensm.log

    -------------------------------------------------

    OpenSM 4.7.0.MLNX20160413.dbbefc2

     

     

    Using default GUID 0x7cfe9003005d7e4a

    Entering DISCOVERING state

     

     

    Entering MASTER state

     

    5. Run the same for Server-2, and make sure that you see a different subnet_prefix parameter in the output.

    # opensm -F opensm2.conf &

    [1] 12923

    [root@jupiter002 ophir]# -------------------------------------------------

    OpenSM 4.7.0.MLNX20160413.dbbefc2

    Config file is `opensm2.conf`:

    Reading Cached Option File: opensm2.conf

    Loading Cached Option:subnet_prefix = 0xfec0000000000002

    Loading Cached Option:max_op_vls = 5

    Loading Cached Option:fdr10 = 1

    Loading Cached Option:sm_priority = 7

    Loading Cached Option:consolidate_ipv6_snm_req = FALSE

    Loading Cached Option:rtr_aguid_enable = 1

    Command Line Arguments:

    Log File: /var/log/opensm.log

    -------------------------------------------------

    OpenSM 4.7.0.MLNX20160413.dbbefc2

     

     

    Using default GUID 0x7cfe9003005d7e52

    Entering DISCOVERING state

     

     

    Entering MASTER state

     

    IP to GID Resolution

    Note: This section was updated starting from OFED 3.4

    When using AF_INET (IPv4 based applications) and not AF_IB (InfiniBand based application), the user should run several scripts to map the IP to GID.

    Follow this procedure:

    1. Create shared directory for all servers in the network (over management network). All servers should reach this shared location.

     

    2. In one of the hosts (per subnet), run ib2ib_setup , add the number of subnets using -s and the range of IPoIB addresses. In this example we use one IP subnet range, it is possible to add more IP ranges.

    Note: it is also possible to supply all IPs in a file using -f .

    This command will create three files

    • guid2lid
    • ip2gid.db
    • hosts

     

    # ssh server-1  (--> jupiter001)

     

    # ib2ib_setup -d ib0 -s 2 -n 12.0.3.1/24

    -I- Using  mlx5_0  Port 1

    -I- Generating IPs from given subnets

    -I- Discovering IPs on Subnet

    -I- Total  7  IPs found on subnet

    -I- files created :ip2gid.db, guid2lid, hosts

    Completed successfully

     

    #

     

    Note: The script supports only subnets with /24 mask. In case you wish to use other mask, use the -n flag to get the IPs from a file.

     

    Use ib2ib_setup -h to learn more about the command options:

    #ib2ib_setup -h

    Usage: ib2ib_setup [options]

     

    Options:

      --version             show program's version number and exit

      -h, --help            show this help message and exit

      -n ADDRESS, --network=ADDRESS

                            network/Mask to scan for IPs. Format is  A.B.C.D/24.

                            Example :11.130.1.1/24,11.130.2.1/24

      -s SM, --sm=SM        subnet number .Unique number for ib subnet.(0-31)

      -d DEV, --device=DEV  device name. Example: ib0

      -f FILE_IPS, --file=FILE_IPS

                            text file which hold IPs.(IP per line)

     

    For additional examples, refer to ib2ib_setup.txt attached to this post.

     

    3. Copy the guid2lid file to the location used by the SM (per subnet) and restart SM to deploy it in the IB fabric.

    The default location is /var/cache/opensm/guid2lid

    # cp /1/guid2lid /var/cache/opensm/guid2lid

     

    4. Run ib_acme -A -O on each server on the network.

    # ib_acme -A -O

     

    5. Merge all ip2gid.db files to one single file.

     

    6. Copy the ip2gid.db file (the mearged file, you just created) to /etc/rdma used by the ibacm (on each server).

    # cp ip2gid.db /etc/rdma

     

    7. Edit /etc/rdma/ibacm_opts.cfg to include the path to ip2gid.db, update as follows:

    addr_preload acm_hosts

    addr_data_file /etc/rdma/ip2gid.db

    route_timeout 0

    addr_timeout -1     --> This way IBACM won’t time out the ip->gid mapping.

     

    Note: More information about those parameters are described in the file.

     

    8. Run ibacm on each server in the network.

    # ibacm

     

    9. Copy hosts file to /etc/hosts, on each server in the network.

    # cp ~/hosts /etc/hosts

     

    Note: not all applications are using this file.

     

    10. Change the device MAC address of the IPoIB device to be based on the alias GID and not the GUID.

    For example:

    # echo fec0:0000:0000:0003:0014:0500:0000:0001 > /sys/class/net/ib0/set_mac

     

    where fe:c0:00:00:00:00:00:02:00:14:05:00:00:00:00:01 is the alias gid given by the SM to that node

     

    11. Add route using "ip route add" command to the relevant hosts.

    # ifconfig ib0 12.0.3.1/24 --> set ip for ib0                                                                                                          

    # ip route add 12.0.1.0/24 via 12.3.0.250 --> adding route to hosts with 12.1.xxx.xxx IP                                                                   

    # ip route add 12.0.2.0/24 via 12.3.0.250 --> adding route to hosts with 12.2.xxx.xxx IP   

     

    Performance Tests

     

    perftest Package

    Use any scripts from the perftest package, such as ib_send_lat or ib_send_bw, to run performance benchmarks between two nodes.

    Use the router port LID as the destination lid (dlid).

     

    NOTE: ConnectX-3 (and Pro) do not support the case where the path from client to server uses a different router than the path from server to router.

    This is because they implement IBTA spec 1.2.1 and perform SLID check on incoming traffic. The same compliance statement was modified in IBTA spec 1.3 to require ignoring the SLID check when GRH is present.

     

    For example:

    1. Run the following on the switch:

    # show interfaces ib-subnet brief

     

    IB subnet              LID      Subnet prefix             L3 I/f state       

    -----------------------------------------------------------------------

    infiniband-default     7        FE:C0:00:00:00:00:00:01   Up                 

    infiniband-1           2        FE:C0:00:00:00:00:00:02   Up      

     

    2. Run ib_write_bw as follows:

     

    For the Server: On the host connected to the infiniband-default subnet with prefix FE:C0:00:00:00:00:00:01, use the router dlid 7.

    # ib_write_bw -F -d mlx5_0 --dlid 7 -x 1 -a

     

    ************************************

    * Waiting for client to connect... *

    ************************************

     

    For the Client: On the host connected to the infiniband-1 subnet with prefix FE:C0:00:00:00:00:00:02, use the router dlid 2.

    # ib_write_bw -F -d mlx5_0 --dlid 2 -x 1 -a jupiter001

    ---------------------------------------------------------------------------------------

                        RDMA_Write BW Test

    Dual-port       : OFF Device         : mlx5_0

    Number of qps   : 1 Transport type : IB

    Connection type : RC Using SRQ      : OFF

    TX depth        : 128

    CQ Moderation   : 100

    Mtu             : 4096[B]

    Link type       : IB

    Gid index       : 1

    Max inline data : 0[B]

    rdma_cm QPs : OFF

    Data ex. method : Ethernet

    ---------------------------------------------------------------------------------------

    local address: LID 0x8b QPN 0x009b PSN 0xaaffee RKey 0x0453bb VAddr 0x007ffff8800000

    GID: 254:192:00:00:00:00:00:02:00:20:05:00:00:00:00:139

    remote address: LID 0x02 QPN 0x0094 PSN 0x59e02e RKey 0x04d4a1 VAddr 0x007ffff8800000

    GID: 254:192:00:00:00:00:00:01:00:20:05:00:00:00:00:151

    ---------------------------------------------------------------------------------------

    #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]

    2          5000             19.71              18.39     9.639919

    4          5000             39.87              39.83     10.442406

    8          5000             80.04              79.73     10.450260

    16         5000             158.30             149.56   9.801458

    32         5000             318.95             306.95   10.058216

    64         5000             637.91             613.08   10.044668

    128        5000             1271.07            1268.80   10.394038

    256        5000             2514.11            2323.42   9.516736

    512        5000             4884.56            4314.60   8.836308

    1024       5000             9117.84            8832.64   9.044622

    2048       5000             11373.60            11188.83   5.728679

    4096       5000             11541.57            11485.01   2.940163

    8192       5000             11615.08            11611.30   1.486246

    16384      5000             11658.40            11655.88   0.745976

    32768      5000             11673.95            11672.98   0.373535

    65536      5000             11695.78            11695.78   0.187132

    131072     5000             11706.73            11705.19   0.093642

    262144     5000             11705.76            11705.33   0.046821

    524288     5000             11707.61            11706.79   0.023414

    1048576    5000             11708.01            11704.91   0.011705

    2097152    5000             11706.59            11704.21   0.005852

    4194304    5000             11707.32            11705.89   0.002926

    8388608    5000             11708.10            11703.05   0.001463

    ---------------------------------------------------------------------------------------

     

     

     

    MPI Testing

    1. Get OpenMPI as follows:

    # git clone https://github.com/open-mpi/ompi-release

    ...

    # cd ompi-release

     

    2. Build OpenMPI.

     

    Note: Before you start, make sure you are using the following recommended versions of the tools:

     

    • GNU Autoconf: 2.65
    • GNU Automake: 1.12.2
    • GNU Libtool: 2.2.6b

    a. Use the following command line for building ompi. In order to make IB routing work, you must configure ompi with --enable-openib-rdmacm-ibaddr.

           b. Once you build the autotools, export the PATH to that location:

     

    # ls ~/autotools/

    autoconf-2.69         automake-1.15         bin      lib            libtool-2.4.6.tar.gz

    autoconf-2.69.tar.gz  automake-1.15.tar.gz  include  libtool-2.4.6  share

     

    # export PATH=/home/mellanox/autotools/bin:${PATH}

          c. Enable tools:

    # ./autogen.pl && ./configure --prefix=$PWD/install --enable-openib-rdmacm-ibaddr --disable-debug -with-platform=contrib/platform/mellanox/optimized --without-mpi-param-check  --with-verbs --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default && make -j 16 && make install

    ...

     

    === Patching PGI compiler version numbers in ltmain.sh

    === Patching configure for Libtool PGI 10 fortran compiler name

    === Patching configure for Libtool PGI version number regexps

    === Patching configure for Sun Studio Fortran version strings ()

    === Patching configure for Sun Studio Fortran version strings (_F77)

    === Patching configure for Sun Studio Fortran version strings (_FC)

    === Patching configure for IBM xlf libtool bug

    === Patching configure for libtool.m4 bug

    Running: cp configure.patched configure

     

     

    ================================================

    Open MPI autogen: completed successfully.  w00t!

    ================================================

     

    3. Run MPI with the IB Router.

    Use one of the standard benchmarks, for example:

     

    Note: Because the hosts you are using are located on separate subnets, you need to set btl_openib_allow_different_subnets to 1.

    The GID index, which is equal to 1, should be used for the SM configuration.

     

    In order for you to use rdmacm, you must set up a per-peer QP as the first QP (all QPs cannot be SRQ).

    In some branches of ompi, the default is to use only SRQ. In this case, add -mca btl_openib_receive_queues P,65536,256,192,128 to the command line.

    In the current v1.10 branch, the default configuration should work with IB routing without any changes.

     

    # ompi-release/install/bin/mpirun -np 2 --display-map --map-by node -H jupiter001,jupiter002  -mca pml ob1 -mca btl self,sm,openib --mca btl_openib_cpc_include rdmacm -mca btl_openib_if_include mlx5_0:1 -mca btl_openib_gid_index 1 -mca btl_openib_allow_different_subnets 1  ./IMB/src/IMB-MPI1 pingpong

     

    Note: openib RoCE support by default is enabled while IB router support by default is disabled, therefore, start from MLNX_OFED 3.3. OpenMPI should be recompiled in order to use it with IB router.

    See also   https://www.open-mpi.org/faq/?category=openfabrics#ib-router

     

    Performance:

    For testing purposes, use the osu_bw benchmark running on two nodes. Better performance was reached when setting btl_openib_receive_queues as follows:

     

    -mca btl_openib_receive_queues P,65536,256,192,128:S,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64

     

    instead of:

     

    -mca btl_openib_receive_queues P,65536,256,192,128

     

    In each IB subnet, the host responsible for the SM must run OpenSM and use a different subnet prefix.

     

    NOTE: ConnectX-3 (and Pro) do not support the case where the path from client to server uses a different router than the path from server to router.

    This is because they implement IBTA spec version 1.2.1 and perform SLID check on incoming traffic. The same compliance statement was modified in IBTA spec version 1.3 to require ignoring the SLID check when GRH is present. The impact on MPI implementations that do not rely on librdmacm is that they need to either user ConnectIB or ConnectX-4 or newer devices or make sure the connection manager they use guarantees the same router will be used in both directions of the RC QP.

     

    Troubleshooting

    1. Unless you are using ib_send_lat (or any other tool that uses libibverbs directly) you shouldn’t care about the dlid of the router. In case you do use those tools, you can use sa_query with DGID and SGID and see what dlid the SM returns.

    This is important when there is more than one router per subnet.

     

    2. In case a server is added (or removed) from the network. we need to run the first script, ib2ib_guids, on the affected subnet. Then run the second script, ib2ib_setup. After this we have new files for ip2gid.db, guid2lid and for the dhcp server for the affected subnet.

    1) We need to copy ip2gid.db to each host in the fabric (not just the affected subnet) and close ibacm and open it again.

    2) Add the dhcp.db file to the dhcp server and re-run it.

    3) Copy the opensm file (guid2lid) to the host that runs opensm and re-run opensm ( close and open, this should be done on the affected ib subnet)