HowTo Configure Accelio Enabled with Reconnect over bond Interface

Version 9

    This post shows how to bond (LAG) two InfiniBand interfaces and use it to run Accelio client and server with the reconnect feature enabled.

    The reconnect flag enables the RDMA to reconnect upon a failure in one of the interfaces of the bond.

     

    References

     

    Setup

     

    For this setup you will need to use a server with ConnectX-3 adapter port connected to an InfiniBand switch and another server that will act as a client on the far end side.
    Refer to HowTo Create Linux bond (LAG) Interface over InfiniBand network for example.

     

    • Make sure your machine runs a RedHat distribution. Red Hat Enterprise Linux Server release 6.5 (Santiago) was used in this procedure.
    • Make sure you have the latest Mellanox OFED installed. The latest MLNX_OFED package can be downloaded from Mellanox website. MLNX_OFED_LINUX-2.4-1.0.0 was used in here.

     

    Configuration

    1. Follow HowTo Create Linux bond (LAG) Interface over InfiniBand network to create bond over the IPoIB Interface.

     

    2. Download Accelio source, build & install latest master branch

    # mkdir /tmp/xio

    # cd /tmp/xio

    # git clone git://github.com/accelio/accelio.git accelio.git

    ...

     

    # cd accelio.git

     

    # ./autogen.sh

    configure.ac:13: installing './compile'

    configure.ac:13: installing './config.guess'

    configure.ac:13: installing './config.sub'

    configure.ac:7: installing './install-sh'

    configure.ac:7: installing './missing'

    benchmarks/usr/xio_perftest/Makefile.am: installing './depcomp'

     

     

    # ./configure --prefix=/opt/accelio

    ...

     

    # make && make install

    ...

     

    3.  Enable the reconnect flag and disable keepalive in your testing environment.

    You will need to enable the Accelio's reconnect feature in your tests.

    Make sure the reconnect flag is enabled in your main function for both client and server.

    The xio_set_opt function is the way to raise flags for Accelio and the XIO_OPTNAME_ENABLE_RECONNECT is the pre-defined boolean to do so.

    int reconnect = 1;

    xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_ENABLE_RECONNECT, &reconnect, sizeof(reconnect));

     

     

    You will also need to disable the Accelio's keepalive feature in your tests, since it has the opposite effects of reconnect: disconnection if the link goes down.

    Make sure the keepalive flag is disabled in your main function for both client and server.

    The xio_set_opt function is the way to disable flags for Accelio and the XIO_OPTNAME_ENABLE_KEEPALIVE is the pre-defined boolean to do so.

    int keepalive = 0;

    xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_ENABLE_KEEPALIVE, &keepalive, sizeof(keepalive));

     

     

    4. Verify that Accelio is working

    The following test and measure Accelio performance with one-way message.

    On the server:

    #  tests/usr/hello_test_ow/run_ow_server.sh 11.11.11.1 1234 0 4096 rdma

    =============================================

    Server Address         : 11.11.11.3

    Server Port            : 1234

    Transport              : rdma

    Header Length          : 0

    Data Length            : 4096

    CPU Affinity           : 1

    Finite run             : 0

    =============================================

    listen to rdma://11.11.11.3:1234

     

    On the client:

    # cd accelio.git

    # /usr/hello_test_ow/run_ow_client.sh 11.11.11.1 1234 0 4096 rdma

    =============================================

    Server Address         : 11.11.11.3

    Server Port            : 1234

    Transport              : rdma

    Header Length          : 0

    Data Length            : 4096

    Connection Index       : 0

    CPU Affinity           : 1

    Finite run             : 0

    =============================================

    shmget rdma pool sz:2097152 failed (errno=12 Cannot allocate memory)

    **** starting ...

    **** [0x79d160] session established

    session event: connection established. reason: Success

    transactions per second: 387137, bandwidth: TX 1512.25 MB/s, length: TX: 4096 B

    transactions per second: 387349, bandwidth: TX 1513.08 MB/s, length: TX: 4096 B

    transactions per second: 387364, bandwidth: TX 1513.14 MB/s, length: TX: 4096 B

    ...

     

    At this point, Accelio is running on both servers.

     

    Verification

    Now that your test is utilizing reconnect, you can fail the primary interface of the bond and see Accelio reconnect on the backup interface.

    Here you will learn how to fail the interface while Accelio is running.

     

    Note: The implementation of ifup and ifdown is not similar to  a cable plug-out scenario.

    To verify the bonding feature we will shutdown the port on the switch connected to the server to emulate cable plug-out scenario.

     

    Get LID And Port

    You will need to identify the LID of the node in the fabric (our switch) and the the port of the node (in our switch).

    In order to determine the LID for a node within the switch run:

    # ibswitches

    ...

    Switch  : 0x0008f1050010009e ports 36 "Mellanox 4036 # 4036-009E" enhanced port 0 lid 4 lmc 0

    The port number is the physical port number in the switch to which your NIC is connected.

     

    Fail A Link

    Use the following command to fail the active interface and make the backup interface to be active:

    ibportstate <lid> <portnum> disable

     

    For example (assuming the port used in the switch is 21 and the switch LID is 4):

    # ibportstate 4 21 disable

     

    Check Link

    Watching the following command will show you the current active link.

    Use to following command to check that there were link failures:

    # cat /proc/net/bonding/bond0

    Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

     

    Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)

    Primary Slave: ib0 (primary_reselect always)

    Currently Active Slave: ib1

    MII Status: up

    MII Polling Interval (ms): 100

    Up Delay (ms): 100

    Down Delay (ms): 100

     

    Slave Interface: ib1

    MII Status: up

    Speed: 40000 Mbps

    Duplex: full

    Link Failure Count: 0

    Permanent HW addr: a0:04:03:00:fe:80

    Slave queue ID: 0

     

    Slave Interface: ib0

    MII Status: down

    Speed: 40000 Mbps

    Duplex: full

    Link Failure Count: 254

    Permanent HW addr: a0:04:02:20:fe:80

    Slave queue ID: 0

    Use the following command to see that the backup interface became active and has TX packets:

    ifconfig

     

    Recover A Link

    Use the following command to recover the failed interface and make it the active interface again:

    ibportstate <lid> <portnum> enable

     

    For example:

    ibportstate 4 21 enable

     

     

    Read More

    For further reading about Bonding, RDMA and Mellanox Products you can check out Bonding Considerations for RDMA Applications.