How to Create a Docker Container with RDMA Accelerated Applications Over 100Gb InfiniBand Network

Version 11

    In this document we will demonstrate a deployment procedure of RDMA accelerated applications running on Docker Containers and Mellanox end-to-end 100 Gb/s Infiniband (IB) solution.

    This document describes the process of building the Docker CE from sources for Ubuntu 16.04.2 LTS and Docker 17.06 on physical servers.

    We will show how to update and install the Mellanox software and hardware components on host and on Docker container.

    References

     

     

    Setup Overview

     

    Equipment

     

     

    Server Logical Design

     

    Server Wiring

    In our reference we'll wire 1st port to InfiniBand switch and will not use a 2nd port.

     

     

     


    Network Configuration

    The will use in our install setup two servers.

    Each servers will connected to the SB7700 switch by a 100Gb IB copper cable. The switch port connectivity in our case is as follow:

    • 1st -2th ports – connected to Host servers

    Server names with network configuration provided below

    Server typeServer nameIP and NICS               
    Internal networkExternal network
    Server 01clx-mld-41ib0: 12.12.12.41eno1: From DHCP (reserved)
    Server 02clx-mld-42ib0: 12.12.12.42eno1: From DHCP (reserved)


    Deployment Guide


    Prerequisites

    Update Ubuntu Software Packages

    To update/upgrade Ubuntu software packages, run the commands below.

    $ sudo apt-get update            # Fetches the list of available update
    $ sudo apt-get upgrade -y        # Strictly upgrades the current packages


    Enable the Subnet Manager(SM) on the IB Switch

    Refer to the MLNX-OS User Manual to become familiar with switch software (located at support.mellanox.com).
    Before starting to use of the Mellanox switch, we recommend that you upgrade the switch to the latest MLNX-OS version.

    There are three options to select the best place to locate the SM:

    1. Enabling the SM on one of the managed switches. This is a very convenient and quick operation and make Infiniband ‘plug & play’ easily.
    2. Run /etc/init.d/opensmd on one or more servers. It is recommended to run the SM on a server in case there are 648 nodes or more.
    3. Use Unified Fabric Management (UFM®) Appliance dedicated server. UFM offers much more than the SM. UFM needs more compute power than the existing switches have, but does not require an expensive server. It does represent additional cost for the dedicated server.

    We'll explain options 1 and 2 only

     

    Option 1: Configuring the SM on a Switch MLNX-OS® all Mellanox switch systems.
    To enable the SM on one of the managed switches follow the next steps.

    1. Login to the switch and enter to config mode:
      Mellanox MLNX-OS Switch Management

      switch login: admin
      Password:
      Last login: Wed Aug 12 23:39:01 on ttyS0

      Mellanox Switch

      switch [standalone: master] > enable
      switch [standalone: master] # conf t
      switch [standalone: master] (config)#
    2. Run the command:
      switch [standalone: master] (config)#ib sm
      switch [standalone: master] (config)#
    3. Check if the SM is running. Run:

      switch [standalone: master] (config)#show ib sm
      enable
      switch [standalone: master] (config)#

    To save the configuration (permanently), run:

    switch (config) # configuration write

     

     

    Option 2: Configuring the SM on a Server (Skip this procedure if you enable SM on switch)

    To start up OpenSM on a server, simply run opensm from the command line on your management node by typing:

    # opensm

    Or:

    Start OpenSM automatically on the head node by editing the /etc/opensm/opensm.conf file.

    Create a configuration file by running:

    # opensm –config /etc/opensm/opensm.conf

    Edit /etc/opensm/opensm.conf file with the following line:

    onboot=yes

    Upon initial installation, OpenSM is configured and running with a default routing algorithm. When running a multi-tier fat-tree cluster, it is recommended to change the following options to create the most efficient routing algorithm delivering the highest performance:

    –routing_engine=updn

    For full details on other configurable attributes of OpenSM, see the “OpenSM – Subnet Manager” chapter of the Mellanox OFED for Linux User Manual.

     

     

    Installation Mellanox OFED for Ubuntu on a Host

    This chapter describes how to install and test the Mellanox OFED for Linux package on a single host machine with Mellanox ConnectX®-5 adapter card installed. For more information click on Mellanox OFED for Linux User Manual.

     

    Downloading Mellanox OFED

    1. Verify that the system has a Mellanox network adapter (HCA/NIC) installed.
      # lspci -v | grep Mellanox
      The following example shows a system with an installed Mellanox HCA:
    2. Download the ISO image according to you OS to your host.
      The image’s name has the format
      MLNX_OFED_LINUX-<ver>-<OS label><CPUarch>.iso. You can download it from:
      http://www.mellanox.com > Products > Software > InfiniBand/VPI Drivers > Mellanox OFED Linux (MLNX_OFED) > Download.

    3. Use the MD5SUM utility to confirm the downloaded file’s integrity. Run the following command and compare the result to the value provided on the download page.

       

      # md5sum MLNX_OFED_LINUX-<ver>-<OS label>.tgz

       

    Installing Mellanox OFED

    MLNX_OFED is installed by running the mlnxofedinstall script. The installation script, performs the following:

    • Discovers the currently installed kernel
    • Uninstalls any software stacks that are part of the standard operating system distribution or another vendor's commercial stack
    • Installs the MLNX_OFED_LINUX binary RPMs (if they are available for the current kernel)
    • Identifies the currently installed InfiniBand and Ethernet network adapters and automatically upgrades the firmware

    The installation script removes all previously installed Mellanox OFED packages and re-installs from scratch. You will be prompted to acknowledge the deletion of the old packages.

    1. Log into the installation machine as root.
    2. Copy the downloaded tgz to /tmp
      # cd /tmp
      # tar -xzvf MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64.tgz
      # cd MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64/
    3. Run the installation script.
      # ./mlnxofedinstall
    4. Reboot after the installation finished successfully.

      # /etc/init.d/openibd restart

      # reboot

      By default both ConnectX®-5 VPI ports are initialized as Infiniband ports.

    5. Disable unused the 2nd port on the device(optional).
      Identify PCI ID of your NIC ports:

      # lspci | grep Mellanox

      05:00.0 Infiniband controller: Mellanox Technologies Device 1019

      05:00.1 Infiniband controller: Mellanox Technologies Device 1019

      Disable 2nd port
      # echo 0000:05:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
    6. Check the ports’ mode is Infiniband
      # ibv_devinfo

    7. If you see the following - You need to change the interfaces port type to Infiniband
      Capture.JPG
      Change the interfaces port type to Infiniband mode ConnectX®-5 ports can be individually configured to work as Infiniband or Ethernet ports.
      Change the mode to Infiniband. Use the mlxconfig script after the driver is loaded.
      * LINK_TYPE_P1=1 is a Infiniband mode
      a. Start mst and see ports names
      # mst start
      # mst status

      b. Change the mode of both ports to Infiniband:

      # mlxconfig -d /dev/mst/mt4121_pciconf0 s LINK_TYPE_P1=1
      #Port 1 set to IB mode
      # reboot

      After each reboot you need to Disable 2nd port.
      c. Queries Infiniband devices and prints about them information that is available for use from userspace.

       

      # ibv_devinfo

       

    8. Run the ibdev2netdev utility to see all the associations between the Ethernet devices and the IB devices/ports.

      # ibdev2netdev

      # ifconfig ib0 12.12.12.41 netmask 255.255.255.0

    9. Insert to the /etc/network/interfaces file the lines below after the following lines:

      # vim /etc/network/interfaces

      auto eno1

      iface eno1 inet dhcp

      The new lines:
      auto ib0
      iface ib0 inet static
      address 12.12.12.41
      netmask 255.255.255.0
      Example:
      # vim /etc/network/interfaces

      auto eno1
      iface eno1 inet dhcp

      auto ib0
      iface ib0 inet static
      address 12.12.12.41
      netmask 255.255.255.0
    10. Check the network configuration is set correctly.
      # ifconfig -a

       

    Docker installing and configured

    Uninstall old versions

    To uninstall old versions, we recommend run following command:

    $ sudo apt-get remove docker docker-engine docker.io

    It’s OK if apt-get reports that none of these packages are installed.

    The contents of /var/lib/docker/, including images, containers, volumes, and networks, are preserved.

     

    Install Docker CE

    For Ubuntu 16.04 and higher, the Linux kernel includes support for OverlayFS, and Docker CE will use the overlay2 storage driver by default.

     

    Install using the repository

    Before you install Docker CE for the first time on a new host machine, you need to set up the Docker repository. Afterward, you can install and update Docker from the repository.

     

    Set Up the repository

    1. Update the apt package index:

      $ sudo apt-get update
    2. Install packages to allow apt to use a repository over HTTPS:

      $ sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
    3. Add Docker’s official GPG key:

      $ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

      Verify that the key fingerprint is 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88.

      $ sudo apt-key fingerprint 0EBFCD88
      pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
      uid                  Docker Release (CE deb) <docker@docker.com>
      sub   4096R/F273FCD8 2017-02-22

     

    Install Docker CE

    Install the latest version of Docker CE, or go to the next step to install a specific version. Any existing installation of Docker is replaced.

    $ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs) stable"
    $ sudo apt-get update
    $ sudo apt-get install docker-ce

     

     

    Customize the docker0 bridge

    The recommended way to configure the Docker daemon is to use the daemon.json file, which is located in /etc/docker/ on Linux. If the file does not exist, create it. You can specify one or more of the following settings to configure the default bridge network

    {     
         "bip": "172.16.41.1/24",
         "fixed-cidr": "172.16.41.0/24",
         "mtu": 1500,
         "dns": ["8.8.8.8","8.8.4.4"]
    }

     

    The same options are presented as flags to dockerd, with an explanation for each:

    • --bip=CIDR: supply a specific IP address and netmask for the docker0 bridge, using standard CIDR notation. For example: 172.16.41.1/16.
    • --fixed-cidr=CIDR: restrict the IP range from the docker0 subnet, using standard CIDR notation. For example: 172.16.41.0/16.
    • --mtu=BYTES: override the maximum packet length on docker0. For example: 1500.
    • --dns=[]: The DNS servers to use. For example: --dns=8.8.8.8,8.8.4.4.

     

    Restart Docker after making changes to the daemon.json file.

    $ sudo /etc/init.d/docker restart

    Set communicating to the outside world

    Check ip forwarding in kernel:

    $ sysctl net.ipv4.conf.all.forwarding

    net.ipv4.conf.all.forwarding = 1

    If disabled

    $ sysctl net.ipv4.conf.all.forwarding

    net.ipv4.conf.all.forwarding = 0

    please enable and check again:

    $ sysctl net.ipv4.conf.all.forwarding=1

     

    For security reasons, Docker configures the iptables rules to prevent containers from forwarding traffic from outside the host machine, on Linux hosts. Docker sets the default policy of the FORWARD chain to DROP.

    To override this default behavior you can manually change the default policy:

    $ sudo iptables -P FORWARD ACCEPT

     

    Add IP route with specific subnet

    Add routing for containers network on second host:

    $ sudo ip route add 172.16.42.0/24 via 12.12.12.42  

     

    A quick check

    Give your environment a quick test run to make sure you’re all set up:

    $ docker run hello-world

    Create or pull a base image and run Container

     

    Option 1

    Pull the image from Docker Hub and run a Docker Container in privileged mode from the remote repository by:

    $ sudo docker run -it --privileged --name=mnlx-verbs-prvlg mellanox/mofed4-1-1:latest bash

     

    Option 2

    Pull the image from Docker Hub and run a Docker Container in not privileged mode from the remote repository by:

    $ sudo docker run -it --cap-add=IPC_LOCK --device=/dev/infiniband/uverbs1 --name=mnlx-verbs-nonprvlg mellanox/mofed4-1-1:latest bash

     

    Option 3

    Docker can build images automatically by reading the instructions from a Dockerfile.

    A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

     

    Dockerfile

    1. Create an empty directory.
    2. Change directories (cd) into the new directory, create a file called Dockerfile, copy-and-paste the following content into that file, and save it.
      Take note of the comments that explain each statement in your new Dockerfile.

    # Use an official Ubuntu 16.04 as a parent image

    FROM ubuntu:16.04

     

    MAINTAINER YOUR NAME <your@real.mail>

     

    # Set the working directory to /

    WORKDIR /

     

    # Pick up some MOFED dependencies

    RUN apt-get update && apt-get install -y --no-install-recommends \

            wget \
            net-tools \

            ethtool \

            perl \

            lsb-release \

            iproute2 \

            pciutils \

            libnl-route-3-200 \

            kmod \

            libnuma1 \

            lsof \

            linux-headers-4.4.0-92-generic \

            python-libxml2 && \

            rm -rf /var/lib/apt/lists/*

     

    # Download and install Mellanox OFED 4.1.1 for Ubuntu 16.04

    RUN wget http://content.mellanox.com/ofed/MLNX_OFED-4.1-1.0.2.0/MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64.tgz && \

            tar -xzvf MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64.tgz && \

            MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --all -q && \

            cd .. && \

            rm -rf MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 && \

            rm -rf *.tgz

     

    Docker Image and run a container

    1. Now run the build command. This creates a Docker image, which we’re going to tag using -t so it has a friendly name.
      $ docker build -t myofed411image .
    2. Where is your built image? It’s in your machine’s local Docker image registry:
      $ docker images
    3. Run a Docker Container in not privileged mode from the remote repository by:
      $ docker run -it --cap-add=IPC_LOCK --device=/dev/infiniband/uverbs1 --name=my-verbs-nonprvlg myofed411image bash

    Benchmark

     

    Check the mofed version and uverbs:

    # ofed_info -s MLNX_OFED_LINUX-4.1-1.0.2.0:# ls /dev/infiniband/uverbs1

     

    Run Bandwidth stress over IB in container.:

    Server

    ib_write_bw -a -d mlx5_1 &

    Client

    ib_write_bw -a -F $server_IP -d mlx5_1 --report_gbits

    In this way you can run Bandwidth stress over IB between containers.

     

    Done!