HowTo Configure VMA Inside Docker Container (with MLNX OFED)

Version 27

    This post describes how to create and run docker containers with Mellanox's Messaging Accelerator (VMA), using RHEL7.1, Docker 1.9 and Mellanox OFED 3.1-1.0.3.

     

    References

    Setup

    This post was written for MLNX_OFED 3.1 and VMA 7.0.7

    When testing, make sure you use the latest MLNX_OFED and latest VMA version.

     

    OFED Installation (on the host)

    1. Download MLNX OFED from MLNX_OFED Download page.

     

    2. Extract the MLNX_OFED package and install the driver.

    # tar -zxvf MLNX_OFED_LINUX-3.1-1.0.3-rhel7.1-x86_64

    # cd MLNX_OFED_LINUX-3.1-1.0.3-rhel7.1-x86_64

    # ./mlnxofedinstall

    Docker Installation

    Based on Docker Installation on Red Hat Enterprise Linux

     

    1. Create the Docker repository. Add /etc/yum.repos.d/docker.repo file with the following content:

    [dockerrepo]
    name=Docker Repository
    baseurl=https://yum.dockerproject.org/repo/main/centos/7
    enabled=1
    gpgcheck=1
    gpgkey=https://yum.dockerproject.org/gpg

     

    2. Install the Docker.

    # yum install docker-engine

     

    3. Start the Docker service and enable the service to run on startup.

    # systemctl start docker

    # systemctl enable docker

     

    4. Verify installation was completed successfully.

    # docker run hello-world

     

    Create a Docker Container Image (with VMA)

    1. Download the Docker RHEL7 image and verify.

    # docker pull rhel7

    # docker images

     

    Note: In case docker pull rhel7 command  does not exist, use different path for docker pull rhel7. For example:

    # docker pull registry.access.redhat.com/rhel7/rhel

     

    There are several ways to share files between the host and the container. The method below mounts a directory from the host inside the container. Other methods can be used as well.

     

    2. Create a directory to be mounted inside the container (and sub-directories for the repositories files).

    # mkdir /tmp/mnt

    # mkdir /tmp/mnt/repofiles

    # mkdir /tmp/mnt/rpm-gpg

     

    3. Copy the host repositories files to the new directories.

        If your container repositories are configured correctly, please do the following only for Mellanox OFED repository (5 and 8).

    # cp /etc/yum.repos.d/* /tmp/mnt/repofiles/

    # cp /etc/pki/rpm-gpg/RPM-GPG-KEY-* /tmp/mnt/rpm-gpg/

     

    4. Extract MLNX_OFED to the directory created above.

    # tar -zxvf MLNX_OFED_LINUX-3.1-1.0.3-rhel7.1-x86_64.tgz -C /tmp/mnt

     

    5. Download Mellanox Technologies GPG-KEY to the directory that will be mounted on the container.

    # cd /tmp/mnt/rpm-gpg/

    # wget http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox

     

    6. Run the container and mount the directory from the host inside the container.

    # docker run -t -i -v /tmp/mnt:/tmp/mnt rhel7 /bin/bash

        The -v option mounts the host /tmp/mnt directory to /tmp/mnt inside the container.

     

     

    Note: The name of the created docker image will be registry.access.redhat.com/rhel7/rhel (and not rhel7), in case the user used the following command (in step 1)

    # docker pull registry.access.redhat.com/rhel7/rhel

     

    7. Copy the repo files from the host to the container (inside the container).

    (container)# cp /tmp/mnt/repofiles/* /etc/yum.repos.d/

    (container)# cp /tmp/mnt/rpm-gpg/* /etc/pki/rpm-gpg/

     

    8. Create  the MLNX_OFED repository (inside the container).

        Create a yum repository configuration file called /etc/yum.repos.d/mlnx_ofed.repo with the following content

    [mlnx_ofed]

    name=MLNX_OFED Repository

    baseurl=file:///tmp/mnt/MLNX_OFED_LINUX-3.1-1.0.3-rhel7.1-x86_64/RPMS

    enabled=1

    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Mellanox

    gpgcheck=1

     

    Note : Make sure to use the correct OFED folder name (the folder you extracted before in OFED installation section ) in baseurl=file:///tmp/mnt/MLNX_OFED_LINUX-3.1-1.0.3-rhel7.1-x86_64/RPMS

     

    9. Install the required packages from the MLNX_OFED repo (inside the container)

    (container)# yum install libmlx4 libibverbs libvma librdmacm libmlx5

        libnl is also needed, but usually installed by default.

     

    Note: In the current MLNX_OFED, sockperf is automatically installed with VMA. In future versions of MLNX_OFED you will also need to run yum install sockperf.

     

           In case InfiniBand tools like ibstat is needed -

    (container)# yum install infiniband-diags

    Note: Installing infiniband-diags will install many dependencies and result in larger image.

     

    10. Save the container image.

     

    Note: Unless the MLNX_OFED directory is mounted to the container every time it starts, it is suggested to remove at least the MLNX_OFED repo just to avoid yum warnings.

     

    a. Exit the container (using exit command from the container).

    (container)# exit

     

    b. Get the container ID (on the host) and save the new image with the new name.

    # docker ps -l
    CONTAINER ID    IMAGE         COMMAND         CREATED             STATUS                   PORTS               NAMES
    7ff87f6229d3    rhel7         "/bin/bash"     14 minutes ago      Exited (0) 5 seconds ago                     lonely_swartz

    # docker commit -m="Added VMA Eth" -a="user1" 7ff87f6229d3 rhel7.1-vma
    304d338bf0b8087e01ff34bf7fd668921fa045415a28f0306e5e051ed75dff8d

     

    c. Verify the new image was saved.

    # docker images

    REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE

    rhel7.1-vma         latest              304d338bf0b8        44 seconds ago      245.3 MB

    hello-world         latest              0a6ba66e537a        4 weeks ago         960 B

    rhel7               latest              82ad5fa11820        9 weeks ago         158.3 MB

     

    ulimit Considerations

    VMA require much higher max locked memory ulimit  (ulimit -l) number than the default. A container does not inherit the ulimits from the host (unless running in privileged mode) and changing the ulimits value within the container is not allowed. Therefore, a container needs to run with the --ulimit parameter.

    Another option is to set default ulimit value for the docker daemon which containers inherit (running a container with --ulimit will override the daemon --default_ulimit).

     

    Run new image with --ulimit parameter

    # docker run -t -i --net=host --ulimit memlock=-1 --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm rhel7.1-vma /bin/bash

    Since unlimited is not acceptable by the docker, run command, use equivalent --ulimit memlock=-1

    See below for explanations regarding the --net and --device parameters

    Docker run command reference

     

     

    Note :make sure to use the correct uverbs in --device=/dev/infiniband/uverbs0, OR you can share all interfaces using --device=/dev/infiniband

     

    Verify the new ulimit value is set to unlimited (inside the container)

    (container)# ulimit -l

    unlimited

     

    Custom Docker ulimit Daemon Options (Optional)

    Based on Control and configure Docker with systemd

    1. Create a file in /etc/systemd/system/docker.service.d directory including the following (Create the directory if needed).

    [Service]
    EnvironmentFile=-/etc/sysconfig/docker
    EnvironmentFile=-/etc/sysconfig/docker-storage
    EnvironmentFile=-/etc/sysconfig/docker-network
    ExecStart=
    ExecStart=/usr/bin/docker daemon -H fd:// $OPTIONS \
                         $DOCKER_STORAGE_OPTIONS \
                         $DOCKER_NETWORK_OPTIONS \
                         $BLOCK_REGISTRY \
                         $INSECURE_REGISTRY

    Some of the options above might not be used but it is a good practice to set it for future use

     

    Note: This  this configuration doesn't affect VMA, and therefore optional.

     

    2. Create a file /etc/sysconfig/docker with the including the following:

    OPTIONS="--default-ulimit memlock=-1"

    OPTIONS will be used as Docker daemon run parameters.

     

    3. Restart Docker service and verify.

    # systemctl restart docker

    # systemctl show docker | grep EnvironmentFile

    EnvironmentFile=/etc/sysconfig/docker (ignore_errors=yes)

    EnvironmentFile=/etc/sysconfig/docker-storage (ignore_errors=yes)

    EnvironmentFile=/etc/sysconfig/docker-network (ignore_errors=yes)

     

    Verification

    Verify that VMA is working within the container:

    1. Run sockperf server on different server (We used VMA also here but this is not a must).

    # LD_PRELOAD=libvma.so sockperf server

     

    2. Run the Docker container.

    # docker run -t -i --net=host --ulimit memlock=-1 --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm rhel7.1-vma /bin/bash

    This command allows the container to access the InfiniBand devices uverbs0 and the RDMA-CM required by VMA. --ulimit is needed if not set as a daemon option

     

    3. Run the sockperf client with the VMA library (inside the container).

    [container ~] LD_PRELOAD=libvma.so sockperf ping-pong -i 192.168.70.8 -t 5

     

    VMA INFO   : ---------------------------------------------------------------------------

    VMA INFO   : VMA_VERSION: 7.0.7-0 Release built on 2015-09-08-13:24:18

    VMA INFO   : Cmd Line: sockperf ping-pong -i 192.168.70.8 -t 5

    VMA INFO   : OFED Version: MLNX_OFED_LINUX-3.1-1.0.3:

    VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]

    VMA INFO   : ---------------------------------------------------------------------------

    sockperf: == version #2.5.254 ==

    sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

     

    [ 0] IP = 192.168.70.8    PORT = 11111 # UDP

    sockperf: Warmup stage (sending a few dummy messages)...

    sockperf: Starting test...

    sockperf: Test end (interrupted by timer)

    sockperf: Test ended

    sockperf: [Total Run] RunTime=5.100 sec; SentMessages=2037574; ReceivedMessages=2037573

    sockperf: ========= Printing statistics for Server No: 0

    sockperf: [Valid Duration] RunTime=5.000 sec; SentMessages=2001378; ReceivedMessages=2001378

    sockperf: ====> avg-lat=  1.229 (std-dev=0.678)

    sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0

    sockperf: Summary: Latency is 1.229 usec

    sockperf: Total 2001378 observations; each percentile contains 20013.78 observations

    sockperf: ---> <MAX> observation =   17.133

    sockperf: ---> percentile  99.99 =    3.959

    sockperf: ---> percentile  99.90 =    3.079

    sockperf: ---> percentile  99.50 =    2.735

    sockperf: ---> percentile  99.00 =    2.484

    sockperf: ---> percentile  95.00 =    1.742

    sockperf: ---> percentile  90.00 =    1.328

    sockperf: ---> percentile  75.00 =    1.200

    sockperf: ---> percentile  50.00 =    1.156

    sockperf: ---> percentile  25.00 =    1.129

    sockperf: ---> <MIN> observation =    1.058

    Note: This is not fully optimized VMA run, just a sanity check. VMA has many optimization options but they are outside the scope of this post.

    Note: Above results are similar to sockperf results over VMA when you run same command outside the container.