Reference Deployment Guide for RDMA over Ethernet (RoCE) accelerated Caffe2 with an NVIDIA GPU Card over Mellanox 100 GbE Network

Version 3

    In this document we will demonstrate a distributed deployment procedure of RDMA accelerated Caffe2 and Mellanox end-to-end 100 Gb/s Ethernet solution.

    This document describes the process of building the Caffe2 from sources for Ubuntu 16.04.2 LTS on four physical servers.

    We will show how to update and install the NVIDIA drivers, NVIDIA CUDA Toolkit, NVIDIA CUDA® Deep Neural Network library (cuDNN) and Mellanox software and hardware components.







    What is Caffe2 ?

    Caffe2 is a deep learning framework that provides an easy and straightforward way for you to experiment with deep learning and leverage community contributions of new models and algorithms. You can bring your creations to scale using the power of GPUs in the cloud or to the masses on mobile with Caffe2’s cross-platform libraries. Caffe2 supports Cuda 8.0 & CuDNN 6.0 (req. registration), in this guide we will use the installing from sources from their website for a much easier installation. In order to use Caffe2 with GPU support, you must have an NVIDIA GPU with a minimum compute capability of 3.0.


    Mellanox’s Machine Learning

    Mellanox Solutions accelerate many of the world’s leading artificial intelligence and machine learning platforms and wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars and smart cities. Mellanox solutions enable companies and organizations such as Baidu, NVIDIA,, Facebook, PayPal and more to leverage machine learning platforms to enhance their competitive advantage.

    In this post we will show how to build most efficient Machine Learning cluster enhanced by native RDMA over 100Gbps IB network.


    Setup Overview

    Before you start, make sure you are aware of the distributed training, see  following link for more info.
    In the distributed Caffe2 configuration described in this guide, we are using the following hardware specification.




    This document, does not cover the server’s storage aspect. You should configure the servers with the storage components appropriate to your use case (Data Set size)

    Setup Logical Design

    Server Wiring

    If you have Dual Port NIC you shall disable one port.
    Due to certain limitations in current Caffe2 version you can face issues if both ports will be enabled.

    In our reference we'll wire 1st port to IB switch and will disable the 2nd port.

    We'll cover the procedure late in Installing Mellanox OFED section.


    Server Block Diagram


    Network Configuration

    Each server is connected to the SN2700 switch by a 100Gb Ethernet copper cable. The switch port connectivity in our case is as follow:

    • 1st -4th ports – connected to Node servers

    Server names with network configuration provided below

    Server typeServer nameIP and NICS               
    Internal networkExternal network
    Node Server 01clx-mld-41enp5s0f0: From DHCP (reserved)
    Node Server 02clx-mld-42enp5s0f0: From DHCP (reserved)
    Node Server 03clx-mld-43enp5s0f0: From DHCP (reserved)
    Node Server 04clx-mld-44enp5s0f0: From DHCP (reserved)

    Deployment Guide


    Required Software

    Prior to install Caffe2, the following software must be installed.


    Disable a Nouveau kernel Driver


    Prior to installing NVIDIA last drivers and CUDA in Ubuntu 16.04, the Nouveau kernel driver must be disabled. To disable it, follow the procedure below.


    1. Check that the Nouveau kernel driver is loaded.
      $ lsmod |grep nouv
    2. Remove all NVIDIA packages.

      Skip this step if your system is fresh installed.
      $ sudo apt-get remove nvidia* && sudo apt autoremove
    3. Install the packages below for the build kernel.

      $ sudo apt-get install dkms build-essential linux-headers-generic
    4. Block and disable the Nouveau kernel driver.
      $ sudo vim /etc/modprobe.d/blacklist.conf
    5. Insert the follow lines to the blacklist.conf file.
      blacklist nouveau
      blacklist lbm-nouveau
      options nouveau modeset=0
      alias nouveau off
      alias lbm-nouveau off
    6. Disable the Nouveau kernel module and update the initramfs image.  (Although the nouveau-kms.conf file may not exist, it will not affect this step).
      $ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
      $ sudo update-initramfs -u
    7. Reboot
      $ sudo reboot
    8. Check that the Nouveau kernel drive is not loaded.
      $ lsmod |grep nouveau


    Install General Dependencies

    1. To install general dependencies, run the commands below or paste each line.
      $ sudo apt-get update
    2. To install Caffe2, you must install the following packages:
      • dev: Enables adding extensions to Python
      • pip: Enables installing and managing of certain Python packages

    To install these packages for Python 2.7

    $ sudo apt-get install -y --no-install-recommends build-essential cmake git libgoogle-glog-dev libprotobuf-dev protobuf-compiler python-dev python-pip
    $ sudo pip install numpy protobuf

    Install Optional Dependencies

    1. To install optional dependencies, run the commands below or paste each line.
      $ sudo apt-get install -y --no-install-recommends libgflags-dev
      $ sudo apt-get install -y --no-install-recommends libgtest-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libopenmpi-dev libsnappy-dev openmpi-bin openmpi-doc python-pydot
      $ sudo pip install flask future graphviz hypothesis jupyter matplotlib pydot python-nvd3 pyyaml requests scikit-image scipy setuptools six tornado


    Update Ubuntu Software Packages

    To update/upgrade Ubuntu software packages, run the commands below.

    $ sudo apt-get update            # Fetches the list of available update
    $ sudo apt-get upgrade -y        # Strictly upgrades the current packages


    Install the NVIDIA Drivers


    The 367 (or later) NVIDIA drivers must be installed. To install them, you can use the Ubuntu built (when installing the additional drivers) after updating the driver packages.

    1. Go to the NVIDIA’s website (
    2. Download the latest version of the driver. The example below uses a Linux 64-bit driver (NVIDIA-Linux-x86_64-375.51).
    3. Exit the GUI (as the drivers for graphic devices are running at a low level).


      $ sudo service lightdm stop


    4. Set the RunLevel to 3 with the program init.
      $ sudo init 3
    5. Once you accept the download please follow the steps listed below.
      $ sudo dpkg -i nvidia-driver-local-repo-ubuntu1604_375.51-1_amd64.deb
      $ sudo apt-get update
      $ sudo apt-get install cuda-drivers
      During the run, you will be asked to confirm several things such as the pre-install of something failure, no 32-bit libraries and more.
    6. Once installed using additional drivers, restart your computer.
      $ sudo reboot


    Verify the Installation

    Make sure the NVIDIA driver can work correctly with the installed GPU card.

    $ lsmod |grep nvidia



    Run the nvidia-debugdump utility to collect internal GPU information.

    $ nvidia-debugdump -l

    Run the nvidia-smi utility to check the NVIDIA System Management Interface.

    $ nvidia-smi

    Enable the Subnet Manager(SM) on the IB Switch

    There are three options to select the best place to locate the SM:

    1. Enabling the SM on one of the managed switches. This is a very convenient and quick operation and make Infiniband ‘plug & play’ easily.
    2. Run /etc/init.d/opensmd on one or more servers. It is recommended to run the SM on a server in case there are 648 nodes or more.
    3. Use Unified Fabric Management (UFM®) Appliance dedicated server. UFM offers much more than the SM. UFM needs more compute power than the existing switches have, but does not require an expensive server. It does represent additional cost for the dedicated server.

    We'll explain options 1 and 2 only

    Option 1: Configuring the SM on a Switch MLNX-OS® all Mellanox switch systems.
    To enable the SM on one of the managed switches follow the next steps.

    1. Login to the switch and enter to config mode:
      Mellanox MLNX-OS Switch Management

      switch login: admin
      Last login: Wed Aug 12 23:39:01 on ttyS0

      Mellanox Switch

      switch [standalone: master] > enable
      switch [standalone: master] # conf t
      switch [standalone: master] (config)#
    2. Run the command:
      switch [standalone: master] (config)#ib sm
      switch [standalone: master] (config)#
    3. Check if the SM is running. Run:

      switch [standalone: master] (config)#show ib sm
      switch [standalone: master] (config)#

    To save the configuration (permanently), run:

    switch (config) # configuration write



    Option 2: Configuring the SM on a Server (Skip this procedure if you enable SM on switch)

    To start up OpenSM on a server, simply run opensm from the command line on your management node by typing:

    # opensm


    Start OpenSM automatically on the head node by editing the /etc/opensm/opensm.conf file.

    Create a configuration file by running:

    # opensm –config /etc/opensm/opensm.conf

    Edit /etc/opensm/opensm.conf file with the following line:


    Upon initial installation, OpenSM is configured and running with a default routing algorithm. When running a multi-tier fat-tree cluster, it is recommended to change the following options to create the most efficient routing algorithm delivering the highest performance:


    For full details on other configurable attributes of OpenSM, see the “OpenSM – Subnet Manager” chapter of the Mellanox OFED for Linux User Manual.


    Installation Mellanox OFED for Ubuntu

    This chapter describes how to install and test the Mellanox OFED for Linux package on a single host machine with Mellanox ConnectX®-5 adapter card installed. For more information click on Mellanox OFED for Linux User Manual.


    Downloading Mellanox OFED

    1. Verify that the system has a Mellanox network adapter (HCA/NIC) installed.
      # lspci -v | grep Mellanox
      The following example shows a system with an installed Mellanox HCA:
    2. Download the ISO image according to you OS to your host.
      The image’s name has the format
      MLNX_OFED_LINUX-<ver>-<OS label><CPUarch>.iso. You can download it from: > Products > Software > InfiniBand/VPI Drivers > Mellanox OFED Linux (MLNX_OFED) > Download.

    3. Use the MD5SUM utility to confirm the downloaded file’s integrity. Run the following command and compare the result to the value provided on the download page.


      # md5sum MLNX_OFED_LINUX-<ver>-<OS label>.tgz


    Installing Mellanox OFED

    MLNX_OFED is installed by running the mlnxofedinstall script. The installation script, performs the following:

    • Discovers the currently installed kernel
    • Uninstalls any software stacks that are part of the standard operating system distribution or another vendor's commercial stack
    • Installs the MLNX_OFED_LINUX binary RPMs (if they are available for the current kernel)
    • Identifies the currently installed InfiniBand and Ethernet network adapters and automatically upgrades the firmware

    The installation script removes all previously installed Mellanox OFED packages and re-installs from scratch. You will be prompted to acknowledge the deletion of the old packages.

    1. Log into the installation machine as root.
    2. Copy the downloaded tgz to /tmp
    3. Mount the ISO image on your machine.

      # cd /tmp

      # tar -xzvf MLNX_OFED_LINUX-4.2-

      # cd MLNX_OFED_LINUX-4.2-

    4. Run the installation script.
      # ./mlnxofedinstall --all --force
    5. Restart openbd and
    6. Reboot after the installation finished successfully.

      # /etc/init.d/openibd restart

      # reboot

      By default both ConnectX®-5 VPI ports are initialized as Infiniband ports.

    7. Disable unused the 2nd port on the device.
      Identify PCI ID of your NIC ports:

      # lspci | grep Mellanox

      05:00.0 Infiniband controller: Mellanox Technologies Device 1019

      05:00.1 Infiniband controller: Mellanox Technologies Device 1019

      Disable 2nd port
      # echo 0000:05:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
    8. Check the ports’ mode is Infiniband
      # ibv_devinfo

    9. If you see the following - You need to change the interfaces port type to Ethernet

      Change the interfaces port type to Infiniband mode ConnectX®-5 ports can be individually configured to work as Infiniband or Ethernet ports.
      Change the mode to Ethernet. Use the mlxconfig script after the driver is loaded.
      * LINK_TYPE_P1=2 is a Infiniband mode
      a. Start mst and see ports names
      # mst start
      # mst status

      b. Change the mode of both ports to Infiniband:

      # mlxconfig -d /dev/mst/mt4121_pciconf0 s LINK_TYPE_P1=2
      #Port 1 set to Ethernet mode
      # reboot

      After each reboot you need to Disable 2nd port.
      c. Queries Infiniband devices and prints about them information that is available for use from userspace.


      # ibv_devinfo


    10. Run the ibdev2netdev utility to see all the associations between the Ethernet devices and the IB devices/ports.

      # ibdev2netdev

      # ifconfig enp5s0f0 netmask

    11. Insert to the /etc/network/interfaces file the lines below after the following lines:

      # vim /etc/network/interfaces

      auto eno1

      iface eno1 inet dhcp

      The new lines:
      auto enp5s0f0
      iface enp5s0f0 inet static
      # vim /etc/network/interfaces

      auto eno1
      iface eno1 inet dhcp

      auto enp5s0f0
      iface enp5s0f0 inet static
    12. Check the network configuration is set correctly.
      # ifconfig -a


    Install Nvidia Toolkit 8.0 (CUDA) & CudNN


    Pre-installation Actions

    The following action must be taken before installing the CUDA Toolkit and Driver on the Linux driver:

    • Verify the system has a CUDA-capable GPU
    • Verify the system is running a supported version of Linux
    • Verify the system has gcc installed
    • Verify the system has the correct kernel headers and development packages installed
    • Download the NVIDIA CUDA Toolkit
    • Handle conflicting installation methods

    You can override the install-time prerequisite checks by running the installer with the “-override” flag. Remember that the prerequisites will still be required to use the NVIDIA CUDA Toolkit.


    Verify You Have a CUDA-Capable GPU

    To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:

    $ lspci | grep -i nvidia

    If you do not see any settings, update the PCI hardware database that Linux maintains by entering “update-pciids” (generally found in /sbin) at the command line and rerun the previous lspci command.

    If your graphics card is from NVIDIA, and it is listed in, your GPU is CUDA-capable.

    The Release Notes for the CUDA Toolkit also contain a list of supported products.


    Verify You Have a Supported Linux Version

    The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes.

    To determine which distribution and release number you are running, type the following at the command line:

    $ uname -m && cat /etc/*release

    You should see output similar to the following, modified for your particular system:



    Ubuntu 16.04.2 LTS

    The x86_64 line indicates you are running on a 64-bit system. The remainder gives information about your distribution.


    Verify the System Has a gcc Compiler Installed

    The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly.
    To verify the version of gcc installed on your system, type the following on the command line:

    $ gcc --version

    You should see output similar to the following, modified for your particular system:

    gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 2016060

    If an error message is displayed, you need to install the “development tools” from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web.


    Verify the System has the Correct Kernel Headers and Development Packages Installed

    The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel to be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

    While the Runfile installation performs no package validation, the RPM and DEB installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.

    The version of the kernel your system is running can be found by running the following command:

    $ uname -r

    This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers. This command will be used multiple times below to specify the version of the packages to install. Note that below are the common-case scenarios for kernel usage. More advanced cases, such as custom kernel branches, should ensure that their kernel headers and sources match the kernel build they are running.

    The kernel headers and development packages for the currently running kernel can be installed with:

    $ sudo apt-get install linux-headers-$(uname -r)


    Installation Process

    1. Download the base installation .run file from NVIDIA CUDA website.
    2. Create an account if you do not already have one, and log in (an account is also required to download cuDNN).
    3. Choose Linux > x86_64 > Ubuntu > 16.04 > runfile (local) and download the base installer and the patch.
      Make sure you select yes to creating a symbolic link to your CUDA directory.
      $ cd /root # or directory to where you downloaded file
      $ sudo sh --override # hold s to skip
    4. Install CUDA into: /usr/local/cuda.
      Do you accept the previously read EULA?
      accept/decline/quit: accept
      Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
      (y)es/(n)o/(q)uit: N
      Install the CUDA 8.0 Toolkit?
      (y)es/(n)o/(q)uit: Y
      Enter Toolkit Location[ default is /usr/local/cuda-8.0 ]: Enter
      Do you want to install a symbolic link at /usr/local/cuda?
      (y)es/(n)o/(q)uit: Y
      Install the CUDA 8.0 Samples?
      (y)es/(n)o/(q)uit: Y
      Enter CUDA Samples Location[ default is /root ]: Enter
      Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...

    To install cuDNN download cuDNN v6.0.20-1 for Cuda 8.0 from the NVIDIA website and extract into /usr/local/cuda via:

    $ tar -xzvf cudnn-8.0-linux-x64-v6.0.tgz

    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

    Post-installation Actions


    Mandatory Actions

    Some actions must be taken after the installation before the CUDA Toolkit and Driver can be used.


       Environment Setup (
    • The “PATH” variable needs to include /usr/local/cuda-8.0/bin
    • In addition, when using the .run file installation method, the “LD_LIBRARY_PATH” variable needs to contain /usr/local/cuda-8.0/lib64.
    • Update your bash file.
      $ vim ~/.bashrc
      This will open your bash file in a text editor which you will scroll to the bottom and add these lines:
      export CUDA_HOME=/usr/local/cuda-8.0
      export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
      export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    • Once you save and close the text file, you can return to your original terminal and type this command to reload your .bashrc file.
      $ source ~/.bashrc
    • Check that the paths have been properly modified.

      $ echo $CUDA_HOME
      $ echo $PATH
      $ echo $LD_LIBRARY_PATH

    • Set the “LD_LIBRARY_PATH” and “CUDA_HOME” environment variables. Consider adding the commands below to your ~/.bash_profile. These assume your CUDA installation is in /usr/local/cuda-8.0.
      $ vim ~/.bash_profile
      This will open your file in a text editor which you will scroll to the bottom and add these lines:
      export CUDA_HOME=/usr/local/cuda-8.0
      export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
      export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}


    Other actions are recommended to verify the integrity of the installation.

    • Install Writable Samples (
      In order to modify, compile, and run the samples, the samples must be installed with “write” permissions. A convenience installation script is provided:
      $ ~
      This script is installed with the cuda-samples-8-0 package. The cuda-samples-8-0 package installs only a read-only copy in /usr/local/cuda-8.0/samples.
    • Verify the Installation ( continuing, it is important to verify that the CUDA toolkit can find and communicate correctly with the CUDA-capable hardware. To do this, you need to compile and run some of the included sample programs.
      Ensure the PATH and, if using the .run file installation method the LD_LIBRARY_PATH variables are set correctly. See section Mandatory Actions.
    • Verify the Driver Version (
      * If you installed the driver, verify that the correct version of it is loaded.
      * If you did not install the driver, or are using an operating system where the driver is not loaded via a kernel module, such as L4T, skip this step.
      When the driver is loaded, the driver version can be found by executing the following command.
      $ cat /proc/driver/nvidia/version
      You should see output similar to the following:
      NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.20  Tue Nov 15
      16:49:10 PST 2016
      GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
    • Compiling the Examples (
      The version of the CUDA Toolkit can be checked by running “nvcc –V” in a terminal window. The “nvcc” command runs the compiler driver that compiles the CUDA programs. It calls the “gcc” compiler for C code and the NVIDIA PTX compiler for the CUDA code.
      $ nvcc -V
      You should see output similar to the following:
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2016 NVIDIA Corporation
      Built on Tue_Jan_10_13:22:03_CST_2017
      Cuda compilation tools, release 8.0, V8.0.61
      The NVIDIA CUDA Toolkit includes sample programs in the source form. You should compile them by changing to ~/NVIDIA_CUDA-8.0_Samples and typing make. The resulting binaries will be placed under ~/NVIDIA_CUDA-8.0_Samples/bin.
      $ cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/
      $ make
      $ cd ~/NVIDIA_CUDA-8.0_Samples

      $ ./bin/x86_64/linux/release/deviceQuery
    • Running the Binaries (
      After the compilation, find and run deviceQuery under ~/NVIDIA_CUDA-8.0_Samples. If the CUDA software is installed and configured correctly, the output should look similar to the below.

      The exact appearance and the output lines might be different on your system. The important outcomes are that a device was found (the first highlighted line), that the device matches the one on your system (the second highlighted line), and that the test passed (the final highlighted line).
      If a CUDA-capable device and the CUDA Driver are installed but the deviceQuery reports that no CUDA-capable devices are present, this likely means that the /dev/nvidia* files are missing or have the wrong permissions.
      Running the bandwidthTest program ensures that the system and the CUDA-capable device are able to communicate correctly. Its output is shown below.

      $ cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest/

      $ make

      $ cd ~/NVIDIA_CUDA-8.0_Samples

      $ ./bin/x86_64/linux/release/bandwidthTest

    Note that the measurements for your CUDA-capable device description will vary from system to system. The important point is that you obtain measurements, and that the second-to-last line confirms that all necessary tests passed.
    Should the tests not pass, make sure you have a CUDA-capable NVIDIA GPU on your system and make sure it is properly installed.
    If you run into difficulties with the link step (such as libraries not being found), consult the Linux Release Notes found in the doc folder in the CUDA Samples directory.


    Installing Caffe2


    Clone the Caffe2 Repository

    To clone the latest Caffe2 repository, issue the following commands:

    $ cd ~
    $ git clone --recursive

    The preceding git clone command creates a subdirectory called “caffe2”. After cloning, you may optionally build a specific branch (such as a release branch) by invoking the following commands:

    $ cd caffe2
    $ git checkout 91f63a2361fb8671e103a8d5601adec8354299b5    # where 91f63a2361fb8671e103a8d5601adec8354299b5 is the desired branch (stable version)$ git submodule update

    $ git submodule sync --recursive

    $ git submodule update --init --recursive

    For ibverbs support change  USE_IBVERBS_DEFAULT  to ON (caffe2/third_party/gloo/CMakeLists.txt)

    $ vim third_party/gloo/CMakeLists.txt

    # Option defaults (so they can be overwritten before declaring the option)




    Build Caffe2

    $ mkdir build
    $ cd build
    $ cmake .. -DUSE_IBVERBS=1
    $ make -j 400

    Validate Caffe2 Installation

    To validate the Caffe2 installation run the following commands:

    $ python -m caffe2.python.operator_test.relu_op_test


    Distributed Caffe2 run - sample

    To run distributed Caffe2, I use HPC-X or openMPI. Please see here how to install HPC-X.

    I use custom imagenet_cars_boats dataset in my runs.

    The mpirun here is only used for raising the threads from the cluster. In fact you can run it by hand on each node manually.

    Then run the MPI cmdline, it long but easy to understand.


    $ bs=64 if=mlx5_0 tr=ibverbs traindata="/cfdata/imagenet_cars_boats_train" testdata="/cfdata/imagenet_cars_boats_val" filestore="/tmp"; mpirun -x PYTHONPATH=/caffe2/build -host -n 1 python --train_data $traindata --test_data $testdata --num_gpus 4 --batch_size $bs --num_shards=4 --shard_id=0 --run_id=1234 --file_store_path $filestore --distributed_transport=$tr --distributed_interface=$if : -x PYTHONPATH=/root/caffe2.pz/build -host -n 1 python --train_data $traindata --test_data $testdata --num_gpus 4 --batch_size $bs --num_shards=4 --shard_id=1 --run_id=1234 --file_store_path $filestore --distributed_transport=$tr --distributed_interface=$if -x PYTHONPATH=/caffe2/build -host -n 1 python --train_data $traindata --test_data $testdata --num_gpus 4 --batch_size $bs --num_shards=4 --shard_id=2 --run_id=1234 --file_store_path $filestore --distributed_transport=$tr --distributed_interface=$if : -x PYTHONPATH=/caffe2/build -host -n 1 python --train_data $traindata --test_data $testdata --num_gpus 4 --batch_size $bs --num_shards=4 --shard_id=3 --run_id=1234 --file_store_path $filestore --distributed_transport=$tr --distributed_interface=$if


    Done !