Reference Deployment Guide for RDMA over Ethernet (RoCE) accelerated TensorFlow 1.3 GA with an NVIDIA GPU Card over Mellanox 100 GbE Network

Version 22

    In this document, we will demonstrate a distributed deployment procedure of  RoCe accelerated TensorFlow and Mellanox end-to-end 100 Gb/s Ethernet solution.

    This document describes the process of building the TensorFlow from sources for Ubuntu 16.04.2 LTS on five physical servers (4 Worker nodes and 1 dedicated Parameter Server).

    We will show how to update and install the NVIDIA drivers, NVIDIA CUDA Toolkit, NVIDIA CUDA® Deep Neural Network library (cuDNN) install Bazel and Mellanox software and hardware components.


    This document is preliminary and subject to change.





    What is TensorFlow?

    TensorFlow is an open source software library developed by the Google Brain team for the purpose of conducting machine learning and deep neural networks research. The library performs numerical computation by using data flow graphs, where the nodes in the graph represent mathematical operations and the graph edges represent the multidimensional data arrays (tensors) which communicate between the nodes. TensorFlow supports Cuda 8.0 & CuDNN 6.0 (req. registration), in this guide we will use the installing from sources from their website for a much easier installation. In order to use TensorFlow with GPU support, you must have an NVIDIA GPU with a minimum compute capability of 3.0.0.


    Mellanox’s Machine Learning

    Mellanox Solutions accelerate many of the world’s leading artificial intelligence and machine learning platforms and wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars and smart cities. Mellanox solutions enable companies and organizations such as Baidu, NVIDIA,, Facebook, PayPal and more to leverage machine learning platforms to enhance their competitive advantage.

    In this post we will show how to build most efficient Machine Learning cluster enhanced by RoCE over 100Gbps Ethernet network.


    Setup Overview

    Before you start, make sure you are aware of the distributed TensorFlow architecture, see Glossary in Distributed TensorFlow for more info.

    In the distributed TensorFlow configuration described in this guide, we are using the following hardware specification.



    You can remove the dedicated Parameter Server (PS) from the setup if you plan to run the PS and Worker on same servers


    This document, does not cover the server’s storage aspect. You should configure the servers with the storage components appropriate to your use case (Data Set size)



    Setup Logical Design




    Server Wiring


    If you have Dual Port NIC you shall disable one port.

    Due to certain limitations in current TensorFlow version you can face issues if both ports will be enabled.

    In our reference we'll wire 1st port to Ethernet switch and will disable the 2nd port.

    We'll cover the procedure late in Installing Mellanox OFED section.

    Server Block Diagram

    Network Configuration

    Each server is connected to the SN2700 switch by a 100GbE copper cable.

    The switch port connectivity in our case is as follow:

    • 1st -4th ports – connected to Worker Servers
    • 5th port – connected to the dedicated Parameter Server


    Server names with network configuration provided below

    Server type

    Server name

    IP and NICS                

    Internal network

    External network

    Parameter Server



    eno1: From DHCP (reserved)

    Worker Server 01



    eno1: From DHCP (reserved)

    Worker Server 02



    eno1: From DHCP (reserved)

    Worker Server 03



    eno1: From DHCP (reserved)

    Worker Server 04



    eno1: From DHCP (reserved)



    Deployment Guide




    Required Software

    Prior to install Tensorflow, the following software must be installed.


    Disable a Nouveau kernel Driver

    Skip this procedure if you are installing the driver without a GPU support (for the dedicated Parameter Server).

    Prior to installing NVIDIA last drivers and CUDA in Ubuntu 16.04, the Nouveau kernel driver must be disabled. To disable it, follow the procedure below.


    1. Check that the Nouveau kernel driver is loaded.
      $ lsmod |grep nouv
    2. Remove all NVIDIA packages.
      Skip this step if your system is fresh installed.
      $ sudo apt-get remove nvidia* && sudo apt autoremove
    3. Install the packages below for the build kernel.
      $ sudo apt-get install dkms build-essential linux-headers-generic
    4. Block and disable the Nouveau kernel driver.
      $ sudo vim /etc/modprobe.d/blacklist.conf
    5. Insert the follow lines to the blacklist.conf file.
      blacklist nouveau
      blacklist lbm-nouveau
      options nouveau modeset=0
      alias nouveau off
      alias lbm-nouveau off
    6. Disable the Nouveau kernel module and update the initramfs image.  (Although the nouveau-kms.conf file may not exist, it will not affect this step).

      $ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf

      $ sudo update-initramfs -u

    7. Reboot
      $ sudo reboot
    8. Check that the Nouveau kernel drive is not loaded.
      $ lsmod |grep nouveau

    Install General Dependencies

    1. To install general dependencies, run the commands below or paste each line.
      $ sudo apt-get install openjdk-8-jdk git build-essential python-virtualenv swig python-wheel libcupti-dev
    2. To install TensorFlow, you must install the following packages:
      • Numpy: A numerical processing package that TensorFlow requires
      • dev: Enables adding extensions to Python
      • pip: Enables installing and managing of certain Python packages
      • wheel: Enables management of Python compressed packages in the wheel (.whl) format

    To install these packages for Python 2.7

    $ sudo apt-get install python-numpy python-dev python-pip python-wheel


    Update Ubuntu Software Packages

    1. To update/upgrade Ubuntu software packages, run the commands below.

    $ sudo apt-get update            # Fetches the list of available update

    $ sudo apt-get upgrade -y        # Strictly upgrades the current packages


    Install the NVIDIA Drivers

    Skip this procedure if you are installing the driver without a GPU support (for dedicated Parameter Server).

    The 367 (or later) NVIDIA drivers must be installed. To install them, you can use the Ubuntu built (when installing the additional drivers) after updating the driver packages.

    1. Go to the NVIDIA’s website (
    2. Download the latest version of the driver. The example below uses a Linux 64-bit driver (NVIDIA-Linux-x86_64-375.51).
    3. Exit the GUI (as the drivers for graphic devices are running at a low level).
      $ sudo service lightdm stop
    4. Set the RunLevel to 3 with the program init.
      $ sudo init 3
    5. Once you accept the download please follow the steps listed below.
      $ sudo dpkg -i nvidia-driver-local-repo-ubuntu1604_375.51-1_amd64.deb
      $ sudo apt-get update
      $ sudo apt-get install cuda-drivers
      During the run, you will be asked to confirm several things such as the pre-install of something failure, no 32-bit libraries and more.
    6. Once installed using additional drivers, restart your computer.
      $ sudo reboot


    Verify the Installation

    Make sure the NVIDIA driver can work correctly with the installed GPU card.

    $ lsmod |grep nvidia



    Run the nvidia-debugdump utility to collect internal GPU information.

    $ nvidia-debugdump -l

    Run the nvidia-smi utility to check the NVIDIA System Management Interface.

    $ nvidia-smi


    Network Switch Configuration


    Refer to the MLNX-OS User Manual to become familiar with switch software (located at
    Before starting to use of the Mellanox switch, we recommend that you upgrade the switch to the latest MLNX-OS version.


    Flow control is required when running RoCE.
    On Mellanox's switche, run the following command to enable flow control on the switches (on all ports connected to nodes in this example):

    switch (config) # interface ethernet 1/1-1/5 flowcontrol receive on force
    switch (config) # interface ethernet 1/1-1/5 flowcontrol send on force

    To save the configuration (permanently), run:

    switch (config) # configuration write


    Installation Mellanox OFED for Ubuntu

    This chapter describes how to install and test the Mellanox OFED for Linux package on a single host machine with Mellanox ConnectX®-5 adapter card installed. For more information click on Mellanox OFED for Linux User Manual.


    Downloading Mellanox OFED

    1. Verify that the system has a Mellanox network adapter (HCA/NIC) installed.
      # lspci -v | grep Mellanox
      The following example shows a system with an installed Mellanox HCA:
    2. Download the ISO image according to you OS to your host.
      The image’s name has the format
      MLNX_OFED_LINUX-<ver>-<OS label><CPUarch>.iso. You can download it from: > Products > Software > InfiniBand/VPI Drivers > Mellanox OFED Linux (MLNX_OFED) > Download.
    3. Use the MD5SUM utility to confirm the downloaded file’s integrity. Run the following command and compare the result to the value provided on the download page.
      # md5sum MLNX_OFED_LINUX-<ver>-<OS label>.iso

    Installing Mellanox OFED

    MLNX_OFED is installed by running the mlnxofedinstall script. The installation script, performs the following:

    • Discovers the currently installed kernel
    • Uninstalls any software stacks that are part of the standard operating system distribution or another vendor's commercial stack
    • Installs the MLNX_OFED_LINUX binary RPMs (if they are available for the current kernel)
    • Identifies the currently installed InfiniBand and Ethernet network adapters and automatically upgrades the firmware

    The installation script removes all previously installed Mellanox OFED packages and re-installs from scratch. You will be prompted to acknowledge the deletion of the old packages.

    1. Log into the installation machine as root.
    2. Copy the downloaded ISO to /root
    3. Mount the ISO image on your machine.
      # mkdir /mnt/iso
      # mount -o loop /root/MLNX_OFED_LINUX-4.0- /mnt/iso
      # cd /mnt/iso
    4. Run the installation script.
      # ./mlnxofedinstall
    5. Reboot after the installation finished successfully.

      # /etc/init.d/openibd restart
      # reboot

      By default both ConnectX®-5 VPI ports are initialized as Infiniband ports.

    6. Disable unused the 2nd port on the device.
      Identify PCI ID of your NIC ports:
      # lspci | grep Mellanox
      05:00.0 Infiniband controller: Mellanox Technologies Device 1019
      05:00.1 Infiniband controller: Mellanox Technologies Device 1019
      Disable 2nd port
      # echo 0000:05:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
    7. Check the ports’ mode is Infiniband
      # ibv_devinfo
    8. If you see the following - You need to change the interfaces port type to Ethernet

      Change the interfaces port type to Infiniband mode ConnectX®-5 ports can be individually configured to work as Infiniband or Ethernet ports.
      Change the mode to Infiniband. Use the mlxconfig script after the driver is loaded.
      * LINK_TYPE_P1=2 is a Ethernet mode

      a. Start mst and see ports names

      # mst start
      # mst status

      b. Change the mode of both ports to Infiniband:

      # mlxconfig -d /dev/mst/mt4121_pciconf0 s LINK_TYPE_P1=2
      #Port 1 set to Ethernet mode
      # reboot


      After each reboot you need to Disable 2nd port.
      c. Queries Infiniband devices and prints about them information that is available for use from userspace.
      # ibv_devinfo
    9. Run the ibdev2netdev utility to see all the associations between the Ethernet devices and the IB devices/ports.
      # ibdev2netdev
      # ifconfig ib0 netmask
    10. Insert to the /etc/network/interfaces file the lines below after the following lines:
      # vim /etc/network/interfaces

      auto eno1
      iface eno1 inet dhcp
      The new lines:
      auto ib0
      iface ib0 inet static
      address 12.12.12.xx
      # vim /etc/network/interfaces

      auto eno1
      iface eno1 inet dhcp

      auto ib0
      iface ib0 inet static
    11. Check the network configuration is set correctly.
      # ifconfig -a

    Install Nvidia Toolkit 8.0 (CUDA) & CudNN


    Skip this procedure if you are installing the driver without a GPU support (for dedicated Parameter Server).


    Pre-installation Actions

    The following action must be taken before installing the CUDA Toolkit and Driver on the Linux driver:

    • Verify the system has a CUDA-capable GPU
    • Verify the system is running a supported version of Linux
    • Verify the system has gcc installed
    • Verify the system has the correct kernel headers and development packages installed
    • Download the NVIDIA CUDA Toolkit
    • Handle conflicting installation methods


    You can override the install-time prerequisite checks by running the installer with the “-override” flag. Remember that the prerequisites will still be required to use the NVIDIA CUDA Toolkit.


    Verify You Have a CUDA-Capable GPU

    To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:


    $ lspci | grep -i nvidia


    If you do not see any settings, update the PCI hardware database that Linux maintains by entering “update-pciids” (generally found in /sbin) at the command line and rerun the previous lspci command.

    If your graphics card is from NVIDIA, and it is listed in, your GPU is CUDA-capable.

    The Release Notes for the CUDA Toolkit also contain a list of supported products.


    Verify You Have a Supported Linux Version

    The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes.

    To determine which distribution and release number you are running, type the following at the command line:

    $ uname -m && cat /etc/*release


    You should see output similar to the following, modified for your particular system:

    • x86_64
    • Ubuntu 16.04.2 LTS

    The x86_64 line indicates you are running on a 64-bit system. The remainder gives information about your distribution.


    Verify the System Has a gcc Compiler Installed

    The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly.

    To verify the version of gcc installed on your system, type the following on the command line:

    $ gcc --version


    You should see output similar to the following, modified for your particular system:

    gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 2016060

    If an error message is displayed, you need to install the “development tools” from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web.


    Verify the System has the Correct Kernel Headers and Development Packages Installed

    The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel to be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

    While the Runfile installation performs no package validation, the RPM and DEB installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.

    The version of the kernel your system is running can be found by running the following command:

    $ uname -r


    This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers. This command will be used multiple times below to specify the version of the packages to install. Note that below are the common-case scenarios for kernel usage. More advanced cases, such as custom kernel branches, should ensure that their kernel headers and sources match the kernel build they are running.

    The kernel headers and development packages for the currently running kernel can be installed with:


    $ sudo apt-get install linux-headers-$(uname -r)


    Installation Process


    1. Download the base installation .run file from NVIDIA CUDA website.


    2. Create an account if you do not already have one, and log in (an account is also required to download cuDNN).


    3. Choose Linux > x86_64 > Ubuntu > 16.04 > runfile (local) and download the base installer and the patch.



    Make sure you select yes to creating a symbolic link to your CUDA directory.


    $ cd ~/root # or directory to where you downloaded file

    $ sudo sh --override # hold s to skip


    4.    Install CUDA into: /usr/local/cuda.

    Do you accept the previously read EULA?

    accept/decline/quit: accept

    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?

    (y)es/(n)o/(q)uit: N

    Install the CUDA 8.0 Toolkit?

    (y)es/(n)o/(q)uit: Y

    Enter Toolkit Location[ default is /usr/local/cuda-8.0 ]: Enter

    Do you want to install a symbolic link at /usr/local/cuda?

    (y)es/(n)o/(q)uit: Y

    Install the CUDA 8.0 Samples?

    (y)es/(n)o/(q)uit: Y

    Enter CUDA Samples Location

    [ default is /root ]: Enter

    Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...


    To install cuDNN download  cuDNN v6.0.20-1 for Cuda 8.0 from the NVIDIA website and extract into /usr/local/cuda via:

    $ tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz

    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include

    $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64

    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*


    Post-installation Action


    Mandatory Actions

    Some actions must be taken after the installation before the CUDA Toolkit and Driver can be used.


       Environment Setup (
    • The “PATH” variable needs to include /usr/local/cuda-8.0/bin
    • In addition, when using the .run file installation method, the “LD_LIBRARY_PATH” variable needs to contain /usr/local/cuda-8.0/lib64.
    • Update your bash file.
      $ vim ~/.bashrc
    • This will open your bash file in a text editor which you will scroll to the bottom and add these lines:
      export CUDA_HOME=/usr/local/cuda-8.0
      export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
      export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    • Once you save and close the text file, you can return to your original terminal and type this command to reload your .bashrc file.
      $ source ~/.bashrc
    • Check that the paths have been properly modified.

      $ echo $CUDA_HOME
      $ echo $PATH
      $ echo $LD_LIBRARY_PATH

    • Set the “LD_LIBRARY_PATH” and “CUDA_HOME” environment variables. Consider adding the commands below to your ~/.bash_profile. These assume your CUDA installation is in /usr/local/cuda-8.0.
      $ vim ~/.bash_profile
      This will open your file in a text editor which you will scroll to the bottom and add these lines:
      export CUDA_HOME=/usr/local/cuda-8.0
      export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
      export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

    Other actions are recommended to verify the integrity of the installation.

            This script is installed with the cuda-samples-8-0 package. The cuda-samples-8-0 package installs only a read-only copy in /usr/local/cuda-8.0/samples.

    • Verify the Installation ( continuing, it is important to verify that the CUDA toolkit can find and communicate correctly with the CUDA-capable hardware. To do this, you need to compile and run some of the included sample programs.
      Ensure the PATH and, if using the .run file installation method the LD_LIBRARY_PATH variables are set correctly. See section Mandatory Actions.
    • Verify the Driver Version (
      * If you installed the driver, verify that the correct version of it is loaded.
      * If you did not install the driver, or are using an operating system where the driver is not loaded via a kernel module, such as L4T, skip this step.
      When the driver is loaded, the driver version can be found by executing the following command.
      $ cat /proc/driver/nvidia/version
      You should see output similar to the following:
      NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.20  Tue Nov 15
      16:49:10 PST 2016
      GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
    • Compiling the Examples (
      The version of the CUDA Toolkit can be checked by running “nvcc –V” in a terminal window. The “nvcc” command runs the compiler driver that compiles the CUDA programs. It calls the “gcc” compiler for C code and the NVIDIA PTX compiler for the CUDA code.
      $ nvcc -V
      You should see output similar to the following:
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2016 NVIDIA Corporation
      Built on Tue_Jan_10_13:22:03_CST_2017
      Cuda compilation tools, release 8.0, V8.0.61
      The NVIDIA CUDA Toolkit includes sample programs in the source form. You should compile them by changing to ~/NVIDIA_CUDA-8.0_Samples and typing make. The resulting binaries will be placed under ~/NVIDIA_CUDA-8.0_Samples/bin.
      $ cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/
      $ make
      $ cd ~/NVIDIA_CUDA-8.0_Samples

      $ ./bin/x86_64/linux/release/deviceQuery
    • Running the Binaries (
      After the compilation, find and run deviceQuery under ~/NVIDIA_CUDA-8.0_Samples. If the CUDA software is installed and configured correctly, the output should look similar to the below.

      The exact appearance and the output lines might be different on your system. The important outcomes are that a device was found (the first highlighted line), that the device matches the one on your system (the second highlighted line), and that the test passed (the final highlighted line).
      If a CUDA-capable device and the CUDA Driver are installed but the deviceQuery reports that no CUDA-capable devices are present, this likely means that the /dev/nvidia* files are missing or have the wrong permissions.
      Running the bandwidthTest program ensures that the system and the CUDA-capable device are able to communicate correctly. Its output is shown below.

      $ cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest/
      $ make
      $ cd ~/NVIDIA_CUDA-8.0_Samples
      $ ./bin/x86_64/linux/release/bandwidthTest

    Note that the measurements for your CUDA-capable device description will vary from system to system. The important point is that you obtain measurements, and that the second-to-last line confirms that all necessary tests passed.
    Should the tests not pass, make sure you have a CUDA-capable NVIDIA GPU on your system and make sure it is properly installed.
    If you run into difficulties with the link step (such as libraries not being found), consult the Linux Release Notes found in the doc folder in the CUDA Samples directory.


    Install TensorFlow

    Clone the TensorFlow Repository

    To clone the latest TensorFlow repository, issue the following command:

    $ cd ~
    $ git clone

    The preceding git clone command creates a subdirectory called “tensorflow”. After cloning, you may optionally build a specific branch (such as a release branch) by invoking the following commands:

    $ cd tensorflow
    $ git checkout r1.3          # where r1.0 is the desired branch (by default - master)

    Install the Bazel Tool

    Bazel is a build tool from Google. For further information, please see

    1. Download and install JDK 8, which will be used to compile Bazel form source.
      $ sudo add-apt-repository ppa:webupd8team/java
      $ sudo apt-get update
      $ sudo apt-get install oracle-java8-installer
    2. Install the Bazel tool.
      $ echo "deb [arch=amd64] stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
      $ curl | sudo apt-key add -
      $ sudo apt-get update
      $ sudo apt-get install bazel
      $ sudo apt-get upgrade bazel
    Know issue. Now bazel 0.5.3 has problem with TensorFlow r1.3. Please install bazel 0.5.2.

    Bazel Install 0.5.2 version


    Uninstall bazel

    $ sudo apt-get purge bazel

    The binary installers are on Bazel's GitHub releases page.

    The installer contains the Bazel binary and the required JDK. Some additional libraries must also be installed for Bazel to work.


    Install required packages

    $ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip


    Download Bazel

    Go to Bazel's GitHub releases page.

    Download the binary installer This installer contains the Bazel binary and the required JDK, and can be used even if JDK is already installed.

    $ sudo wget

    Note that also exist. It is a version without embedded JDK 8. Only use this installer if you already have JDK 8 installed.


    Run the installer

    $ sudo chmod +x

    $ sudo ./ --user

    The --user flag installs Bazel to the $HOME/bin directory on your system and sets the .bazelrc path to $HOME/.bazelrc. Use the --help command to see additional installation options.


    Set up your environment

    If you ran the Bazel installer with the --user flag as above, the Bazel executable is installed in your $HOME/bin directory. It's a good idea to add this directory to your default paths, as follows:

    $ export PATH="$PATH:$HOME/bin"

    You can also add this command to your ~/.bashrc file.


    Install TensorFlow using the configure Script

    The root of the source tree contains a bash script named “configure”. This script asks you to identify the pathname of all relevant TensorFlow dependencies and specify other build configuration options such as compiler flags.

    Run the configure script prior to creating the pip package and installing TensorFlow.

    To build TensorFlow with GPU, the “configure” script needs to know the version numbers of CUDA and cuDNN. If several versions of CUDA or cuDNN are installed on your system, explicitly select the desired version instead of relying on the system default.

    $ cd ~/tensorflow            # cd to the top-level directory created
    $ ./configure

    If you received the following error message:

    locale.Error: unsupported locale setting


    1. Run the following command:
      export LANGUAGE=en_US.UTF-8
      export LANG=en_US.UTF-8
      export LC_ALL=en_US.UTF-8
      For further information see:
    2. Or, edit the locale file: /etc/default/locale to:
      LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8
    3. Restart the computer.
      Here is an example execution of the “configure” script. In the example we configure the installation with GPU and CUDA libraries support.
      Please specify the location of python. [Default is /usr/bin/python]: Enter

      Please specify optimization flags to use during compilation when bazel
      option "--config=opt" is specified [Default is -march=native]: Enter

      Do you wish to use jemalloc as the malloc implementation? [Y/n] Y
      jemalloc enabled

      Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
      No Google Cloud Platform support will be enabled for TensorFlow

      Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
      No Hadoop File System support will be enabled for TensorFlow

      Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
      No XLA JIT support will be enabled for TensorFlow

      Do you wish to build TensorFlow with VERBS support? [y/N] Y
      VERBS support will be enabled for TensorFlow

      Found possible Python library paths:  /usr/local/lib/python2.7/dist-packages  /usr/lib/python2.7/dist-packages
      Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]  Enter
      Using python library path: /usr/local/lib/python2.7/dist-packages

      Do you wish to build TensorFlow with OpenCL support? [y/N] N
      No OpenCL support will be enabled for TensorFlow

      Do you wish to build TensorFlow with CUDA support? [y/N] Y(Y For a Worker, N for a dedicated Parameter Server)
      CUDA support will be enabled for TensorFlow

      Do you want to use clang as CUDA compiler? [y/N] N
      nvcc will be used as CUDA compiler

      Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:  Enter

      Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0

      Please specify the location where CUDA 8.0 toolkit is installed. Refer to for more details. [Default is /usr/local/cuda]:  Enter

      Please specify the cuDNN version you want to use. [Leave empty to use system default]: 6

      Please specify the location where cuDNN 5 library is installed. Refer to for more details. [Default is /usr/local/cuda]: Enter

      Please specify a list of comma-separated Cuda compute capabilities you want to build with.
      You can find the compute capability of your device at:
      Please note that each additional compute capability significantly increases
      your build time and binary size.[Default is: "3.5,5.2"]: 6.0 (Tesla P100 from
      Extracting Bazel installation.............
      INFO: Starting clean (this may take a while). Consider using --async if the
      clean takes more than several minutes.
      Configuration finished

    Pip Installation

    Pip installation installs TensorFlow on your machine, possibly upgrades previously installed Python packages. Note, this may impact existing Python programs on your machine.
    Pip is a package management system used to install and manage software packages written in Python. We provides pip packages for TensorFlow on Linux.
    This installation requires the code from Github. You can either take the most recent master branch (lots of new commits) or the latest release branch (should be more stable, but still updated every few days). Here, we use the branch master.


    Build the Pip Package

    To build a pip package for TensorFlow with CPU-only support (For a dedicated Parameter Server), invoke the following command:

    $ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

    To build a pip package for TensorFlow with GPU support (For a Worker Server), invoke the following command:

    $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

    The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the /tmp/tensorflow_pkg directory:

    $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg


    Install the Pip Package

    You will need Pip version 8.1 or later for the following commands to work on Linux.

    $ pip install

    The filename of the .whl file depends on your platform. For example, the following command will install the pip package for TensorFlow Master based on 1.0 on Linux:

    • For a dedicated Parameter Server (sample)
      $ sudo pip install /tmp/tensorflow_pkg/tensorflow-1.1.0rc2-cp27-cp27mu-linux_x86_64.whl
    • For Worker Server (sample)
      $ sudo pip install /tmp/tensorflow_pkg/tensorflow-1.1.0rc2-cp27-cp27mu-linux_x86_64.whl

    Validate TensorFlow Installation

    To validate the TensorFlow installation:

    1. Close all your terminals and open a new terminal to test.


    2. Change directory (cd) to any directory on your system other than the tensorflow subdirectory from which you invoked the configure command.


    3.    Invoke python:

    $ cd /

    $ python

    >>> import tensorflow as tf
    >>> hello = tf.constant('Hello, TensorFlow!')
    >>> sess = tf.Session()
    >>> print(
    Hello, TensorFlow!
    >>> a = tf.constant(10)
    >>> b = tf.constant(32)
    >>> print( + b))

    CTRL-D to EXIT.

    Appendix A: TensorFlow Benchmarks and TCP vs. RoCE comparison

    Google published a collection of performance benchmarks that highlight TensorFlow's speed and scalability when training image classification models like InceptionV3, ResNet.

    Here we will provide our performance benchmark results for InceptionV3, ResNet-50 and ResNet-50 over TCP and RoCE.

    Benchmarks ran using both real and synthetic data. We believe it is important to include real data (ImageNet 2012 DataSet) measurements when benchmarking a platform.

    Testing with synthetic data was done by using a tf.Variable set to the same shape as the data expected by each model for ImageNet.

    This load tests both the underlying hardware and the framework at preparing data for actual training.

    We start with synthetic data to remove disk I/O as a variable and to set a baseline. Real data is then used to verify that the TensorFlow input pipeline and the underlying disk I/O are saturating the compute units.

    Server's hardware and configurations used for TCP and RoCE benchmarks are identical.


    Details for our benchmarks


    • Instance type: See setup overview
    • GPU: 8x NVIDIA® Tesla® P100
    • OS: Ubuntu 16.06 LTS with tests run via Docker
    • CUDA / cuDNN: 8.0 / 6.0
    • TensorFlow GitHub : r1.3
    • Benchmark GitHub hash: b922111
    • Build Command: bazel build -c opt --copt=-march="broadwell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
    • Disk: Local NVMe
    • DataSet: ImageNet 2012
    • Test Date: June 2017


    The batch size and optimizer used for the tests are listed in the table.


    Batch size per GPU646464


    Configuration used for each model.




    The server setup for the runs is included 4 worker servers and was explained in the setup overview part of the document.



    Inception V3

    Training synthetic data



    Training synthetic data



    Training synthetic data



    This script was run to generate the above results.


    In order to create results that are as repeatable as possible, each test was run 3 times and then the times were averaged together. GPUs are run in their default state on the given platform. For each test, 10 warm up steps are done and then the next 100 steps are averaged.

    Appendix B: Common Installation Issues


    The installation issues that might occur typically depend on the installed Operating System. For further information, please see the "Common installation problems" Installing TensorFlow on Linux guide.
    Beyond the errors documented in the guide above, the following table specifies additional errors specific to building TensorFlow. Note that we are relying on Stack Overflow as the repository for build and installation problems. If you encounter an error message not listed in the preceding guide or in the following table, search for it on Stack Overflow. If Stack Overflow does not show the error message, ask a new question on Stack Overflow and specify the tensorflow tag.

    Stack Overflow LinkError Message

    ImportError: cannot open shared object file:

    No such file or directory


    ImportError: libcudnn.6: cannot open shared object file:

    No such file or directory


    Invoking `python` or `ipython` generates the following error:

    ImportError: cannot import name pywrap_tensorflow