Building Open MPI with the latest MXM 2 and FCA 2.5

Version 1

    Preparation for install

     

    • Make sure your cluster nodes have latest MOFED 2.0-3.0.0 installed
    • Choose shared storage where you plan to install Mellanox HPC solution software
    • Define variables pointing to various shared locations which will be used to build HPC packages

    % export SHARED_NFS=/usr/local/mellanox 

    % export KNEM_DIR=$(find /opt -maxdepth 1 -type d -name "knem*" -print0)

    % export FCA_DIR=/opt/mellanox/fca

    % export MPI_HOME=$SHARED_NFS/install/openmpi

     

    Install/Upgrade to Latest MXM

     

      

    Install/Upgrade to Latest FCA

     

    # service fca_managerd stop 

    • On all cluster nodes, remove old and install new fca.rpm

    # rpm -e fca --nodeps

    # rpm -ihv fca.bin.rpm

    • Start FCA manager on dedicated machine connected to the cluster fabric

    # service fca_managerd start

     

    Mellanox OMPI v1.6.5 install

     

    % wget ftp://bgate.mellanox.com/hpc/ompi/ompi-latest.tar

    % rpm2cpio openmpi-1.6.5-*.src.rpm| cpio -id

    % mkdir -p $SHARED_NFS

    % tar zxvf -C $SHARED_NFS openmpi-1.6.5-0ea4499.tar.gz

    % export MPI_DIR=$PWD/openmpi-1.6.5-0ea4499

    % cd $MPI_DIR

    % ./configure --with-platform=mellanox/optmized \

    --with-mxm=$MXM_DIR \

    --with-fca=$FCA_DIR \

    --with-knem=$KNEM_DIR \

    --prefix=$MPI_HOME \

    --with-slurm

    % make -j9 all && make -j9 install

     

    Run script example

     

    #!/bin/sh -x

    #SBATCH --job-name=compare

    #SBATCH --nodes=10

    #SBATCH --ntasks-per-node=16

    #SBATCH -p cluster

    #SBATCH --time=24:00:00

     

    NP=$SLURM_NPROCS

    MPI_HOME=$SHARED_NFS/install/openmpi

    EXE=/usr/mpi/gcc/openmpi-1.6.5/tests/IMB-3.2.4/IMB-MPI1

    EXE_ARGS="-iter 1000 -npmin $NP -mem 0.9 "

     

    common_args="-mca mca_component_show_load_errors 0 --bind-to-core --byslot -display-map "

     

    coll_fca="-mca coll_fca_np 0 -mca coll_fca_enable 1 -mca coll_fca_enable_cache 1"

    coll_nofca="-mca coll_fca_enable 0"

     

    mpi_args_mxm="$common_args -mca mtl_mxm_np 0 -mca pml cm -mca mtl mxm"

    mpi_args_vanilla="$common_args -mca mtl ^mxm -mca btl self,sm,openib"

     

    mlx5_params="-x MXM_RDMA_PORTS=mlx5_0:1 -mca btl_openib_if_include mlx5_0:1 -mca rmaps_base_dist_hca mlx5_0"

    mlx4_params="-x MXM_RDMA_PORTS=mlx4_0:1 -mca btl_openib_if_include mlx4_0:1 -mca rmaps_base_dist_hca mlx4_0"

     

    $MPI_HOME/bin/mpirun -np $NP $coll_fca   $mpi_args_mxm $mlx5_params $EXE $EXE_ARGS

    $MPI_HOME/bin/mpirun -np $NP $coll_nofca $mpi_args_vanilla $mlx5_params $EXE $EXE_ARGS

     

    $MPI_HOME/bin/mpirun -np $NP $coll_fca   $mpi_args_mxm $mlx4_params $EXE $EXE_ARGS

    $MPI_HOME/bin/mpirun -np $NP $coll_nofca $mpi_args_vanilla $mlx4_params $EXE $EXE_ARGS