Getting started with HPC-X MPI and slurm

Version 8

    This is a basic post that shows simple "hello world" program that runs over HPC-X Accelerated OpenMPI using slurm scheduler.

     

    References

     

    Prerequisites

    This procedure assume you have the following already up and running

    • InfiniBand setup running with several servers
    • SM is running
    • slurm is installed
    • MLNX_OFED is installed
    • HPC-X was downloaded and set up.

     

     

    Configuration

    1. Allocate 2 servers our of your cluser using slurm.

    $ squeue
    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

     

    $ salloc -N 2 -t 60

    salloc: Granted job allocation 2859

     

    $ squeue            

    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)             

    2859 pheimdall     bash   ophirm  R       0:05      2 heimdall[001-002]

     

    2. Make sure that the hpc-x gcc module is available and loaded

    $ module avail

    ...  hpcx-1.9/gcc ...

     

    $ module load hpcx-1.9/gcc

     

    3. Check the MPI compiler

    $ which mpicc

    /opt/hpcx-1.9/ompi-v2.x/bin/mpicc

     

    4. Follow MPI Hello World · MPI Tutorial  to clone the git and compile mpi-hello-world application

     

    5. Check which MPI is running

    $ which mpirun

    /opt/hpcx-1.9/ompi-v2.x/bin/mpirun

     

    5. Run HPC-X MPI with the following basic flags

     

    • np : The number of MPI processes to run
    • display-map : will display the map as shown below, which process is mapped to which core.
    • map-by node: will round robin the processes on all nodes
    • -x MXM_RDMA_PORTS=mlx5_0:1  : MXM parameter that forces using this specific interface.

    • -mca btl_openib_if_include mlx5_0:1:  MPI parameter that forces using this specific interface.

     

    There are many additional flags, but those are the very basic ones.

     

    Here is a running example of the program.

    $ mpirun -np 4 --display-map --map-by node -x MXM_RDMA_PORTS=mlx5_0:1 -mca btl_openib_if_include mlx5_0:1 mpi_hello_world

    Data for JOB [48202,1] offset 0

     

    ========================   JOB MAP   ========================

     

    Data for node: heimdall001     Num slots: 28   Max slots: 0    Num procs: 2

            Process OMPI jobid: [48202,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]]:[B/././././././././././././.][./././././././././././././.]

            Process OMPI jobid: [48202,1] App: 0 Process rank: 2 Bound: socket 0[core 1[hwt 0]]:[./B/./././././././././././.][./././././././././././././.]

     

     

    Data for node: heimdall002     Num slots: 28   Max slots: 0    Num procs: 2

            Process OMPI jobid: [48202,1] App: 0 Process rank: 1 Bound: socket 0[core 0[hwt 0]]:[B/././././././././././././.][./././././././././././././.]

            Process OMPI jobid: [48202,1] App: 0 Process rank: 3 Bound: socket 0[core 1[hwt 0]]:[./B/./././././././././././.][./././././././././././././.]

     

     

    =============================================================

    Hello world from processor heimdall002, rank 3 out of 4 processors

    Hello world from processor heimdall001, rank 2 out of 4 processors

    Hello world from processor heimdall002, rank 1 out of 4 processors

    Hello world from processor heimdall001, rank 0 out of 4 processors

     

    5. To release the nodes use scancel:

    $ squeue            

    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)            

    2859 pheimdall     bash   ophirm  R       0:05      2 heimdall[001-002]

     

    $ scancel 2859

    salloc: Job allocation 2859 has been revoked.

    Hangup

     

    $ squeue

    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)