1 Reply Latest reply on Jan 20, 2015 6:15 AM by alkx

    MXM - only up to 2 devices are supported?

    tobias

      Hello,

       

      I recently installed a test system that utilizes multirail Infiniband. It consists of two fat nodes, each with four Xeon E5-4650's. Each CPU has got a ConnectX-2 card directly attached. The latest MLNX-OFED is installed and working. Is there a way to force MXM to use all four cards? Both nodes are connected to a IS5025 switch.

      If I run for example the osu_alltoall benchmark by

       

      /usr/mpi/gcc/openmpi-1.8.4/bin/mpirun --mca btl,self openib -n 64 --hostfile test  /usr/mpi/gcc/openmpi-1.8.4/tests/osu-micro-benchmarks-4.4/osu_alltoall

       

      I get a lot of warnings like this:

      [1421687772.785621] [linux-3e34:12178:0]  ib_dev.c:405  MXM  WARN  Skipping IB device 'mlx4_2' - up to 2 devices are supported
      [1421687772.785640] [linux-3e34:12178:0]  ib_dev.c:405  MXM  WARN  Skipping IB device 'mlx4_1' - up to 2 devices are supported
      [1421687772.785647] [linux-3e34:12178:0]  ib_dev.c:405  MXM  WARN  Skipping IB device 'mlx4_0' - up to 2 devices are supported

       

      # OSU MPI All-to-All Personalized Exchange Latency Test v4.4

      # Size       Avg Latency(us)

      1                      66.02

      2                      64.99

      4                      66.64

      8                      76.15

      16                     81.32

      32                     88.54

      64                    137.83

      128                   186.70

      256                   294.64

      512                   558.72

      1024                 1287.07

      2048                 2418.95

      4096                 3637.48

      8192                 5647.53

      16384                9947.06

      32768               19036.50

      65536               38769.77

      131072              71470.19

      262144             141088.41

      524288             294086.71

      1048576            600280.88

       

      by disabling mxm via --mca mtl ^mxm, the warnings disappear, and also the latency goes down dramatically:

      # Size   Avg Latency(us)
      1                  37.48
      2                  37.12
      4                  38.24
      8                  39.59
      16                 50.07
      32                 47.93
      64                 53.22
      128                77.66
      256               116.47
      512               214.17
      1024              335.79
      2048              594.70
      4096             1045.58
      8192             1334.22
      16384            2972.10
      32768            4990.44
      65536            9215.92
      131072          16271.00
      262144          31121.92
      524288          61814.72
      1048576        124195.41

       

      I would be thankful for any suggestions!

       

      Kind regards,

      Tobias