3 Replies Latest reply on Jan 27, 2017 6:41 AM by alkx

    KNEM errors when running OMPI 2.0.1

    bioinformatica-ibis

      Hi, I am running on my SGE cluster (with Ubuntu 12.04) the following script (using qsub):

      #!/bin/bash 
      #$-cwd 
      #$ -S /bin/bash 
      #$ -V 
      #$ -q normal 
      #$ -pe mpi 40 
      #$ -P Lab219 
      #$ -o output 
      #$ -e error 
      module load PhyML/3.3
      
      mpirun --mca pml yalla -np 40 phyml-mpi -i proteic -b 10 -d aa 
      

       

      where phyml-mpi is the parallel version for OMPI of the program PhyML. --mca pml yalla option is called to used MXM (I have mellanox OFED).

       

       

      It gives me lots of errors related to KNEM (see error and output files from qsub in the attachments). However, I specified the KNEM directory when installing OMPI.

      /dev/knem is not mounted and, when I try to do it with sudo modprobe knem, it gives me:

      FATAL: Error inserting knem (/lib/modules/3.13.0-37-generic/updates/dkms/knem.ko): Invalid module format
      

       

      Could anyone  give ,me any hint on this issue? Should I install, maybe, knem independently from the Knem website and build OMPI with such knem drivers again?

      Thanks in advance

        • Re: KNEM errors when running OMPI 2.0.1
          alkx

          That doesn't seems like an OMPI error as this is completely userspace package. There is something with knem module. Most likely it isn't compiled with currently running kernel. I would suggest to reinstall MOFED and try to run' modprobe knem' to check if it loads.

          • Re: KNEM errors when running OMPI 2.0.1
            dwarren

            FATAL: Error inserting knem (/lib/modules/3.13.0-37-generic/updates/dkms/knem.ko): Invalid module format indicates that the module is not built to match your kernel, even though it is in the correct dkms directory. Also, I would guess that your MPI is built to use knem as it doesn't complain about it being missing, and nothing tries to load it if it is not. So, I would download the latest version from the knem site, built it and install it. - Note - read the section of instructions about modifying the udev file. This will be necessary unless everyone is in the RDMA group. Knem does make a difference. It allows for 0 copy transfers withing the system, and doesn't have the security set-up problems of the other 0 copy options

            Here are instructions:

            KNEM: Fast Intra-Node MPI Communication