5 Replies Latest reply on Apr 21, 2016 8:59 AM by niverson

    mlnx_add_kernel_support.sh and magic version errors.

    niverson

      I have successfully built and installed ofed 3.2-2.0.0.0 on an Oracle Enterprise Linux 6.7 system. I tested this ofed installation in ethernet emulation mode with iperf3  and a ConnectX-4 port looping back to the other ConnectX-4 port on the same system. I disabled the internal loopback and configured nat to force the data to be issued over the physical IB cable.

       

      I used the same build and install sequence on a simulator system with a custom kernel based on Oracle Enterprise Linux 7.1. This system is failing "mst start"

       

      I used MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz and the mlnx_add_kernel_support.sh completed without issue.

       

      Here is the uname info for the custom kernel the system is running:

      # uname -a

      Linux  4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

       

       

      The extra modules directory is populated with what seems like the correct directories and files:

      # ls /usr/lib/modules/4.1.12-32.el7uek-axnp.debug.070000.009400/extra

      dmadriver_api_mod.ko  knem             pdgCommon.ko     slbmgr_api_mod.ko

      i40e.ko               ksimod.ko        pdgPm8018.ko     slbmgrmod.ko

      iser                  mlnx-ofa_kernel  pdgQlSan.ko      srp

      kernel-mft            nvdimm.ko        psg_services.ko  tdsmod.ko

       

      The mst modules are present:

      # ls ./usr/lib/modules/4.1.12-32.el7uek-axnp.debug.070000.009400/extra/kernel-mft/

      mst_pciconf.ko  mst_pci.ko

       

       

      However "mst start" fails for "version magic":

      # mst start

      Starting MST (Mellanox Software Tools) driver set

      Loading MST PCI modulemodprobe: ERROR: could not insert 'mst_pci': Exec format error

      - Failure: 1

      Loading MST PCI configuration modulemodprobe: ERROR: could not insert 'mst_pciconf': Exec format error

      - Failure: 1

      Create devices

      mst_pci driver not found

      Unloading MST PCI module (unused) - Success

      Unloading MST PCI configuration module (unused) - Success

       

      From dmesg:

      [ 2264.895260] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload '

      [ 2265.048972] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload '

       

       

      It looks like the .070000.009400 is being truncated off the "version magic" in the mst_pci and mst_pciconf modules.

      Is there a string length limit for the kernel versioning in mlnx_add_kernel_support.sh?

        • Re: mlnx_add_kernel_support.sh and magic version errors.
          sophie

          Hi Nathan,

           

          Did you actually download and install the MFT package version 4.3.0 for Linux, which is required if you wanted to use the MFT utilities.

          http://www.mellanox.com/page/mlxup_firmware_tool

          NOTE: Version 4.3.0 is the latest.

           

          Thank you,

          Sophie.

            • Re: mlnx_add_kernel_support.sh and magic version errors.
              niverson

              I didn't download the MFT package. I must have used the MFT supplied in

              MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz.

               

              It looks like I have mft 4.3

              #

              mst version*

              *mst, mft 4.3.0-25, built on Jan 25 2016, 19:10:21. Git SHA Hash: 7465f26

               

               

              I followed this doc to setup ofed.

               

              https://community.mellanox.com/docs/DOC-2294

               

               

               

              *These are the steps I followed. It worked for standard OEL 6.7 with the

              proper 6.7 *tar ball.

               

              1. Download ofed 3.2-2.0.0.0 tar ball to /tmp

              http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

                   MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz for OEL 7.1

               

               

              2. Untar MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz

               

                   tar zxvf MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz

                   creates /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64

               

              3. Install other required software packages.

                   yum install rpm-build gcc-gfortran

               

              4. Generate the ofed binaries for your specific kernel from

              /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.

                   ./mlnx_add_kernel_support.sh --make-tgz --mlnx_ofed

              /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64 --ofed-sources

              /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64/src/MLNX_OFED_SRC-3.2-2.0.0.0.tgz--skip-repo

              -t /tmp

               

                   SCREEN OUPUT:

                   Note: This program will create MLNX_OFED_LINUX TGZ for oel7.1 under

              /usr/tmp directory.

                   Do you want to continue?[y/N]:y

                   See log file /tmp/mlnx_ofed_iso.1651.log

               

                   Building OFED RPMS . Please wait...

                   Created /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64-ext.tgz

               

              5. Untar the ofed package that was just created.

                   tar zxvf MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64-ext.tgz

               

              6. navigate to the direcotry that hold the package for the specific

              system and run install script.

                   cd MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64-ext

                   ./mlnxofedinstall

               

               

                   SCREEN OUPUT:

                   Logs dir: /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0.3782.logs

                   This program will install the MLNX_OFED_LINUX package on your machine.

                   Note that all other Mellanox, OEM, OFED, or Distribution IB

              packages will be removed.

                   Do you want to continue?[y/N]:

                   .

                   .

                   .

                   Device (90:00.0):

                       90:00.0 Infiniband controller: Mellanox Technologies MT27700

              Family

                       Link Width: x16

                       PCI Link Speed: 8GT/s

               

                   Device (90:00.1):

                       90:00.1 Infiniband controller: Mellanox Technologies MT27700

              Family

                       Link Width: x16

                       PCI Link Speed: 8GT/s

               

               

                   Installation finished successfully.

               

                   Preparing... #################################

                   Updating / installing...

                      1:mlnx-fw-updater-3.2-2.0.0.0 #################################

               

               

                   Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

               

                   Attempting to perform Firmware update...

                   Querying Mellanox devices firmware ...

               

                   Device #1:

                • Re: mlnx_add_kernel_support.sh and magic version errors.
                  niverson

                  I'm not sure how this issue was marked with a "correct answer."

                   

                  I downloaded and tried the stand-a-lone 4.3.0 mft utilities and there is no difference in behavior. "mst start" still fails for the "version magic" number.

                   

                   

                  [root@co-sanfs2sim-01 bin]# ./mlxup --query
                  Querying Mellanox devices firmware ...

                  Device #1:
                  ----------

                    Device Type:      ConnectX4
                    Part Number:      MCX456A-ECA_Ax
                    Description:      ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe3.0 x16; ROHS R6
                    PSID:             MT_2190110032
                    PCI Device Name:  0000:90:00.0
                    Base GUID:        7cfe900300726e9a
                    Base MAC:         00007cfe90726e9a
                    Versions:         Current        Available
                       FW             12.14.2036     12.14.2036

                    Status:           Up to date





                  [root@co-sanfs2sim-01 mft-4.3.0-25]# uname -a
                  Linux co-sanfs2sim-01.us.oracle.com 4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux



                  [root@co-sanfs2sim-01 mft-4.3.0-25]# ./install.sh

                  -I- Removing all installed mft packages: mft  kernel-mft
                  -I- Building the MFT kernel binary RPM...
                  -I- Installing the MFT RPMs...
                  Preparing...                          ################################# [100%]
                  Updating / installing...
                     1:kernel-mft-4.3.0-4.1.12_32.el7uek################################# [100%]
                  Preparing...                          ################################# [100%]
                  Updating / installing...
                     1:mft-4.3.0-25                     ################################# [100%]
                  -I- In order to start mst, please run "mst start".

                  [root@co-sanfs2sim-01 mft-4.3.0-25]# mst start
                  Starting MST (Mellanox Software Tools) driver set
                  Loading MST PCI modulemodprobe: ERROR: could not insert 'mst_pci': Exec format error
                  - Failure: 1
                  Loading MST PCI configuration modulemodprobe: ERROR: could not insert 'mst_pciconf': Exec format error
                  - Failure: 1
                  Create devices

                  mst_pci driver not found
                  Unloading MST PCI module (unused) - Success
                  Unloading MST PCI configuration module (unused) - Success




                  from dmesg.
                  [ 5898.964905] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload '
                  [ 5899.116921] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload '

                • Re: mlnx_add_kernel_support.sh and magic version errors.
                  niverson

                  I'm not sure how this issue was marked with a "correct answer."

                   

                  I downloaded and tried the stand-a-lone 4.3.0 mft utilities and there is no difference in behavior. "mst start" still fails for the "version magic" number.

                   

                   

                  [root@co-sanfs2sim-01 bin]# ./mlxup --query
                  Querying Mellanox devices firmware ...

                  Device #1:
                  ----------

                    Device Type:      ConnectX4
                    Part Number:      MCX456A-ECA_Ax
                    Description:      ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe3.0 x16; ROHS R6
                    PSID:             MT_2190110032
                    PCI Device Name:  0000:90:00.0
                    Base GUID:        7cfe900300726e9a
                    Base MAC:         00007cfe90726e9a
                    Versions:         Current        Available
                       FW             12.14.2036     12.14.2036

                    Status:           Up to date





                  [root@co-sanfs2sim-01 mft-4.3.0-25]# uname -a
                  Linux co-sanfs2sim-01.us.oracle.com 4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux



                  [root@co-sanfs2sim-01 mft-4.3.0-25]# ./install.sh

                  -I- Removing all installed mft packages: mft  kernel-mft
                  -I- Building the MFT kernel binary RPM...
                  -I- Installing the MFT RPMs...
                  Preparing...                          ################################# [100%]
                  Updating / installing...
                     1:kernel-mft-4.3.0-4.1.12_32.el7uek################################# [100%]
                  Preparing...                          ################################# [100%]
                  Updating / installing...
                     1:mft-4.3.0-25                     ################################# [100%]
                  -I- In order to start mst, please run "mst start".

                  [root@co-sanfs2sim-01 mft-4.3.0-25]# mst start
                  Starting MST (Mellanox Software Tools) driver set
                  Loading MST PCI modulemodprobe: ERROR: could not insert 'mst_pci': Exec format error
                  - Failure: 1
                  Loading MST PCI configuration modulemodprobe: ERROR: could not insert 'mst_pciconf': Exec format error
                  - Failure: 1
                  Create devices

                  mst_pci driver not found
                  Unloading MST PCI module (unused) - Success
                  Unloading MST PCI configuration module (unused) - Success




                  from dmesg.
                  [ 5898.964905] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload '
                  [ 5899.116921] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400SMP mod_unload '

                  • Re: mlnx_add_kernel_support.sh and magic version errors.
                    niverson

                    I have looked through the mlnx_add_kernel_support.sh and mlnxofedinstall to try get this to install properly. So far I have been unsuccessful.

                     

                    I have successfully installed ofed on the following kernel:

                    4.1.12-32.1.2.el7uek.x86_64

                     

                    This is the kernel that gets the magic number issue and I don't have control over changing this number format.

                    4.1.12-32.el7uek-axnp.debug.070000.009400

                     

                    mlx_compat: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ' should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload '

                     

                    It looks like "mlnx_add_kernel_support.sh" has a kernel name format requirement. Can you point to the area of the script that does this. I found the uname portion, but it just reads in the kernel number and uses it. It doesn't manipulate it.