13 Replies Latest reply on Aug 28, 2014 10:34 AM by neelpert1

    Error /hca_self_test.ofed: line 165: [: too many arguments

      I am running Centos 6.5 x86_64 I have installed the Mellanox driver (MLNX_OFED_LINUX-2.2-1.0.1-rhel6.5-x86_64) when running the following  hca_self_test.ofed I receive the following output

       

      /hca_self_test.ofed: line 165: [: too many arguments REASON: no RPMs found for currently booted kernel 2.6.32-431.20.3.el6.x86_64

       

      Below is the output of ibstatus

       

      ibstatus

      Infiniband device 'mlx4_0' port 1 status:

      default gid:     fe80:0000:0000:0000:f452:1403:0033:7fc1

      base lid:        0x0

      sm lid:          0x0

      state:           2: INIT

      phys state:      5: LinkUp

      rate:            56 Gb/sec (4X FDR)

      link_layer:      InfiniBand

       

      I can restart the driver successfully but now always receive the error above when running hca_self_test.ofed, I have also made sure selinux is running in permissive, getenforce responds with Permissive. Anything else concerning the setup I can post that might help someone help me!

       

      Thanks

        • Re: Error /hca_self_test.ofed: line 165: [: too many arguments
          ferbs

          Hi,

           

          My guess is it probably because the script cant find the exact kernel build 2.6.32-431.20.3. I dont think you should put any attention to it at this moment. if you are trying to verify your fabric health your ibstatus shows that the port is in INIT because there is no SM in the fabric. unless there is a switch that is supposed to manage the network you can simply start opensm locally : "/etc/init.d/opensmd start".



            • Re: Error /hca_self_test.ofed: line 165: [: too many arguments

              Thank You,  so this error will not cause the mellanox "not to work" once final configuration is in place?

               

              The following did work as you mentioned 

               

              "/etc/init.d/opensmd start".


              ibstatus now reports the following


              ibstatus

              Infiniband device 'mlx4_0' port 1 status:

                      default gid:     fe80:0000:0000:0000:f452:1403:0033:7fc1

                      base lid:        0x1

                      sm lid:          0x1

                      state:           4: ACTIVE

                      phys state:      5: LinkUp

                      rate:            56 Gb/sec (4X FDR)

                      link_layer:      InfiniBand

               

              Thanks

            • Re: Error /hca_self_test.ofed: line 165: [: too many arguments
              alkx

              Could you run the following commands

              modinfo mlx4_core

              modinfo mlx4_core |grep filename  | awk '{print $NF}'

               

              and provide the output? The last command probably return zero length string and cause to script failure.

                • Re: Error /hca_self_test.ofed: line 165: [: too many arguments

                  Output Below

                   

                  modinfo mlx4_core

                  filename:       /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

                  version:        1.1

                  license:        Dual BSD/GPL

                  description:    Mellanox ConnectX HCA low-level driver

                  author:         Roland Dreier

                  srcversion:     9A90DAE92A2E75BF5F67A24

                  alias:          pci:v000015B3d00001010svsdbcsci*

                  alias:          pci:v000015B3d0000100Fsvsdbcsci*

                  alias:          pci:v000015B3d0000100Esvsdbcsci*

                  alias:          pci:v000015B3d0000100Dsvsdbcsci*

                  alias:          pci:v000015B3d0000100Csvsdbcsci*

                  alias:          pci:v000015B3d0000100Bsvsdbcsci*

                  alias:          pci:v000015B3d0000100Asvsdbcsci*

                  alias:          pci:v000015B3d00001009svsdbcsci*

                  alias:          pci:v000015B3d00001008svsdbcsci*

                  alias:          pci:v000015B3d00001007svsdbcsci*

                  alias:          pci:v000015B3d00001006svsdbcsci*

                  alias:          pci:v000015B3d00001005svsdbcsci*

                  alias:          pci:v000015B3d00001004svsdbcsci*

                  alias:          pci:v000015B3d00001003svsdbcsci*

                  alias:          pci:v000015B3d00001002svsdbcsci*

                  alias:          pci:v000015B3d0000676Esvsdbcsci*

                  alias:          pci:v000015B3d00006746svsdbcsci*

                  alias:          pci:v000015B3d00006764svsdbcsci*

                  alias:          pci:v000015B3d0000675Asvsdbcsci*

                  alias:          pci:v000015B3d00006372svsdbcsci*

                  alias:          pci:v000015B3d00006750svsdbcsci*

                  alias:          pci:v000015B3d00006368svsdbcsci*

                  alias:          pci:v000015B3d0000673Csvsdbcsci*

                  alias:          pci:v000015B3d00006732svsdbcsci*

                  alias:          pci:v000015B3d00006354svsdbcsci*

                  alias:          pci:v000015B3d0000634Asvsdbcsci*

                  alias:          pci:v000015B3d00006340svsdbcsci*

                  depends:        compat

                  vermagic:       2.6.32-431.el6.x86_64 SMP mod_unload modversions

                  parm:           set_4k_mtu:(Obsolete) attempt to set 4K MTU to all ConnectX ports (int)

                  parm:           debug_level:Enable debug tracing if > 0 (int)

                  parm:           msi_x:0 - don't use MSI-X, 1 - use MSI-X, >1 - limit number of MSI-X irqs to msi_x (non-SRIOV only) (int)

                  parm:           enable_sys_tune:Tune the cpu's for better performance (default 0) (int)

                  parm:           block_loopback:Block multicast loopback packets if > 0 (default: 1) (int)

                  parm:           num_vfs:Either single value (e.g. '5') or triplet (e.g. '10,11,12') to define uniform num_vfs value for all devices functions.

                                  If a single value is given, this value will be used in order to define  dual port virtual functions are probed.

                                  Alternatively, a string to map device function numbers to their probe_vf values

                                  (e.g. '0000:04:00.0-3,002b:1c:0b.a-13;12;11') could be given.

                                  Hexadecimal digits for the device function (e.g. 002b:1c:0b.a) and decimal for probe_vf value (e.g. 13 or 1;2;3). (string)

                  parm:           log_num_mgm_entry_size:log mgm size, that defines the num of qp per mcg, for example: 10 gives 248.range: 7 <= log_num_mgm_entry_size <= 12. To activate device managed flow steering when available, set to -1 (int)

                  parm:           high_rate_steer:Enable steering mode for higher packet rate (default off) (int)

                  parm:           fast_drop:Enable fast packet drop when no recieve WQEs are posted (int)

                  parm:           enable_64b_cqe_eqe:Enable 64 byte CQEs/EQEs when the the FW supports this if non-zero (default: 1) (int)

                  parm:           log_num_mac:Log2 max number of MACs per ETH port (1-7) (int)

                  parm:           log_num_vlan:(Obsolete) Log2 max number of VLANs per ETH port (0-7) (int)

                  parm:           log_mtts_per_seg:Log2 number of MTT entries per segment (0-7) (default: 0) (int)

                  parm:           port_type_array:Either pair of values (e.g. '1,2') to define uniform port1/port2 types configuration for all devices functions

                                  or a string to map device function numbers to their pair of port types values (e.g. '0000:04:00.0-1;2,002b:1c:0b.a-1;1').

                                  Valid port types: 1-ib, 2-eth, 3-auto, 4-N/A

                                  In case that only one port is available use the N/A port type for port2 (e.g '1,4'). (string)

                  parm:           log_num_qp:log maximum number of QPs per HCA (default: 19) (int)

                  parm:           log_num_srq:log maximum number of SRQs per HCA (default: 16) (int)

                  parm:           log_rdmarc_per_qp:log number of RDMARC buffers per QP (default: 4) (int)

                  parm:           log_num_cq:log maximum number of CQs per HCA (default: 16) (int)

                  parm:           log_num_mcg:log maximum number of multicast groups per HCA (default: 13) (int)

                  parm:           log_num_mpt:log maximum number of memory protection table entries per HCA (default: 19) (int)

                  parm:           log_num_mtt:log maximum number of memory translation table segments per HCA (default: max(20, 2*MTTs for register all of the host memory limited to 30)) (int)

                  parm:           enable_qos:Enable Quality of Service support in the HCA (default: off) (bool)

                  parm:           internal_err_reset:Reset device on internal errors if non-zero (default 0) (int)

                  #

                   

                  Second command

                   

                  modinfo mlx4_core |grep filename  | awk '{print $NF}

                   

                  /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

                   

                   

                  Thanks

                    • Re: Error /hca_self_test.ofed: line 165: [: too many arguments
                      alkx

                      and what is the output of

                      rpm -qf /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

                       

                      If should return the RPM name

                      And could you check if script will work if you change this line (163)

                      KER_RPM=`rpm -qf $mlx4_core_ko 2> /dev/null | grep -E "kernel-ib|ofa_kernel"`

                      to this

                      KER_RPM=`rpm -qf $mlx4_core_ko 2> /dev/null | grep -E "kernel-ib\|ofa_kernel"`

                        • Re: Error /hca_self_test.ofed: line 165: [: too many arguments

                          Output follows

                           

                          rpm -qf /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

                          file /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko is not owned by any package

                           

                          changed file to the following gives the same result

                           

                          KER_RPM=`rpm -qf $mlx4_core_ko 2> /dev/null | grep -E "kernel-ib\|ofa_kernel"

                           

                           

                           

                          Thanks

                          • Re: Error /hca_self_test.ofed: line 165: [: too many arguments

                            Reran install as you asked and saw the following, not sure these warnings will be a problem, while I had time I also ran the install on another machine with the same specs and it worked normally after initial install and before a yum update, after yum update it also has the same problem.

                             

                            Device (42:00.0):

                                    42:00.0 Network controller: Mellanox Technologies MT27500 Family

                                    Link Width: 8x

                                    PCI Link Speed: Unknown

                             

                            Device (42:00.0):

                                    42:00.0 Network controller: Mellanox Technologies MT27500 Family

                                    WARNING - device 42:00.0 The MaxReadRequest size is set too low (512 bytes) and will affect performance.

                                    Please consult your server's vendor and if possible change BIOS settings or use setpci to configure MaxReadReq to 4096 bytes.

                                     

                            1. /sbin/setpci -s 42:00.0 68.W

                                    2xxx

                                    Change to 4096 bytes:

                                     

                            1. /sbin/setpci -s 42:00.0 68.W=5xxx

                             

                             

                            Installation finished successfully.

                             

                            Attempting to perform Firmware update...

                            Querying Mellanox devices firmware ...

                             

                            Device #1:

                      • Re: Error /hca_self_test.ofed: line 165: [: too many arguments
                        neelpert1

                        All:


                        Same exact symptoms here when running MLNX OFED 2.1-1.0.6 on RHEL 6.5 x86.  No errors during mlnxofedinstall at all, nothing indicating that all RPM's did not install.

                         

                        I am going to attempt updating the adapter to the latest firmware and re-check to see if anything is corrected, but I am seeing the same things as the original poster of this thread.

                         

                        Output looks like this (ignore the Spanish - it means "too many arguments.")

                         

                        [root@master ~]# hca_self_test.ofed

                         

                         

                        ---- Performing Adapter Device Self Test ----

                        Number of CAs Detected ................. 1

                        PCI Device Check ....................... PASS

                        /usr/bin/hca_self_test.ofed: línea 165: [: demasiados argumentos

                        Host Driver RPM Check .................. FAIL

                            REASON: no RPMs found for currently booted kernel 2.6.32-431.23.3.el6.x86_64

                        Kernel Arch ............................ x86_64

                        Host Driver Version .................... NA

                        Firmware Check on CA #0 (VPI) .......... NA

                        Host Driver Initialization ............. NA

                        Number of CA Ports Active .............. NA

                        Error Counter Check .................... NA

                        Kernel Syslog Check .................... NA

                        Node GUID on CA #0 (VPI) ............... 00:02:c9:03:00:38:ed:60

                        ------------------ DONE ---------------------

                        • Re: Error /hca_self_test.ofed: line 165: [: too many arguments
                          neelpert1

                          In the end, we found out that this customer - without my knowledge - had actually bumped up their kernel with errata updates via the Red Hat Network.  Once we reconfigured OFED and built a custom ISO, the driver was re-installed and this command now works quite well.

                           

                          It is possible that the original poster of this thread has something other than the default, Red Hat/CentOS kernel (provided on DVD or ISO) and the Mellanox OFED drivers were not reconfigured and recompiled to support the kernel updates.

                           

                          Symptoms do not present during the installation script, which makes this process more confusing - no errors of any kind.  But once certain features are utilized then the problems begin.

                           

                          If the kernel has any updates of any kind following Linux ISO or DVD installation, then Mellanox OFED must be reconfigured to support the new kernel and the resulting custom ISO must be mounted and used.