1 Reply Latest reply on Nov 1, 2014 6:03 AM by halr

    message "SUBNET UP" is not found in log files

      I installed successfully MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on rhel6.4 machine with kernel 2.6.32-358.46.2.el6.x86_64.

       

      Then I started two daemons

      1.)/etc/init.d/openibd

      2.)/etc/init.d/opensmd

       

      If opensm was able to setup the subnet correctly then message "SUBNET UP" should seen in log files /var/log/opensm.log and /var/log/messages which is not found.

      The log file /var/log/opensm.log contains :

      Aug 11 02:26:50 235789 [4C8AF700] 0x03 -> OpenSM 4.1.5.MLNX20140424.25abcb5

      OpenSM 4.1.5.MLNX20140424.25abcb5

       

      Aug 11 02:26:50 235869 [4C8AF700] 0x80 -> OpenSM 4.1.5.MLNX20140424.25abcb5

      Using default GUID 0xf4521403002abf01

      Entering DISCOVERING state

       

      Aug 11 02:26:50 242489 [4C8AF700] 0x02 -> osm_vendor_init: 1000 pending umads specified

      Aug 11 02:26:50 264138 [4C8AF700] 0x80 -> Entering DISCOVERING state

      Entering STANDBY state

       

      Aug 11 02:26:50 276494 [4C8AF700] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xf4521403002abf01

      Aug 11 02:26:50 340494 [4C8AF700] 0x02 -> osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0xf4521403002abf01

      Aug 11 02:26:50 340559 [4C8AF700] 0x02 -> osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0xf4521403002abf01

      Aug 11 02:26:50 340628 [4C8AF700] 0x02 -> osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0xf4521403002abf01

      Aug 11 02:26:50 340700 [4C8AF700] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0xf4521403002abf01

      Aug 11 02:26:50 368521 [45AA2700] 0x80 -> Entering STANDBY state

      Aug 11 02:31:50 370767 [43C9F700] 0x01 -> log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC

                              SubnGetResp(SMInfo), attr_mod 0x0, TID 0x12f9

                              Initial path: 0,1,23 Return path: 0,4,1

      Aug 11 02:32:00 370839 [43C9F700] 0x01 -> log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC

                              SubnGetResp(SMInfo), attr_mod 0x0, TID 0x12fa

                              Initial path: 0,1,23 Return path: 0,4,1

      Aug 11 02:32:10 370884 [43C9F700] 0x01 -> log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC

                              SubnGetResp(SMInfo), attr_mod 0x0, TID 0x12fb

                              Initial path: 0,1,23 Return path: 0,4,1

      Entering DISCOVERING state

       

       

      It shows only two status STANDBY or DISCOVERY.What should I need to do make SUBNET UP status??

       

      Do I requires to configure opensm manually?? But file /etc/sysconfig/opensm is also missing.

       

      osmtest also results in a failure. :

       

      Command Line Arguments

      Done with args

          Flow = All Validations

      Aug 11 03:28:40 177806 [53627700] 0x7f -> Setting log level to: 0x03

      Aug 11 03:28:40 177956 [53627700] 0x02 -> osm_vendor_init: 1000 pending umads specified

      using default guid 0xf4521403002abf01

      Aug 11 03:28:40 220980 [53627700] 0x02 -> osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0xf4521403002abf01

      Aug 11 03:28:40 285077 [53627700] 0x02 -> osmtest_validate_sa_class_port_info:

      -----------------------------

      SA Class Port Info:

      base_ver:1

      class_ver:2

      cap_mask:0x2602

      cap_mask2:0x3E8

      resp_time_val:0x10

      -----------------------------

      Aug 11 03:28:40 285105 [53627700] 0x01 -> osmtest_create_db: ERR 0130: Unable to open inventory file (osmtest.dat)

      Aug 11 03:28:40 285114 [53627700] 0x01 -> osmtest_run: ERR 0145: Database creation failed (IB_ERROR)

      OSMTEST: TEST "All Validations" FAIL

       

      Is anything went wrong in my installation??

        • Re: message "SUBNET UP" is not found in log files
          halr

          Don't know if this is still an issue but some comments which may help:

           

          This OpenSM instance went into STANDBY mode which means there is other higher priority or same priority with lower GUID SM active on subnet.

           

          The MAD error status messages (SMInfo with status 0xc) are due to this STANDBY polling the MASTER SM and that node is rejecting that query for some unknown reason. That master SM is at direct route path of 0,1,23 from stsandby machine which means out port 1 of local machine to next hop switch and then out port 23 there. I would do smpquery -D nodeinfo 0,1,23 to see what node is there.

           

          Also, I think there is more recent MLNX OFED OpenSM available now. You might want to try that.

           

          osmtest failure is due to not having create inventory file first. That is done with something like osmtest -f c.