7 Replies Latest reply on Jun 22, 2015 6:45 AM by tgale96

    Problem configuring and using IPoIB

      I have IB connecting two nodes, and if i run ibhosts i can see both nodes GUID but when I run ibqueryerrors it says "2 bad nodes found" and "2 ports have errors beyond thresholds". I have given the nodes the IP addresses 10.0.0.0 and 10.0.0.1 in my if cfg-ib0 and when i try and ping one from the other it times out. Looking in the log the other node has been assigned a LID, and there is no indication of an error but the messages log does not say SUBNET UP. I'm new to IB and don't really know what the error might be. Has anyone encountered this error before/knows where I can find more info?

        • Re: Problem configuring and using IPoIB

          After messing with it again last night, I seem to have IPoIB configured as the MTU dropped to 2044 and the IPs that I assigned each node in their ifcfg-ib0 show up when i run "ifconfig ib0". Now I see the SUBNET UP message in my log but I still don't have connection. When I run ping 10.0.0.1 it times out, and when I try ibping -G <GUID> it times out as well.

           

          I ran osmtest and got all of these errors,

          Screen Shot 2015-06-18 at 10.31.43 AM.png

          and even a few more above that I couldn't fit into the screen shot.

           

          My opensm.log on the head node running opensm shows that the subnet was up, but a number of errors showed up once I ran osmtest.

          Screen Shot 2015-06-18 at 10.28.36 AM.png

          I can't really find anything in my setup that looks like an error, or anything that indicates where the error is. I am using older HCAs 26428s and when I ran mlnxofedinstall it said it couldn't update the firmware. Is this potentially a firmware issue?

           

          I installed flexboth nodes HCA as well if it makes any difference.

           

          Thanks,

          Trevor