52 Replies Latest reply on May 24, 2013 8:05 PM by mblanke

    MHGH28-XTC not working

    mblanke

      MHGH28-XTC not working; I've tried installing it in Windows 7 amd Server 2008r2. In both instances I get a device error 10 in device manager. Further attempts to update the firmware have failed giving me an error "corrupted device ID 0xffffffff"  Can anyone assist or did I just buy bricked NICs? I will be attempting a Server 2012 install tonight.

       

      Any Assistance is appreciated.

        • Re: MHGH28-XTC not working
          yairi

          Hi mblanke,

           

          With my experience with Windows and ConnectX cards i learned that having the latest FW revision is a must.

          I am not sure if your card actually works and whether you tried updating the firmware of the card but there is no point in continuing unless the card runs one of those later versions.

          I think that the highest this card can go would be 2.9.1000.

          This page would have the FW you need. you will need to look at the sticker on the card for the right revision to determine which of the 3 available options is the right for you.

          Then, you need the MFT package that has the burning utility. should be right here

           

          other thoughts: try a different PCI-e slot..

           

          It doesn't sound like the issue has to do with the version of your windows. definitely sound like an HCA issue - either bad or runs an old FW.

           

           

          I hope it helps..

          • Re: MHGH28-XTC not working
            mblanke

            Not 100% sure I did this right

             

            C:\Users\Administrator>cd C:\Program Files\Mellanox\WinMFT

            C:\Program Files\Mellanox\WinMFT>mst status
            MST devices:
            ------------
              mt25418_pciconf0
              mt25418_pci_cr0

            C:\Program Files\Mellanox\WinMFT>mlxburn -dev_type 25418 -dev mt25418_pci_cr0 -i
            mage fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin
            -E- Read a corrupted device id (0xffff). Probably HW/PCI access problem
            -E- Can not open mt25418_pci_cr0:  MFE_CR_ERROR
            -E- Image burn failed: child process exited abnormally

            C:\Program Files\Mellanox\WinMFT>

             

             

            Here is a newbie question, should it make a difference at this point if the card is connected to a switch (It is not at the moment)

            Do I really need a switch if I'm going from server A to server B via cable?

              • Re: MHGH28-XTC not working
                yairi

                no... no need to connect to a switch in order to burn the FW.

                try with the other MST device please (mt25418_pciconf0)

                  • Re: MHGH28-XTC not working
                    mblanke

                    Are these boards multi layered? The cards I bought have some damage to them

                     

                    mhgh28-xtc.JPG.jpg

                     

                    Would this cause any issues?

                      • Re: MHGH28-XTC not working
                        yairi

                        well, the PCB board itself is certainly multi-layered. i can't really say if the damage you are pointing on would cause any malfunction.

                         

                        trying to recap on things here:

                        1) this is one of the oldest modules on this card (but still should work). i suspect it is running a very old FW that is not compatible with the OS and the driver.

                        2) no need for a switch between the two machines you have but you will need to run opensm daemon on one of the machines (from command line under the IB utilities directory).

                        3) unless you can get this card to run the latest FW, i am not sure if we can make any progress

                        4) potentially you have a bad HW in hand :-(

                          • Re: MHGH28-XTC not working
                            mblanke

                            so make sure opensm is running prior to trying to update the firmware?

                              • Re: MHGH28-XTC not working
                                yairi

                                no. Opensm is not a requirement for flashing FW but keep it in mind for later on after you are able to get a working set of cards.

                                1 of 1 people found this helpful
                                  • Re: MHGH28-XTC not working
                                    mblanke

                                    Thanks I'll try what you mentioned this morning. with the other device mt25418_pciconf0 though I have a feeling I'll be returning this to the seller (with difficulty)

                                      • Re: MHGH28-XTC not working
                                        yairi

                                        see if you can get your hands on a newer Infiniband card model (ConnectX2 or 3) with matching set of cables.

                                        1 of 1 people found this helpful
                                          • Re: MHGH28-XTC not working
                                            mblanke

                                            Ha thanks, a little out of my price range for home use.

                                              • Re: MHGH28-XTC not working
                                                justinclift

                                                Hi Marc-Andre,

                                                 

                                                Did you get these working?

                                                 

                                                I bought several of these cards too (exact same eBay auction ID from atlanticdeals), and they're working fine in RHEL 6.4 and CentOS 6.4.  So, there's definitely no hardware problem with the cards themselves.

                                                 

                                                That being said, it might be worth trying a different PCIe slot on your motherboard when flashing, just in case the slot you're using at the moment in in x4 mode or something instead of x8.  I remember having issues a few years ago with some PCIe slots with some older generation Mellanox cards.  (took a lot of hair pulling to sort that out :>)

                                                 

                                                Also... are your Linux skills in decent shape (yet)?  Just asking because a lot of the software tools for Infiniband seem more Linux oriented than Windows oriented. (my impression anyway)

                                                 

                                                Regards and best wishes,

                                                 

                                                Justin Clift

                                                  • Re: MHGH28-XTC not working
                                                    mblanke

                                                    I tried it on 2 different machines in 6 different pcie slots with 5 os's

                                                    (7, 8, 2008r2. 2012 and Ubuntu) sadly my ubuntu skills are depressing.

                                                     

                                                    I was able to update the card using a command given to me yesterday and I

                                                    was able to update the firmware. It was at 2.6.0  now all three are at

                                                    2.9.1000  The command that I was using was giving me a HW error so I wasn't

                                                    sure what was going on.

                                                      • Re: MHGH28-XTC not working
                                                        yairi

                                                        Marc,

                                                         

                                                        that's good news - you were able to update the FW. well done.

                                                        how things look afterwards?

                                                         

                                                        regarding OS's support: Windows - shouldn't be a problem with all the above you mentioned. Mellanox has drivers available for download on the web site. in fact, 2012 server comes with "inbox" driver (which means, you don't need to download anything).

                                                         

                                                        as for Linux - Ubuntu (and all other debian flavors) driver is not there yet. it is coming soon. should be any problem with working with all RH, CentOS and SUSE releases.

                                                         

                                                        good luck my friend.

                                                        • Re: MHGH28-XTC not working
                                                          justinclift

                                                          Yeah, my cards had the same firmware version when they arrived too.

                                                           

                                                          It's good you got the firmware updated, sounds like you're progressing.

                                                           

                                                          Are the cards being detected ok now, and they're now able to communicate between hosts?

                                                           

                                                          On another tangent, which cables did you end up getting?  I went with the el-cheapo option and got the cables here (arrived today), which are working well:

                                                           

                                                            http://www.ebay.co.uk/itm/251200441924

                                                           

                                                          If you do want to start trying out Linux stuff with Infiniband, as yairi mentioned it would be a much better idea for you to go with CentOS instead of Ubuntu (for now anyway).  Someone with really good Ubuntu skills could be ok, but when starting out... not so much.  People here can help with "exactly what to type" instructions for CentOS (not so much for Ubuntu generally).

                                                           

                                                          (Note - edited for typo fix)

                                                            • Re: MHGH28-XTC not working
                                                              mblanke

                                                              Yeah there was a lot of progress last night. The actual command that fixed it was

                                                               

                                                              mlxburn -dev_type 25418 -dev mt25418_pciconf0 -image fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin

                                                               

                                                              I'm waiting for the cables to arrive (might be a few days still, will try and borrow some from work). I was able to assign an IP and ping it, so that stack is at least working. Can't wait to get the cables now. I am running a Hyper-V server and will give CentOS a whirl

                                                                • Re: MHGH28-XTC not working
                                                                  justinclift

                                                                  Cool. 


                                                                  These adapters give all kinds of worrying status stuff if they don't have a link detected (ie cables plugged in).  They're different from the general run-of-the-mill standard GigE cards in that way.  So, a fair amount of your worry and frustration has probably come from that.

                                                                   

                                                                  But, don't stress, you're on the right track and not far from having things work.

                                                                   

                                                                  With CentOS, (if you're not sure) go for CentOS 6.4 x86_64. (ie 64-bit version not 32-bit)

                                                                   

                                                                  If you have any troubles, you're completely ok to ask for assistance.

                                                                    • Re: MHGH28-XTC not working
                                                                      mblanke

                                                                      Thanks Justin.

                                                                       

                                                                      I think the most frustrating part was seeing the error in windows and not being able to update the firmware when everything indicated it was a fw issue.

                                                                      I'm in the process of rebuilding my servers and installing them in rack mounts (part of the man cave) so it might be a little while before I get the time to tinker with CentOS. I'll keep your offer in mind though.

                                                                       

                                                                      Would anyone be able to list that last post of mine with the mlxburn command as a valid answer?

                                                                        • Re: MHGH28-XTC not working
                                                                          yairi

                                                                          Just to save troubles down the road - RH/CentOS 6.4 is very new. Mellanox didn't release a driver for IB yet for this kernel. it will probably get released in few months.

                                                                          you can:

                                                                          - switch over to 6.3 or

                                                                          - continue with 6.4 and work with with the inbox driver (which should be fine i guess).

                                                                           

                                                                          i will mark the post about the mlxburn as the correct answer.

                                                                           

                                                                          Cheers..

                                                                            • Re: MHGH28-XTC not working
                                                                              justinclift

                                                                              For these cards, the drivers installed by the default "Infiniband Support" yum group in CentOS 6.3 worked and the test boxes have since been upgraded to CentOS 6.4. (still all ok)

                                                                               

                                                                              I haven't done any in depth testing yet (only received cables today), so you could be right.

                                                                                • Re: MHGH28-XTC not working
                                                                                  toddh

                                                                                  Just saw this post.  I picked up some HP 448397-B21 cards at one point that were MHGH28-XTC and was getting errors like this.  I used the flint command as follows.  The -nofs allows burning without certain failsafes which solved my errors. 

                                                                                   

                                                                                  flint -d mt25418_pciconf0 -i fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin -nofs -allow_psid_change burn

                                                                                   



                                                                                  You will see the following


                                                                                  Current FW version on flash:  2.6.0

                                                                                      New FW version:               2.9.1000

                                                                                   



                                                                                      You are about to replace current PSID on flash - "HP_09D0000001" with a different PSID - "MT_04A0120002".

                                                                                      Note: It is highly recommended not to change the PSID.


                                                                                  Do you want to continue ? (y/n) [n] : y


                                                                                  Burn process will not be failsafe. No checks will be performed.

                                                                                  ALL flash, including the Invariant Sector will be overwritten.

                                                                                  If this process fails, computer may remain in an inoperable state.


                                                                                  Do you want to continue ? (y/n) [n] : y

                                                                          • Re: MHGH28-XTC not working
                                                                            justinclift

                                                                            mblanke - Btw, I think you might be the only one able to mark your post with the working mlxburn command as the answer (other than admin persons I guess).

                                                  • Re: MHGH28-XTC not working

                                                    Not wishing to hijack but I have noticed an bigger issue with this card with a new Windows install since I have received a few (yep from EBay).

                                                     

                                                    Note: my card seems to be an A1.

                                                     

                                                    On installing the card on my server, Windows Server 2012 Standard reports a conflicting drivers issue (standard Windows drivers).  I then updated with the Mellanox package but this resulted in the same issue (uninstalling and deleting the old driver made no difference).

                                                     

                                                    I decided to put a fresh install on my server to clear out anything that may be causing issues (was about time and this is a home setup).

                                                     

                                                    On installing Windows Server 2012 Essentials the server went through all the usual install steps and reboots until just before it would let you finally login fro the first time and then it reported 'An install error has occured' with the only option being to save the log file and shutdown.  You can see the system sitting behind the install screen but cannot access it.  I tried a second time from fresh drives to make sure and the same thing happened again.

                                                     

                                                    I then installed ESXi 5.1 and installed Win Server 2012 Essentials as a VM on top.  Everything was fine unitl I used VT-d to passthrough the Mellanox controller to the VM.  After this the VM would not start.  It would hang at the Windows loading screen and then just revert to a power off state.

                                                     

                                                    I finally pulled the card and installed Win Server 2012 Ess on to the bare server again and it installed without an issue.  Tonight I will be trying to put the card back in and seeing what I can do.

                                                     

                                                    The reason for wanting Win Server is that I need OpenSM running and do not intend to have any Linux environments with direct control of a Infiniband HCA as they will all be in VMs running on ESXi (whos drivers also don't have a subnet manager).

                                                     

                                                    The Server with Windows is a HP ML110G7 so pretty standard hardware.  The cards seem to be fine on my ESXi 5.1 servers, as much as I can tell without a subnet manager running.  I suspect the firmware on the card is the problem (fingers crossed) but have not been able to verify yet.

                                                     

                                                    The info on flashing the card in this thread will most likely be very helpful.  Thanks

                                                      • Re: MHGH28-XTC not working
                                                        mblanke

                                                        I tried using server 2012 and wasn't happy with the driver support. I'm on 2008r2 and I just finished the install last night. I have the A3 rev. and the firmware is 2.9.1000. 

                                                        When you type mst status what does it give you?

                                                        Sent from Mailbox for iPhone

                                                          • Re: MHGH28-XTC not working
                                                            yairi

                                                            Hi Marc,

                                                             

                                                            mst status should provide you with the list of devices available for accessing the card should you need to burn FW onto it. in your case you don't need that anymore.

                                                            once you got the driver loading on your machine, you should see it in device manager. also you should see a couple of IPoIB network interfaces (one for each port you have for the card). configure those with IP addresses just like you would configure any Eth interface.

                                                            Don't forget to run opensm daemon on one of the servers. without that nothing will work.

                                                             

                                                            Cheers..

                                                            Yair Ifergan.

                                                              • Re: MHGH28-XTC not working
                                                                mblanke

                                                                That's how i'm setting it up. I was just trying to help out rimblock as he seems to be having the same issues as I was having. rimblock Are you getting an error code 10?

                                                                  • Re: MHGH28-XTC not working

                                                                    Hi,

                                                                     

                                                                    The mst status command responds with...

                                                                    MST devices:

                                                                    ------------

                                                                      mt25418_pciconf0

                                                                      mt25418_pci_cr0

                                                                     

                                                                    C:\Program Files\Mellanox\WinMFT>mlxburn -dev_type 25418 -dev mt25418_pci_cr0 -i

                                                                    mage fw-25408-2_9_1000-MHGH28-XTC_A1.bin

                                                                    -E- Read a corrupted device id (0xffff). Probably HW/PCI access problem

                                                                    -E- Can not open mt25418_pci_cr0:  MFE_CR_ERROR

                                                                    -E- Image burn failed: child process exited abnormally

                                                                     

                                                                    Same as you had

                                                                     

                                                                    Trying Todds flint command...

                                                                     

                                                                    C:\Program Files\Mellanox\WinMFT>flint -d mt25418_pciconf0 -i fw-25408-2_9_1000-

                                                                    MHGH28-XTC_A1.bin  -nofs -allow_psid_change burn

                                                                     

                                                                        Current FW version on flash:  2.6.0

                                                                        New FW version:               2.9.1000

                                                                     

                                                                        You are about to replace current PSID on flash - "HP_09D0000001" with a diff

                                                                    erent PSID - "MT_04A0110002".

                                                                        Note: It is highly recommended not to change the PSID.

                                                                     

                                                                    Do you want to continue ? (y/n) [n] : y

                                                                     

                                                                    Burn process will not be failsafe. No checks will be performed.

                                                                    ALL flash, including the Invariant Sector will be overwritten.

                                                                    If this process fails, computer may remain in an inoperable state.

                                                                     

                                                                    Do you want to continue ? (y/n) [n] : y

                                                                     

                                                                    Burning FW image without signatures  - OK

                                                                    Restoring signature                  - OK

                                                                     

                                                                    C:\Program Files\Mellanox\WinMFT>

                                                                     

                                                                    Now I am seeing

                                                                    "Insufficient system resources exist to complete the API." in the device manager.

                                                                     

                                                                    Trying for a reboot.

                                                                      • Re: MHGH28-XTC not working
                                                                        yairi

                                                                        check if the card is on a X8 PCI-e slot.

                                                                        • Re: MHGH28-XTC not working
                                                                          yairi

                                                                          Also... Look at toddh post above. he was able to use the flint tool with -nofs flag and get by

                                                                            • Re: MHGH28-XTC not working

                                                                              Opensm is now running

                                                                               

                                                                              C:\Program Files\Mellanox\MLNX_VPI\IB\Tools>ibstat

                                                                              CA 'ibv_device0'

                                                                                      CA type:

                                                                                      Number of ports: 2

                                                                                      Firmware version: 0x2000903e8

                                                                                      Hardware version: 0xa0

                                                                                      Node GUID: 0x001635ffffbfaa74

                                                                                      System image GUID: 0x001635ffffbfaa77

                                                                                      Port 1:

                                                                                              State: Active

                                                                                              Physical state: LinkUp

                                                                                              Rate: 20

                                                                                              Base lid: 1

                                                                                              LMC: 0

                                                                                              SM lid: 1

                                                                                              Capability mask: 0x90580000

                                                                                              Port GUID: 0x001635ffffbfaa75

                                                                                      Port 2:

                                                                                              State: Down

                                                                                              Physical state: Polling

                                                                                              Rate: 10

                                                                                              Base lid: 0

                                                                                              LMC: 0

                                                                                              SM lid: 0

                                                                                              Capability mask: 0x90580000

                                                                                              Port GUID: 0x001635ffffbfaa76

                                                                               

                                                                              C:\Program Files\Mellanox\MLNX_VPI\IB\Tools>

                                                                               

                                                                              Only one port is connected to a cable so this looks about right.  I am also trying for SRP (iSER etc) not IBoIP so rate of 20 also looks good.

                                                                               

                                                                              Just need to see if my Solaris box and see the Windows box now.

                                                                               

                                                                              .

                                                                                • Re: MHGH28-XTC not working

                                                                                  Ok, now I see what ;unhappy with driver support' means .

                                                                                   

                                                                                  There appears to be no SRP driver in the Windows 2012 package .  That is a fair sized blow to my plans.  Oh well, back to Windows 2008r2 then.

                                                                                    • Re: MHGH28-XTC not working
                                                                                      toddh

                                                                                      Rim,

                                                                                       

                                                                                      I spoke with Mellanox and confirmed that there is no SRP in the 4.2 driver.   The Mellanox guys have been very responsive and helpful.

                                                                                       

                                                                                      My take is that 4.x drivers primarily appear to be Windows 2012 Server to Windows 2012 Server.  A collaboration between Mellanox and Microsoft to get infiniband support into the new OS. The big push of course was SMB 3.0.  As far as I can tell they have done a good job.

                                                                                       

                                                                                      However if you want to connect a Server 2012 box to anything besides another Windows box then the drivers are lacking supported protocols.  

                                                                                       

                                                                                      My sincerest hopes are that the next version of the drivers will include SRP and allow use of the full bandwidth and RDMA features to nix based boxes. 

                                                                                        • Re: MHGH28-XTC not working

                                                                                          Cheers Todd.

                                                                                           

                                                                                          I have finally got SRP working between ESXi 5.1 and Solaris and have put on Windows SBS 2011 Essentials on another box.  The drives for Win 2008r2 do have a SRP miniport but it is throwing up the (unable to start) error, the Subnet Manager etc are all starting fine.

                                                                                           

                                                                                          One post I came across suggested that if you have more than 2 targets then the Windows miniport will fail to start.  I have 4 .  I will probably have to remove a couple and then try again but after 3 painful days to get to this stage, I will leave it for another day this coming week.

                                                                                           

                                                                                          I also agree, the Mellanox guys have been very proactive and helpful which is pretty rare and to be applauded.

                                                                                      • Re: MHGH28-XTC not working

                                                                                        Hmm,

                                                                                         

                                                                                        I seem to be hitting a wall again.

                                                                                         

                                                                                        The above mentioned flashes worked fine for my two Rev A1 cards but my two rev A2 cards just sit there taking 90% cpu and appear to be doing nothing.  There is no text output after I type the command in and hit return.

                                                                                         

                                                                                        mft status reports as you would expect (see posts above) but mlxburn and flint both just sit there with a blank line and 90% cpu.

                                                                                         

                                                                                        flint -d device_name -q reports that it cannot get semaphores (63)

                                                                                         

                                                                                        I have run flint -clear-semaphore -d device_id which seems to complete fine but mlxburn and flint for flashing just sits there doing nothing when run after.

                                                                                         

                                                                                        I will try the exact command Todd used for his A2-A3 card but I suspect it is the same as I have been running already.

                                                                                         

                                                                                        Update: No luck with the command Todd used.

                                                                                         

                                                                                        I also see a lot of errors (IB_Timeouts) in the /vat/log/opensm.log on Linux which is running OpenSM.  THis one jumps out at me (see attachment for them all).

                                                                                         

                                                                                        Apr 17 21:00:06 008048 [7A680700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3120 Timeout while getting attribute 0x15 (PortInfo); Possible mis-set mkey?

                                                                                         

                                                                                        Suggestions ?.

                                                                                          • Re: MHGH28-XTC not working
                                                                                            justinclift

                                                                                            @rimblock - As a curiosity question, are you definitely using the A2/A3 firmware?  It's a different firmware download than the A1:

                                                                                             

                                                                                              http://www.mellanox.com/downloads/firmware/fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin.zip

                                                                                             

                                                                                            Just in case you forgot.

                                                                                              • Re: MHGH28-XTC not working

                                                                                                Yeah, spotted that .

                                                                                                 

                                                                                                What I now have to do is track down which cards are reporting the possible mis-set key and I suspect it will be the A1s that were updated by flint.

                                                                                                 

                                                                                                A mixture of cards and revisions is really not helping .

                                                                                                 

                                                                                                Port 1 - Solaris 11.1 - ConnectX-2 (SAN)

                                                                                                Port 2 - ESXi 5.1: ConnectX A1 (flashed)

                                                                                                Port 3 - ESXi 5.1: ConnectX A2 (unflashed)

                                                                                                Port 4 - CentOS 6.4 (OpenSM): ConnectX A2 (unflashed)

                                                                                                Port 5 - ESXi 5.1: ConnectX A1 (flashed)

                                                                                                 

                                                                                                I have just changed the port 2 connectx A1 card for an A2 card to see if that makes a difference.

                                                                                                  • Re: MHGH28-XTC not working

                                                                                                    Ok, all the error messages have gone away after changing the A1 (flashed) card to the A2 (unflashed) in the machine on port 2.  It was also able to pickup the SRP target straight away again. 

                                                                                                     

                                                                                                    The port 5 machine is not on so I am guessing that is why there are no errors from that A1 card.  This is the machine with Windows 2012 Ess on it that will not work with SRP.  I have now put Windows in a VM but the underlying ESXi also cannot see the SAN SRP target.  The A1 card would not work at all with Windows unless it was flashed.

                                                                                                     

                                                                                                    Hopefully there is a little tweeking that can be done with one of the firmware tools to correct this issue but I have no idea what that may be .

                                                                                                      • Re: MHGH28-XTC not working

                                                                                                        The current setup seems to be working and is this...

                                                                                                        Port 1 - Solaris 11.1 - ConnectX-2 (SAN)

                                                                                                        Port 2 - ESXi 5.1: ConnectX A2 (unflashed)

                                                                                                        Port 3 - ESXi 5.1: none

                                                                                                        Port 4 - CentOS 6.4 (OpenSM): ConnectX  A1 (flashed)

                                                                                                        Port 5 - ESXi 5.1: ConnectX A2 (unflashed)

                                                                                                         

                                                                                                        So, the two core ESXi 5.1 hosts have ver A2, the OpenSM box has a ver A1 and the san has a X-2.

                                                                                                         

                                                                                                        I had to remove the Mellanox ESXi vib and then reinstall it as it did not 'see' the A2 card.  After the reinstall the card popped up and the targets were available.

                                                                                                         

                                                                                                        It also seems that ver A1 will not work with VT-d (passthrough).  After assigning the card for passthrough on the ESXi host, rebooting and then adding the card to the Windows 2012 Ess VM as a PCI devce, the VM will not start.  It produces a caught error (at least not a PSOD).  I have not yet tried with the A2 version.

                                                                                                          • Re: MHGH28-XTC not working

                                                                                                            Ok, it appears that the 2.9 firmware is not playing nice with the ConnectX cards in ESXi servers.

                                                                                                             

                                                                                                            It is resulting in the "IB_Timeout" and "mkey incorrectly set" errors.  That explains the results by swapping the cards above.  The 2.7 firmware does seem to work though.

                                                                                                             

                                                                                                            Thanks to Chuckleb (at Serve The Home forums) for the results of his investigation.

                                                                                                             

                                                                                                            RB

                                                                                    • Re: MHGH28-XTC not working
                                                                                      mblanke

                                                                                      yairi rimblock

                                                                                      Sorry for joining in late i've been pre-occupied.

                                                                                      I have the cards "working" as in the firmware and drivers are recognized. I have 2 servers running win 2008r2 and a win7 x64

                                                                                      The DC is configured with opensm and I can use it to connect one of two ports to either machine, however the other port of the server doesn't connect. Do I have to run opensm on each port ?

                                                                                      DC port 1 to RAID port 1

                                                                                      DC port 2 to Win7 port 1

                                                                                      RAID port 2 to Win7 port 2

                                                                                      is the way I have it wired, DC ports will work one at a time if I disconnect the other one turns on.

                                                                                       

                                                                                      any suggestions?

                                                                                        • Re: MHGH28-XTC not working
                                                                                          justinclift

                                                                                          No worries about the time gap.  We all get super busy at times and prioritise, etc.

                                                                                           

                                                                                          With the problem you're experiencing, a fundamental bit of info is that OpenSM only attaches to one port when it runs.  By default, the first one it finds in a server (can be overridden in config file).

                                                                                           

                                                                                          The way to think about it is that OpenSM starts up and locates the first Infiniband port, then explores/maps the network topology by finding whatever it can through that one port.

                                                                                           

                                                                                          The reason I'm emphasising the "through that one port" bit, is to try and highlight that OpenSM won't see or recognise any of the other ports in that same server (unless there's an Infiniband switch in place to let the first port see the other ports).

                                                                                           

                                                                                          One way to get around this is to have your all of your 3 nodes cabled from port 1 (on one box) to port 2 (on the next box), then run OpenSM on them all.  That way all ports will come up and be active.

                                                                                           

                                                                                          I do this with a 2 node setup (port 1 of each box connecting to port 2 of the other, OpenSM running on both), then I run IPoIB over the top and set up individual IP subnetting for each group of ports so IP connectivity "just works".

                                                                                           

                                                                                          I haven't yet tried it with a 3 node setup, but probably will do in a few weeks after I'm back in the UK.

                                                                                           

                                                                                          Does this help?

                                                                                           

                                                                                          (note - edited to improve clarity a bit)