21 Replies Latest reply on Apr 23, 2013 12:13 PM by yairi

    ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

      Hello, I'm trying to use the mlx4_en driver instead of mlx4_ib to hopefully increase bandwidth in a filtering device that uses ConnectX-2 MT26428 adapters).

       

      I did a fresh install of CentOS-6.2 since the infiniband adapters are running firmware 2.8 and the mlnx drivers that support this version have been tested with RHEL-6.2 according to the release notes. Kernel used is 2.6.32-358.2.1.el6.x86_64.

       

      With the standard centos drivers (mlx4_core 1.1), I saw this in dmesg:

       

      command 0xc failed: fw status = 0x40

       

      And 'modprobe mlx4_en' didn't create any ethernet devices. Modprobing mlx4_ib ib_sa ib_cm ib_umad ib_addr ib_uverbs ib_ipoib ib_ipath resulted in ib0 and ib1 showing up.

       

      I downloaded mlnx_en-1.5.8.3.tgz (mlx4_1.5.7.2) from the download archives and mlx4_en still doesn't create an ethernet device, but the error in dmesg is:

       

      mlx4_core: Mellanox ConnectX core driver v1.0-mlnx_ofed1.5.3 (November 3, 2011)

      ...

      mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.7.2 (Dec 2011)

      ...

      mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

       

      I've compiled and installed all versions of the drivers by

      - extracting the tgz from mellanox

      - rpm2cpio SRPMS/mellanox-mlnx-en-x.y.z.tgz

      - extract, run scripts/mlnx_en_patch.sh; 2 errors:

        - kernel_patches/backport/2.6.32-EL6.2/dma_mapping*.patch does not exist

        - kernel_patches/backport/2.6.32-EL6.2/memtrack*.patch does not exist

      - make -> no errors, .ko files resulted

       

      I've also tried mlnx_en drivers 1.5.9 and 1.5.10 from an ubuntu-server 12.04 (kernel 3.2) with the same results. Using the mlx4_ib driver, I could do netperf tests across two servers and the devices were functional.

       

      Is there any other setup step required for using the mlx4_en driver ?

        • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
          yairi

          Hi,

           

          There could be few things going on here. here is my list ordered with most reasonable at top and go from there:

          1) Your HCA is configured to work with Infiniband and not with Eth. you will need to load the MellanoxOFED stack (because the tool we need for flipping this HCA back to Eth is there). then use tool "connectx_port_config" to configure both ports to be in Eth mode

          2) use the latest FW available for this card. the one you have (2.8.X) is too old and might give you grief later on.

          Mellanox Technologies: Firmware Download

          3) recommending on using driver version 1.5.9 or.10 (but get above #1 and 2 done first).

           

          let me know how things go..

            • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

              Just to add to yairi's note, you can also flip from IB to Eth by setting this module parameter:

              options mlx4_core port_type_array="2,2"

              or write directly in procfs:

              echo eth > /sys/bus/pci/devices/0000\:20\:00.0/mlx4_port2

              echo eth > /sys/bus/pci/devices/0000\:20\:00.0/mlx4_port1

                • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                  Thanks for the hint. I just did that on both connected servers and then rmmod mlx4_en mlx4_core, then modprobe mlx4_en.

                   

                  No mac-related errors in dmesg this time, with these versions:

                  ConnectX core driver v1.0-mlnx_ofed1.5.3 (November 3, 2011)

                  ConnectX HCA Ethernet driver v1.5.8.3 (June 2012)

                   

                  But ifconfig -a still doesn't show any additional adapters.

                   

                  Firmware is still at 2.8 on both adapters, will try updating them next using mstflint since I can't figure out which flag makes mlxburn accept a simple .bin firmware (that I can get from [1]).

                   

                  [1] Mellanox Technologies: Support

                    • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                      Does this show eth after you reloaded the modules?

                      cat  /sys/bus/pci/devices/0000\:20\:00.0/mlx4_port1

                        • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                          Apparently the port-type gets reset to ib after I rmmod/modprobe mlx4_core:

                           

                          # echo eth > /sys/bus/pci/devices/0000\:06\:00.0/mlx4_port2

                          # echo eth > /sys/bus/pci/devices/0000\:06\:00.0/mlx4_port1

                          # dmesg | tail -n 3

                          mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.8.3 (June 2012)

                          mlx4_en 0000:06:00.0: Activating port:1

                          mlx4_en: 0000:06:00.0: Port 1: Port: 1, invalid mac burned: 0x0, quiting

                           

                          # rmmod mlx4_en mlx4_core

                          # modprobe mlx4_core

                          # modprobe mlx4_en

                          # cat /sys/bus/pci/devices/0000\:06\:00.0/mlx4_port*

                          ib

                          ib

                            • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                              Yes, that's normal. You need to pass that parameter to modprobe:

                              # modprobe mlx4_core port_type_array="2,2"

                              Otherwise it defaults to "ib"

                               

                              You can also set that options in /etc/modprobe.d/*.conf (or something similar for CentOS) so you don't have to specify it every time you use modprobe.

                               

                              But you still get that "invalid mac burned" error, which might be related to firmware.

                                • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                  Just updated firmware to 2.9.1000. 'mstflint' warned that PSID's didn't match (originally I had HP_0160000009, it got burned to MT_0D70110009). Same dmesg error about 'invalid mac burned' with this firmware and mlx4_en 1.5.8.3.

                                   

                                  I can backup the HP_0160000009 firmware from another board and re-burn it if needed.

                                  I did pass that port_type_array option to modprobe when testing.

                                    • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                      I looked at the source code, and as odd as it sounds it looks like your adapter does not have a MAC address (or it's set to 0x0)

                                      ofa_kernel-1.5.3/drivers/net/mlx4/mlx4_en.h:

                                          #define ILLEGAL_MAC(addr)(addr == 0xffffffffffffULL || addr == 0x0)

                                       

                                      ofa_kernel-1.5.3/drivers/net/mlx4/en_netdev.c

                                             priv->mac = mdev->dev->caps.def_mac[priv->port];

                                              if (ILLEGAL_MAC(priv->mac)) {

                                                      mlx4_err(mdev, "Port: %d, invalid mac burned: 0x%llx, quiting\n",

                                                               priv->port, priv->mac);

                                       

                                      The serial number reported by lspci should include the mac address, I am curious if that is zero too.

                                      # lspci -vvv -s 06:00.0 | grep Serial

                                          Capabilities: [148] Device Serial Number 00-02-c9-03-00-12-83-44

                                        • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
                                          yairi

                                          Try updating the FW of the card to the latest. i know that there should be an option in the newer Firmware to create the MAC our of the card's number when the FW starts.

                                           

                                          give it a try..

                                          1 of 1 people found this helpful
                                          • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                            Actually, it's pretty easy to set/change the MAC (page 29 on http://www.mellanox.com/pdf/MFT/MFT_user_manual.pdf)

                                             

                                            # mst start

                                             

                                            Query the mac:

                                            # flint -d /dev/mst/mt25418_pci_cr0 -qq q

                                             

                                            Change the mac:

                                            # flint -d /dev/mst/mt25418_pci_cr0 -mac 02c90abcdef0 sg

                                             

                                            Change the GUID:

                                            # flint -d /dev/mst/mt25418_pci_cr0 -guid 0002c9000abcdef0 sg

                                             

                                            # flint -d /dev/mst/mt25418_pci_cr0 -qq q

                                              • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                                Thanks, the flint -mac hints were spot on. I can now see the two eth* devices in ifconfig, but when doing:

                                                machine1$ ifconfig eth0 192.192.168.1.1

                                                machine2$ ifconfig eth0 192.192.168.1.2

                                                this comes up in dmesg:

                                                mlx4_en 0000:06:00.0: Activating port:1

                                                mlx4_en: eth0: Using 16 TX rings

                                                mlx4_en: eth0: Using 16 RX rings

                                                mlx4_en: eth0: Initializing port
                                                ADDRCONF(NETDEV_UP): eth0: link is not ready

                                                And ping doesn't work between the machines.

                                                 

                                                When setting port types to ib and using the ib0 devices, ping works, netperf tests work, etc (I started an opensm service on one machine). I've tried "ifconfig eth0 down" then back up on the machines with no success. I checked with ibdev2netdev that I was using the right eth device (ib0 <==> eth0).

                                                 

                                                I did "yum groupinstall Infiniband\ Support" and then configured things in /etc/rdma but got the same result ("link is not ready").

                                                 

                                                Is there anything I missed ? The MLNX_OFED manual doesn't list anything besides ifconfig under "4.1.9 A detailed example".

                                                  • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                                    Are the ports connected to an Ethernet switch now?

                                                    Can you check the configuration of the switch ports, speed, autonegotiation, and make sure the switchports are "no shutdown".

                                                    You can also crossover connect two machines, and if that works then the switch might be the problem.

                                                    • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
                                                      yairi

                                                      Sorry for the dumb question but why is your ipv4 address contains 5 segments?

                                                      was this a mistake?

                                                      machine1$ ifconfig eth0 192.192.168.1.1

                                                      machine2$ ifconfig eth0 192.192.168.1.2

                                                        • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                                          Because more is better J

                                                           

                                                          I edited the commands in the reply, trying to make it clear what I did. Ifconfig doesn't accept 5-segment ipv4 addresses (it outputs 'unknown host').

                                                           

                                                          Thanks for the list of things to try w.r.t. the switch. The issue is most likely there, I'll let you know how things progress.

                                                           

                                                          EDIT: I compiled earlier a libsdp.so and started netperf with the lib LD_PRELOAD'ed. The result was about 18Gb/s (at 100%CPU), which was encouraging.

                                                          • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                                            Time for a dumb question from my side: the ib ports go into an "HP 4X QDR InfiniBand Switch Module for c-Class BladeSystem" (part number 489184-B21). Since I haven't seen any mention in the switch's specs for supporting ethernet mode, should I even attempt to use mlx4_en ? It seems to me that special support would be required in the switch ports for that, and the HP QDR switch is (ipo)ib - only.

                                                             

                                                            In any case, the performance over ipoib went up from 3Gb/s to 11Gb/s curiously after I did this:

                                                            - run a netperf via ipoib -> 3Gb/s (100% CPU)

                                                            - run a netperf via ipoib with SDP -> 18Gb/s (100% CPU)

                                                            - run a netperf via ipoib, no SDP -> 11Gb/s (80% CPU)

                                                             

                                                            so maybe enabling SDP flipped some setting that now gets me 11Gb/s at ~80% CPU. Which is plenty for what we need.

                                                             

                                                            Thanks yairi, Sorin and Justin for your persistence in helping me get this set up.

                                                              • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
                                                                yairi

                                                                The HP blade switch is not a VPI switch and therefore only work with IB. In one of my previous posts I recommended on on using connectx_port_config tool to configure the card to the right mode. There are 3 modes that the VPI NIC card could have:

                                                                1) IB Only - with this mode it can only work with IBM switches

                                                                2) Eth only - with this mode is can only work with Eth switches

                                                                3) Auto sense - with this mode it can work against either (or VPI switch) and will sense the port against.

                                                                 

                                                                For working with Eth. you will need an Eth blade. The connectx_port_config tool should set the mode of the card persistently.

                                                                • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
                                                                  justinclift

                                                                  Yeah, sounds like you'd better run the cards with IPoIB instead of native Ethernet mode, else the switches won't talk to them.

                                                                   

                                                                  For the speed difference, I kind of wonder if the SDP change toggled "connected mode" on.

                                                                   

                                                                  An easy way to check is by doing an "ip addr" or "ifconfig" to show your ethernet/IPoIB settings.

                                                                   

                                                                  If the adapters are not in connected mode, their mtu value will generally be around 2044 or similar (with mine anyway).  When connected mode is enabled, the mtu shows up at about 65535 (ie 65k).

                                                                   

                                                                  If that's what the difference turns out to be (no idea :>), then connected mode can be enabled "properly" by changing a setting in /etc/rdma/ ... something.  Not remembering off the top of my head, but you should be able to find it pretty easily.

                                          • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
                                            justinclift

                                            Hmmm, it kind of sounds like you're just wanting to run the adapters in native 10 GbE mode instead of in IB mode.

                                             

                                            That's super simple to do if you're using the Infiniband stuff that comes with CentOS. (the "Infiniband Support" yum group.)  It's just a setting you change in one of the /etc/rdma/ conf files.

                                            • Re: ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

                                              The correct answer to "mlx4_en: Port: 1, invalid mac burned: 0x0, quiting"

                                              is to write a MAC address into the firmware: flint -d /dev/mst/mt25418_pci_cr0 -mac 02c90abcdef0 sg

                                               

                                              Installing the latest firmware does not solve that.

                                              1 of 1 people found this helpful