3 Replies Latest reply on Oct 25, 2015 12:04 AM by elurex

    Mellanox ConnectX-3 SR-IOV problem

    mzhang

      Hi, All,

       

      I have spent quite some time searching around for solutions. Tutorials and Q&As like:

      https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/sect-Virtualization_Host_Configuration_and_Guest_Installation_Guide-SR_IOV-How_SR_IOV_Libvirt_Works.html

      https://community.mellanox.com/docs/DOC-1317

      https://community.mellanox.com/docs/DOC-1484

      are all very helpful. However, they primarily focus on how to create and pass Mellanox VFs to the guest, and stop right there. Unfortunately, although my guest can see the VF as a pci device, it failed on installing the driver. Here are some details:

       

      Host: Intel Xeon CPU E5-2620 v3 @ 2.40GHz

               Debian 7

               Mellanox ConnectX-3 dual port

               Mellanox OFED driver v2.4-1.0.0.1

               VT-d and VT-x enabled in BIOS

               intel_iommu=on in kernel option

      /etc/modprobe.d/mlx4_core.conf:

      options mlx4_core port_type_array=2,2 num_vfs=4,4,0 probe_vf=4,4,0 enable_64b_cqe_eqe=0 log_num_mgm_entry_size=-1

      I can see virtual functions created on host via "lspci -nn | grep Mellanox":

      04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]

      04:00.1 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:00.2 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:00.3 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:00.4 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:00.5 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:00.6 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:00.7 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      04:01.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

       

      I also enabled MSI-X on the host Mellanox card driver, as shown in "lspci -vv -s 04:00.0"

      04:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

          Subsystem: Mellanox Technologies Device 0049

          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

          Latency: 0, Cache Line Size: 64 bytes

          Interrupt: pin A routed to IRQ 32

          Region 0: Memory at c7200000 (64-bit, non-prefetchable) [size=1M]

          Region 2: Memory at c5000000 (64-bit, prefetchable) [size=8M]

          Expansion ROM at c7100000 [disabled] [size=1M]

          Capabilities: [40] Power Management version 3

              Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

          Capabilities: [48] Vital Product Data

              Product Name: CX312A - ConnectX-3 SFP+

              Read-only fields:

                  [PN] Part number: MCX312A-XCBT        

                  [EC] Engineering changes: A9

                  [SN] Serial number: MT1445K01104           

                  [V0] Vendor specific: PCIe Gen3 x8   

                  [RV] Reserved: checksum good, 0 byte(s) reserved

              Read/write fields:

                  [V1] Vendor specific: N/A  

                  [YA] Asset tag: N/A                    

                  [RW] Read-write area: 105 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 253 byte(s) free

                  [RW] Read-write area: 252 byte(s) free

              End

          Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-

              Vector table: BAR=0 offset=0007c000

              PBA: BAR=0 offset=0007d000

          Capabilities: [60] Express (v2) Endpoint, MSI 00

              DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

                  ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

              DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

                  RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

                  MaxPayload 256 bytes, MaxReadReq 512 bytes

              DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

              LnkCap:    Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited

                  ClockPM- Surprise- LLActRep- BwNot-

              LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+

                  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

              LnkSta:    Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

              DevCap2: Completion Timeout: Range ABCD, TimeoutDis+

              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-

              LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB

                   Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

                   Compliance De-emphasis: -6dB

              LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

                   EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

          Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

              ARICap:    MFVC- ACS-, Next Function: 0

              ARICtl:    MFVC- ACS-, Function Group: 0

          Capabilities: [148 v1] Device Serial Number f4-52-14-03-00-94-cc-c0

          Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)

              IOVCap:    Migration-, Interrupt Message Number: 000

              IOVCtl:    Enable+ Migration- Interrupt- MSE+ ARIHierarchy+

              IOVSta:    Migration-

              Initial VFs: 16, Total VFs: 16, Number of VFs: 8, Function Dependency Link: 00

              VF offset: 1, stride: 1, Device ID: 1004

              Supported Page Size: 000007ff, System Page Size: 00000001

              Region 2: Memory at 00000000bd000000 (64-bit, prefetchable)

              VF Migration: offset: 00000000, BIR: 0

          Capabilities: [154 v2] Advanced Error Reporting

              UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

              UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

              UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

              CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

              CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

              AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

          Capabilities: [18c v1] #19

          Kernel driver in use: mlx4_core

       

      I use qemu-kvm and libvirt for guest machines, and here is the interface section of my guest configuration xml:

          <interface type='network'>
            <mac address='52:54:00:78:06:44'/>
            <source network='default'/>
            <model type='rtl8139'/>
            <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
          </interface>
          <interface type='hostdev' managed='yes'>
            <mac address='52:54:00:6d:90:02'/>
            <source>
              <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
            </source>
            <vlan>
              <tag id='42'/>
            </vlan>
            <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
          </interface>
      

       

      I meant to pass the first virtual function to the guest.

      After start the guest, I can see this Mellanox device via lspci:

      00:05.0 Ethernet controller [0200]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004]

      Next, I installed Mellanox Ethernet driver from http://www.mellanox.com/page/products_dyn?product_family=27, since I pass the ports as Ethernet port in the mlx4_core.conf file

      However, after I reboot the guest, the dmesg gives:

      mlx4_core: Mellanox ConnectX core driver v2.4-1.0.0.1 (Feb 19 2015)

      mlx4_core: Initializing 0000:00:05.0

      mlx4_core 0000:00:05.0: setting latency timer to 64

      mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode

      mlx4_core 0000:00:05.0: Sending reset

      mlx4_core 0000:00:05.0: Sending vhcr0

      mlx4_core 0000:00:05.0: Requested number of MACs is too much for port 1, reducing to 64.

      mlx4_core 0000:00:05.0: HCA minimum page size:512

      mlx4_core 0000:00:05.0: Timestamping is not supported in slave mode.

        alloc irq_desc for 24 on node -1

        alloc kstat_irqs on node -1

      mlx4_core 0000:00:05.0: irq 24 for MSI/MSI-X

        alloc irq_desc for 25 on node -1

        alloc kstat_irqs on node -1

      mlx4_core 0000:00:05.0: irq 25 for MSI/MSI-X

      mlx4_core 0000:00:05.0: communication channel command 0x31 timed out.

      mlx4_core 0000:00:05.0: mlx4_enter_error_state: device is going to be reset

      mlx4_core 0000:00:05.0: VF is sending reset request to Firmware.

      mlx4_core 0000:00:05.0: VF Reset succeed, unloading VF driver.

      mlx4_core 0000:00:05.0: mlx4_enter_error_state: device was reset successfully

      mlx4_core 0000:00:05.0: mlx4_enter_error_state: end

      mlx4_core 0000:00:05.0: NOP command failed to generate MSI-X interrupt IRQ 24).

      mlx4_core 0000:00:05.0: Trying again without MSI-X.

      mlx4_core 0000:00:05.0: Failed to close slave function.

      mlx4_core: probe of 0000:00:05.0 failed with error -5

      unload and load mlx4_core via modprobe with give similar message.

       

      It appears to me the driver cannot be installed correctly on the guest. Please advice and many thanks in advance!

        • Re: Mellanox ConnectX-3 SR-IOV problem

          I've been able to get a bit further....need to enable vfio-pci kernel module, unbind the driver from the vf device and assign the vfio driver:

           

          # modprobe -r vfio_iommu_type1

          # modprobe -r vfio

          # modprobe vfio_iommu_type1 allow_unsafe_interrupts=1

          # modprobe vfio-pci

          # echo 0000:08:00.1 > /sys/bus/pci/devices/0000\:08\:00.2/driver/unbind

           

          get the pci vendor/device code:

          # lspci -s 08:00.1 -n

          08:00.1 0280: 15b3:1004

           

          bind to vfio-pci:

          #  echo 15b3 1004 > /sys/bus/pci/drivers/vfio-pci/new_id

           

          you can verify, here is the card interface my host os uses

          # lspci -s 08:00.0 -k

          08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

                  Subsystem: Hewlett-Packard Company Device 18d6

                  Kernel driver in use: mlx4_core

           

          here is the vf:

          # lspci -s 08:00.1 -k

          08:01.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

                  Subsystem: Hewlett-Packard Company Device 61b0

                  Kernel driver in use: vfio-pci

           

          my virsh xml file is slightly diff

           

          <hostdev mode='subsystem' type='pci' managed='yes'>
            <source>
              <address domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
            </source>
            <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
          </hostdev>

           

           

          In linux everything is working as expected.  In windows (2012 R2 x64) I can see the card and the driver is loaded but broken has a yellow exclamation point without any meaningful errors.

           

          Does anyone have SR-IOV working with Windows guests?

           

          my host is Ubuntu 14.04.3 (3.19 kernel)

            • Re: Mellanox ConnectX-3 SR-IOV problem
              mzhang

              Hey, Kyle,

               

              I am terribly sorry about the late reply.

               

              After posting my original question and waiting for month, I wasn't able to resolve it. So I had to pause the work and disassemble my test bed.

               

              I very much appreciate for your input, and will definitely use it as a helpful reference if I could ever resume the work. Really with Mellanox can offer more supports other than selling cards...

               

              Best,

              • Re: Mellanox ConnectX-3 SR-IOV problem
                elurex

                Dear Kyle,

                 

                how did you get the KVM guest to start at all? mine failed complaining that iommu group issue

                 

                root@vm-ha:~# virsh start kvm-node0

                error: Failed to start domain kvm-node0

                error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -device vfio-pci,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x6: vf

                io: error, group 4 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.

                qemu-system-x86_64: -device vfio-pci,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x6: vfio: failed to get group 4

                qemu-system-x86_64: -device vfio-pci,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x6: Device initialization failed.

                qemu-system-x86_64: -device vfio-pci,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x6: Device 'vfio-pci' could not be initialized

                 

                root@vm-ha:~# lspci -s 07:00.1 -k

                07:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

                        Subsystem: Mellanox Technologies Device 61b0

                        Kernel driver in use: vfio-pci

                root@vm-ha:~# lspci -s 07:00.2 -k

                07:00.3 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

                        Subsystem: Mellanox Technologies Device 61b0

                        Kernel driver in use: vfio-pci

                root@vm-ha:~# lspci -s 07:00.0 -k

                07:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

                        Subsystem: Mellanox Technologies Device 0024

                        Kernel driver in use: mlx4_core

                 

                I notice that my 07:00.0 and 07:00.1 device are both in the same iommmu group

                 

                I am also on Ubuntu 14.04.3 and Kernel 3.19