HowTo Configure SR-IOV for ConnectX-3 with KVM (Ethernet)

Version 8

    This post shows the procedure of how to enable SR-IOV on ConnectX-3 and ConnectX-3 Pro adapter cards in Ethernet mode.

    Setting up VM via KVM (virt-manager) is out of the scope of this port, refer to virt-manager documentation.

     

     

    References

     

    Overview

    SR-IOV configuration includes the following steps:

    1. Enable Virtualization (SR-IOV) in the BIOS (prerequisites)

    2. Enable SR-IOV in the ConnectX-3 firmware

    3. Enable SR-IOV in the MLNX_OFED Driver

    4. Set up the VM

     

    Setup and Prerequisites

    1. Two servers connected via Ethernet switch

     

    2. KVM is installed on the servers

    # yum install kvm

    # yum install virt-manager libvirt libvirt-python python-virtinst

     

    3. Make sure that SR-IOV is enabled in the BIOS of the specific server. Each server has different BIOS configuration options for virtualization. See BIOS Performance Tuning Example for BIOS configuration examples.

     

    4. Make sure that intel_iommu=on and iommu=pt were added to /boot/grub/grub.conf

    # cat /boot/grub/grub.conf

     

    default=0

    timeout=5

    splashimage=(hd0,0)/grub/splash.xpm.gz

    hiddenmenu

    title Red Hat Enterprise Linux (2.6.32-358.el6.x86_64)

      root (hd0,0)

      kernel /vmlinuz-2.6.32-358.el6.x86_64 ro root=UUID=4f9ed446-05fe-4db5-a079-56738f4ae05f rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM crashkernel=auto rhgb quiet rd_NO_DM rhgb quiet intel_iommu=on iommu=pt

      initrd /initramfs-2.6.32-358.el6.x86_64.img

     

    To learn more about iommu grub parameters refer to Understanding the iommu Linux grub File Configuration.

     

    5. Install the latest MLNX_OFED driver on the server and on the VM.

    # mlnxofedinstall
    ...

     

    Configuration

    I. Enable SR-IOV on the Firmware

     

    1. Run MFT.

    # mst start

    Starting MST (Mellanox Software Tools) driver set

    Loading MST PCI module - Success

    Loading MST PCI configuration module - Success

    Create devices

     

    2. Locate the Connect-X3 device on the desired PCI slot.

    # mst status

    MST modules:

    ------------

        MST PCI module loaded

        MST PCI configuration module loaded

     

    MST devices:

    ------------

    /dev/mst/mt4099_pciconf0         - PCI configuration cycles access.

                                       domain:bus:dev.fn=0000:03:00.0 addr.reg=88 data.reg=92

                                       Chip revision is: 00

    /dev/mst/mt4099_pci_cr0          - PCI direct access.

                                       domain:bus:dev.fn=0000:03:00.0 bar=0xdfa00000 size=0x100000

                                       Chip revision is: 00

    /dev/mst/mt4113_pciconf0         - PCI configuration cycles access.

                                       domain:bus:dev.fn=0000:81:00.0 addr.reg=88 data.reg=92

                                       Chip revision is: 00

     

    3. Query the Status of the device.

    # mlxconfig -d /dev/mst/mt4099_pciconf0 q

     

     

    Device #1:

    ----------

     

     

    Device type:    ConnectX3      

    PCI device:     /dev/mst/mt4099_pciconf0

     

     

    Configurations:                          Current

             SRIOV_EN                        0              

             NUM_OF_VFS                      0              

             LINK_TYPE_P1                    2              

             LINK_TYPE_P2                    2              

             LOG_BAR_SIZE                    0     

     

    4. Enable SR-IOV and set the desired number of VFs.

    • SRIOV_EN=1
    • NUM_OF_VFS=4   ; This is an example with 4 VFs

     

    mlxconfig -d /dev/mst/mt4099_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=4

     

    Device #1:

    ----------

     

    Device type:    ConnectX3      

    PCI device:     /dev/mst/mt4099_pciconf0

     

     

    Configurations:                          Current         New

             SRIOV_EN                        0               1              

             NUM_OF_VFS                      0               4              

             LINK_TYPE_P1                    2               2              

             LINK_TYPE_P2                    2               2              

             LOG_BAR_SIZE                    0               0              

     

    Apply new Configuration? ? (y/n) [n] : y

    Applying... Done!

    -I- Please reboot machine to load new configurations.

     

    5. Reboot the server.

     

    6. Verify that the configuration took place.

    # mst start

    ...

     

    # mlxconfig -d /dev/mst/mt4099_pciconf0 q

     

     

    Device #1:

    ----------

     

    Device type:    ConnectX3      

    PCI device:     /dev/mst/mt4099_pciconf0

     

    Configurations:                          Current

             SRIOV_EN                        1              

             NUM_OF_VFS                      4              

             LINK_TYPE_P1                    2              

             LINK_TYPE_P2                    2              

             LOG_BAR_SIZE                    0 

     

    Note: At this point, the VFs are not seen via the lspci. Only when SR-IOV is enabled on the MLNX_OFED driver, you will be able to see them.

     

    # lspci | grep Mellanox

    03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

     

     

    II. Enable SR-IOV on the MLNX_OFED driver

     

    1. Find the device (normally mlx4_0)

        In this case, we have ConnectX-3 dual port.

    # ibstat

    CA 'mlx4_0'

      CA type: MT4099

      Number of ports: 2

      Firmware version: 2.33.5000

      Hardware version: 0

      Node GUID: 0x0002c90300302860

      System image GUID: 0x0002c90300302863

      Port 1:

      State: Active

      Physical state: LinkUp

      Rate: 40

      Base lid: 0

      LMC: 0

      SM lid: 0

      Capability mask: 0x0c010000

      Port GUID: 0x0202c9fffe302860

      Link layer: Ethernet

      Port 2:

      State: Down

      Physical state: Disabled

      Rate: 10

      Base lid: 0

      LMC: 0

      SM lid: 0

      Capability mask: 0x0c010000

      Port GUID: 0x0002c90001302861

      Link layer: Ethernet

     

    2. Create (or edit) /etc/modprobe.d/mlx4_core.conf :

    options mlx4_core num_vfs=4 port_type_array=2,2 probe_vf=0

    Module parameters description:

    • num_vfs - is the number of VF required for this server, in this example 4 VFs.
    • probe_vf - is the number of VF to be probed in the hypervisor. Probed in the hypervisor means that the VF will also have interface in the hypervisor (e.g. can be seen using the command ifconfig).
      In this example there are no probed VFs. when running ifconfig, no new interfaces will be added (per VF). In case, probe_vf was equal to 1 for example, we would get 2 new interfaces in the hypervisor (check ifconfig -a), one each port.
      Probed VFs can be used by the IT administrator to monitor the traffic on that hypervisor without the need of doing that via logging to the VM itself.
    • port_type_array - is the port type of the interface, 1 is for infiniBand, 2 for Ethernet. In this example, both ports are Ethernet.

     

    In case of dual port adapters, refer to HowTo Configure SR-IOV VFs on Different ConnectX-3 Ports  in order to set each port differently. Refer to the MLNX_OFED User Manual for additional information.

     

    3. Restart the driver

    #/etc/init.d/openibd restart

     

    4. Check that the VFs can be seen via lscpi

    # lspci | grep Mellanox

    03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

    03:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

    03:00.2 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

    03:00.3 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

    03:00.4 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

     

    III. VM Management

     

    Note: It is recommend to install the latest MLNX_OFED on the VM.

     

    1. Add PCI device to the VM.

       In our example, we will connect the VM to the PCI address 03:00.0

       Here is an example from virt-manager application.

    2.PNG

     

     

    2. Connect to the VM Console and set IP address to the New interfaces that were added to the VM.

     

    Note: Two new interfaces were added, each interface is mapped with SR-IOV to each port of the adapter (e.g. eth2 is mapped to port 1, while eth3 is mapped to port 2).

     

    # ibdev2netdev

    mlx4_0 port 1 ==> eth2 (Up)

    mlx4_0 port 2 ==> eth3 (Down)

     

    Run ifconfig:

     

    # ifconfig

    ...

    eth2      Link encap:Ethernet  HWaddr 82:F5:C0:82:5D:20 

              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

              RX packets:0 errors:0 dropped:0 overruns:0 frame:0

              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

              collisions:0 txqueuelen:1000

              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

     

    eth3      Link encap:Ethernet  HWaddr 56:44:90:46:46:AC 

              inet6 addr: fe80::5444:90ff:fe46:46ac/64 Scope:Link

              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

              RX packets:39 errors:0 dropped:0 overruns:0 frame:0

              TX packets:16 errors:0 dropped:0 overruns:0 carrier:0

              collisions:0 txqueuelen:1000

              RX bytes:2496 (2.4 KiB)  TX bytes:3952 (3.8 KiB)

     

    3. Set IP addresses for the desired interface

    # ifconfig eth2 10.10.10.1/24 up

     

    4. Ping another server on the network.

     

    5. Test RDMA using one of the perftest commands (e.g. ib_send_bw) between the VM and another server.

        a. Run on another server on the network (assuming the IP of the 40GbE port is 10.10.10.10/24)

    # ib_send_bw

     

       b. Run on the VM

    #ib_send_bw 10.10.10.10

       c. Verify that RDMA (RoCE) is running.

     

     

     

    Troubleshooting

    1. probed VFs.

    Unlike regular VFs, Probed VFs are visible from the hypervisor and can be used for debugging.

     

    In case you add to the kernel module 1 probed VF, for example:

    options mlx4_core num_vfs=4 port_type_array=2,2 probe_vf=1

     

    This is what you will get using the virt-manager application:

    3.PNG

     

    You can see the the VF on the PCU bus 03:00:01 has also eth11 interface, unlike the other VFs on the other PCI buses (e.g. 03:00:02) that are not probed.