HowTo Configure SR-IOV for Connect-IB/ConnectX-4 with KVM (InfiniBand)

Version 27

    This post describes how to configure the Mellanox Connect-IB / ConnectX-4 IB driver with SR-IOV.

    The reader should be familiar with InfiniBand network management and terms.

    Setting up VM via KVM (virt-manager) is out of the scope of this port, refer to virt-manager documentation.

    The example described here can be used for Connect-IB and also for ConnectX-4 working in IB mode, as they use the same mlx5 driver.

     

     

    References

     

     

    Overview

    SR-IOV configuration includes the following steps:

    1. Enable Virtualization (SR-IOV) in the BIOS (prerequisites).

    2. Enable SR-IOV in the ConnectX-IB firmware.

    3. Enable SR-IOV in the MLNX_OFED Driver.

    4. Set up the Virtual Machine (VM).

     

     

    Setup and Prerequisites

    The setup should include:

    1. Two servers connected via InfiniBand switch

     

    2. SM is running in the network

     

    3. KVM is installed on the servers as follows:

    # yum install kvm

    # yum install virt-manager libvirt libvirt-python python-virtinst

     

    4. Make sure that SR-IOV is enabled in the BIOS of the specific server. Each server has different BIOS configuration options for virtualization. See BIOS Performance Tuning Example for BIOS configuration examples.

     

    5. Make sure that intel_iommu=on and iommu=pt were added to /boot/grub/grub.conf:

    # cat /boot/grub/grub.conf

     

    default=0

    timeout=5

    splashimage=(hd0,0)/grub/splash.xpm.gz

    hiddenmenu

    title Red Hat Enterprise Linux (2.6.32-358.el6.x86_64)

      root (hd0,0)

      kernel /vmlinuz-2.6.32-358.el6.x86_64 ro root=UUID=4f9ed446-05fe-4db5-a079-56738f4ae05f rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM crashkernel=auto rhgb quiet rd_NO_DM rhgb quiet intel_iommu=on iommu=pt

      initrd /initramfs-2.6.32-358.el6.x86_64.img

     

    To learn more about iommu grub parameters refer to Understanding the iommu Linux grub File Configuration.

     

    6. Install the latest MLNX_OFED driver on the server and on the VM.

    # mlnxofedinstall
    ...

     

    7. Make sure that the openSM is enabled with virtualization. Open the file  /etc/opensm/opensm.conf and add:

    virt_enabled 2

     

         Note: This is relevant only for mlx5 driver and not for mlx4 (ConnectX-3/Pro).

     

         This parameter has the following configuration options:

    • 0: Ignore Virtualizations - No virtualization support (Default)
    • 1: Disable Virtualization - Disable virtualization on all Virtualization supporting ports
    • 2: Enable Virtualization - Enable (virtualization on all Virtualization supporting ports)

     

    The Default for parameter is 0 (ignore virtualization).

     

     

    Configuration

    I. Enable SR-IOV on the Firmware

     

    1. Run Mellanox Framework Tools (MFT).

    # mst start

    Starting MST (Mellanox Software Tools) driver set

    Loading MST PCI module - Success

    Loading MST PCI configuration module - Success

    Create devices

     

    2. Locate the Connect-IB device on the desired PCI slot.

    # mst status

    MST modules:

    ------------

        MST PCI module loaded

        MST PCI configuration module loaded

     

    MST devices:

    ------------

    /dev/mst/mt4099_pciconf0         - PCI configuration cycles access.

                                       domain:bus:dev.fn=0000:03:00.0 addr.reg=88 data.reg=92

                                       Chip revision is: 00

    /dev/mst/mt4099_pci_cr0          - PCI direct access.

                                       domain:bus:dev.fn=0000:03:00.0 bar=0xdfa00000 size=0x100000

                                       Chip revision is: 00

    /dev/mst/mt4113_pciconf0         - PCI configuration cycles access.

                                       domain:bus:dev.fn=0000:81:00.0 addr.reg=88 data.reg=92

                                       Chip revision is: 00

     

    3. Query the Status of the device.

    # mlxconfig -d /dev/mst/mt4113_pciconf0 q

     

    Device #1:

    ----------

     

    Device type:    ConnectIB      

    PCI device:     /dev/mst/mt4113_pciconf0

     

    Configurations:                          Current

             SRIOV_EN                        0              

             NUM_OF_VFS                      0                

             INT_LOG_MAX_PAYLOAD_SIZE        0              

     

    4. Enable SR-IOV, set the desired number of VFs as follows:

    • SRIOV_EN=1
    • NUM_OF_VFS=4   ; This is an example with 4 VFs

     

    Note:

    • Connect-IB: FPP_EN is disabled by default on Connect-IB. When FPP_EN is enabled, the OS sees dual port device as two single port devices. On Connect-IB you need to enable FPP_EN.
    • ConnectX-4: FPP_EN is enabled on ConnectX-4 but not configurable.

     

    # mlxconfig -d /dev/mst/mt4113_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=4 FPP_EN=1

     

    Device #1:

    ----------

     

    Device type:    ConnectIB      

    PCI device:     /dev/mst/mt4113_pciconf0

     

    Configurations:                          Current         New

             SRIOV_EN                        0               1              

             NUM_OF_VFS                      0               4   

             FPP_EN                          0               1           

             INT_LOG_MAX_PAYLOAD_SIZE        0               0              

     

    Apply new Configuration? ? (y/n) [n] : y

    Applying... Done!

    -I- Please reboot machine to load new configurations.

     

     

    5. Reboot the server or just reboot the adapter firmware to apply changes (faster ...).

    # mlxfwreset --device /dev/mst/mt4113_pciconf0 reset

     

    Minimal reset level for device, /dev/mst/mt4113_pciconf0:

     

    3: Driver restart and PCI reset

    Continue with reset?[y/N] y

    -I- Stopping Driver                         -Done

    -I- Sending Reset Command To Fw             -Done

    -I- Resetting PCI                           -Done

    -I- Starting Driver                         -Done

    -I- Restarting MST                          -Done

    -I- FW was loaded successfully.

    [root@i-zak-3 ~]#

     

    Note: At this point, the VFs are not seen via the lspci. Only when SR-IOV is enabled on the MLNX_OFED driver will you be able to see them.

    # lspci | grep Mellanox

    81:00.0 Network controller [0207]: Mellanox Technologies MT27600 [Connect-IB]

     

    II. Enable SR-IOV on the MLNX_OFED driver

    1. Locate the device (normally mlx5_0).

        In this case, we have Connect-IB single port.

    [root@i-zak-3 ~]# ibstat

    CA 'mlx5_0'

      CA type: MT4113

      Number of ports: 1

      Firmware version: 10.1100.6630

      Hardware version: 0

      Node GUID: 0xf4521403006fa1d0

      System image GUID: 0xf4521403006fa1d0

      Port 1:

      State: Initializing

      Physical state: LinkUp

      Rate: 56

      Base lid: 65535

      LMC: 0

      SM lid: 0

      Capability mask: 0x26516848

      Port GUID: 0xf4521403006fa1d0

      Link layer: InfiniBand

     

    2. Get the current number of VFs on this device.

    # cat /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs

    0

     

    Note: If the command fails, it may imply that the driver was not loaded.

     

    3. Set the desired number of VFs.

    # echo 4 > /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs

    # cat /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs

    4

     

    Note: Changing the mlx5_num_vfs is not persistent and does not survive server reboot!

     

    4. Check the PCI bus:

    # lspci | grep Mellanox

    81:00.0 Network controller [0207]: Mellanox Technologies MT27600 [Connect-IB]

    81:00.1 Network controller [0207]: Mellanox Technologies MT27600 Family [Connect-IB Virtual Function]

    81:00.2 Network controller [0207]: Mellanox Technologies MT27600 Family [Connect-IB Virtual Function]

    81:00.3 Network controller [0207]: Mellanox Technologies MT27600 Family [Connect-IB Virtual Function]

    81:00.4 Network controller [0207]: Mellanox Technologies MT27600 Family [Connect-IB Virtual Function]

     

    At this point you can see 4 VFs and one Physical Function (PF).

     

    5. Set the VF parameters (port and node GUIDs and port admin).

     

    By default all VFs have three parameters.

    1. Port GUID (default 0)

    2. Node GUID (default 0)

    3. Policy (admin state) (default Down)

     

    With this example, the following path was created for each VF:

     

    /sys/class/infiniband/mlx5_0/device/sriov/<VF number>

     

     

    PCI FunctionVF number
    81:00.10
    81:00.21
    81:00.32
    81:00.43

     

     

    For each VF, run the following commands:

     

    Here is an example for the first VF under 81:00.1 (VF number  0):

    a. Set the Admin state to "Follow":

    # cat  /sys/class/infiniband/mlx5_0/device/sriov/0/policy

    Down

    # echo Follow > /sys/class/infiniband/mlx5_0/device/sriov/0/policy

    # cat  /sys/class/infiniband/mlx5_0/device/sriov/0/policy

    Follow

     

    b. Set numbers for the port and node GUIDs:

    # echo 11:22:33:44:77:66:77:90 > /sys/class/infiniband/mlx5_0/device/sriov/0/node

    # echo 11:22:33:44:77:66:77:91 > /sys/class/infiniband/mlx5_0/device/sriov/0/port

     

    Note: The "echo" command will not suffice. To add  the GUID parameters, you need to unbind the VF and rebind it.

    # cat /sys/class/infiniband/mlx5_0/device/sriov/0/node

    00:00:00:00:00:00:00:00

    # cat /sys/class/infiniband/mlx5_0/device/sriov/0/port

    00:00:00:00:00:00:00:00

     

     

    c. Unbind the VF and then re-bind it.

    # echo 0000:81:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind

    # echo 0000:81:00.1 > /sys/bus/pci/drivers/mlx5_core/bind

    # cat /sys/class/infiniband/mlx5_0/device/sriov/0/node

    11:22:33:44:77:66:77:90

    #  cat  /sys/class/infiniband/mlx5_0/device/sriov/0/port

    11:22:33:44:77:66:77:91

     

    d. Verify, run ibstat.

        For the first VM, look for the mlx5_1 device.

        Make sure that the port GUID and node GUIDs configuration is as expected.

    #ibstat

    ....

    CA 'mlx5_1'

      CA type: MT4114

      Number of ports: 1

      Firmware version: 10.1100.6630

      Hardware version: 0

      Node GUID: 0x1122334477667790

      System image GUID: 0xf4521403006fa1d0

      Port 1:

      State: Active

      Physical state: LinkUp

      Rate: 56

      Base lid: 65535

      LMC: 0

      SM lid: 0

      Capability mask: 0x26516c48

      Port GUID: 0x1122334477667791

      Link layer: InfiniBand

     

     

    III. VM Management

    1. Add the PCI device to the VM.

        In our example, we will connect the VM to the PCI address 81:00.1.

        Here is an example from virt-manager application.

     

    1.png

     

    2. Connect to the VM Console and set IP address to the ib0 interface.

     

    Note: Make sure that the VM has the latest MLNX_OFED.

     

    # ifconfig ib0 10.10.10.1/24 up

    3. Verify that the port is up (on the VM).

       If the port is down, make sure that the SM is running.

    # ibstat

    CA 'mlx5_0'

      CA type: MT4114

      Number of ports: 1

      Firmware version: 10.1100.6630

      Hardware version: 0

      Node GUID: 0x1122334477667790

      System image GUID: 0xf4521403006fa1d0

      Port 1:

      State: Active

      Physical state: LinkUp

      Rate: 56

      Base lid: 2

      LMC: 1

      SM lid: 2

      Capability mask: 0x26516c48

      Port GUID: 0x1122334477667791

      Link layer: InfiniBand

     

    4. Ping another server on the network.

     

    Note: In case you are testing RDMA connectivity from the VM, using preftest package (e.g. ib_send_bw) use the parameter -x 0, it may not work without this (expected). Perftest default is without GRH at all, for performance reasons.  In SR-IOV GRH is a must. So therefore, there is a need to use the -x flag and choose GID index.

     

    IB Diagnostics

    When using Connect-IB, you cannot access the network diagnostics (SMP) from the VM, since sending and receiving SMP packets is not allowed from the VF for security reasons. SMPs are not restricted by network partitioning and may affect the physical network topology. Moreover, even the SM may be denied access from portions of the network by setting management keys unknown to the SM.

     

    Note: in case of ConnectX-3, this option is supported (Refer to the MLNX_OFED UM) for IB networks. Therefore, when trying to run ibhosts on one of the nodes in the network, the VM HCA will not be shown in the output, see HowTo Configure SR-IOV for ConnectX-3 with KVM (InfiniBand).

     

    Troubleshooting

    1. The MLNX_OFED installation script contains two fields related to virtualization and Single-root IO Virtualization (SR-IOV). There is no need to use those fields with connect-IB installation (relevant to other adapter cards such as ConnectX-3).

    enable-sriov# ./mlnxofedinstall  --enable-sriov  --hypervisor

     

    2. If the sysfs command fails, it may imply that the driver is not loaded. In this case, you should enable the driver:

    # /etc/init.d/openibd restart

     

    3. Make sure SM is running in the network. To enable it on the server run:

    # /etc/init.d/opensmd restart

    4. If the mlx5_num_vfs parameter does not survive reboot, make sure to add this command to the startup script or manually run each reboot.

     

    For example (4 VFs):

    # echo 4 > /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs

    # cat /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs

    4

     

    5. In case you use perftest package, use the -x 0 (or other GID index). Perftest default is without GRH at all, for performance reasons.  In SR-IOV GRH is a must. Therefore, there is a need to use the -x flag and choose GID index.

     

    6. OpenSM and any other utility that uses SMP MADs (ibnetdiscover, sminfo, iblinkinfo, smpdump, ibqueryerr, ibdiagnet and smpquery) should run on the PF and not on the VFs. In case of multi PFs (multi-host), OpenSM should run on Host0.

     

    OpenStack Support

    For OpenStack SR-IOV support refer to OpenStack SR-IOV Support for ConnectX-4.