HowTo Launch VM over OVS-DPDK-17.08 Using Mellanox ConnectX-4 and ConnectX-5

Version 9

    This post shows how to launch a Virtual Machine (VM) over OVS-DPDK 17.08 using Mellanox ConnectX-4 or ConnectX-5 adapters.

    In this example MLNX_OFED 4.2- is used.





    1. Install MLNX_OFED and using ofed_info command verify version is 4.1-

    OFED download link:

    # ofed_info -s


    2. Check CPU support for 1G hugepages by checking for pdpe1gb flag:

    # cat /proc/cpuinfo | grep pdpe1gb
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology n_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid

    Make sure that the flag list includes pdpe1gb flag.


    3. Find the NUMA number per PCI slot:

    # mst start
    # mst status -v

    MST modules:
        MST PCI module is not loaded
        MST PCI configuration module is not loaded
    PCI devices:
    DEVICE_TYPE             MST      PCI       RDMA    NET                      NUMA
    ConnectX4LX(rev:0)      NA       07:00.0   mlx5_1  net-ens2f0                0


    4. Check QEMU version information. It must be Rev. 2.1 or above.

    # qemu-system-x86_64 --version
    QEMU emulator version 2.7.50 (v2.7.0-456-gffd455a), Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers


    To download a new qemu version refer to the official qemu site:


    5. Download required packages:

    # apt-get install unzip libnuma-dev python-six




    Configure the grub File and Mount hugepages

    1. Update the grub.conf file.

    Note: Updating grub files is different for each Linux OS distribution. Refer to OS documentation.


    Edit the grub.conf file in the GRUB_CMDLINE_LINUX line as follows:

    "default_hugepagesz=1G hugepagesz=1G hugepages=8"


    The following line defines the hugepages size and quantity. For most Intel processors, 2Mb and 1G sizes are supported.  It is recommended that you leave the OS at least 2G RAM free.


    2. Update grub vi OS grub script.


    3. Reboot the server. The configuration will apply after reboot.


    4 . Check that hugepages are loaded correctly after reboot:

    # cat /proc/meminfo | grep Hug

    AnonHugePages:   2314240 kB
    HugePages_Total:       8
    HugePages_Free:        8
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:    1048576 kB

    The output shows eight hugepages free, each at 1G size.


    5. Mount 1G type hugepages.

    # mkdir -p /dev/hugepages
    # mount -t hugetlbfs -o pagesize=1G none /dev/hugepages


    Note: The mount of hugepages is not persistent. You must mount hugepages after each reboot.


    DPDK Configuration

    1. Download and extract DPDK 17.08 package.

    # cd /usr/src/
    # wget
    # unzip


    2. Set DPDK environment variables as follows:

    # export DPDK_DIR=/usr/src/dpdk-17.08
    # cd $DPDK_DIR
    # export DPDK_TARGET=x86_64-native-linuxapp-gcc


    3. Modify compilation settings to support ConnectX-4 and ConnectX-5 interfaces.

    # sed -i 's/\(CONFIG_RTE_LIBRTE_MLX5_PMD=\)n/\1y/g' $DPDK_DIR/config/common_base


    4. Compile your DPDK code.

    # make -j install T=$DPDK_TARGET DESTDIR=install


    OVS Configuration

    1. Download the OVS version 2.8.1 or later.

    # cd /usr/src/
    # wget

    2. Set environment variables to compile OVS with DPDK 17.08.

    # tar xf openvswitch-2.8.1.tar.gz

    # cd /usr/src/openvswitch-2.8.1


    3. Compile your code.

    # ./
    # ./configure --with-dpdk=$DPDK_BUILD
    # make -j LDFLAGS=-libverbs
    # make install


    4. Reset OVS environment.

    # pkill -9 ovs

    # rm -rf /usr/local/var/run/openvswitch/

    # rm -rf /usr/local/etc/openvswitch/

    # rm -f /usr/local/etc/openvswitch/conf.db

    # mkdir -p /usr/local/var/run/openvswitch/

    # mkdir -p /usr/local/etc/openvswitch/

    # rm -f /tmp/conf.db


    5. Specify initial Open vSwitch (OVS) database to use:

    # export PATH=$PATH:/usr/local/share/openvswitch/scripts

    # export DB_SOCK=/usr/local/var/run/openvswitch/db.sock

    # ovsdb-tool create /usr/local/etc/openvswitch/conf.db /usr/local/share/openvswitch/vswitch.ovsschema

    # ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach


    6. Configure OVS to support DPDK ports:

    # ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true


    7. Start OVS-DPDK service:

    # ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start

    Note: Each time you reboot or there is an OVS termination, you need to rebuild the OVS environment and repeat steps 4-7 of this section.


    8. Configure source code analyzer (PMD) to work with 2G hugespages and NUMA node0.

    # ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="2048,0"

    Note: Find the correct NUMA number according to the prerequisites section. This example shows 2G (=2048) Hugepages and NUMA number 0.


    9. Set core mask to enable several PMDs. In this example cores 1 - 4 are used 6=Bin 1110.

    # ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0xe


    11. Create an OVS bridge.

    # ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev


    12. Create a DPDK port (dpdk0) with single RX queue using n_rxq=1 option. Inset dpdk-devargs pci slot number, taken from prerequisites section step 3 output.

    Note: Before adding the DPDK interface make sure it is shut down in the linux bridge.

    To bring the interface down use:

    # ifconfig <interface_name> down


    Now add the DPDK port to OVS-DPDK bridge:

    # ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:07:00.0,n_rxq_desc=1024,n_txq_desc=1024,n_rxq=1,pmd-rxq-affinity="0:1" ofport_request=1

    PMD 0 is set by affinity to cores 1

    RXQ/TXQ descriptors are set to 1024


    13. Create a vhost-user port toward the guest machine with two RX queues and core affinity:

    # ovs-vsctl add-port br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:n_rxq=2,pmd-rxq-affinity="0:2,1:3"

    PMD 0 and 1 are set by affinity to cores 2 and 3 respectively for the host machine.



    • Make sure that you have enough free cores for host and guest PMDs, depending on the amount of RX queues configured. For the configuration above, 3 PMD cores are required for host and 2 cores for guest (5 cores total).

    • Use the top command to see that two PMD cores are running at 100% CPU usage.

    • If you do not use ofport_request in the OVS control command, the OVS will select a random port ID.


    14. Set environment parameter to identify the port IDs.

    # DPDK0_INDEX=$(echo `ovs-ofctl show br1 | grep dpdk0 | cut -d '(' -f 1`)

    # VHOST_USER1_INDEX=$(echo `ovs-ofctl show br1 | grep vhost-user1 | cut -d '(' -f 1`)

    Verify the assigned IDs:

    # echo $DPDK0_INDEX
    # echo $VHOST_USER1_INDEX


    15. Set OVS flow rules:

    # ovs-ofctl add-flow br1 in_port=$DPDK0_INDEX,action=output:$VHOST_USER1_INDEX
    # ovs-ofctl add-flow br1 in_port=$VHOST_USER1_INDEX,action=output:$DPDK0_INDEX

    Verify the new flows' insertion:

    # ovs-ofctl dump-flows br1
    NXST_FLOW reply (xid=0x4):
    cookie=0x0, duration=4.903s, table=0, n_packets=8035688, n_bytes=482141280, idle_age=0, in_port=1 actions=output:2
    cookie=0x0, duration=3.622s, table=0, n_packets=0, n_bytes=0, idle_age=3, in_port=2 actions=output:1
    cookie=0x0, duration=353.725s, table=0, n_packets=284039649, n_bytes=17042378940, idle_age=5, priority=0 actions=NORMAL


    VM Configuration

    1. Launch a guest machine.

    # numactl --cpunodebind 0 --membind 0 -- \

    echo 'info cpus' | \

    /usr/src/qemu/x86_64-softmmu/qemu-system-x86_64 \

    -enable-kvm \

    -name gen-l-vrt-019-006-Ubuntu-15.10 \

    -cpu host -m 6G \

    -realtime mlock=off \

    -smp 8,sockets=8,cores=1,threads=1 \

    -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/gen-l-vrt-019-006-Ubuntu-15.10.monitor,server,nowait \

    -drive file=/images/gen-l-vrt-019-006/gen-l-vrt-019-006.img,if=none,id=drive-ide0-0-0,format=qcow2 \

    -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \

    -netdev tap,id=hostnet0,script=no,downscript=no \

    -device e1000,netdev=hostnet0,id=net0,mac=00:50:56:1b:b2:05,bus=pci.0,addr=0x3 \

    -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user1 \

    -netdev type=vhost-user,id=iface1,chardev=char1,vhostforce,queues=2 \

    -device virtio-net-pci,netdev=iface1,mac=12:34:00:00:50:2c,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,mrg_rxbuf=off,mq=on,vectors=6 \

    -object memory-backend-file,id=mem,size=6144M,mem-path=/dev/hugepages,share=on \

    -numa node,memdev=mem \

    -mem-prealloc \

    -monitor stdio \

    > /tmp/qemu_cpu_info.txt&


    • Choose the right number of vhost-user queues to match the RX queues configured. In this example queue=2.
    • In some environments the external e1000 interface fails to come up. If it fails, add the following script=no,downscript=no.
      While the guest boots, bring up the interface manually as follows (in this example br0 is the main host bridge to the public network):
      # brctl addif br0 tap0
      # ifconfig tap0 up
    • The number of shared memory size (mem,size) must be equal to "cpu host -m".  Also the amount of memory must not exceed the total amount of free memory for hugepages on the host.


    2. Set CPU affinity for eight cores :

    # a=( $(cat /tmp/qemu_cpu_info.txt  | grep thread_id | cut -d '=' -f 3 | tr -d '\r' ) )
    # taskset -p 0x004  ${a[0]}
    # taskset -p 0x008  ${a[1]}
    # taskset -p 0x010  ${a[2]}
    # taskset -p 0x020  ${a[3]}
    # taskset -p 0x040  ${a[4]}
    # taskset -p 0x080  ${a[5]}
    # taskset -p 0x100  ${a[6]}
    # taskset -p 0x200  ${a[7]}


    Make sure that the chosen cores for affinity do not correlate with the host PMD cores configured (see OVS configuration in section 8). In this example eight cores cores from 3-10 are pinned to guest machine.


    3. Configure guest machine to have 1G hugepages.

    Refer to (grub and hugepages configuration section above).


    4. Load the guest DPDK driver to use the virtio interface.

    It is assumed that the guest image already includes the compiled DPDK driver.

    For further information on how to compile DPDK-17.08 for guest machine refer to:

    Compiling the DPDK Target from Source — Data Plane Development Kit 18.02.0-rc0 documentation


    5. Find the virtio interface bus number:

    # lspci -nn | grep -i virtio

    Example of command output:

    00:04.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]


    6. Load the DPDK driver:

    # modprobe uio
    # insmod /usr/src/dpdk-17.08/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko

    7. Bind the DPDK driver to the PCI slot of the virtio interface:

    # /usr/src/dpdk-17.08/tools/ --bind=igb_uio 0000:00:04.0



    1. Run testpmd to loop traffic for single port with UDP RSS.

    The example that follows runs with 2G hugepages and two PMD cores.

    # /usr/src/dpdk-17.08/x86_64-ivshmem-linuxapp-gcc/app/testpmd -v -c 0x1f  -n 4 -m 2048 -- --burst=64 --rxq=2 --txq=2 --nb-cores=4 -a -i --mbcache=256 --rss-udp --port-topology=chained

    The number of --rxq --txq queues must be equal to the number of queues defined before in 'Launch guest machine' section.

    Note: UDP RSS takes effect only if you are injecting various source UDP ports.


    Performance Tuning Recommendations

    1. Configure the grub.conf file (in the GRUB_CMDLINE_LINUX line) to isolate and remove interrupts from the PMD CPUs.

    It is not recommended that you list core0.

    "isolcpus=1-8 nohz_full=1-8 rcu_nocbs=1-8"


    2. Set scaling_governor to performance mode:

    # for (( i=0; i<$(cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | wc -l); i++ )); do echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done 


    3. Stop the irq balancing service:

    # service irqbalance stop


    4. Disable kernel memory compaction:

    # echo never > /sys/kernel/mm/transparent_hugepage/defrag
    # echo never > /sys/kernel/mm/transparent_hugepage/enabled
    # echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
    # sysctl -w vm.zone_reclaim_mode=0
    # sysctl -w vm.swappiness=0


    5. Inside the VM, disable ksm/run.

    # echo 0 > /sys/kernel/mm/ksm/run