OVS Acceleration using ASAP2 VXLAN Offload over CentOS 7.5 Standalone Server with ConnectX-5 NIC

Version 32

    ASAP2 technology introduces enhanced cloud networking and NFV performance, among other benefits. Acceleration is achieved by allowing the OVS to offload some of the most intensive packet processing operations to the Mellanox ConnectX NIC hardware, including VXLAN encapsulation/decapsulation and packet flow classification.

     

    Through this article you will be able to quickly set up a test which would demonstrate the ASAP2 Direct capabilities of accelerating high troughtput of VXLAN-based traffic by offloading to the NIC.

     

    Requirements:

    • 1 x Bare Metal server to be used as TRex traffic generator
    • 1 x Bare Metal server to be used as ASAP2-based compute node
    • 2 x ConnectX-5 ethernet card

    Setup:

    Topology:

    References

    Part 1: ASAP2 Compute Node Preparation:

    1. BIOS prerequisites:

    • Enable SRIOV
    • Set power mode to maximal performance
    • Disable HT
    • Enable Vtd

     

    2. The following releases\packages were used:

    • CentOS: Release 7.5.1804 (inbox kernel 3.10.0-862.el7.x86_64)
    • ConnectX-5 FW: 16.23.1020
    • MFT tools: mft-4.10.0-104-x86_64-rpm
    • OVS: openvswitch-2.9.0-3.el7.x86_64 (installed from centos-openstack-queens repository)
    • qemu-kvm: qemu-kvm-ev-2.10.0-21.el7_5.4.1.x86_64 (installed from centos-openstack-queens repository)
    • iproute: iproute-4.11.0-14.el7.x86_64 (inbox)

     

    3. Download and Install the latest MFT tools from: http://www.mellanox.com/page/management_tools

    # ./install.sh

    # yum install pciutils

     

    4. Issue the commands below to identify the ConnectX MST device and verify that the FW version is 16.21.2030 or up.

        In case it is older - download the latest OFED package and follow the installation instructions for burning the new FW.

     

    # mst start

    # mst status -v

    MST modules:

    ------------

        MST PCI module is not loaded

        MST PCI configuration module loaded

    PCI devices:

    ------------

    DEVICE_TYPE             MST                           PCI       RDMA            NET                       NUMA

    ConnectX5(rev:0)        /dev/mst/mt4121_pciconf0.1    08:00.1                   net-ens1f1                0

     

    ConnectX5(rev:0)        /dev/mst/mt4121_pciconf0      08:00.0                   net-ens1f0                0

     

    #mlxfwmanager --query -d /dev/mst/mt4121_pciconf0

    Querying Mellanox devices firmware ...

     

    Device #1:

    ----------

     

      Device Type:      ConnectX5

      Part Number:      MCX556A-EDA_Ax

      Description:      ConnectX-5 Ex VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6

      PSID:             MT_0000000009

      PCI Device Name:  /dev/mst/mt4121_pciconf0

      Base MAC:         ec0d9a8cc7be

      Versions:         Current        Available

         FW             16.23.1020     N/A

         PXE            3.5.0504       N/A

         UEFI           14.16.0017     N/A

     

      Status:           No matching image found

     

     

    5. Enable SRIOV on the ConnectX NIC and reboot server:

     

    # mlxconfig -d /dev/mst/mt4121_pciconf0 s SRIOV_EN=1

    # mlxconfig -d /dev/mst/mt4121_pciconf0 s  NUM_OF_VFS=4

    # mlxconfig -d /dev/mst/mt4121_pciconf0 s LINK_TYPE_P1=2

    # mlxconfig -d /dev/mst/mt4121_pciconf0 s LINK_TYPE_P2=2

    # reboot

     

     

    Part 2: ASAP2 Environment Setup

     

    1. Upload the attached ASAP2_Centos7.5_CX5.tar.gz to the compute ASAP2 server.

        Extract the file and run the first setup script to install the required pkgs and set boot parameters.

        Reboot the server after executing the script.

     

    # tar -xvzf ASAP2_Centos7.5_CX5.tar.gz

    # cd ASAP2_Centos7.5_CX5

    #./setup_part1.sh

    # reboot

     

    2. Run the second setup script to configure VFs and networking-related parameters.

        Reboot the server after executing the script.

    #./setup_part2.sh

    # reboot

    3. Run the third setup script to stop NetworkManager service and configure OVS, vxlan and offloading:

    # yum -y install net-tools

    #./setup_part3.sh

     

    Part 3: Launch guest testpmd VM on the ASAP2 compute node

     

    1. Download a a pre prepared cloud image with testpmd application into the Images directory on the compute node:  

    #mkdir /opt/images/

    #cd /opt/images

    #wget http://13.74.249.42/images/CentOS7.5_testpmd.qcow2

    2. Spawn a VM using the command below:

    #virt-install --connect=qemu:///system \

      --name=testpmd \

      --disk path=/opt/images/Trex_and_testpmd.v2.qcow2,format=qcow2 \

      --ram 8192 \

      --memorybacking hugepages=on,size=1024,unit=M,nodeset=0 \

      --vcpus=5,cpuset=3,4,5,6,7 \

      --check-cpu \

      --cpu host-model,+pdpe1gb,cell0.id=0,cell0.cpus=0,cell0.memory=8388608 \

      --numatune mode=strict,nodeset=0 \

      --nographics --noautoconsole \

      --os-variant=rhel7 \

      --import

    3. Shutdown the VM, and collect the bus/slot/function number per VF:

    #virsh shutdown testpmd

    #virshpci=$(lspci | grep nox | grep Virt | awk '{print $1}' | tr '[:.]' '_')

    #for i in ${virshpci}; do echo VF-${i} && virsh nodedev-dumpxml $(virsh nodedev-list | grep ${i}) | grep -e 'bus>' -e 'slot>' -e 'function>'; done

    VF-08_00_2

        <bus>8</bus>

        <slot>0</slot>

        <function>2</function>

    VF-08_00_3

        <bus>8</bus>

        <slot>0</slot>

        <function>3</function>

    4. Use the "virsh edit testpmd" command to add the following to the VM configuration file:

         a. Shared memory access parameter to the <numa> section

    #virsh edit testpmd

     

     

    .

    .

    .

    <numa>

    <cell id='0' cpus='0' memory='8388608' unit='KiB' memAccess='shared'/>

    </numa>

    `

    `

    `

         b.  Relevant VF hostdev bus info identified in the previous step (VF 08_00_02) to the VM configuration file under the <devices> section:

    <devices>

    .

    .

    .

        <hostdev mode='subsystem' type='pci' managed='yes'>

          <source>

            <address domain='0x0000'

                     bus='0x08'

                     slot='0x0'

                     function='0x2'/>

          </source>

        </hostdev>

    .

    .

    <devices>

     

    5. Identify CPUs-NUMA correlation in the server:

    # lscpu | grep NUMA

    NUMA node(s):          2

    NUMA node0 CPU(s):     0-5,12-17

    NUMA node1 CPU(s):     6-11,18-23

    6. Start the VM and set CPU pinning, make sure to pin CPUs that are on the same NUMA node.

        In this example we are pinning to the VM CPUs 12-16 which are on NUMA node0.

    # virsh start testpmd

    Domain testpmd started

    # virsh vcpupin testpmd 0 12; virsh vcpupin testpmd 1 13; virsh vcpupin testpmd 2 14; virsh vcpupin testpmd 3 15; virsh vcpupin testpmd 4 16

    # virsh vcpupin testpmd

    VCPU: CPU Affinity

    ----------------------------------

       0: 12

       1: 13

       2: 14

       3: 15

       4: 16

    7. Get console access to the VM and set up IP on its VF interface:

    # virsh console testpmd

    Connected to domain testpmd

    [root@localhost ~]# ifconfig -a

    ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

            inet6 fe80::5d4:5812:b713:280d  prefixlen 64  scopeid 0x20<link>

            ether e4:11:22:33:44:50  txqueuelen 1000  (Ethernet)

            RX packets 8  bytes 2736 (2.6 KiB)

            RX errors 0  dropped 0  overruns 0  frame 0

            TX packets 49  bytes 8310 (8.1 KiB)

            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

     

    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

            inet 192.168.122.92  netmask 255.255.255.0  broadcast 192.168.122.255

            inet6 fe80::5242:bb65:d9:4b31  prefixlen 64  scopeid 0x20<link>

            ether 52:54:00:a3:ab:e2  txqueuelen 1000  (Ethernet)

            RX packets 123  bytes 9976 (9.7 KiB)

            RX errors 0  dropped 9  overruns 0  frame 0

            TX packets 68  bytes 6039 (5.8 KiB)

            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

     

    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536

            inet 127.0.0.1  netmask 255.0.0.0

            inet6 ::1  prefixlen 128  scopeid 0x10<host>

            loop  txqueuelen 1  (Local Loopback)

            RX packets 68  bytes 5916 (5.7 KiB)

            RX errors 0  dropped 0  overruns 0  frame 0

            TX packets 68  bytes 5916 (5.7 KiB)

            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

     

    [root@localhost ~]# ip addr add 2.2.2.1/24 dev ens8

     

    Part 4: Install and prepare the TRex server

     

    1. The following releases\packages were used:

    • CentOS: Release 7.5.1804 (inbox kernel 3.10.0-862.el7.x86_64)
    • OFED: 4.4-1.0.0.0-rhel7.5
    • ConnectX-5 FW: 16.23.1000
    • TRex: v2.44

     

    2. Install MLNX_OFED version 4.4-1.0.0.0 or higher, and use the ofed_info command to verify the post-installation version number.

        Make sure the OFED installation script execution is performed using the following parameters:

    # ./mlnxofedinstall --with-mft --with-mstflint --dpdk --upstream-libs

    # ofed_info -s

     

    3. Configure IP interface and VXLAN:

    #pci=$(lspci | grep Mell | head -n1 | awk '{print $1}' |  sed s/\.0\$//g)

    #pf=$(ls  -l /sys/class/net/ | grep $pci | awk '{print  $9}' | head -n1)

    #ifconfig $pf  1.1.1.2/24 up

    #ip link add name vxlan0 type vxlan id 98 dev $pf  remote 1.1.1.1 dstport 4789

    #ifconfig vxlan0 2.2.2.2/24 up

     

    4. Disable firewall and check connectivity with the remote testpmd VM over the vxlan tunnel. In case there is no ping, verify again that testpmd VM did not lose its IP, and that

        the firewall service is disabled:

    #systemctl stop firewalld.service

    #ping 2.2.2.1

     

    5. Download and extract TRex latest branch:

    #mkdir -p /opt/trex

    #cd /opt/trex

    #wget --no-cache http://trex-tgn.cisco.com/trex/release/latest

    #tar -xzvf latest

    6. Run initialization script - select MAC-based config and use the testpmd VM VF MAC as the destination DUT MAC.

        Note: the links for both ports should be up even when only one port is used, as in our case.

    # cd /<latest ver_name>

    #./dpdk_setup_ports.py -i

    By default, IP based configuration file will be created. Do you want to use MAC based config? (y/N)y

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    | ID | NUMA |   PCI   |        MAC        |                  Name                   |  Driver   | Linux IF |  Active  |

    +====+======+=========+===================+=========================================+===========+==========+==========+

    | 0  | 0    | 03:00.0 | a0:d3:c1:01:44:80 | NetXtreme BCM5719 Gigabit Ethernet PCIe | tg3       | eno1     |          |

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    | 1  | 0    | 03:00.1 | a0:d3:c1:01:44:81 | NetXtreme BCM5719 Gigabit Ethernet PCIe | tg3       | eno2     | *Active* |

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    | 2  | 0    | 03:00.2 | a0:d3:c1:01:44:82 | NetXtreme BCM5719 Gigabit Ethernet PCIe | tg3       | eno3     |          |

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    | 3  | 0    | 03:00.3 | a0:d3:c1:01:44:83 | NetXtreme BCM5719 Gigabit Ethernet PCIe | tg3       | eno4     |          |

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    | 4  | 0    | 07:00.0 | 24:8a:07:a1:fc:6c | MT28800 Family [ConnectX-5 Ex]          | mlx5_core | ens2f0   |          |

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    | 5  | 0    | 07:00.1 | 24:8a:07:a1:fc:6d | MT28800 Family [ConnectX-5 Ex]          | mlx5_core | ens2f1   |          |

    +----+------+---------+-------------------+-----------------------------------------+-----------+----------+----------+

    Please choose even number of interfaces from the list above, either by ID , PCI or Linux IF

    Stateful will use order of interfaces: Client1 Server1 Client2 Server2 etc. for flows.

    Stateless can be in any order.

    Enter list of interfaces separated by space (for example: 1 3) : 4

    Please specify even number of interfaces

    Enter list of interfaces separated by space (for example: 1 3) : 4 5

     

    For interface 4, assuming loopback to it's dual interface 5.

    Destination MAC is 24:8a:07:a1:fc:6d. Change it to MAC of DUT? (y/N).y

    Please enter new destination MAC of interface 4: e4:11:22:33:44:50

    For interface 5, assuming loopback to it's dual interface 4.

    Destination MAC is 24:8a:07:a1:fc:6c. Change it to MAC of DUT? (y/N).y

    Please enter new destination MAC of interface 5: e4:11:22:33:44:50

    Print preview of generated config? (Y/n)y

    ### Config file generated by dpdk_setup_ports.py ###

     

    - port_limit: 2

      version: 2

      interfaces: ['07:00.0', '07:00.1']

      port_info:

          - dest_mac: e4:11:22:33:44:50

            src_mac:  24:8a:07:a1:fc:6c

          - dest_mac: e4:11:22:33:44:50

            src_mac:  24:8a:07:a1:fc:6d

     

      platform:

          master_thread_id: 0

          latency_thread_id: 10

          dual_if:

            - socket: 0

              threads: [1,2,3,4,5,6,7,8,9,20,21,22,23,24,25,26,27,28,29]

     

     

    Save the config to file? (Y/n)y

    Default filename is /etc/trex_cfg.yaml

    Press ENTER to confirm or enter new file:

    Saved to /etc/trex_cfg.yaml.

     

    Part 5: Configure and generate TRex traffic against the testpmd VM

     

    1. In the testpmd VM:

         a. Configure huge pages on grub - add the following parameters to /etc/default/grub, update grub and reboot:

    # vi /etc/default/grub

     

    GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap nofb nomodeset vga=normal rhgb quiet intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=4"

     

    # grub2-mkconfig -o /boot/grub2/grub.cfg

    # reboot

     

         b. Mount a 1G huge pages post boot:

    # mount -t hugetlbfs -o pagesize=1G none /dev/hugepages

     

         c.  Start testpmd to accept traffic from TRex over the tunnel and loop it back. Make sure to use the PCI ID of the VF:

    # lspci | grep -i mell

    00:08.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

     

    # cd $DPDK_DIR

    #./testpmd -c 0x1f -n 4 -m 1024 -w 00:08.0 -- \

    --burst=64 --txd=1024 --rxd=1024 --mbcache=512 --rxq=4 --txq=4 --nb-cores=4 \

    --rss-udp --forward-mode=macswap -a -i --port-topology=chained

     

    Parameters definition:

    • Number of cores mask (example for 5 cores) -c 0x1f
    • Number Numa Channels  -n 4
    • Hugepages amount of memory -m 1024
    • PCI slot -w <number>
    • Amount of RX/TX queues and PMD cores --rxq=4 --txq=4 --nb-cores=4
    • L2 forwarding mechanism --forward-mode=macswap

     

    2. In the ASAP2 compute node set a static arp entry with the MAC of the TRex server interface:

     

    #arp -s 1.1.1.2 <trex_pf_mac_address>

     

    3. In the TRex server:

    a.  Upload the udp_vxlan.py script, which is included in the ASAP2_Centos7.5_CX5.tar.gz file.

    b. Identify the MAC of the following interfaces and set it up in the udp_vxlan.py script:

    MAC_A: TRex physical interface (which holds IP 1.1.1.2)

    MAC_B: ASAP2 compute physical interface (which holds IP 1.1.1.1)

    MAC_C TRex vxlan0 interface (which holds IP 2.2.2.2)

    MAC of testpmd VM VF interface was manually set by the setup scripts and its e4:11:22:33:44:50

     

    # pkt =  Ether(,dst="<MAC_B>")/IP(,dst="1.1.1.1")/UDP(dport=4789)/VXLAN()/Ether(,dst=" e4:11:22:33:44:50")/IP(,dst="2.2.2.1")/UDP(dport=44)

     

    c. On a dedicated "screen" session, start the trex server, make sure you are in the directory into which trex was extracted:

     

    # screen -S trex_run

    #./t-rex-64 -i -c 14

     

         d. Leave the trex "screen" session (ctrl+a+d) and start another one for trex console:

     

    # screen -S trex_console

    # stty rows 45

    # stty columns 111

    # ./trex-console

      

     

         e. In the trex console start traffic using the udp_vxlan.py file. In this case we are generating 50mpps packet rate:

     

    #trex>start -f /home/udp_vxlan.py -m 50mpps -p 0

         f. Run the following command to see real time statistics:

     

    trex>tui

    4. In the testpmd VM, while in the tespmd console, use the following command to see port statistics on the VM side:

     

    #testpmd> show port stats all