Getting started with Mellanox ASAP^2

Version 26

    This post provides basic steps on how to configure and setup basic parameters for the Mellanox ASAP^2.

    This post is basic and is meant for advanced users.

     

    References

     

    Overview

    Open vSwitch (OVS) allows Virtual Machines (VM) to communicate with each other and with the outside world. OVS traditionally resides in the hypervisor and switching is based on twelve tuple matching on flows. The OVS software based solution is CPU intensive, affecting system performance and preventing fully utilizing available bandwidth. Mellanox Accelerated Switching And Packet Processing (ASAP2) Direct technology allows to offload OVS by handling OVS data-plane in Mellanox ConnectX-4 onwards NIC hardware (Mellanox Embedded Switch or eSwitch) while maintaining OVS control-plane unmodified. As a results we observe significantly higher OVS performance without the associated CPU load. The current actions supported by ASAP2 Direct include packet parsing and matching, forward, drop along with VLAN push/pop or VXLAN encap/decap.

     

    Setup

    The basic setup consists of:

    • At least two servers equipped with PCI gen3x16 slots
    • At least two Mellanox ConnectX-4/ConnectX-5 adapter cards
    • One Mellanox Ethernet Cable

     

    Prerequisites

    In case you plan to run performance test, it is recommended to tune the BIOS to high performance.

    Refer to Performance Tuning for Mellanox Adapters and see this example: BIOS Performance Tuning Example.

     

    1. Linux Kernel >= 4.13-rc5

    2. Mellanox NICs FW

        FW ConnectX-5: >= 16.21.0338

        FW ConnectX-4 Lx: >= 14.21.0338

    3. iproute >= 4.12

    4. upstream openvswitch >= 2.8

    5. openstack >= Pike version

    6. SR-IOV enabled

     

    Configuration

    Configure ASAP on all Compute Nodes(HyperVisors):

    ( see attached script asap_config.sh below)

     

    Check Mellanox NIC PCI address

    # lspci |grep Mellanox

     

    Check PF name of the relevant PCI slot

    # ls -l /sys/class/net/

     

    Set Number of VFs

    # echo $NUM_VFS > /sys/class/net/$PF/device/sriov_numvfs

     

    Change the e-switch mode from legacy to switchdev on the PF device. This will also create the VF representor netdevices in the host OS.

    # echo 0000:${PCI}.$i > /sys/bus/pci/drivers/mlx5_core/unbind

     

    Set devices to switchdev mode

    #  sudo devlink dev eswitch set pci/0000:${PCI}.0 mode switchdev

     

    Enable ASAP Offloading

    #  sudo ethtool -K $pf hw-tc-offload on

     

    Bind PCI device back (iterate $i over VFs)

    # echo 0000:${PCI}.$i > /sys/bus/pci/drivers/mlx5_core/bind

     

    Restart OVS

    # sudo systemctl enable openvswitch.service

    # sudo ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

    # sudo systemctl restart openvswitch.service

     

    The given timeout of OVS is given in ms and can be controlled with:

    # ovs-vsctl set Open_vSwitch . other_config:max-idle=30000

     

    Install OpenStack (devstack)

    # git clone git://git.openstack.org/openstack-dev/devstack
    # cd devstack

     

    Update local.conf of the Nodes ( examples attached below)

    LIBS_FROM_GIT=os-vif,neutron-lib
    enable_plugin os-vif https://github.com/openstack/os-vif

    Q_PLUGIN=ml2

    Q_AGENT=openvswitch

    Q_ML2_PLUGIN_MECHANISM_DRIVERS=openvswitch

    [[post-config|$NOVA_CONF]]

    [DEFAULT]

    [pci]

    passthrough_whitelist ={"'"address"'":"'"*:'"${PCI}"'.*"'","'"physical_network"'":"'"default"'"}

     

    Run devstack on all nodes.

    # ./stack.sh

     

     

    Validation

    Below we will show how to bring up 2 VMs manually and  check ASAP Offloading.

    Both VMs should be located on different Compute Nodes (HyperVisors)

     

    Use VM image with Mellanox OFED drivers

    # . openrc admin
    # wget http://52.169.200.208/images/fedora_24_ofed_4.0-2.0.0.1.qcow2
    # glance image-create --name mellanox_fedora --visibility public --container-format bare --disk-format qcow2 --file fedora_24_ofed_4.0-2.0.0.1.qcow2

     

    Create two 'direct' ports for VMs on the relevant network

    Create 'direct' port on 'private' network for example
    # neutron port-create --binding:vnic_type=direct --binding-profile '{"capabilities": ["switchdev"]}'
    private

     

    Create First VM with the image and port from the previous steps.

    Note: you can use availability-zone parameter to place VM on desired hypervisor.
    # nova boot --flavor m1.small --image
    mellanox_fedora --nic port-id=<direct_port_id1>
    --availability-zone nova:compute_node_1 vm1

     

    Create Second VM following steps above

    # neutron port-create --binding:vnic_type=direct private
    # nova boot --flavor m1.small --image mellanox_fedora --nic port-id=<direct_port_id2> --availability-zone nova:compute_node_2 vm2

     

    Connect to VM's console and run ping from one to another.

    Mellanox Fedora image credentials are cloud:cloud

    hypervisor_1# vncviewer localhost:5900
    vm_1# ping vm2

     

    Check on which Representor port used by VM2

    # ip link show enp3s0f0

    6: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000

        link/ether ec:0d:9a:46:9e:84 brd ff:ff:ff:ff:ff:ff

        vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state enable, trust off, query_rss off

        vf 1 MAC 00:00:00:00:00:00, spoof checking off, link-state enable, trust off, query_rss off

        vf 2 MAC 00:00:00:00:00:00, spoof checking off, link-state enable, trust off, query_rss off

        vf 3 MAC fa:16:3e:b9:b8:ce, vlan 57, spoof checking on, link-state enable, trust off, query_rss off

    #ls -l /sys/class/net/|grep eth

    lrwxrwxrwx 1 root root 0 Sep 11 10:54 eth0 -> ../../devices/virtual/net/eth0

    lrwxrwxrwx 1 root root 0 Sep 11 10:54 eth1 -> ../../devices/virtual/net/eth1

    lrwxrwxrwx 1 root root 0 Sep 11 10:54 eth2 -> ../../devices/virtual/net/eth2

    lrwxrwxrwx 1 root root 0 Sep 11 10:54 eth3 -> ../../devices/virtual/net/eth3


    #

    bash-4.3$ sudo ovs-dpctl show

    system@ovs-system:

            lookups: hit:1684 missed:1465 lost:0

            flows: 0

            masks: hit:8420 total:1 hit/pkt:2.67

            port 0: ovs-system (internal)

            port 1: br-enp3s0f0 (internal)

            port 2: br-int (internal)

            port 3: br-ex (internal)

            port 4: enp3s0f0

            port 5: tapfdc744bb-61 (internal)

            port 6: qr-a7b1e843-4f (internal)

            port 7: qg-79a77e6d-8f (internal)

            port 8: qr-f55e4c5f-f3 (internal)

            port 9: eth3

     

    Connect to the HyperVisor that hosts VMs and check tcpdump

    hypervisor_2# tcpdump -nnn -i eth3

    ASAP_tcp.jpg

     

    As you can see only first packet where shown on representor port.

    All other packets ( we kept pinging VM2) were offloaded.

     

    Observe that rules are added to OVS data-path

    # sudo ovs-dpctl dump-flows

    in_port(9),eth(src=fa:16:3e:b9:b8:ce,dst=fa:16:3e:0a:f4:71),eth_type(0x0800), packets:885, bytes:90270, used:0.400s, actions:push_vlan(vid=57,pcp=0),4

    in_port(4),eth(src=fa:16:3e:0a:f4:71,dst=fa:16:3e:b9:b8:ce),eth_type(0x8100),vlan(vid=57,pcp=0),encap(eth_type(0x0800)), packets:885, bytes:90270, used:0.400s, actions:pop_vlan,9

    recirc_id(0),in_port(8),eth(src=fa:16:3e:53:d8:c0,dst=33:33:00:00:00:01),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:push_vlan(vid=1,pcp=0),2,pop_vlan,push_vlan(vid=57,pcp=0),1,4,pop_vlan,5,6,9