3 Replies Latest reply on Jun 19, 2014 9:23 AM by thorvald

    ConnectX-3 Pro VXLAN Performance Overhead

    thorvald

      Hi,

       

      I'm testing out ConnectX-3 Pro with VXLAN Offload in our lab. Using a single-stream iperf performance test, we get ~34Gbit/s transfer speed of non-VXLAN transport, but only ~28Gbit/s with VXLAN encapsulation.

       

      In both cases, the bottleneck is the CPU on the receiving side. Looking at a perf dump, the top usage:

       

      Without VXLAN:

      +   24.27%            iperf  [kernel.kallsyms]       [k] copy_user_enhanced_fast_string

      +    6.49%            iperf  [kernel.kallsyms]       [k] mlx4_en_process_rx_cq

      +    5.34%            iperf  [kernel.kallsyms]       [k] tcp_gro_receive

      +    3.43%            iperf  [kernel.kallsyms]       [k] dev_gro_receive

      +    3.28%            iperf  [kernel.kallsyms]       [k] mlx4_en_complete_rx_desc

      +    3.05%            iperf  [kernel.kallsyms]       [k] memcpy

      +    2.88%            iperf  [kernel.kallsyms]       [k] inet_gro_receive

       

      With VXLAN:

      +   20.06%            iperf  [kernel.kallsyms]      [k] copy_user_enhanced_fast_string

      +    6.04%            iperf  [kernel.kallsyms]      [k] mlx4_en_process_rx_cq

      +    5.43%            iperf  [kernel.kallsyms]      [k] inet_gro_receive

      +    3.29%            iperf  [kernel.kallsyms]      [k] dev_gro_receive

      +    3.24%            iperf  [kernel.kallsyms]      [k] tcp_gro_receive

      +    3.08%            iperf  [kernel.kallsyms]      [k] skb_gro_receive

      +    3.02%            iperf  [kernel.kallsyms]      [k] memcpy

      +    2.85%            iperf  [kernel.kallsyms]      [k] mlx4_en_complete_rx_desc

       

      This is Centos 6.5, kernel 3.15.0, Firmware 2.31.5050.

       

      We're certainly happy with 28Gbit/s, but I'm wondering if there are plans to improve this to the point that VXLAN adds no additional CPU overhead at all, or if there is any tuning I can do towards the same goal?

       

      - Thorvald

        • Re: ConnectX-3 Pro VXLAN Performance Overhead
          ophirmaor

          Hi Thorvald,

          Did you run this test VM to VM or within the hypervisor, I assume VM to VM.

          Is this only one flow (one VM) or more (several VMs on the same host)?

          What is the CPU that you are using? number of cores? memory?

          Do you use PCIe Gen3? (I assume you do)

          Do you use MTU=1500?

          If possible, try to run 2 or 4 VMs and see how it goes, it should be better.

          The performance looks ok, but you could reach to better ones (close to line rate)

          See this post:http://community.mellanox.com/docs/DOC-1456

           

          I added a performance slide, and a link to Case Study with Plumgrid

           

          http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf 

           

           

          Thanks,

          Ophir.

            • Re: Re: ConnectX-3 Pro VXLAN Performance Overhead
              thorvald

              #! /bin/bash

               

              set -x

               

              DEV=mlx4

              NET=21

               

              ip addr flush dev mlx4

              ip link set dev mlx4 down

              ip link del vxlan0

               

              ip link set dev $DEV mtu 9000

              ip addr add 10.224.$NET.27/24 brd + dev $DEV

              ip link set dev $DEV up

              ip route add 10.224.0.0/12 via 10.224.$NET.1

               

              ip link add vxlan0 type vxlan id 17 group 239.1.1.17 dev $DEV

              ip addr add 172.18.1.$NET/24 brd + dev vxlan0

              ip link set dev vxlan0 up

               

              This is run on both machines (with different NET variable), bare metal with no VM. mlx4 is the ethX device renamed.

               

              MTU 9000 is a new addition; with that I get ~38 Gbit/s when doing single-stream TCP testing on the mlx4 device, but VXLAN encapsulated traffic stays at ~24Gbit/s; CPU bound on a single core.

               

              The performance I am seeing is close to the one you show in DOC-1456 for 1 VM pair. While I can get high performance by running multiple streams, I could get similar aggregate performance by bonding 4 10 Gbit/s connections. I'm really hoping to improve our single-stream speeds.

            • Re: ConnectX-3 Pro VXLAN Performance Overhead
              ophirmaor

              About PlumGrid:

               

              PlumGrid and Mellanox published a new white paper about creating a better network infrastructure for a large-scale OpenStack cloud by using Mellanox’s ConnectX-3 Pro VXLAN HW offload.

              The PlumGrid VNI  (Virtual Network Infrastructure) running over Mellanox switches and ConnectX-3 Pro adapters is a unique offering targeted for large-scale data centers.

              With the ConnectX-3 Pro stateless HW offload,  users can achieve:

              - Linear improvement in VM performance until reaching the near line rate performance (36 Gbps with eight VM pairs generating traffic at maximum rates).

              - CPU utilization remains virtually constant on both TX and RX ends, while the throughput grows to 36 Gbps.

              The white paper is available from Plumgrid website page: http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf 

               

              PlumGrid VNI 3.0 is a software networking product for large-scale OpenStack Clouds, it provides a network fabric-agnostic, turn-key solution to build a scalable cloud infrastructure and offer advanced, on-demand network services to cloud tenants.  To find out more  http://www.plumgrid.com/product/overview/