Did you run this test VM to VM or within the hypervisor, I assume VM to VM.
Is this only one flow (one VM) or more (several VMs on the same host)?
What is the CPU that you are using? number of cores? memory?
Do you use PCIe Gen3? (I assume you do)
Do you use MTU=1500?
If possible, try to run 2 or 4 VMs and see how it goes, it should be better.
The performance looks ok, but you could reach to better ones (close to line rate)
See this post:http://community.mellanox.com/docs/DOC-1456
I added a performance slide, and a link to Case Study with Plumgrid
ip addr flush dev mlx4
ip link set dev mlx4 down
ip link del vxlan0
ip link set dev $DEV mtu 9000
ip addr add 10.224.$NET.27/24 brd + dev $DEV
ip link set dev $DEV up
ip route add 10.224.0.0/12 via 10.224.$NET.1
ip link add vxlan0 type vxlan id 17 group 126.96.36.199 dev $DEV
ip addr add 172.18.1.$NET/24 brd + dev vxlan0
ip link set dev vxlan0 up
This is run on both machines (with different NET variable), bare metal with no VM. mlx4 is the ethX device renamed.
MTU 9000 is a new addition; with that I get ~38 Gbit/s when doing single-stream TCP testing on the mlx4 device, but VXLAN encapsulated traffic stays at ~24Gbit/s; CPU bound on a single core.
The performance I am seeing is close to the one you show in DOC-1456 for 1 VM pair. While I can get high performance by running multiple streams, I could get similar aggregate performance by bonding 4 10 Gbit/s connections. I'm really hoping to improve our single-stream speeds.
PlumGrid and Mellanox published a new white paper about creating a better network infrastructure for a large-scale OpenStack cloud by using Mellanox’s ConnectX-3 Pro VXLAN HW offload.
The PlumGrid VNI (Virtual Network Infrastructure) running over Mellanox switches and ConnectX-3 Pro adapters is a unique offering targeted for large-scale data centers.
With the ConnectX-3 Pro stateless HW offload, users can achieve:
- Linear improvement in VM performance until reaching the near line rate performance (36 Gbps with eight VM pairs generating traffic at maximum rates).
- CPU utilization remains virtually constant on both TX and RX ends, while the throughput grows to 36 Gbps.
The white paper is available from Plumgrid website page: http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf
PlumGrid VNI 3.0 is a software networking product for large-scale OpenStack Clouds, it provides a network fabric-agnostic, turn-key solution to build a scalable cloud infrastructure and offer advanced, on-demand network services to cloud tenants. To find out more http://www.plumgrid.com/product/overview/