This post demonstrates latency and throughput benchmark results using memaslap benchmark for testing MemCaheD over high speed Mellanox Ethernet network with and without VMA acceleration.
The memaslap benchmark is a command line utility developed in conjunction with MemCacheD load generation and bechmarking Key-Value databases.
This benchmark test shows latency and throughput improvements using memaslap benchmark over VMA comparing to non VMA benchmark.
In our performance tests, we used the benchmark in order to understand the improvement between running with and without VMA.
The benchmark results showed significant improvement in favor of running MemCacheD on top of VMA compared to running on native kernel network stack.
The MemCacheD Non-VMA is saturated around ~1M TPS (Transactions Per Second), while with VMA it is saturated around ~2M TPS
Each test was executed in 2 methods:
- "No VMA" - Run over kernel network sockets without any acceleration
- "VMA on Server" side only acceleration - Client ran over the Linux sockets.
The results were calculated for single MemCacheD server and single memaslap client, doing single GET operations of key & value size of 64 byte.
The MemCacheD server was running with 7 threads while the memaslap was running with different number of threads & connections to achieve different TPS rates.
The memaslap was always running with VMA, to achieve high rates with single client.
Setup and Configuration
- Transport: 10 GbE
- Adapter: ConnectX-3 Pro, FW: 2.31.2510
- CPU: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz, 16 cores, 2 numa nodes
- RAM: 64GB
- OS: Linux RH6.4 2.6.32-358.el6.x86_64
- MLNX_OFED: 2.2
- VMA: 6.6.2
- MemCacheD: 1.4.17 from http://memcached.org/
- libevent (for memcached): 188.8.131.52.9 from http://libevent.org/
- memaslap: 1.0 from libmemcached 1.0.18 library rpm http://pkgs.org/centos-6/centalt-x86_64/libmemcached-1.0.18-1.el6.x86_64.rpm.html
- Memaslap config file (cat ~/.memslap.cnf)
64 64 1
64 64 1
1 0.999 (GET ratio 99.9%, SET ratio 0.1%)
# service irqbalance stop
# service iptables stop
# service cpuspeed stop
MemCacheD Command Line
Command Line for "No VMA"
# LD_LIBRARY_PATH=/usr/local/lib memcached -m 12000 -l 184.108.40.206 -u root -t 7 -c 10000
Command line for "VMA on the Server"
# LD_PRELOAD=libvma.so VMA_RING_ALLOCATION_LOGIC_TX=31 VMA_RING_ALLOCATION_LOGIC_RX=31 LD_LIBRARY_PATH=/usr/local/lib taskset -c 8-15 memcached -m 12000 -l 220.127.116.11 -u root -t 7 -c 10000
Memaslap Command Line
# VMA_RING_ALLOCATION_LOGIC_TX=20 VMA_RING_ALLOCATION_LOGIC_RX=20 LD_PRELOAD=libvma.so taskset -c 8-15 memaslap -s 18.104.22.168:11211 -T $1 -c $2 -t 30s -X 64 -S 1s
Where ($1 - number of treads, $2 - number of concurrency to simulate with the load) is in: (1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (8,16), (8,24), (8,32), (8,64)
1. Latency vs. Transaction Rate (GET & SET operations) [Lower is better]
2. Higher Max Transaction Rate below 100 usec
MemCacheD Non-VMA is saturated around ~1M TPS, while MemCacheD with VMA is saturated around ~2M TPS.