1) Have you validated that the HCA FW aligned with Mellanox OFED Driver? Please consult with the RN of the driver.
2) Check the PCIe Gen & width (check the " LnkCap" & LnkSta") : "PCIe generation 3.0 and x8 or x16"?
#lspci -v | grep -i mel
#lspci -s "domain.bus.slot" -vvv (IE: 04:00.0)
3) In regard to tuning, have you used our Performance Tuning Guide for our Network Adapter Card?
4) Are the results similar with iperf version 2.x from both client and server side using a parallelism of 4?
(make sure you are using "taskset" using the CPU's closer to the NUMA node the HCA card is installed).
5) Verify offloads (ethtool -k <interface>)
6) In some Linux distributions, Hardware LRO (HW LRO) must be enabled to reach the
required line-rate performance.
To enabled HW LRO:
# ethtool -–set-priv-flags <interface> hw_lro on ( default off)
7) In case “tx-nocache-copy” is enabled, (this is the case for some kernels, e.g. kernel 3.10,
which is the default for RH7.0) “tx-nocache-copy” should be disabled.
To disable “tx-nocache-copy”:
# ethtool -K <interface> tx-nocache-copy off
8) Our Performance Tuning Guide contains these information + other recommended tuning (IE: power management, NUMA Architecture tuning, Interrupt Moderation Tuning).
9) You can also use our "mlnx_tune" utility for automatic tuning and compare. (mlnx_tune --help).