Getting started with Performance Tuning of Mellanox adapters

Version 5

    This post supply the todo list when starting to debug performance issues related to Mellanox adapters.




    Hardware Setup

    1. Get your BIOS configured to highest performance, refer to the server BIOS documentation and see here as well:


    2. Use proper PCIe generation that suit the adapter. In most cases you will need to use:

    • PCIe Gen3
    • Speed 8GT/s
    • Width x8 or x16 (depends on the speed).


    Note: if available x16 slots could be useful for x8 adapters as it benefit from additional buffers allocated by the CPU.


    To learn more about the PCIe performance See Understanding PCIe Configuration for Maximum Performance and click here.


    3. Make sure you understand the motherboard architecture and NUMA configuration of the server.

    For the relevant application use the CPU cores directly connected to the relevant PCIe bus used by Mellanox adapter. See Understanding NUMA Node for Performance Benchmarks.


    4. For high performance it is recommended to use the highest memory speed with fewest DIMMs and populate all memory channels for every CPU installed.


    5. It is recommended to disable system profilers and stop all monitoring tools while running performance benchmarks (such as sysstat, vmstat, iostat, mpstat, dstat, etc). Those tools use host’s resources, hence running them in parallel to benchmark jobs may affect the performance in various degrees based on the traffic type and/or pattern, and the nature of the benchmark.


    6. For benchmark testing, make sure that you bypass the local cache (write directly to memory), see HowTo Bypass Local Cache (disable tx-nocache-copy).


    OS and Driver Configuration

    1.For benchmark reasons, close unnecessary applications/processes/services that might consume CPU when running the benchmark (e.g. sysstat, vmstat, iostat, mpstat and others).


    2. Use mlnx_tune for automatic tuning of the server. For more info, refer to HowTo Tune Your Linux Server for Best Performance Using the mlnx_tune Tool.