How To Tune an AMD Server (EYPC CPU) for Maximum Performance

Version 4

    The following is a setup guide to tune AMD EYPC CPU based servers to achieve maximum performance from Mellanox NICs.

     

    References

     

    Verifying System Configuration

    Prior to CPU tuning, we must inspect the NUMA node configuration and verify that our server is actually running an AMD CPU:

    # lscpu

    Architecture:          x86_64

    CPU op-mode(s):        32-bit, 64-bit

    Byte Order:            Little Endian

    ...

    Thread(s) per core:    1

    Core(s) per socket:    32

    Socket(s):             1

    NUMA node(s):          4

    Vendor ID:             AuthenticAMD

    ...

    Model name:            AMD EPYC 7551 32-Core Processor

    ...

    CPU MHz:               1996.203

    ...

    NUMA node0 CPU(s):     0,4,8,12,16,20,24,28

    NUMA node1 CPU(s):     1,5,9,13,17,21,25,29

    NUMA node2 CPU(s):     2,6,10,14,18,22,26,30

    NUMA node3 CPU(s):     3,7,11,15,19,23,27,31

    ...

     

    In the above output we can observe that the tested server is running an AMD CPU model "EPYC 7551 32-Core Processor" with 4 octa-core NUMA nodes.

    Since Hyper Threading is disabled, only a combined total of 32 CPUs (physical and logical) are available.

     

    To find Mellanox NIC's local NUMA node, refer to the following How-To: Understanding NUMA Node for Performance Benchmarks.

    In this example, we will tune Mellanox NIC's local node to NUMA node #2. To do that, run:

    # cat /sys/class/net/eth20/device/numa_node

    2

    For the performance tuning process, we will utilize local CPU cores 2,6,10,14,18,22,26,30.

     

    Performance Tuning

    To maximize the NIC's bandwidth, interrupt events processing must be handled by the local CPUs only. This will localize processing and memory usage, and reduce QPI overhead.

    See What is IRQ Affinity? for more information.

     

    To bind the NIC's interrupt events to the local cores, run:

    # service irqbalance stop

    # set_irq_affinity_cpulist.sh 2,6,10,14,18,22,26,30 eth20

     

    Alternatively, binding the NIC's interrupt events to the local cores can be done using the mlnx_tune tool (runs automatically on all Mellanox NIC's), run:

    # mlnx_tune -p HIGH_THROUGHPUT

     

    Performance Results

    Below are the expected OOB results with the above tuning for the following setup:

    • iperf
    • 8 threads
    • TCP window 512KB
    • 8KB message size

     

    Server:

    iperf -s

     

    Client:

    iperf -c 120.7.84.141 -P 8 -t 10 -w 512k

    Expected results:

    MTUTuningBandwith
    1500BOOB~50Gb/s
    1500BTuned~90Gb/s
    9000BOOB~85Gb/s
    9000BTuned~97.5Gb/s