HowTo Tune Your Linux Server for Best Performance Using the mlnx_tune Tool

Version 18

    This post describes the mlnx_tune tool. mlnx_tune is designed for both basic and advanced users.

     

     

     

    References

     

    Prerequisites

    mlnx_tune only affects Mellanox's Adapters.

    It is installed as a part of MLNX_OFED driver installation in MLNX_OFED Rel. 3.0 GA version or newer.

    For upstream, mlnx_en and inbox driver users, the tool can be obtained from Mellanox official site:

    1. Navigate to http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
    2. The tool is a part of the downloadable "Performance Tuning Script" package, alongside other helpful scripts

     

    Tool Description

    mlnx_tune is a static system analysis and tuning tool. It has two main functions -  to"report" and to "tune". The reporting function is used for running a static analysis of the system. The tuning function is basically an automated implementation of the Mellanox Performance Tuning Guide guidelines for different scenarios. The tool checks current performance relevance and system properties, and tunes the system to maximum performance according to the selected profile. With the selected profile, mlnx_tune can change interface properties, core assignments for traffic handling, and system services (IRQ Balancer, IP forwarding, firewall, etc.).

     

    Release Notes

    Typically, mlnx_tune versions in MLNX_OFED and on the web are the same. From time-to-time, some important updates are pushed to the web before the newer version of the MLNX_OFED G.A. is released. The release notes refer to mlnx_tune versions, not MLNX_OFED's. If the version installed on your setup is not current, search for a newer version on the web (here).

    Beta:

    0.80:

    Low latency VMA profile added.

    Report according to profile.

    0.66:

    Added details to reports.

    0.59:

    Optimizations to IP forwarding profile.

    Report as default action.

    0.51:

    First working version.

     

    Setup

    Any setup is applicable here, including InfiniBand and Ethernet. In this example, we will use a server that equipped with ConnectX-3 and ConnectX-4 cards.

    The script comes as a part of MLNX_OFED Rel. 3.0 (or newer):

    # ofed_info -s

    MLNX_OFED_LINUX-3.3-1.1.0:

     

    Note: You can download the tool directly as a part of the performance scripts, if MLNX_OFED is not installed. Click here to download the scripts.

     

    Supported Profiles

    mlnx_tune supports a number of profiles for achieving maximum performance for different traffic scenarios.

    • High Throughput:
      • Optimizes interrupt handling - On cores closest to the device, no overlap in core assignment between different interfaces.
      • Disables system services, which might interfere with the driver work.
      • Note: For MLNX_OFED Rel. 3.4 and later versions, Out of the Box behavior should provide high throughput results even if you do not run this profile.
    • IP Forwarding Single Stream:
      • Optimizes interrupt handling - Assigns cores to Receive/Transmit Packet Steering (RPS and XPS), such that each core can handle receive or transmit interrupts without collisions. This method supports bidirectional traffic.
      • Disables system services, which might interfere with the driver work.
    • IP Forwarding Single Stream 0 Packet Loss:
      • Supports both high throughput and IP Forwarding Single Stream, but changes the TX queue length to avoid packet drop between the kernel and the driver.
    • IP Forwarding Single Stream Single Port:
      • Supports IP forwarding single stream, but optimizes for a single port scenario (different XPS/RPS configuration).
    • IP Forwarding Multi Stream Throughput:
      • Optimizes multi stream packet processing for larger message sizes (1024B-1518B).
      • Assigns all cores from the close NUMA only for packet processing in order to improve SW performance by increasing locality.a
      • Disables system services, which might interfere with the driver work.
    • IP Forwarding Multi Stream Packet Rate:
      • Optimizes multi stream packet processing for smaller message sizes (64B-512B).
      • Assigns all cores for packet processing in order to improve performance by working as parallel as possible.
      • Disables system services, which might interfere with the driver work.
    • Low Latency VMA:
      • Optimizes the system for minimal latency over VMA.
      • Assigns all queues to a specific core on the close NUMA node.
      • Disables system services and removes kernel modules in order to avoid interrupts on the executing core.

     

    Generating a Report

    Running mlnx_tune with the -r option (and by default from version 0.59) the tool will query the system and generate a static report with a list of relevant comments on components that might affect performance.

    This following shows an output example from a server equipped with ConnectX-3 Pro and ConnectX-4 cards:

    # mlnx_tune -r

    2016-01-29 11:45:35,790 INFO Collecting node information

    2016-01-29 11:45:35,790 INFO Collecting OS information

    2016-01-29 11:45:35,793 INFO Collecting CPU information

    2016-01-29 11:45:35,922 INFO Collecting IRQ balancer information

    2016-01-29 11:45:35,952 INFO Collecting firewall information

    2016-01-29 11:45:36,827 INFO Collecting IP forwarding information

    2016-01-29 11:45:36,831 INFO Collecting hyper threading information

    2016-01-29 11:45:36,831 INFO Collecting IOMMU information

    2016-01-29 11:45:36,833 INFO Collecting driver information

    2016-01-29 11:45:37,565 INFO Collecting Mellanox devices information

     

    Mellanox Technologies - System Report

     

    Operation System Status

    CENTOS

    3.10.0-229.11.1.el7.x86_64

     

    CPU Status

    Intel Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz Haswell

    OK: Frequency 3086.132MHz

     

    Hyper Threading Status

    ACTIVE

     

    IRQ Balancer Status

    ACTIVE

     

    Driver Status

    OK: MLNX_OFED_LINUX-3.2-0.1.1.0 (OFED-3.2-0.1.1)

     

    ConnectX-3Pro Device Status on PCI 81:00.0

    FW version 2.33.5100

    OK: PCI Width x8

    OK: PCI Speed 8GT/s

    PCI Max Payload Size 256

    PCI Max Read Request 512

    Local CPUs list [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]

     

    ens817 (Port 1) Status

    Link Type eth

    OK: Link status Up

    MTU 1500

     

    ens817d1 (Port 2) Status

    Link Type eth

    OK: Link status Up

    MTU 1500

     

    ConnectX-4 Device Status on PCI 05:00.0

    FW version 12.14.0138

    OK: PCI Width x16

    OK: PCI Speed 8GT/s

    PCI Max Payload Size 256

    PCI Max Read Request 512

    Local CPUs list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]

     

    ens785f0 (Port 1) Status

    Link Type eth

    Warning: Link status Down >>> Check your port configuration (Physical connection, SM, IP).

    MTU 1500

     

    ConnectX-4 Device Status on PCI 05:00.1

    FW version 12.14.0138

    OK: PCI Width x16

    OK: PCI Speed 8GT/s

    PCI Max Payload Size 256

    PCI Max Read Request 512

    Local CPUs list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]

     

    ens785f1 (Port 1) Status

    Link Type eth

    OK: Link status Up

    MTU 9000

    Note: Use the -c flag to output a colorful report.

    Tuning a System

    1. To display all possible commands use the -h flag.

    # mlnx_tune -h

    Usage: mlnx_tune [options]

     

     

    Options:

      -h, --help            show this help message and exit

      -d, --debug_info      dump system debug information without setting a

                            profile

      -r, --report          Report HW/SW status and issues without setting a

                            profile

      -c, --colored         Switch using colored/monochromed status reports. Only

                            applicable with --report

      -p PROFILE, --profile=PROFILE

                            Set profile and run it. choose from:

                            ['HIGH_THROUGHPUT',

                            'IP_FORWARDING_MULTI_STREAM_THROUGHPUT',

                            'IP_FORWARDING_MULTI_STREAM_PACKET_RATE',

                            'IP_FORWARDING_SINGLE_STREAM',

                            'IP_FORWARDING_SINGLE_STREAM_0_LOSS',

                            'IP_FORWARDING_SINGLE_STREAM_SINGLE_PORT',

                            'LOW_LATENCY_VMA']

      -q, --verbosity       print debug information to the screen [default False]

      -v, --version         print tool version and exit [default False]

      -i INFO_FILE_PATH, --info_file_path=INFO_FILE_PATH

                            info_file path. [default %s]

     

    2. To tune the server run mlnx_tune -p <profile>, where <profile> should be the appropriate profile selected from the list of supported profiles.

    # mlnx_tune -p HIGH_THROUGHPUT

    2016-01-29 11:34:17,729 INFO Collecting node information

    2016-01-29 11:34:17,729 INFO Collecting OS information

    2016-01-29 11:34:17,734 INFO Collecting CPU information

    2016-01-29 11:34:17,985 INFO Collecting IRQ balancer information

    2016-01-29 11:34:18,044 INFO Collecting firewall information

    2016-01-29 11:34:19,827 INFO Collecting IP forwarding information

    2016-01-29 11:34:19,835 INFO Collecting hyper threading information

    2016-01-29 11:34:19,835 INFO Collecting IOMMU information

    2016-01-29 11:34:19,839 INFO Collecting driver information

    2016-01-29 11:34:20,841 INFO Collecting Mellanox devices information

    2016-01-29 11:34:26,727 INFO Applying High Throughput profile.

    2016-01-29 11:34:26,786 WARNING Failed to run cmd: /etc/init.d/irqbalance stop

    2016-01-29 11:34:26,786 WARNING Unable to stop irqbalancer

    2016-01-29 11:34:27,184 INFO Some devices' properties might have changed - re-query system information.

    2016-01-29 11:34:27,184 INFO Collecting node information

    2016-01-29 11:34:27,184 INFO Collecting OS information

    2016-01-29 11:34:27,184 INFO Collecting CPU information

    2016-01-29 11:34:27,343 INFO Collecting IRQ balancer information

    2016-01-29 11:34:27,383 INFO Collecting firewall information

    2016-01-29 11:34:28,343 INFO Collecting IP forwarding information

    2016-01-29 11:34:28,347 INFO Collecting hyper threading information

    2016-01-29 11:34:28,348 INFO Collecting IOMMU information

    2016-01-29 11:34:28,349 INFO Collecting driver information

    2016-01-29 11:34:29,089 INFO Collecting Mellanox devices information

    2016-01-29 11:34:32,220 INFO System info file: /tmp/mlnx_tune_160129_113416.log

     

     

    Note: The mlnx_tune tool does not have an 'undo' option, but most of its actions can be undone by restarting the driver. In order to make sure, a reboot might be required.

     

    More Options

    1. Get versioning information.

    # mlnx_tune -v

    2016-01-29 11:36:42,165 INFO Version: 0.66

     

    2. Raise the verbosity level.

    This option will print debug information to the screen. This can be used in order to see exactly what is executed. It will also load the screen with debug messages.

    # mlnx_tune -q

    2016-08-31 23:29:21,674 INFO Collecting node information

    2016-08-31 23:29:21,674 INFO Collecting OS information

    2016-08-31 23:29:21,677 INFO Collecting CPU information

    2016-08-31 23:29:21,737 INFO Collecting IRQ Balancer information

    2016-08-31 23:29:21,740 INFO Collecting Firewall information

    2016-08-31 23:29:21,743 INFO Collecting IP table information

    2016-08-31 23:29:21,746 INFO Collecting IPv6 table information

    2016-08-31 23:29:21,749 INFO Collecting IP forwarding information

    2016-08-31 23:29:21,752 INFO Collecting hyper threading information

    2016-08-31 23:29:21,752 INFO Collecting IOMMU information

    2016-08-31 23:29:21,754 INFO Collecting driver information

    2016-08-31 23:29:22,810 INFO Collecting Mellanox devices information

    2016-08-31 23:29:23,647 DEBUG Checking adaptive rx value for interface ens1.

    2016-08-31 23:29:23,649 DEBUG Checking tx-frames value for interface ens1.

    2016-08-31 23:29:23,651 DEBUG Checking rx-frames value for interface ens1.

    ...

     

    Note: All of the information that was gathered will be dumped into a log file under /tmp.