HowTo Configure aRFS on ConnectX-4

Version 6

    This post describes the procedure used to configure accelerated Receive Flow Steering (aRFS) on ConnectX-4 adapters.

    aRFS support is provided for MLNX_OFED, version 3.3 for ConnectX-4.

     

    References

     

    Overview

    RFS and accelerated RFS (aRFS) are kernel features currently available in most distributions. The aRFS feature requires explicit configuration in order to enable it.

    For RFS, packets are forwarded based on the location of the application consuming the packet.

    Accelerated RFS boosts the speed of RFS by adding the support for the hardware. By using aRFS (unlike RFS), the packets are directed to a CPU that is local to the thread running the application.

    For more information about the features, refer to kernel documentation on each distribution . See Receive Flow Steering (RFS)  and Accelerated RFS to see examples.

    Note: LRO does not need to be disabled for aRFS.

     

    Setup

    In this example two servers are configured via a switch equipped with ConnectX-4 adapters. Note that the servers could also back be configured to back.

     

    Configuration

    1. By default, RFS and aRFS are compiled with the kernel in most of the distributions. In this example, kernel support is provided for kernel release 2.6.39 and later.

    Check that the following flags are enabled (=y) in the kernal's /boot/config file .

    For example:

    # vim /boot/config-3.10.0-123.el7.x86_64

     

    Check that you have the following parameters are enabled:

    Note: RPS is required in order for RFS to function properly.

    CONFIG_RPS=y

    CONFIG_RFS_ACCEL=y

     

    2. Enable the ntuple feature on the driver, and then check the status of ntuple by running the following command:

    # ethtool -k ens6

    Features for ens6:

    ...

     

    ntuple-filters: off

    ...

     

    Enable the ntuple feature and check that the feature is set off (the default) by running:

    # ethtool -K ens6 ntuple on

    # ethtool -k ens6

    Features for ens6:

    ...

     

    ntuple-filters: on

    ...

     

    3. Disable the interrupt (irq) balance.

    Note: irq_balance could be active, but that makes it difficult to verify the feature by looking on the Rx ring counter because the mapping between the rinq and CPU core is not guaranteed to be core 0 < -> ring 0 , core 1 <-> ring 1 and so on. It could, however, be changed dynamically while traffic is running.

    For this test, we will disable irq_balance to verify that the aRFS functioning as expected.

    # service irqbalance stop

     

    4. Ensure that irq affinity is spread across all cores.

    In this example, we have 12 CPU cores (rings) with the interrupts are spread across the cores (it is a bitmap).

    # show_irq_affinity.sh ens6

    Note: interface name is not in /proc/interrupts, using the pci device IRQs

    ...

    29: 001   # CPU 0

    30: 002   ...

    31: 004

    32: 008

    33: 010

    34: 020

    35: 040

    36: 200

    37: 100

    38: 200

    39: 400

    40: 800  # CPU 12

    For more information, refer to What is IRQ Affinity?

     

    5. Configure the RFS flow table entries (globally and per core).

    Note: The functionality remains disabled until explicitly configured (by default it is 0).

     

    The number of entries in the global flow table is set as follows:

    /proc/sys/net/core/rps_sock_flow_entries

     

    The number of entries in the per-queue flow table are set as follows:

    /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt

     

    For example:

    # echo 32768 > /proc/sys/net/core/rps_sock_flow_entries

    # for f in /sys/class/net/ens6/queues/rx-*/rps_flow_cnt; do echo 32768 > $f; done

     

    For more information about RFS supported values refer to: https://www.kernel.org/doc/Documentation/networking/scaling.txt.

     

    Monitoring

    To monitor the traffic that goes to each CPU, run:

    # ethtool -S ens6 | egrep rx.*pack

         rx_packets: 0

         rx_vport_error_packets: 0

         rx_vport_unicast_packets: 764481667

         rx_vport_multicast_packets: 27

         rx_vport_broadcast_packets: 1

         rx0_packets: 0

         rx0_lro_packets: 0

         rx1_packets: 0

         rx1_lro_packets: 0

         rx2_packets: 0

         rx2_lro_packets: 0

         rx3_packets: 0

         rx3_lro_packets: 0

         rx4_packets: 0

         rx4_lro_packets: 0

         rx5_packets: 0

         rx5_lro_packets: 0

         rx6_packets: 0

         rx6_lro_packets: 0

         rx7_packets: 0

         rx7_lro_packets: 0

         rx8_packets: 0

         rx8_lro_packets: 0

         rx9_packets: 0

         rx9_lro_packets: 0

         rx10_packets: 0

         rx10_lro_packets: 0

         rx11_packets: 0

         rx11_lro_packets: 0

    [root@dev-l-vrt-201-005 ~]#

     

     

    Verification

    In this example we will use two hosts running the netperf application.

     

    1. Disable ntuple. Start by following the configuration section (or the script) described above and disable ntuple.

    # ethtool -K ens6 ntuple off

     

    2. Run netserver on one host and on a specific core. In this example we selected core number 5.

    # taskset -c 5 netserver &

     

    3. Run the netperf client on the other host, for example:

    # netperf -H 11.134.201.5 -l 200 -t TCP_STREAM &

    4. Monitor the ring counters. In most cases, you will get the packets delivered to a different ring, in this case ring 8 (rx8_packets counter).

    # watch -n 1 -d "ethtool -S ens6 | egrep rx.*pack"

         rx_packets: 60266184

         rx_vport_error_packets: 0

         rx_vport_unicast_packets: 824749590

         rx_vport_multicast_packets: 27

         rx_vport_broadcast_packets: 1

         rx0_packets: 3

         rx0_lro_packets: 0

         rx1_packets: 6

         rx1_lro_packets: 0

         rx2_packets: 0

         rx2_lro_packets: 0

         rx3_packets: 6

         rx3_lro_packets: 0

         rx4_packets: 0

         rx4_lro_packets: 0

         rx5_packets: 0

         rx5_lro_packets: 0

         rx6_packets: 0

         rx6_lro_packets: 0

         rx7_packets: 0

         rx7_lro_packets: 0

         rx8_packets: 6296748

         rx8_lro_packets: 0

         rx9_packets: 0

         rx9_lro_packets: 0

         rx10_packets: 0

         rx10_lro_packets: 0

         rx11_packets: 0

         rx11_lro_packets: 0

     

    5. Enable ntuple.

    # ethtool -K ens6 ntuple on

     

    6. Then re-run the netserver on core 5 (on the server).

    # taskset -c 5 netserver &

     

    7. Re-run the netperf client.

    # netperf -H 11.134.201.5 -l 200 -t TCP_STREAM &

     

    8. Monitor the ring counters. In most cases, you see that the packets are delivered to a the same ring on which the netserver was spawned, in this case ring 5 (rx5_packets counter).

    # watch -n 1 -d "ethtool -S ens6 | egrep rx.*pack"

        ...

     

         rx5_packets: 234532

         rx5_lro_packets: 0

     

    Just Give Me a Script

     

    Here is an example of a script used to configure aRFS on your host. The $1 argument is the interface name.

    Create a file called enable_arfs.sh and copy the following:

    #!/bin/sh

     

    intf=$1

    ethtool -K $intf ntuple on

     

    if [ $? -gt 0 ]; then

            echo "ERROR to enble ntuple"

    #       exit

    fi

     

    echo 32768 > /proc/sys/net/core/rps_sock_flow_entries

    for f in /sys/class/net/$intf/queues/rx-*/rps_flow_cnt; do echo 32768 > $f; done

     

    /usr/sbin/set_irq_affinity.sh $intf

     

    Run:

    # enable_arfs.sh ens6

     

    Troubleshooting

    1. This feature is supported in MLNX_OFED version 3.3, and later. When older version are installed, output from the driver features are specified as off [fixed], which means that the feature is not supported in this driver.

    # ethtool -k ens6

    Features for ens6:

    ...

     

    ntuple-filters: off [fixed]