HowTo Configure Packet Pacing on ConnectX-4

Version 8

    This post shows the configuration steps to follow when you configure packet pacing (traffic shaping) per flow (send queue) on ConnectX-4 and ConnectX-4 Lx.

    This feature is supported in MLNX_OFED version 3.3, and later.

     

    Note: This procedure is relevant only for kernel traffic and not for bypass traffic, such as RDMA over Converged Ethernet (RoCE).

     

    References

    • MLNX_OFED User Manual

     

    Overview

    ConnectX-4 and ConnectX-4LX devices allow packet pacing (traffic shaping) for each send queue. Note that:

    • 16 different rates are supported.
    • Up to 512 send queues are supported.
    • The rates can vary from 1 Mbps to line rate in 1 Mbps resolution.
    • Multiple different queues can be mapped to the same rate (each queue is paced independently).
    • It is possible to configure per CPU and per flow rate limiting in parallel.

     

    System Requirements

    • MLNX_OFED, version 3.3.
    • Linux kernel 4.1, or higher.
    • ConnextX-4/Lx adapter.

     

    Network considerations

     

    Configuration

     

    Note: This configuration is not persistent (it does not survive driver restart).

     

    There are two modes of operation: rate limit per CPU core and rate limit per flow. These are described below.

     

    Firmware Activation

    Before you start, make sure that Mellanox Firmware Tools service (mst) started on the host.

    # mst start

     

    1. Activate Packet Pacing in the Firmware

    #echo "MLNX_RAW_TLV_FILE" > /tmp/mlxconfig_raw.txt

    #echo "0x00000004 0x0000010c 0x00000000 0x00000001" >> /tmp/mlxconfig_raw.txt

    #yes | mlxconfig -d <mst_dev >-f /tmp/mlxconfig_raw.txt set_raw > /dev/null

    #reboot

     

    To deactivate packet pacing:

    #echo "MLNX_RAW_TLV_FILE" > /tmp/mlxconfig_raw.txt

    #echo "0x00000004 0x0000010c 0x00000000 0x00000000" >> /tmp/mlxconfig_raw.txt

    #yes | mlxconfig -d <mst_dev >-f /tmp/mlxconfig_raw.txt set_raw > /dev/null

    #reboot

     

    Note: <mst_dev> in the example above, needs to be replaced with the path to the mst device (e.g.  /dev/mst/mt4115_pciconf0).

    Note: "mlnxfwreset -d <mst_dev> reset" could be used as well.

     

    Rate Limit per CPU Core

    When XPS is enabled, traffic from a CPU core #x is being send using the corresponding send queue.

    By limiting the the rate on that queue, we actually limit the transmit rate on that CPU core.

    For example:

    # echo 300 > /sys/class/net/ens2f1/queues/tx-0/tx_maxrate

    Used in this way, this command limits the rate on Core 0 (tx-0) to 300Mbit/sec.

     

    Rate Limit per Flow

    1. The driver allows you to open up to 2048 additional send queues using the following command:

    # ethtool -L ens2f1 other 1200

    In this example, we opened 1200 additional queues.

     

    2. The next step involves creating flow mapping. The user can map a certain Destination IP and/or Destination Layer 4 Port to a specific send queue.

    The match precedence is as follows:

    1. IP + L4 Port
    2. IP only
    3. L4 Port only
    4. No match (the flow would be mapped to default queues)

     

    To create flow mapping:

    Configure the Destination IP configuration. Begin by writing the IP address in hexadecimal representation to the relevant sysfs entry.

    For example, mapping IP address 192.168.1.1 (0xc0a80101) to send queue 310, execute the following command:

    # echo 0xc0a80101 > /sys/class/net/ens2f1/queues/tx-310/flow_map/dst_ip

    To map Destination L4 3333 port (either TCP or UDP) to same queue execute:

    # echo 3333 > /sys/class/net/ens2f1/queues/tx-310/flow_map/dst_port

     

    From this point on, all traffic destined to the given IP address and L4 port will be sent using send queue 310. All other traffic will be sent using the original send queue.

     

    3. Limit the rate of this flow using the following command:

    # echo 100 > /sys/class/net/ens2f1/queues/tx-310/tx_maxrate

    Note: Each queue only supports single IP+Port combination.

     

    Limitations

    1. IPoIB or RoCE transports are not supported for packet pacing.