HowTo Auto-Config PFC and ETS on ConnectX-4 via LLDP DCBX

Version 7

    This post is about the procedure of using an LLDP DCBX package to auto-configure the ConnectX-4 adapter PFC and ETS features through switch configuration.

    This feature is supported since MLNX_OFED 3.3, firmware version 12.16.0190 or later.

    The feature was updated for MLNX_OFED 4.0. dcbx_handle_by_fw ethtool parameter was removed and now can be configured via mlnx_qos tool.

     

    References

     

    Setup

    For this setup, a server equipped with ConnectX-4 connected to a switch supporting LLDP DCBX, PFC and ETS is used.

    In this example, Mellanox Spectrum switch SN2700 is used.

     

    Overview

    There are two options to configure PFC and ETS on the server:

    • Local Configuration: Configuring each server manually. For example, see: HowTo Configure PFC on ConnectX-4 .
    • Remote Configuration: Configuring PFC and ETS on the switch, after which the switch passes the configuration to the server using LLDP DCBX TLVs. There are two ways to implement the remote configuration using ConnectX-4 adapters:
      • Configure the adapter firmware to enable DCBX, run LLDP and configure the firmware with remote configuration.
      • Configure the host to enable DCBX, run LLDP (e.g. lldpad) and configure the driver/firmware with the right configuration.

     

    Prerequisites

    Before you start, make sure that you have the following:

    • The host is equipped with ConnectX-4
    • MLNX_OFED 3.3. is installed on the host
    • Latest MLNX-OS is installed on the switch
    • The link is UP
    • LLDP is enabled on the switch (Run "lldp")
    • PFC is enabled on the switch (e.g. on priority 4)

       See also:

     

    Configuration

     

    Getting started

    1. Start MFT, and query the adapter parameters using mlxconfig tool.

    # mst start

    ..

     

     

    # mst status

    MST modules:

    ------------

        MST PCI module loaded

        MST PCI configuration module loaded

     

    MST devices:

    ------------

    /dev/mst/mt4115_pciconf0         - PCI configuration cycles access.

                                       domain:bus:dev.fn=0000:05:00.0 addr.reg=88 data.reg=92

                                       Chip revision is: 00

     

     

     

    # mlxconfig -d /dev/mst/mt4115_pciconf0 q

     

    Device #1:

    ----------

     

    Device type:    ConnectX4

    PCI device:     /dev/mst/mt4115_pciconf0

     

    Configurations:                              Current

             ...

             LLDP_NB_RX_MODE_P1                  0

             LLDP_NB_TX_MODE_P1                  0

             LLDP_NB_DCBX_P1                     False(0)

             LLDP_NB_RX_MODE_P2                  0

             LLDP_NB_TX_MODE_P2                  0

             LLDP_NB_DCBX_P2                     False(0)

             DCBX_IEEE_P1                        True(1)

             DCBX_CEE_P1                         True(1)

             DCBX_WILLING_P1                     True(1)

             DCBX_IEEE_P2                        True(1)

             DCBX_CEE_P2                         True(1)

             DCBX_WILLING_P2                     True(1)

    Note: Parameter description and possible values can be found in MFT User Manual on http://www.mellanox.com/page/management_tools

     

    2. Get the LLDP remote configuration on the switch.

     

    Note: At this point, even when LLDP is enabled on the switch, the remote information will remain empty as long as LLDP is not enabled on the server.

    switch (config) #  show lldp interfaces ethernet remote

    No lldp remote information.

     

    3. Get the PFC and ETS configuration using the mlnx_qos tool on the server.

     

    Note: At this point, PFC is disabled on all priorities. ETS is configured as default (All traffic goes to TC0, no rate limit, bandwidth split across all 8 TCs).

     

    # mlnx_qos -i ens785f1

    PFC configuration:

            priority    0   1   2   3   4   5   6   7

            enabled     0   0   0   0   0   0   0   0

     

    tc: 0 ratelimit: unlimited, tsa: ets, bw: 12%

             priority:  0

                     skprio: 0

                     skprio: 1

                     skprio: 2 (tos: 8)

                     skprio: 3

                     skprio: 4 (tos: 24)

                     skprio: 5

                     skprio: 6 (tos: 16)

                     skprio: 7

                     skprio: 8

                     skprio: 9

                     skprio: 10

                     skprio: 11

                     skprio: 12

                     skprio: 13

                     skprio: 14

                     skprio: 15

    tc: 1 ratelimit: unlimited, tsa: ets, bw: 12%

             priority:  1

    tc: 2 ratelimit: unlimited, tsa: ets, bw: 12%

             priority:  2

    tc: 3 ratelimit: unlimited, tsa: ets, bw: 12%

             priority:  3

    tc: 4 ratelimit: unlimited, tsa: ets, bw: 13%

             priority:  4

    tc: 5 ratelimit: unlimited, tsa: ets, bw: 13%

             priority:  5

    tc: 6 ratelimit: unlimited, tsa: ets, bw: 13%

             priority:  6

    tc: 7 ratelimit: unlimited, tsa: ets, bw: 13%

             priority:  7

     

    Firmware Configuration

    The steps below explain the firmware remote configuration. One of its advantages is that firmware can handle configuration regardless of the OS type or whether the lldpad process was invoked or not.

     

    1. Enable LLDP (RX and Tx) and DCBX on the firmware.

    This is an example to be applied on both ports. Please refer to MFT User Manual for the parameter description.

    # mlxconfig -d /dev/mst/mt4115_pciconf0 set LLDP_NB_DCBX_P1=TRUE LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2

     

    Device #1:

    ----------

     

    Device type:    ConnectX4

    PCI device:     /dev/mst/mt4115_pciconf0

     

     

    Configurations:                              Current         New

             LLDP_NB_RX_MODE_P1                  0               2

             LLDP_NB_TX_MODE_P1                  0               2

             LLDP_NB_DCBX_P1                     False(0)        True(1)

             LLDP_NB_RX_MODE_P2                  0               2

             LLDP_NB_TX_MODE_P2                  0               2

             LLDP_NB_DCBX_P2                     False(0)        True(1)

     

    Apply new Configuration? ? (y/n) [n] : y

    Applying... Done!

    -I- Please reboot machine to load new configurations.

     

    2. Verify that DCBX firmware parameters are true.

    In the example below, the DCBX is enabled in both CEE and IEEE modes (will adjust according to the switch version support) and is willing to accept switch configuration.

    # mlxconfig -d /dev/mst/mt4115_pciconf0 q

    ...

    DCBX_IEEE_P1                        True(1)

    DCBX_CEE_P1                         True(1)

    DCBX_WILLING_P1                     True(1)

    DCBX_IEEE_P2                        True(1)

    DCBX_CEE_P2                         True(1)

    DCBX_WILLING_P2                     True(1)

     

    3. Reset the firmware

    #  mlxfwreset -d /dev/mst/mt4115_pciconf0 --level 3 reset

     

    Requested reset level for device, /dev/mst/mt4115_pciconf0:

     

    3: Driver restart and PCI reset

    Continue with reset?[y/N] y

    -I- Stopping Driver                         -Done

    -I- Sending Reset Command To Fw             -Done

    -I- Resetting PCI                           -Done

    -I- Starting Driver                         -Done

    -I- Restarting MST                          -Done

    -I- FW was loaded successfully.

     

    Driver Configuration

    1. Get the driver configuration via ethtool and mlnx_qos tools.

    # ethtool --show-priv-flags ens785f1

    Private flags for eth1:

    rx_cqe_moder       : on

    rx_cqe_compress    : off

    sniffer            : off

    qos_with_dcbx_by_fw: off

     

    # mlnx_qos -i ens785f1 -d get

    DCBX mode: OS controlled

     

    2. Set the driver to allow DCBX to be handled by the firmware.

    # mlnx_qos -i ens785f1 -d fw

    DCBX mode: Firmware controlled

     

    # ethtool --set-priv-flags  ens785f1 qos_with_dcbx_by_fw on

     

    Verify:

    # ethtool --show-priv-flags ens785f1

    Private flags for eth1:

    rx_cqe_moder       : on

    rx_cqe_compress    : off

    sniffer            : off

    qos_with_dcbx_by_fw: on

     

     

    # mlnx_qos -i ens785f1 -d get

    DCBX mode: Firmware controlled

     

    Verification

     

    PFC Verification

    Verify that PFC is configured on the server. In this example, as PFC was enabled on the switch on priority 4, it can be seen that the adapter is also enabled on priority 4.

    # mlnx_qos -i ens785f1

    PFC configuration:

            priority    0   1   2   3   4   5   6   7

            enabled     0   0   0   0   1   0   0   0

     

    ...

     

    ETS Verification

    1. Change the ETS configuration on the switch and verify the effect on the adapter.

    For example:

    switch (config) # dcb ets tc bandwidth 50 10 10 10 10 10 0 0

    In this example: TC0 is configured 50%, TC1 ... TC5 each configured with 10%, while TC6,TC7 left with 0% ETS bandwidth.

     

    2. Verify that the adapter is configured accordingly.

    # mlnx_qos -i ens785f1

    PFC configuration:

            priority    0   1   2   3   4   5   6   7

            enabled     0   0   0   0   1   0   0   0

     

    tc: 0 ratelimit: unlimited, tsa: ets, bw: 50%

             priority:  0

                     skprio: 0

                     skprio: 1

                     skprio: 2 (tos: 8)

                     skprio: 3

                     skprio: 4 (tos: 24)

                     skprio: 5

                     skprio: 6 (tos: 16)

                     skprio: 7

                     skprio: 8

                     skprio: 9

                     skprio: 10

                     skprio: 11

                     skprio: 12

                     skprio: 13

                     skprio: 14

                     skprio: 15

    tc: 1 ratelimit: unlimited, tsa: ets, bw: 10%

             priority:  1

    tc: 2 ratelimit: unlimited, tsa: ets, bw: 10%

             priority:  2

    tc: 3 ratelimit: unlimited, tsa: ets, bw: 10%

             priority:  3

    tc: 4 ratelimit: unlimited, tsa: ets, bw: 10%

             priority:  4

    tc: 5 ratelimit: unlimited, tsa: ets, bw: 10%

             priority:  5

    tc: 6 ratelimit: unlimited, tsa: ets, bw: 0%

             priority:  6

    tc: 7 ratelimit: unlimited, tsa: ets, bw: 0%

             priority:  7

     

    RoCE Support

    The mapping between the application priority skprio and the L2 priority should not be done via LLDP DCBX. It must be configured on the servers.

    At this point, when running the tc_wrap script, it will be seen that all priorities are mapped to L2 priority 0.

    # tc_wrap.py -i ens785f1

    priority  0

            skprio: 0

            skprio: 1

            skprio: 2 (tos: 8)

            skprio: 3

            skprio: 4 (tos: 24)

            skprio: 5

            skprio: 6 (tos: 16)

            skprio: 7

            skprio: 8

            skprio: 9

            skprio: 10

            skprio: 11

            skprio: 12

            skprio: 13

            skprio: 14

            skprio: 15

    priority  1

    priority  2

    priority  3

    priority  4

    priority  5

    priority  6

    priority  7

     

    In order to map the skprio to the L2 priority (e.g. priority 4), map the relevant skprio (in this example we map all all skprio) to L2 priority 4 (which was enabled with PFC).

    tc_wrap.py -i ens785f1 -u 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4

     

    For more informaiton see: HowTo Run RoCE and TCP over L2 Enabled with PFC (2016).