1 Reply Latest reply on Oct 12, 2017 7:15 PM by sophie

    PFC with ConnectX-5

    rsmith

      I'm trying to get RoCE v1 working with ConnectX-5 100G Ethernet adapters.  I have ib_send_bw working with good bandwidth, but things seem to fall apart with OpenMPI jobs with multiple MPI tasks per node, most certainly because I don't have flow control working properly yet.  These adapters use the mlx5 drivers so it doesn't appear that the mlx4_en kernel module options are available (pfctx/pfcrx).

       

      I'm at a loss how to make progress.  If I try to configure things manually then mlx_qos and ethtool seem to wipe out the effect of the other:

       

      [me@mine]# mlnx_qos -i eth4 -f 1,1,1,1,1,1,1,1

      PFC configuration:

        priority    0   1   2   3   4   5   6   7

        enabled     1   1   1   1   1   1   1   1  

       

      tc: 0 ratelimit: unlimited, tsa: vendor

        priority:  1

      tc: 1 ratelimit: unlimited, tsa: vendor

        priority:  0

      tc: 2 ratelimit: unlimited, tsa: vendor

        priority:  2

      tc: 3 ratelimit: unlimited, tsa: vendor

        priority:  3

      tc: 4 ratelimit: unlimited, tsa: vendor

        priority:  4

      tc: 5 ratelimit: unlimited, tsa: vendor

        priority:  5

      tc: 6 ratelimit: unlimited, tsa: vendor

        priority:  6

      tc: 7 ratelimit: unlimited, tsa: vendor

        priority:  7

      [me@mine]# ethtool -A eth4 rx on

      [me@mine]# ethtool -A eth4 tx on

      [me@mine]# ethtool -a eth4

      Pause parameters for eth4:

      Autonegotiate: off

      RX: on

      TX: on

       

      [me@mine]# mlnx_qos -i eth4

      PFC configuration:

        priority    0   1   2   3   4   5   6   7

        enabled     0   0   0   0   0   0   0   0  

       

      tc: 0 ratelimit: unlimited, tsa: vendor

        priority:  1

      tc: 1 ratelimit: unlimited, tsa: vendor

        priority:  0

      tc: 2 ratelimit: unlimited, tsa: vendor

        priority:  2

      tc: 3 ratelimit: unlimited, tsa: vendor

        priority:  3

      tc: 4 ratelimit: unlimited, tsa: vendor

        priority:  4

      tc: 5 ratelimit: unlimited, tsa: vendor

        priority:  5

      tc: 6 ratelimit: unlimited, tsa: vendor

        priority:  6

      tc: 7 ratelimit: unlimited, tsa: vendor

        priority:  7

      [me@mine]# mlnx_qos -i eth4 -f 1,1,1,1,1,1,1,1

      PFC configuration:

        priority    0   1   2   3   4   5   6   7

        enabled     1   1   1   1   1   1   1   1  

       

      tc: 0 ratelimit: unlimited, tsa: vendor

        priority:  1

      tc: 1 ratelimit: unlimited, tsa: vendor

        priority:  0

      tc: 2 ratelimit: unlimited, tsa: vendor

        priority:  2

      tc: 3 ratelimit: unlimited, tsa: vendor

        priority:  3

      tc: 4 ratelimit: unlimited, tsa: vendor

        priority:  4

      tc: 5 ratelimit: unlimited, tsa: vendor

        priority:  5

      tc: 6 ratelimit: unlimited, tsa: vendor

        priority:  6

      tc: 7 ratelimit: unlimited, tsa: vendor

        priority:  7

      [me@mine]# ethtool -a eth4

      Pause parameters for eth4:

      Autonegotiate: off

      RX: off

      TX: off

        • Re: PFC with ConnectX-5
          sophie

          Hi Ricky,

           

           

          Global Pause is being turned on using the ethtool -A.

          PFC (Priority Flow Control) is configured with mlnx_qos on the host.

          You have to choose one or another, no both at the same time.

          What I would recommend first is to make sure the servers are being appropriately tuned (Basic start up) according to the Community Doc:

           

           

          Getting started with Performance Tuning of Mellanox adapters

          https://community.mellanox.com/docs/DOC-2490

           

           

          Then I would test again.

           

           

          Also, you can first test with GP (Global Pause) and compare as well with PFC.

           

           

          You can consult this document below to properly configure PFC on ConnectX (applicable to Connectx-5)

           

           

          https://community.mellanox.com/docs/DOC-2474

           

           

          Cheers,

          Sophie.