2 Replies Latest reply on Nov 29, 2017 4:21 PM by rsmith

    PFC with ConnectX-5

    rsmith

      I'm trying to get RoCE v1 working with ConnectX-5 100G Ethernet adapters.  I have ib_send_bw working with good bandwidth, but things seem to fall apart with OpenMPI jobs with multiple MPI tasks per node, most certainly because I don't have flow control working properly yet.  These adapters use the mlx5 drivers so it doesn't appear that the mlx4_en kernel module options are available (pfctx/pfcrx).

       

      I'm at a loss how to make progress.  If I try to configure things manually then mlx_qos and ethtool seem to wipe out the effect of the other:

       

      [me@mine]# mlnx_qos -i eth4 -f 1,1,1,1,1,1,1,1

      PFC configuration:

        priority    0   1   2   3   4   5   6   7

        enabled     1   1   1   1   1   1   1   1  

       

      tc: 0 ratelimit: unlimited, tsa: vendor

        priority:  1

      tc: 1 ratelimit: unlimited, tsa: vendor

        priority:  0

      tc: 2 ratelimit: unlimited, tsa: vendor

        priority:  2

      tc: 3 ratelimit: unlimited, tsa: vendor

        priority:  3

      tc: 4 ratelimit: unlimited, tsa: vendor

        priority:  4

      tc: 5 ratelimit: unlimited, tsa: vendor

        priority:  5

      tc: 6 ratelimit: unlimited, tsa: vendor

        priority:  6

      tc: 7 ratelimit: unlimited, tsa: vendor

        priority:  7

      [me@mine]# ethtool -A eth4 rx on

      [me@mine]# ethtool -A eth4 tx on

      [me@mine]# ethtool -a eth4

      Pause parameters for eth4:

      Autonegotiate: off

      RX: on

      TX: on

       

      [me@mine]# mlnx_qos -i eth4

      PFC configuration:

        priority    0   1   2   3   4   5   6   7

        enabled     0   0   0   0   0   0   0   0  

       

      tc: 0 ratelimit: unlimited, tsa: vendor

        priority:  1

      tc: 1 ratelimit: unlimited, tsa: vendor

        priority:  0

      tc: 2 ratelimit: unlimited, tsa: vendor

        priority:  2

      tc: 3 ratelimit: unlimited, tsa: vendor

        priority:  3

      tc: 4 ratelimit: unlimited, tsa: vendor

        priority:  4

      tc: 5 ratelimit: unlimited, tsa: vendor

        priority:  5

      tc: 6 ratelimit: unlimited, tsa: vendor

        priority:  6

      tc: 7 ratelimit: unlimited, tsa: vendor

        priority:  7

      [me@mine]# mlnx_qos -i eth4 -f 1,1,1,1,1,1,1,1

      PFC configuration:

        priority    0   1   2   3   4   5   6   7

        enabled     1   1   1   1   1   1   1   1  

       

      tc: 0 ratelimit: unlimited, tsa: vendor

        priority:  1

      tc: 1 ratelimit: unlimited, tsa: vendor

        priority:  0

      tc: 2 ratelimit: unlimited, tsa: vendor

        priority:  2

      tc: 3 ratelimit: unlimited, tsa: vendor

        priority:  3

      tc: 4 ratelimit: unlimited, tsa: vendor

        priority:  4

      tc: 5 ratelimit: unlimited, tsa: vendor

        priority:  5

      tc: 6 ratelimit: unlimited, tsa: vendor

        priority:  6

      tc: 7 ratelimit: unlimited, tsa: vendor

        priority:  7

      [me@mine]# ethtool -a eth4

      Pause parameters for eth4:

      Autonegotiate: off

      RX: off

      TX: off

        • Re: PFC with ConnectX-5
          sophie

          Hi Ricky,

           

           

          Global Pause is being turned on using the ethtool -A.

          PFC (Priority Flow Control) is configured with mlnx_qos on the host.

          You have to choose one or another, no both at the same time.

          What I would recommend first is to make sure the servers are being appropriately tuned (Basic start up) according to the Community Doc:

           

           

          Getting started with Performance Tuning of Mellanox adapters

          https://community.mellanox.com/docs/DOC-2490

           

           

          Then I would test again.

           

           

          Also, you can first test with GP (Global Pause) and compare as well with PFC.

           

           

          You can consult this document below to properly configure PFC on ConnectX (applicable to Connectx-5)

           

           

          https://community.mellanox.com/docs/DOC-2474

           

           

          Cheers,

          Sophie.

            • Re: PFC with ConnectX-5
              rsmith

              Thanks Sophie,

               

              Yes, I've read the DOC-2474 and several things are confusing.

               

              First, I never see socket priorities getting mapped to user priorities or traffic classes as shown in the example outputs from mlx_qos and tc_wrap.py:

               

              # mlnx_qos -i eth2 -f 0,0,0,1,0,0,0,0

              Priority trust mode: pcp

              PFC configuration:

                priority    0   1   2   3   4   5   6   7

                enabled     0   0   0   1   0   0   0   0  

               

              tc: 0 ratelimit: unlimited, tsa: vendor

                priority:  0

              tc: 1 ratelimit: unlimited, tsa: vendor

                priority:  1

              tc: 2 ratelimit: unlimited, tsa: vendor

                priority:  2

              tc: 3 ratelimit: unlimited, tsa: vendor

                priority:  3

              tc: 4 ratelimit: unlimited, tsa: vendor

                priority:  4

              tc: 5 ratelimit: unlimited, tsa: vendor

                priority:  5

              tc: 6 ratelimit: unlimited, tsa: vendor

                priority:  6

              tc: 7 ratelimit: unlimited, tsa: vendor

                priority:  7

               

              # tc_wrap.py -i eth2

              Traffic classes are set to 8

              UP  0

              UP  1

              UP  2

              UP  3

              UP  4

              UP  5

              UP  6

              UP  7

               

              And the section "Set Egress Mapping on Kernel Bypass Traffic (RoCE)" indicates to use tc_wrap.py to set the RoCE mapping.  But doesn't tc just set mappings for the kernel packet scheduler?  RoCE bypasses the kernel, right?

               

              Regardless, if I try to use tc_warp I get the following error message and my confiiguration doeosn't appear to take effect:

               

              # tc_wrap.py -i eth2 -u 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4

              skprio2up is available only for RoCE in kernels that don't support set_egress_map

              Traffic classes are set to 8

              UP  0

              UP  1

              UP  2

              UP  3

              UP  4

                skprio: 0

                skprio: 1

                skprio: 2 (tos: 8)

                skprio: 3

                skprio: 4 (tos: 24)

                skprio: 5

                skprio: 6 (tos: 16)

                skprio: 7

                skprio: 8

                skprio: 9

                skprio: 10

                skprio: 11

                skprio: 12

                skprio: 13

                skprio: 14

                skprio: 15

              UP  5

              UP  6

              UP  7

              # tc_wrap.py -i eth2

              Traffic classes are set to 8

              UP  0

              UP  1

              UP  2

              UP  3

              UP  4

              UP  5

              UP  6

              UP  7