10 Replies Latest reply on Mar 28, 2017 5:14 AM by nnikora@veradata.com

    Teaming (or bonding) ports on Connect-X 4 with MLAG.

    nnikora@veradata.com

      Hello, Mellanox Community.

       

      We have two Mellanox switches SN2100s with Cumulus Linux. On that switches we configured Multi-Chassis Link Aggregation - MLAG.

      The dual-connected devices (servers or switches) must use LACP (IEEE 802.3ad)

       

      Every machine in our network has two Mallanox Connect-X 4 NICs and connected to both switches.

      Network is redundant: it works fine even we turn off one switch.

       

      For example, when we configured teaming (2*25Gbps) on Windows Server 2012 R2, then we have real (tested with iperf) up to 50 Gbps outgoing speed, but on CentOS 7.3 we haven't DOUBLE outgoing speed...

       

      On CentOS we have only double inbound speed, but outbound speed is limited by speed of one link.

       

      We trying to use all of bond and team types, balancing algorithms, etc...

      What are we doing wrong?

        • Re: Teaming (or bonding) ports on Connect-X 4 with two switches connection for all machines.
          eddie.notz

          Hi Nikita,

           

          I assume you are using multiple thread with iperf,right?

           

          I think the issue is that the Linux bonding driver, default tx hash policy is L2 (Mac based) so iperf doesn't spread since the L4 (different tcp ports) are not being considered in the hash.

           

          how to verify the hash policy:

           

          # cat /proc/net/bonding/bond0      

          Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

           

           

          Bonding Mode: IEEE 802.3ad Dynamic link aggregation

          Transmit Hash Policy: layer2 (0)                               <-------------------------------------

          MII Status: down

          MII Polling Interval (ms): 100

          Up Delay (ms): 0

          Down Delay (ms): 0

           

           

          802.3ad info

          LACP rate: fast

          Min links: 0

          Aggregator selection policy (ad_select): stable

          bond bond0 has no active aggregator

           

           

          <snip>

           

           

          to change that :

           

          in the bond ifcfg file add  the xmit_hash_policy parameter

           

          BONDING_OPTS="mode=802.3ad xmit_hash_policy=layer3+4 <other options>"

           

          after changing:

           

          [root@l-csi-demo-03 network-scripts]# cat /proc/net/bonding/bond0

          Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

           

           

          Bonding Mode: IEEE 802.3ad Dynamic link aggregation

          Transmit Hash Policy: layer3+4 (1)

            • Re: Teaming (or bonding) ports on Connect-X 4 with MLAG.
              nnikora@veradata.com

              Hello, eddie.notz!

              Yes, we used iperf with a lot of threading parameter (-P 20 - for example).

              We used xmit_hash_policy parameters, but nothing changed. Linux machine has 2*50Gbps, but can to transfer only up to 50Gbps.

               

              Here are our configuration files:

               

              /etc/modprobe.d/bonding.conf

              alias bond0 bonding

              options bond0 miimon=80 mode=4 xmit_hash_policy=layer3+4 lacp_rate=1

               

              /etc/sysconfig/network-scripts/ifcfg-bond0

              DEVICE=bond0

              IPADDR=**********

              NETMASK=*************

              GATEWAY=************

              ONBOOT=yes

              BOOTPROTO=none

              USERCTL=no

              MTU=9216

              BONDING_OPTS="mode=802.3ad miimon=80 lacp_rate=1 xmit_hash_policy=layer3+4"

               

              /proc/net/bonding/bond0

              Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

               

              Bonding Mode: IEEE 802.3ad Dynamic link aggregation

              Transmit Hash Policy: layer3+4 (1)

              MII Status: up

              MII Polling Interval (ms): 80

              Up Delay (ms): 0

              Down Delay (ms): 0

               

              802.3ad info

              LACP rate: fast

              Min links: 0

              Aggregator selection policy (ad_select): stable

              System priority: 65535

              System MAC address: ***********

              Active Aggregator Info:

                      Aggregator ID: 15

                      Number of ports: 2

                      Actor Key: 1

                      Partner Key: 21

                      Partner Mac Address: 44:38:39:ff:01:01

               

              Slave Interface: ens4f0

              MII Status: up

              Speed: 50000 Mbps

              Duplex: full

              Link Failure Count: 0

              Permanent HW addr: ************

              Slave queue ID: 0

              Aggregator ID: 15

              Actor Churn State: none

              Partner Churn State: none

              Actor Churned Count: 0

              Partner Churned Count: 0

              details actor lacp pdu:

                  system priority: 65535

                  system mac address: ***************

                  port key: 1

                  port priority: 255

                  port number: 1

                  port state: 63

              details partner lacp pdu:

                  system priority: 65535

                  system mac address: 44:38:39:ff:01:01

                  oper key: 21

                  port priority: 255

                  port number: 1

                  port state: 63

               

              Slave Interface: ens4f1

              MII Status: up

              Speed: 50000 Mbps

              Duplex: full

              Link Failure Count: 0

              Permanent HW addr: *************

              Slave queue ID: 0

              Aggregator ID: 15

              Actor Churn State: none

              Partner Churn State: none

              Actor Churned Count: 0

              Partner Churned Count: 0

              details actor lacp pdu:

                  system priority: 65535

                  system mac address: *************

                  port key: 1

                  port priority: 255

                  port number: 2

                  port state: 63

              details partner lacp pdu:

                  system priority: 65535

                  system mac address: 44:38:39:ff:01:01

                  oper key: 21

                  port priority: 255

                  port number: 1

                  port state: 63

               

               

              Thank you for your response!

            • Re: Teaming (or bonding) ports on Connect-X 4 with MLAG.
              eddie.notz

              Hi Nikita,

               

              Are you all set with the answer?

              • Re: Teaming (or bonding) ports on Connect-X 4 with MLAG.
                eddie.notz

                Hi,

                 

                Can you please run the below on the cent-os server:

                 

                1. cat /proc/net/bonding/bond0

                2. lspci -d 15b3: -vvv

                1 of 1 people found this helpful
                  • Re: Teaming (or bonding) ports on Connect-X 4 with MLAG.
                    nnikora@veradata.com

                    Hi, Eddie.

                     

                    1. #cat /proc/net/bonding/bond0

                    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

                     

                    Bonding Mode: IEEE 802.3ad Dynamic link aggregation

                    Transmit Hash Policy: layer3+4 (1)

                    MII Status: up

                    MII Polling Interval (ms): 80

                    Up Delay (ms): 0

                    Down Delay (ms): 0

                     

                    802.3ad info

                    LACP rate: fast

                    Min links: 0

                    Aggregator selection policy (ad_select): stable

                    System priority: 65535

                    System MAC address: ***********

                    Active Aggregator Info:

                            Aggregator ID: 15

                            Number of ports: 2

                            Actor Key: 1

                            Partner Key: 21

                            Partner Mac Address: 44:38:39:ff:01:01

                     

                    Slave Interface: ens4f0

                    MII Status: up

                    Speed: 50000 Mbps

                    Duplex: full

                    Link Failure Count: 0

                    Permanent HW addr: ************

                    Slave queue ID: 0

                    Aggregator ID: 15

                    Actor Churn State: none

                    Partner Churn State: none

                    Actor Churned Count: 0

                    Partner Churned Count: 0

                    details actor lacp pdu:

                        system priority: 65535

                        system mac address: ***************

                        port key: 1

                        port priority: 255

                        port number: 1

                        port state: 63

                    details partner lacp pdu:

                        system priority: 65535

                        system mac address: 44:38:39:ff:01:01

                        oper key: 21

                        port priority: 255

                        port number: 1

                        port state: 63

                     

                    Slave Interface: ens4f1

                    MII Status: up

                    Speed: 50000 Mbps

                    Duplex: full

                    Link Failure Count: 0

                    Permanent HW addr: *************

                    Slave queue ID: 0

                    Aggregator ID: 15

                    Actor Churn State: none

                    Partner Churn State: none

                    Actor Churned Count: 0

                    Partner Churned Count: 0

                    details actor lacp pdu:

                        system priority: 65535

                        system mac address: *************

                        port key: 1

                        port priority: 255

                        port number: 2

                        port state: 63

                    details partner lacp pdu:

                        system priority: 65535

                        system mac address: 44:38:39:ff:01:01

                        oper key: 21

                        port priority: 255

                        port number: 1

                        port state: 63

                     

                     

                    2. # lspci -d 15b3: -vvv

                    81:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

                            Subsystem: Mellanox Technologies Device 0039

                            Physical Slot: 4

                            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+

                            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                            Latency: 0, Cache Line Size: 32 bytes

                            Interrupt: pin A routed to IRQ 33

                            NUMA node: 1

                            Region 0: Memory at f8000000 (64-bit, prefetchable) [size=32M]

                            Expansion ROM at fbe00000 [disabled] [size=1M]

                            Capabilities: [60] Express (v2) Endpoint, MSI 00

                                    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited

                                            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W

                                    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

                                            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-

                                            MaxPayload 256 bytes, MaxReadReq 512 bytes

                                    DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

                                    LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited

                                            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

                                    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

                                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                                    LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

                                    DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

                                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

                                    LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

                                             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

                                             Compliance De-emphasis: -6dB

                                    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

                                             EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

                            Capabilities: [48] Vital Product Data

                                    Product Name: CX414A - ConnectX-4 QSFP28

                                    Read-only fields:

                                            [PN] Part number: MCX414A-GCAT

                                            [EC] Engineering changes: A6

                                            [SN] Serial number: MT1650K08395

                                            [V0] Vendor specific: PCIeGen3 x8

                                            [RV] Reserved: checksum good, 0 byte(s) reserved

                                    End

                            Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-

                                    Vector table: BAR=0 offset=00002000

                                    PBA: BAR=0 offset=00003000

                            Capabilities: [c0] Vendor Specific Information: Len=18 <?>

                            Capabilities: [40] Power Management version 3

                                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)

                                    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

                            Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

                                    ARICap: MFVC- ACS-, Next Function: 1

                                    ARICtl: MFVC- ACS-, Function Group: 0

                            Capabilities: [110 v1] Advanced Error Reporting

                                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                                    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

                                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

                                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

                                    AERCap: First Error Pointer: 04, GenCap+ CGenEn- ChkCap+ ChkEn-

                            Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)

                                    IOVCap: Migration-, Interrupt Message Number: 000

                                    IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+

                                    IOVSta: Migration-

                                    Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00

                                    VF offset: 2, stride: 1, Device ID: 1014

                                    Supported Page Size: 000007ff, System Page Size: 00000001

                                    Region 0: Memory at 0000000000000000 (64-bit, prefetchable)

                                    VF Migration: offset: 00000000, BIR: 0

                            Capabilities: [1c0 v1] #19

                            Capabilities: [230 v1] Access Control Services

                                    ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                                    ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                            Kernel driver in use: mlx5_core

                            Kernel modules: mlx5_core

                     

                     

                    81:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

                            Subsystem: Mellanox Technologies Device 0039

                            Physical Slot: 4

                            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+

                            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                            Latency: 0, Cache Line Size: 32 bytes

                            Interrupt: pin B routed to IRQ 127

                            NUMA node: 1

                            Region 0: Memory at f6000000 (64-bit, prefetchable) [size=32M]

                            Expansion ROM at fbd00000 [disabled] [size=1M]

                            Capabilities: [60] Express (v2) Endpoint, MSI 00

                                    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited

                                            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W

                                    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

                                            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-

                                            MaxPayload 256 bytes, MaxReadReq 512 bytes

                                    DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

                                    LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited

                                            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

                                    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

                                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                                    LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

                                    DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

                                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

                                    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

                                             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

                            Capabilities: [48] Vital Product Data

                                    Product Name: CX414A - ConnectX-4 QSFP28

                                    Read-only fields:

                                            [PN] Part number: MCX414A-GCAT

                                            [EC] Engineering changes: A6

                                            [SN] Serial number: MT1650K08395

                                            [V0] Vendor specific: PCIeGen3 x8

                                            [RV] Reserved: checksum good, 0 byte(s) reserved

                                    End

                            Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-

                                    Vector table: BAR=0 offset=00002000

                                    PBA: BAR=0 offset=00003000

                            Capabilities: [c0] Vendor Specific Information: Len=18 <?>

                            Capabilities: [40] Power Management version 3

                                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)

                                    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

                            Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

                                    ARICap: MFVC- ACS-, Next Function: 0

                                    ARICtl: MFVC- ACS-, Function Group: 0

                            Capabilities: [110 v1] Advanced Error Reporting

                                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                                    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

                                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

                                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

                                    AERCap: First Error Pointer: 04, GenCap+ CGenEn- ChkCap+ ChkEn-

                            Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)

                                    IOVCap: Migration-, Interrupt Message Number: 000

                                    IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-

                                    IOVSta: Migration-

                                    Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00

                                    VF offset: 9, stride: 1, Device ID: 1014

                                    Supported Page Size: 000007ff, System Page Size: 00000001

                                    Region 0: Memory at 0000000000000000 (64-bit, prefetchable)

                                    VF Migration: offset: 00000000, BIR: 0

                            Capabilities: [230 v1] Access Control Services

                                    ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                                    ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                            Kernel driver in use: mlx5_core

                            Kernel modules: mlx5_core

                     

                     

                    Thanks.