2 Replies Latest reply on Oct 6, 2017 12:19 PM by rsmith

    LAG problems

    rsmith

      My team is currently standing up a new cluster that has an SN2700 core ethernet switch on our boot network.  LAG links are working fine between this core and the leaf switches in the new cluster.  We also have an older cluster with an SX1036 ethernet switch serving as its core switch.  LAG links are also working fine between this older core switch and the older leaf switches in that cluster.  Several of us have tried to get LAG working between the SX1036 and SN2700 and we can't working link (single link works fine).  We've done typical troubleshooting looking for bad cables/ports etc.  We can find no differences comparing the configurations and status for working LAG links and the failing link.

       

      The SX1036 is a PPC switch and is running a much older firmware:

       

      Product name:      MLNX-OS

      Product release:   3.4.3002

      Build ID:          #1-dev

      Build date:        2015-07-30 20:13:15

      Target arch:       ppc

      Target hw:         m460ex

      Built by:          jenkins@fit74

      Version summary:   PPC_M460EX 3.4.3002 2015-07-30 20:13:15 ppc

       

      Product model:     ppc

       

      than the SN2700 (X86):

       

      Product name:      MLNX-OS

      Product release:   3.6.3200

      Build ID:          #1-dev

      Build date:        2017-03-09 17:55:58

      Target arch:       x86_64

      Target hw:         x86_64

      Built by:          jenkins@e3f42965d5ee

      Version summary:   X86_64 3.6.3200 2017-03-09 17:55:58 x86_64

       

      Product model:     x86onie

       

      The obvious thing to try is updating the firmware on the SX1036, but this cluster is in production and our team is nervous about messing with that core switch as it's pretty critical to our infrastructure.  Would a firmware mismatch cause this behavior.

       

      I have seen documentation indicating that MLAG doesn't work between PPC and X86 switches.  I sure hope that's not the case for LAG...

        • Re: LAG problems
          khwaja

          Hi Rick,

           

          LAG should work fine b/w SX1036 and the SN2700 switch. Only for MLAG we have the limitations of the cpu which should match for both the switches.

          Can you please verify your configs?

          Is this a regular LACP port channel b/w both the switches.

          What is the status of the second port which you are bundling in a LACP? Is it up/down/suspended?

          Please share me the details.

           

          Thanks

          Khwaja

            • Re: LAG problems
              rsmith

              We're not using LACP.  We actually got it working by changing the port mode to "hybrid" instead of "trunk".  All of our other LAG links work fine in trunk mode.  We figure there's a misconfiguration somewhere in our system causing this, but we have a bunch of switches running at this point.  The hybrid workaround has bumped this pretty low on the priority queue, particularly as any debugging would likely bring down a link critical to production work.  But I'm all ears if somebody has an idea why we have this issue.  Thanks.