0 Replies Latest reply on Sep 12, 2017 2:18 AM by rsmith

    LAG problems


      My team is currently standing up a new cluster that has an SN2700 core ethernet switch on our boot network.  LAG links are working fine between this core and the leaf switches in the new cluster.  We also have an older cluster with an SX1036 ethernet switch serving as its core switch.  LAG links are also working fine between this older core switch and the older leaf switches in that cluster.  Several of us have tried to get LAG working between the SX1036 and SN2700 and we can't working link (single link works fine).  We've done typical troubleshooting looking for bad cables/ports etc.  We can find no differences comparing the configurations and status for working LAG links and the failing link.


      The SX1036 is a PPC switch and is running a much older firmware:


      Product name:      MLNX-OS

      Product release:   3.4.3002

      Build ID:          #1-dev

      Build date:        2015-07-30 20:13:15

      Target arch:       ppc

      Target hw:         m460ex

      Built by:          jenkins@fit74

      Version summary:   PPC_M460EX 3.4.3002 2015-07-30 20:13:15 ppc


      Product model:     ppc


      than the SN2700 (X86):


      Product name:      MLNX-OS

      Product release:   3.6.3200

      Build ID:          #1-dev

      Build date:        2017-03-09 17:55:58

      Target arch:       x86_64

      Target hw:         x86_64

      Built by:          jenkins@e3f42965d5ee

      Version summary:   X86_64 3.6.3200 2017-03-09 17:55:58 x86_64


      Product model:     x86onie


      The obvious thing to try is updating the firmware on the SX1036, but this cluster is in production and our team is nervous about messing with that core switch as it's pretty critical to our infrastructure.  Would a firmware mismatch cause this behavior.


      I have seen documentation indicating that MLAG doesn't work between PPC and X86 switches.  I sure hope that's not the case for LAG...