HowTo Configure VXLAN with MLAG using Cumulus Linux

Version 8

    This post describes the traffic flow, failovers and configurations of VXLAN within Multi-Chassis LAG (MLAG) deployment. The configuration, topology and behavior described in this document are based on Mellanox SN2000 switches with Cumulus Linux 3.2 and above software version.

     

    .

    References

     

     

    Overview

    MLAG running on Mellanox Spectrum-based switches allows a loop free, active-active, layer 2, network fabric that offers predictable low-latency switching, while achieving maximum throughput and linear scalability. MLAG technology simulates a single logical switch based on two physical switches, hence creating an active–active scenario for the traffic coming from south. There could be any type of server or any third-party switch connected on the south of the MLAG pairs. With MLAG configuration, all links connecting a server (or any switch) and the MLAG pair will forward traffic without any loop. For more info about MLAG, refer to MLAG chapter on the Ethernet Switch Solutions page.

    Figure 1

    Virtual Extensible LAN (VXLAN) is a network virtualization technology that creates layer 2 overlay networks on top of a layer 3 fabric. It uses a VXLAN header encapsulation technique to encapsulate MAC-based, OSI layer 2, Ethernet frames to form the overlay network. VXLAN endpoints that terminate VXLAN tunnels (both virtual and/or physical switch ports) are known as VXLAN Tunnel End Points (VTEPs). The protocol is typically deployed as a data center technology and is used for mainly for the following reasons:

    • Providing layer 2 connectivity between the racks which are connected by layer 3 fabric (leaf spine architecture).
    • Connecting geographically disperse data centers as a Data Center Interconnect (DCI) technology.
    • Multi tenancy in a shared cloud environment by BYoIP (Bring Your own IP).
    • Scaling the layer 2 segments from existing 4096 VLANs to around 16M VXLANs.
    • VM mobility (live migration).

     

    Why combine VXLAN with MLAG?

    With VXLAN deployment, the Mellanox switch acts as the hardware VTEP for the overlay networks. Since it is the tunnel endpoint, there is a requirement for the VTEP to provide a resilient L2 active-active topology similar to an MLAG deployment in a traditional VLAN-based layer 2 network. This is achieved by combining MLAG technology with the hardware VTEP functionality of the switch.

    The figure below, shows a layer 3 leaf and spine topology deployment for a data center. Leaf and spine topology is used nowadays to simplify scale while delivering consistent throughput and latency for east-to-west traffic. In this example OSPF is deployed between the leaf and spine but any dynamic routing protocol (such as BGP) could be used here. To provide traffic load-balancing across two spine switches, Equal Cost Multi-Pathing (ECMP) is configured. This topology includes two servers with Host–1 single homed to Leaf–1, and Host–2 dual homed to Leaf–2 and Leaf–2A. Leaf–1 is not running any MLAG, Leaf–2 and Leaf–2A are running MLAG and act as an MLAG pair.

    Figure 2

    For the Layer 2 connectivity between Host–1 and Host–2, VXLAN is used as an overlay technology. For the Mellanox switch it means that leaves have to be configured as VTEPs and spines have to be configured as service nodes. Spines will be configured as service node as we are using LNV (Lightweight Network Virtualization) in this example, which is essentially a controller-less VXLAN solution. On Leaf–1 the server is single homed, so there a configured loopback can be used as VTEP, but on Leaf–2 and Leaf–2A we will need to configure a logical VTEP, which will be configured on both leaves (Leaf–2 and Leaf–2A).

    Note – For VXLAN with LNV refer to HowTo Configure LNV VXLAN on Cumulus Linux.

    Figure 3

    On Leaf–2 and Leaf–2A, the logical VTEP in combination with MLAG provides an active-active VXLAN functionality.

    When a packet arrives from Host–2, it can be received by Leaf–2 or Leaf–2A because both switches are MLAG peers. They will sync the MACs and because both of the switches are part of same logical VTEP, they both can encapsulate the packet and send it to the destined VTEP.

    Similarly, when Leaf–2 or Leaf2A gets the encapsulated packet from any other VTEP, both the switches can decapsulate the packets and then, as the MACs are synced between peers, the leaf can send the packet to Host–2.

    The figure above illustrates the microscopic view at the MLAG peer with VXLAN. The example shows the traffic flow from Host–2 to Host–1 using VXLAN. In the configuration, Host–1 and Host–2 are both members of VLAN 10, and VLAN 10 is mapped to VNI 2000. The VTEP for Host2 is 7.7.7.7 and VTEP for Host–1 is 1.1.1.1.  

    Figure 4

    VXLAN with MLAG Configuration

    The figure below shows the example topology with port numbers and subnets used to make this topology work.

    Figure 5

    Spine–1

    net add hostname Spine-1

    net add port 1-16 speed 100G

    net add lnv service-node anycast-ip 11.11.11.11

    net add lnv service-node source 2.2.2.2

    net add lnv service-node peers 2.2.2.2 5.5.5.5

    net add loopback lo address 2.2.2.2/32

    net add loopback lo address 11.11.11.11/32

    net add interface eth0 address dhcp

    net add interface swp1 address 12.12.12.2/24

    net add interface swp3 address 23.23.23.2/24

    net add interface swp12 address 24.24.24.2/24

    net add interface swp12 link-speed 40000

    net add interface swp1,3,12 ip ospf area 0.0.0.0

    net add router ospf

    net add ospf router-id 2.2.2.2

     

    Spine–2

    net add hostname Spine-2

    net add port 5 speed 4x10G

    net add port 1-4,7-32 speed 100G

    net add port 6 speed disabled

    net add lnv service-node anycast-ip 11.11.11.11

    net add lnv service-node source 5.5.5.5

    net add lnv service-node peers 2.2.2.2 5.5.5.5

    net add loopback lo address 5.5.5.5/32

    net add loopback lo address 11.11.11.11/32

    net add interface eth0 address dhcp

    net add interface swp2 address 27.27.27.1/24

    net add interface swp9 address 25.25.25.1/24

    net add interface swp24 address 28.28.28.1/24

    net add interface swp2,9,24 ip ospf area 0.0.0.0

    net add router ospf

    net add ospf router-id 5.5.5.5

     

    Leaf–1

    net add hostname Leaf-1

    net add port 1-16 speed 100G

    net add interface swp6

    net add loopback lo address 1.1.1.1/32

    net add loopback lo vxrd-src-ip 1.1.1.1

    net add loopback lo vxrd-svcnode-ip 11.11.11.11

    net add interface eth0 address dhcp

    net add interface swp1 address 12.12.12.1/24

    net add interface swp5 address 25.25.25.10/24

    net add interface swp16 bridge-access 10

    net add interface swp16 link-speed 40000

    net add bridge bridge-ports 'swp16 vni2000'

    net add bridge bridge-stp yes

    net add bridge bridge-vlan-aware yes

    net add bridge bridge-vids 10

    net add vlan-interface vlan10 address 10.10.10.1/24

    net add vlan-interface vlan10 vlan-id 10

    net add vlan-interface vlan10 vlan-raw-device bridge

    net add vni vni2000 bridge-access 10

    net add vni vni2000 mstpctl-bpduguard yes

    net add vni vni2000 mstpctl-portbpdufilter yes

    net add vni vni2000 VXLAN-id 2000

    net add vni vni2000 VXLAN-local-tunnelip 1.1.1.1

    net add interface swp1,5 ip ospf area 0.0.0.0

    net add router ospf

    net add ospf router-id 1.1.1.1

     

    Leaf–2

     

    net add hostname sw3

    net add port 1-16 speed 100G

    net add interface swp10,16

    net add interface swp14-15 link-speed 40000

    net add vlan-interface vlan10,20 vlan-raw-device bridge

    net add bond bond11 bond-slaves swp16

    net add bond peerlink bond-slaves 'swp14 swp15'

    net add loopback lo address 3.3.3.3/24

    net add loopback lo clagd-VXLAN-anycast-ip 7.7.7.7

    net add loopback lo vxrd-src-ip 3.3.3.3

    net add loopback lo vxrd-svcnode-ip 11.11.11.11

    net add interface eth0 address dhcp

    net add interface swp2 address 27.27.27.10/24

    net add interface swp3 address 23.23.23.3/24

    net add interface swp9 bridge-access 10

    net add bond bond11 bridge-access 10

    net add bond bond11 clag-id 1

    net add bridge bridge-ports 'bond11 peerlink swp9 vni2000'

    net add bridge bridge-vids 10,20

    net add bridge bridge-vlan-aware yes

    net add interface peerlink.4094 address 169.254.1.1/30

    net add interface peerlink.4094 clagd-backup-ip 10.20.4.51

    net add interface peerlink.4094 clagd-peer-ip 169.254.1.2

    net add interface peerlink.4094 clagd-sys-mac 44:38:39:FF:01:01

    net add vlan-interface vlan10 vlan-id 10

    net add vlan-interface vlan20 address 20.20.20.3/24

    net add vlan-interface vlan20 vlan-id 20

    net add vni vni2000 bridge-access 10

    net add vni vni2000 VXLAN-id 2000

    net add vni vni2000 VXLAN-local-tunnelip 3.3.3.3

    net add interface swp2-3 ip ospf area 0.0.0.0

    net add router ospf

    net add ospf router-id 3.3.3.3

     

     

    Leaf–2A

    net add hostname Leaf-2A

    net add port 1-32 speed 100G

    net add interface swp12,14-15 link-speed 40000

    net add bond bond11 bond-slaves swp16

    net add bond peerlink bond-slaves 'swp14 swp15'

    net add interface swp16

    net add interface spw12

    net add interface vlan10

    net add loopback lo address 8.8.8.8/32

    net add loopback lo clagd-VXLAN-anycast-ip 7.7.7.7

    net add loopback lo vxrd-src-ip 8.8.8.8

    net add loopback lo vxrd-svcnode-ip 11.11.11.11

    net add interface eth0 address dhcp

    net add interface swp12 address 24.24.24.24/24

    net add interface swp24 address 28.28.28.10/24

    net add bond bond11 bridge-access 10

    net add bond bond11 clag-id 1

    net add bridge bridge-ports 'bond11 peerlink vni2000'

    net add bridge bridge-vids 10

    net add bridge bridge-vlan-aware yes

    net add interface peerlink.4094 address 169.254.1.2/30

    net add interface peerlink.4094 clagd-backup-ip 10.20.4.49

    net add interface peerlink.4094 clagd-peer-ip 169.254.1.1

    net add interface peerlink.4094 clagd-sys-mac 44:38:39:FF:01:01

    net add vni vni2000 bridge-access 10

    net add vni vni2000 mstpctl-bpduguard yes

    net add vni vni2000 mstpctl-portbpdufilter yes

    net add vni vni2000 VXLAN-id 2000

    net add vni vni2000 VXLAN-local-tunnelip 8.8.8.8

    net add interface swp12,24 ip ospf area 0.0.0.0

    net add router ospf

    net add ospf router-id 8.8.8.8

     

    With the logical VTEP’s loopback address advertised by OSPF, due to ECMP being enabled, Leaf–2 and Leaf–2A will learn a path to the remote VTEP 1.1.1.1 via two spines and vice versa Leaf–1 will learn the path to logical VTEP 7.7.7.7 via two spines. When VXLAN encapsulated and routed between the two VTEPs, it will be load-balanced on a per flow basis between the two spine switches.

    Following is the routing tables for each switch in the topology, which shows the ECMP in effect:

    Spine1 routes:

    cumulus@spine-1:~$ net show route

     

    show ip route

    =============

    Codes: K - kernel route, C - connected, S - static, R - RIP,

           O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,

           V - VPN,

           > - selected route, * - FIB route

     

    K>* 0.0.0.0/0 via 10.20.0.251, eth0

    O>* 1.1.1.1/32 [110/10] via 12.12.12.1, swp1, 00:13:18

    O   2.2.2.2/32 [110/0] is directly connected, lo, 00:14:54

    C>* 2.2.2.2/32 is directly connected, lo

    O>* 3.3.3.3/32 [110/10] via 23.23.23.3, swp3, 00:13:19

    O>* 5.5.5.5/32 [110/20] via 12.12.12.1, swp1, 00:11:47

      *                     via 23.23.23.3, swp3, 00:11:47

      *                     via 24.24.24.24, swp12, 00:11:47

    O>* 7.7.7.7/32 [110/10] via 23.23.23.3, swp3, 00:12:24

      *                     via 24.24.24.24, swp12, 00:12:24

    O>* 8.8.8.8/32 [110/10] via 24.24.24.24, swp12, 00:12:24

    C>* 10.20.0.0/16 is directly connected, eth0

    O   11.11.11.11/32 [110/0] is directly connected, lo, 00:07:51

    C>* 11.11.11.11/32 is directly connected, lo

    O   12.12.12.0/24 [110/10] is directly connected, swp1, 00:14:13

    C>* 12.12.12.0/24 is directly connected, swp1

    O   23.23.23.0/24 [110/10] is directly connected, swp3, 00:13:29

    C>* 23.23.23.0/24 is directly connected, swp3

    O   24.24.24.0/24 [110/10] is directly connected, swp12, 00:12:34

    C>* 24.24.24.0/24 is directly connected, swp12

    O>* 25.25.25.0/24 [110/20] via 12.12.12.1, swp1, 00:12:37

    O>* 27.27.27.0/24 [110/20] via 23.23.23.3, swp3, 00:12:40

    O>* 28.28.28.0/24 [110/20] via 24.24.24.24, swp12, 00:12:24

     

    Spine2 routes:

    cumulus@spine-2:~$ net show route

     

    show ip route

    =============

    Codes: K - kernel route, C - connected, S - static, R - RIP,

           O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,

           V - VPN,

           > - selected route, * - FIB route

     

    K>* 0.0.0.0/0 via 10.20.0.251, eth0

    O>* 1.1.1.1/32 [110/10] via 25.25.25.10, swp9, 00:13:13

    O>* 2.2.2.2/32 [110/20] via 27.27.27.10, swp2, 00:13:13

      *                     via 28.28.28.10, swp24, 00:13:13

      *                     via 25.25.25.10, swp9, 00:13:13

    O>* 3.3.3.3/32 [110/10] via 27.27.27.10, swp2, 00:13:14

    O   5.5.5.5/32 [110/0] is directly connected, lo, 00:14:05

    C>* 5.5.5.5/32 is directly connected, lo

    O>* 7.7.7.7/32 [110/10] via 27.27.27.10, swp2, 00:13:13

      *                     via 28.28.28.10, swp24, 00:13:13

    O>* 8.8.8.8/32 [110/10] via 28.28.28.10, swp24, 00:13:13

    C>* 10.20.0.0/16 is directly connected, eth0

    O   11.11.11.11/32 [110/0] is directly connected, lo, 00:09:14

    C>* 11.11.11.11/32 is directly connected, lo

    O>* 12.12.12.0/24 [110/20] via 25.25.25.10, swp9, 00:13:13

    O>* 23.23.23.0/24 [110/20] via 27.27.27.10, swp2, 00:13:14

    O>* 24.24.24.0/24 [110/20] via 28.28.28.10, swp24, 00:13:13

    O   25.25.25.0/24 [110/10] is directly connected, swp9, 00:13:57

    C>* 25.25.25.0/24 is directly connected, swp9

    O   27.27.27.0/24 [110/10] is directly connected, swp2, 00:14:00

    C>* 27.27.27.0/24 is directly connected, swp2

    O   28.28.28.0/24 [110/10] is directly connected, swp24, 00:13:13

    C>* 28.28.28.0/24 is directly connected, swp24

     

    Leaf2 routes:

    cumulus@Leaf-2:~$ net show route

     

    show ip route

    =============

    Codes: K - kernel route, C - connected, S - static, R - RIP,

           O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,

           V - VPN,

           > - selected route, * - FIB route

     

    K>* 0.0.0.0/0 via 10.20.0.251, eth0

    O>* 1.1.1.1/32 [110/20] via 23.23.23.2, swp3, 00:13:59

      *                     via 27.27.27.1, swp2, 00:13:59

    O>* 2.2.2.2/32 [110/10] via 23.23.23.2, swp3, 00:15:31

    C>* 3.3.3.0/24 is directly connected, lo

    O>* 3.3.3.3/32 [110/0] is directly connected, lo, 00:16:28

    O>* 5.5.5.5/32 [110/10] via 27.27.27.1, swp2, 00:14:10

    O   7.7.7.7/32 [110/0] is directly connected, lo, 00:15:20

    C>* 7.7.7.7/32 is directly connected, lo

    O>* 8.8.8.8/32 [110/20] via 23.23.23.2, swp3, 00:14:05

      *                     via 27.27.27.1, swp2, 00:14:05

    C>* 10.20.0.0/16 is directly connected, eth0

    O>* 11.11.11.11/32 [110/10] via 23.23.23.2, swp3, 00:10:03

      *                         via 27.27.27.1, swp2, 00:10:03

    O>* 12.12.12.0/24 [110/20] via 23.23.23.2, swp3, 00:15:31

    C>* 20.20.20.0/24 is directly connected, vlan20

    O   23.23.23.0/24 [110/10] is directly connected, swp3, 00:16:21

    C>* 23.23.23.0/24 is directly connected, swp3

    O>* 24.24.24.0/24 [110/20] via 23.23.23.2, swp3, 00:15:27

    O>* 25.25.25.0/24 [110/20] via 27.27.27.1, swp2, 00:14:10

    O   27.27.27.0/24 [110/10] is directly connected, swp2, 00:14:12

    C>* 27.27.27.0/24 is directly connected, swp2

    O>* 28.28.28.0/24 [110/20] via 27.27.27.1, swp2, 00:14:05

    C>* 169.254.1.0/30 is directly connected, peerlink.4094

     

    Leaf2A routes:

    cumulus@Leaf-2A:~$ net show route

     

    show ip route

    =============

    Codes: K - kernel route, C - connected, S - static, R - RIP,

           O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,

           V - VPN,

           > - selected route, * - FIB route

     

    K>* 0.0.0.0/0 via 10.20.0.251, eth0

    O>* 1.1.1.1/32 [110/20] via 24.24.24.2, swp12, 00:15:58

      *                     via 28.28.28.1, swp24, 00:15:58

    O>* 2.2.2.2/32 [110/10] via 24.24.24.2, swp12, 00:16:36

    O>* 3.3.3.3/32 [110/20] via 24.24.24.2, swp12, 00:16:04

      *                     via 28.28.28.1, swp24, 00:16:04

    O>* 5.5.5.5/32 [110/10] via 28.28.28.1, swp24, 00:16:04

    O   7.7.7.7/32 [110/0] is directly connected, lo, 00:17:18

    C>* 7.7.7.7/32 is directly connected, lo

    O   8.8.8.8/32 [110/0] is directly connected, lo, 00:17:31

    C>* 8.8.8.8/32 is directly connected, lo

    C>* 10.20.0.0/16 is directly connected, eth0

    O>* 11.11.11.11/32 [110/10] via 24.24.24.2, swp12, 00:12:02

      *                         via 28.28.28.1, swp24, 00:12:02

    O>* 12.12.12.0/24 [110/20] via 24.24.24.2, swp12, 00:16:36

    O>* 23.23.23.0/24 [110/20] via 24.24.24.2, swp12, 00:16:36

    O   24.24.24.0/24 [110/10] is directly connected, swp12, 00:17:25

    C>* 24.24.24.0/24 is directly connected, swp12

    O>* 25.25.25.0/24 [110/20] via 28.28.28.1, swp24, 00:16:04

    O>* 27.27.27.0/24 [110/20] via 28.28.28.1, swp24, 00:16:04

    O   28.28.28.0/24 [110/10] is directly connected, swp24, 00:16:49

    C>* 28.28.28.0/24 is directly connected, swp24

    C>* 169.254.1.0/30 is directly connected, peerlink.4094

     

    Traffic Forwarding Behavior

    The following figure illustrates the forwarding behavior of the traffic from the logical VTEP to a remote VTEP for the ARP request purpose. The example shows the traffic flow from Host–2 to Host–1 using VXLAN.  In the configuration, Host–1 and Host–2 are both members of VLAN 10, and VLAN 10 is mapped to VNI 2000. Only Host–2 is dual homed. Host–1 is single homed.

    Figure 6

    The figure above shows the steps on how Host–2 discovers Host–1, where both servers reside on same L2 subnet 10.10.10.0/24, but physically located in different racks:

    1. Due to load-balancing algorithm of Host–2 port-channel, the initial ARP packet for Host–1 is received by Leaf-2. As a broadcast frame, the ARP packet would be flooded to any local ports on Leaf-1 which are a member of VLAN 10.

     

    2. As standard MLAG behavior, the frame is also flooded across the MLAG peer link for reception by any single-homed devices on Leaf2A. The MAC address of Host–2 is synchronized with Leaf2A across the peer link, allowing Leaf2A to learn the MAC of Host–2 on its local port member of MLAG port-channel-11.

     

    3. Since VLAN 10 is mapped to VNI 2000, the ARP frame is also flooded to all VTEPs which are a member of the VNI (defined by the configured flood list of the VTEP), which in this example would be 1.1.1.1 the VTEP of Host1. The ARP frame is VXLAN encapsulated by Leaf1 with a source IP address equal to logical VTEP (7.7.7.7) and a destination IP address of the VTEP on Host–1 (1.1.1.1).

     

    4. Leaf1 switch has two potential paths via each of the spine switches to the VTEP 1.1.1.1. For a specific flow, the ECMP hashing algorithm will pick one of the two paths. In this case let us presume the frame arrives at Spine1.

     

    5. The Spine1 switch subsequently routes the VXLAN frame to Leaf1.

     

    6. Receiving the frame, Leaf1 will remove the VXLAN header and in the process learn the MAC address of Host–2 as residing behind the logical VTEP of Host–2 (i.e. 7.7.7.7).

     

    7. This remote MAC entry would have been added in the peer MLAG (if Host–1 rack would have been a MLAG pair).

    Figure 7

    Figure 7 shows the steps on how the Host–1 response will be sent to Host–2:

    1. Receiving the ARP packet, Host–1 responds to the ARP request with a unicast ARP response frame direct to the MAC of Host–1.

     

    2. Due to the initial ARP request, Leaf1 has learned the MAC of Host–2 as residing behind the logical VTEP (7.7.7.7). If Leaf-1 would have been paired with a MLAG peer, the MAC would have synced to peer switch. Hence regardless of which link of the port-channel, Host1 would have hashed the ARP response, and either MLAG switch will have a MAC entry for Host–2.

     

    3. The Leaf1 switch receiving the ARP response from Host1 will VXLAN encapsulate the frame with destination IP address of the logical VTEP of Host–2 (7.7.7.7). Due the ECMP hashing in this example, the frame is routed to the Spine1 switch.

     

    4. The Spine1 switch, subsequently routes the VXLAN frame to Leaf-2. Note that the Spine1 switch has two paths to 7.7.7.7 (Leaf2 and Leaf2A), the path chosen is based on the result of the ECMP hashing algorithm for the flow on Spine1.

     

    5. Receiving the VXLAN encapsulated frame, Leaf2 learns the MAC of Host–1 behind the VTEP of Host–1 (1.1.1.1), this information is synchronized across the MLAG peer link to Leaf-2A.

     

    6. The VXLAN frame is then de-encapsulated and forwarded down local link of port-channel-11 where the MAC of Host–2 has been learned.

     

    The following output shows the learned MACs on leaves after the operations shown in the figures above:

    Leaf1

    cumulus@Leaf-1:~$ net show bridge macs

     

    VLAN      Master    Interface MAC                TunnelDest    State Flags    LastSeen

    -------- --------  -----------  -----------------  ------------ ---------  -------  ----------------

    10        bridge    bridge 26:fd:18:86:19:a8             3 days, 10:11:35

    10        bridge    swp16 e4:1d:2d:37:48:88                                    00:00:01

    10        bridge    vni2000 7c:fe:90:f2:34:c1                                    00:00:01

    untagged vni2000 00:00:00:00:00:00  7.7.7.7       permanent  self 3 days, 10:10:32

    untagged vni2000 7c:fe:90:f2:34:c1  7.7.7.7                  self     00:00:01

    untagged  bridge    swp16 7c:fe:90:fc:7a:f8 permanent           3 days, 10:11:36

    untagged  bridge    vni2000 26:fd:18:86:19:a8 permanent           3 days, 10:11:36

    cumulus@sw1:~$

     

    Leaf2

    cumulus@Leaf-2:~$ net show bridge macs

     

    VLAN      Master    Interface MAC                TunnelDest    State Flags    LastSeen

    -------- --------  -----------  -----------------  ------------ ---------  -------  ---------------

    10        bridge    bond11 7c:fe:90:f2:34:c1                                    00:00:09

    10        bridge    bridge 2a:e2:59:64:f4:25 permanent           1 day, 22:44:30

    10        bridge    vni2000 e4:1d:2d:37:48:88                                    00:00:09

    20        bridge    bridge 2a:e2:59:64:f4:25 permanent           1 day, 22:44:30

    untagged vni2000      00:00:00:00:00:00  1.1.1.1 permanent  self     1 day, 22:43:27

    untagged vni2000 e4:1d:2d:37:48:88  1.1.1.1                  self     02:27:58

    untagged  bridge    bond11 7c:fe:90:fc:7b:f8 permanent           1 day, 22:44:30

    untagged  bridge    peerlink 7c:fe:90:fc:7b:f0 permanent           1 day, 22:44:30

    untagged  bridge    swp9 7c:fe:90:fc:7b:e4 permanent           1 day, 21:30:42

    untagged  bridge    swp10 7c:fe:90:fc:7b:e0                permanent           1 day, 21:30:42

    untagged  bridge    vni2000 2a:e2:59:64:f4:25 permanent           1 day, 22:44:30

    cumulus@Leaf-2:~$

     

    Traffic Failover Behavior

    Traffic forwarding during a failure follows standard MLAG behavior. As per Figure 7, if a link of the Host–2 port-channel fails (swp16 of Leaf2 in the example below), traffic will be forwarded across one of the remaining active links of the port-channel (swp16 of Leaf2A in the example below). When Leaf2A receives the frame from Host–2, with the destination being Host1, Leaf2A will VXLAN encapsulate the frame and route it over the IP fabric via its own local routing. If the returning traffic is received on Leaf2 due to the ECMP hash of the spine, Leaf2 will de-encapsulate the frame and based on its local MAC table, switch the frame across the peer link for forwarding to Host–2 via Leaf2A’s local swp16 interface.

    Figure 8

    The first VTEP switch receiving the frame will always perform the VXLAN encapsulation or decapsulation. Any subsequent processing will be based on the switch's local MAC table (decapsulated and switch) or routing table (encapsulate and route).

    As shown in the figure above, in the event that a leaf switch loses its uplinks to the spine, it is recommended that a routing protocol is enabled across the peer link to ensure that traffic can still be routed to the spine layer. This concept is illustrated below where switch Leaf2 has both of its two links to the spine inactive, traffic received on swp16 of Leaf2 destined for the VTEP of Host1 will be VXLAN encapsulated with a destination IP address of 1.1.1.1. As the local links to the spine are inactive, it will learn a route to 1.1.1.1  via the routing protocol running across the peer link, forcing the traffic to be routed across the MLAG peer link and forwarded across the spine based on the routing table of Leaf2A

    Figure 9