HowTo Configure MLAG on Mellanox Switches

Version 55

    This post is describes how to configure MLAG (Multi-chassis LAG) in MLNX-OS on Mellanox switch systems.

     

    References

     

    Setup

     

    14.png

    Terminology

    • IPL (Inter Peer Link): This is the link between the two switches. The IPL link is required, and used for control and may be used for traffic in case of port failures. This link serves the most important role of transmitting keep alives between switches such that each switch knows that the other switch is still present. In addition, all mac-sync messages, IGMP groups sync and other DB sync messages are sent across this link. Hence it is critical to enable flow control on this link. Even if there is heavy congestion on this link, the control traffic will still get through.
    • MLAG Cluster : To actively participate in an MLAG, two switches must belong to the same MLAG Domain. The management port serves as a fallback mechanism in case of IPL link loss or failure.
    • V-SID (Virtual System-ID):  is essentially a MAC address that is used in the LACP PDU frames. The PDU has a field for a system-identifier, which essentially tells another switch, the ID of the remote switch it is talking to. In case of MLAG, two switches need to use the same SID in order for the remote partner to think it is talking to one switch. This is a challenge because SID needs to be unique and is usually a system mac that can only be owned by one system. Two switches can use the SID belonging to one switch but if that switch is rebooted or is unavailable, then the other remaining switch needs to pick a new mac address (its own SID) which causes LACP link to be renegotiated. Thus the remaining active link flaps, defeating the purpose of MLAG.VSID is a made up mac address that both switches calculate based on some configuration parameters. This is not controlled by the user in our implementation.The advantage of VSID is that each system can claim this VSID and if one switch goes away, the other switch will continue to use this VSID.
    • MLAG Interface: This is the LAG interface that is split between the two switches. In this example there are two MLAG interfaces, one for each host.

     

    To run MLAG you need MLNX-OS version 3.3.5000 (or later). However, it is recommended to upgrade to the most recent version of the switch.

    Note: It is recommended to have two similar switches. It is not possible to mix  CPU architecture, for example SX1036(PPC) or SX1710(X86) is not possible to mix.

     

    Switch Configuration

     

    Note: Before you start, make sure that both switches have the same software version, run "show version" to verify. In addition, it is advised to upgrade both switches to the same latest MLNX-OS software release.

     

    General

    • Enable LACP. This is required for the IPL.
    • Disable spanning tree (STP). When using MLAG on the switch, there are no loopbacks. Either disable STP globally on the switch or if you need to use spanning tree on other interfaces in the switch, at least disable it on the MLAG interfaces. In most cases, STP is not needed when MLAG is configured.
    • Enable IP routing.
    • Enable MLAG protocol.
    • Enable QoS globally

     

     

    Run the following commands on both switches:

     

    sx01 (config) # lacp

    sx01 (config) # no spanning-tree

    sx01 (config) # ip routing

    sx01 (config) # protocol mlag

    sx01 (config) # dcb priority-flow-control enable force

                  

     

     

    IPL

    IPL is configured over LAG (port-channel). For high availability, it is recommended to have more than one physical link within this LAG.

     

    Properties:

    • The IPL MTU default configuration is Jumbo Frames (9126) - non configurable.
    • All VLANs are open on this port. There is no need to configure that, as once an interface is mapped as IPL all the VLANs are open on this port.

    In this example, ports 1/35 and 1/36 are used for the IPL connectivity between the switches.

    The control traffic for the MLAG is sent over the IPL via an L3 interface (interface VLAN). It is recommended to use a VLAN ID that is not used within the subnet (4000 in this example) to avoid mixing the host traffic with the control traffic on this interface.

    Note: The IPL link may pass traffic upon MLAG port failures, but not under normal circumstances (when all ports are in UP state).

    Run the following commands on both switches:

    sx01 (config) # interface port-channel 1

    sx01 (config interface port-channel 1 ) # exit

    sx01 (config) # interface ethernet 1/35 channel-group 1 mode active

    sx01 (config) # interface ethernet 1/36 channel-group 1 mode active

    sx01 (config) # vlan 4000

    sx01 (config vlan 4000) # exit

    sx01 (config) # interface vlan 4000

    sx01 (config interface vlan 4000 ) # exit

    sx01 (config) # interface port-channel 1 ipl 1
    sx01 (config) # interface port-channel 1 dcb priority-flow-control mode on force

                  

     

    Configure IP address for the IPL link on both switches:

     

    Note: The IPL IP address should not be part of the management network, it could be any IP address and subnet that is not in use in the network. This address is not advertised outside the switch.


    Configure the following on one switch (e.g. sx01):

     

    sx01 (config) # interface vlan 4000

    sx01 (config interface vlan 4000) # ip address 10.10.10.1 255.255.255.0

    sx01 (config interface vlan 4000) # ipl 1 peer-address 10.10.10.2                    

    Configure the following on the second switch (e.g. sx02):

    sx02 (config) # interface vlan 4000

    sx02 (config interface vlan 4000) # ip address 10.10.10.2 255.255.255.0

    sx02 (config interface vlan 4000) # ipl 1 peer-address 10.10.10.1                

     

    MLAG VIP

    MLAG VIP (Virtual IP) is important for retrieving peer information.

    Note: The IP address should be within the subnet of the management interface (mgmt0).

    The management network is used for keep-alive messages between the switches.

    The mlag domain must be unique name for each mlag domain. In case you have more than one pair of MLAG switches on the same network, each domain (consist of two switches) should be configured with different name.

     

    Configure the following on both switches:

    sx01 (config)# mlag-vip my-mlag-vip-domain ip 10.209.28.200 /24 force

                                      

     

    Set a virtual system MAC. The System MAC is used to identify the far-end switch used for the LACP System ID. It should be unicast range.

    Note: In case of an upgrade the MAC address is auto-calculated. For new MLAG installation, it must be added as configuration.

    Note: This command is applicable since 3.4.2008 software release and onward. in case you are running older software version, you can ignore it.

    switch (config)# mlag system-mac 00:00:5E:00:01:5D

     

    Enable MLAG globally

    switch config) # no mlag shutdown

     

    MLAG Interface

    MLAG configuration is very similar to port-channel configuration. It is recommended to keep the same port in each switch within
    the same mlag-port-channel (not a must). In this example, there are two MLAG ports, one for each host (host s1 is connected to mlag-port-channel 1 and host s2 is connected to mlag-port-channel 2).

     

    Note:  The "mlag-port-channel" number is globally significant and must be the same on both switches.

    Note:  In the example below "1-2" is a range.  It is possible to configure it individually versus using a range.

    Configure the following on both switches:

    sx01 (config) # interface mlag-port-channel 1-2

    sx01 (config interface port-channel 1-2 ) # exit

     

    Set the mode (LACP or static) - Only one option is applicable

    To set the MLAG interface in static mode run:

    sx01 (config) # interface ethernet 1/1 mlag-channel-group 1 mode on

    sx01 (config) # interface ethernet 1/2 mlag-channel-group 2 mode on

     

    To set the MLAG interface in LACP mode, run:

    sx01 (config) # interface ethernet 1/1 mlag-channel-group 1 mode active

    sx01 (config) # interface ethernet 1/2 mlag-channel-group 2 mode active

     

    Note: LACP mode 4 should be configured on the host side. The question whether to configure LACP or not is similar in LAG and MLAG ports. LACP notifications arrive via the control protocol, and not via the port physical status. It will show the remote system-id and may find configuration errors. LACP is very valuable, especially when you start taking into account large configurations with multiple MLAGs. it helps to detect any mismatched configurations in terms of connectivity.

     

    Enable the two interfaces:

    sx01 (config) # interface mlag-port-channel 1-2 no shutdown

     

    To change any MLAG port parameter, for example MTU simply enter to the MLAG interface configuration mode and perform the change. Note that for some operations you may need to use "force" or disable the link manually.

    sx01 (config) # interface mlag-port-channel 1-2

    sx01 (config interface mlag-port-channel 1-2 ) # mtu 9216 force

        

     

    To change the LAG/MLAG port speed, all interfaces should be removed out of the LAG/MLAG while changing the speed in the member interface configuration mode. It is suggested to do so before adding the ports as members to the LAG/MLAG port as once the ports are members in a LAG/MLAG - there is no option to change the speed, without removing the port from the LAG/MLAG.

     

    To verify MLAG configuration and status, run the following commands:

    sx01 [my-mlag-vip-domain: master] (config) # show mlag

    Admin status: Enabled

    Operational status: Up

    Reload-delay: 30 sec

    Keepalive-interval: 1 sec

    System-id: F4:52:14:11:E5:38

    MLAG Ports Configuration Summary:

    Configured: 2

    Disabled:   0

    Enabled:    2

     

    MLAG Ports Status Summary:

      Inactive:       0

      Active-partial: 0

      Active-full:    2

     

    MLAG IPLs Summary:

    ID   Group         Vlan       Operational  Local           Peer

         Port-Channel  Interface State        IP address      IP address

    --------------------------------------------------------------------------

    1    Po1           4000       Up           10.10.10.1      10.10.10.2

    sx01 [my-mlag-vip-domain: master] (config) #         

                                    

    To verify MLAG domain status, run:

    sx01 [my-mlag-vip-domain: master] (config) # show mlag-vip
    MLAG VIP
    ========
    MLAG group name: my-mlag-vip-domain
    MLAG VIP address: 10.209.28.200/24
    Active nodes: 2

    Hostname             VIP-State            IP Address
    ----------------------------------------------------
    sx01                 master               10.209.28.50
    sx02                 standby              10.209.28.51
    sx01 [my-mlag-vip-domain: master] (config) #

                                   

     

    To see MLAG interfaces summary, run:

    sx01 [my-mlag-vip-domain: master] (config) # show interfaces mlag-port-channel summary

    MLAG Port-Channel Flags: D-Down, U-Up

                             P-Partial UP, S - suspended by MLAG

    Port Flags: D - Down, P - Up in port-channel (members)

                S - Suspend in port-channel (members), I - Individual

     

     

    Group

    Port-Channel      Type       Local Ports              Peer Ports

    (D/U/P/S)                    (D/P/S/I)                (D/P/S/I)

    --------------------------------------------------------------------------------

    1 Mpo1(U)         Static     Eth1/1(P)                Eth1/1(P)

    2 Mpo2(U)         Static     Eth1/2(P)                Eth1/2(P)

     

    sx01 [my-mlag-vip-domain: master] (config) # 

                                  

     

     

    Server Configuration

     

    There are various of options to configure a bond on the servers.

     

    Important Note:

    Not all bond modes are applicable. The support bonding modes are:

    - balance-rr: mode 0

    - balance-xor: mode 2

    - 802.3ad (LACP): mode 4 (starting from 3.4.0000 MLNX-OS release)

     

    All other modes are not supported. The limitation is not derived from Mellanox MLAG implementation (other switch vendors do not support those modes as well). It derived from the fact that modes 1,3,5,6 are design to work without LAG configured on the switch side , in fact configuring LAG on the switch side will break the solution. If it is not working for single chassis LAG , it won’t work for multi-chassis LAG as well.

     

    For the bonding modes which need to use LAG on switch, it requires MLAG in case of using redundant switches.

    For the bonding modes which don’t use LAG on switch, just two independent switches should be okay, as well as non MLAG ports on MLAG switches should be okay as well.

     

    Linux Bonding ModeMode Numberis LAG required on the switch?Can it be used for MLAG interface?
    balance-rr0YesYes
    active-backup1NoNo
    balance-xor2YesYes
    broadcast3NoNo
    802.3ad4Yes (with LACP)Yes
    balance-tlb5NoNo
    balance-alb6NoNo

     

    Here is an example for Linux:

    http://linux.cloudibee.com/2009/10/linux-network-bonding-setup-guide/

     

     

     

    Here is an example for Windows 2012 (or above) - LBFO is configured via the OS.

    http://download.microsoft.com/download/F/6/5/F65196AA-2AB8-49A6-A427-373647880534/[Windows%20Server%202012%20NIC%20Teaming%20(LBFO)%20Deployment%20and%20Management].docx


    In older Windows versions it is configured via the NIC driver configuration.

     

    MLAG Considerations with LACP

     

    MLAG with LACP is supported since 3.4.0000 MLNX-OS SW version release.

    See above the full configuration flow (configure LACP use the command with mode "active")

    sx01 (config) # interface ethernet 1/1 mlag-channel-group 1 mode active

     

    LACP mode 4 should be configured on the host side. The question whether to configure LACP or not is similar in LAG and MLAG ports. LACP notifications arrive via the control protocol, and not via the port physical status. It will show the remote system-id and may find configuration errors. LACP is very valuable, especially when you start taking into account large configurations with multiple MLAGs. it helps to detect any mismatched configurations in terms of connectivity.

     

    MLAG Upgrade Procedure

     

    From a configuration point of view, in order to upgrade the MLAG cluster, the standby switch should be upgraded first, then (after reboot with the upgraded software) the slave will rejoin the MLAG cluster.

    After that, the master can be upgraded. When the master will reboot with the upgraded software, the other standaby node (which is running) becomes master. After the old master reboots, it joins the cluster and then everything is set.

    For the MLNX-OS upgrade procedure follow: HowTo Upgrade MLNX-OS Software on Mellanox switches, and HowTo Upgrade MLNX-OS Software on an MLAG Switch Pair

     

    MLAG and PXE boot

    In case of a pair of Ethernet switches (MLAG) with a number of mlag port-channels - in case the host needs to be able to PXE boot from one of the switches. In regular case, during PXE boot, the host may no get an IP address via DHCP due to the port-channel to both switches.

    By enabling lacp-individual on both switches (mlag-port-channel interface), the host boots up, one of the switches will be in individual-mode and will forward traffic and the second switch will be in suspended mode and will block traffic.

    You can verify that the state is correct with "show mlag-port-channel" command. The problem here is that the host needs to choose the correct port for pxe-boot traffic, otherwise traffic will be dropped. When you disabled one port on the switch, it actually forces the host to choose the correct port. If the host choose the blocked interface, it will fail , but then it is expected that the host will try the second interface until it will succeed.

     

    In addition, you should verify that the host tries to pxe-boot through all interfaces and not only through one. If not it might be configurable in the BIOS of the host.

     

    Configuration example:

    interface mlag-port-channel 1 lacp-individual enable force

     

    1. show mlag

    2. show mlag-vip

    3. show interface mlag-port-channel summary

    4. show interface ethernet <number>

    5. show interface mlag-port-channel <number>

    6. show mlag statistics

    7. show lacp counters

    8. show ip route

     

    Troubleshooting

    1. Make sure that the two switches are part of the same management subnet (conntected to the same switch or more but on the same subnet).

    2. Make sure that the management is connected on mgmt0 port.

    3. Make sure that the mlag-port-channel number is identical in both switches. The member port numbers could be different, but still recommended to be the same for the good order.

    4. It is recommended to have the same switch version on both switches.

    5. Make sure that IPL link is in UP state. try ping to the other switch via the IPL ping.

    6. Make sure that you align the MLAG interface mode on both the server and the switch. For example, if you select LACP mode on the MLAG interface (active), mode 4 should be configured on the bond interface.

     

    For detailed troubleshooting and procedures, refer toMLAG Procedures and Troubleshooting