HowTo Setup High Availability on ESXi 5.5 with Mellanox Adapters and Switches

Version 12

    This post discuss the high availability connectivity for servers installed with VMWare ESX 5.5 and connected to Mellanox Ethernet Switches.

    This post meant for IT managers and integrators and assumes familiarity with Mellanox Ethernet switches, MLAG and ESX installation.

     

    >>Learn how to configure MLAG for free on the Mellanox Academy

     

    References

     

    Overview

    While MLAG works well in most cases for high availability configuration to the ToR switches, in some cases the host has its own mechanism to detect and maintain active-active links. One of those cases is the case of ESXi. When two NIC ports are configured on the same vSwitch, there are few options for the teaming, one of them (the default) is "Route based on the originating virtual port ID" (see below). In this case the ESX performs load balance between the VMs on the two network ports while mapping each VM to dedicated NIC port (only one), in case of failover of one network port, the ESX will move the VM to the other available port.

    teaming-1.PNG

     

    teaming-2.PNG

     

    In such configuration, what should be the switch HA configuration?

    Note: MLAG configuration is not mandatory in case you have only ESX servers! you can configure the link between the switches in a regular port-channel as the HA is performed by the host.

    MLAG configuration between the switches is required due to other network configuration, for example, if the setup consist of leaf and spine (VMS like) topology or in case you have other types of servers (e.g. linux or windows) that configured with regular bond/teaming over the two ports.

     

    In case both ports are connected each one to a different Mellanox Ethernet switch (configured with MLAG) the ports connected to the servers should be configured as  regular edge ports and not as mlag-port-channel. Otherwise, the MLAG will break (e.g. VM1 is assigned to port-1, but it receives data on port-2 (due to the MLAG load balancing) - then VM1 will not receive this data as it is not mapped to the port-2.

     

    What about other teaming Load Balancing options?

    Depends on the application sensitivity, some applications may not handle flows that doesn't ingress and egress from the same port.

    For example, in case Route based on IP hash is

    teaming-6.png

     

    Setup

    To test this configuration, we need two Ethernet switches (e.g. SX1710) configured in MLAG and two servers (S1, S2) installed with ESX 5.5

    Two ports (configured as LAG) will be used for the IPL link connecting the MLAG switches.

     

    Note: MLAG configuration between the switches is not mandatory if this is the only setup, as the HA is being done by the host in ESX case.

    1%3Fauth_token%3D952535e8393d18900fbc028f0b8ff739c8aa3bb8

     

    Configuration

    1. Follow HowTo Configure MLAG on Mellanox Switches to configure MLAG on sx01 and sx02.

    Note: don't configure mlag-port-channel on the ports connected to the servers.

     

    2. Install ESX 5.5 on both servers.

     

    3. Install Mellanox driver, refer to HowTo Install Mellanox OFED ESX Driver for VMWare ESX 5.5.

     

    4. Make sure Two vSwitches are created

     

    • vSwitch0 - for management (1GbE network )
    • vSwitch1 - for data (40GbE network)

     

    teaming-3.PNG

     

    5. Click on vSwitch1 Properties

    teaming-4.PNG

     

    6. Click on Edit vSwitch

    Make sure that the Load balancing is Route based on the originating virtual port ID.

    teaming-5.PNG

     

    7. Create a VM on each server (VM1) with any OS (In this case I used CentOS 6.5) and see that it is equipped with two networks.

    In this case: VM Network is the 1GbE while the Data Network is the 40GbE.

    teaming-7.PNG

     

    8. Using the VM Console, apply IP addresses to each adapter port on each VM.

    In this case eth0 is used for management, while eth1 is configured for the 40GbE ports.

    Here is an example of the ifconfig of VM1 on server 1 (eth1 IP 11.11.11.1).

    # ifconfig eth0

    eth0      Link encap:Ethernet  HWaddr 00:0c:29:3f:60:90

              inet addr:10.20.2.101  Bcast:10.20.2.255  Mask:255.255.255.0

              inet6 addr: fe80::5054:ff:fe9d:272d/64 Scope:Link

              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

              RX packets:2476413 errors:0 dropped:24 overruns:0 frame:0

              TX packets:510895 errors:0 dropped:0 overruns:0 carrier:0

              collisions:0 txqueuelen:1000

              RX bytes:1034283 (986.2 MiB)  TX bytes:832951 (79.3 MiB)

           

    # ifconfig eth1

    eth1      Link encap:Ethernet  HWaddr 52:54:00:9D:27:2D

              inet addr:11.11.11.1  Bcast:11.11.11.255  Mask:255.255.255.0

              inet6 addr: fe80::5054:ff:fe9d:272d/64 Scope:Link

              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

              RX packets:2476413 errors:0 dropped:24 overruns:0 frame:0

              TX packets:510895 errors:0 dropped:0 overruns:0 carrier:0

              collisions:0 txqueuelen:1000

              RX bytes:1034133283 (986.2 MiB)  TX bytes:83217951 (79.3 MiB)

             

     

    Similarly, in VM1 on server 2, configure the IP address on eth1 to be 11.11.11.2.

     

    Verification

    1. Ping should work between the VMs

    # ping 11.11.11.2

    PING 11.11.11.2 (11.11.11.2) 56(84) bytes of data.

    64 bytes from 11.11.11.2: icmp_seq=1 ttl=64 time=1.52 ms

    64 bytes from 11.11.11.2: icmp_seq=2 ttl=64 time=0.608 ms

    2. Disable the port on sx01 switch connected to server 1 and verify that the ping is still running.

     

    3. Disable the port on sx02 switch connected to server 1 and verify that the ping is still running.

     

    For example, if the port on the switches is 1/1 you can use the command to disable the port:

    switch(config) # interface ethernet 1/1 shutdown

    and to enable the port:

    switch(config) # interface ethernet 1/1 no shutdown