HowTo Configure BGP on Mellanox Switches

Version 11

    This post shows how to configure eBGP on Mellanox switches based on the recommendations supplied by IETF Lapukhov et al.

    If you are familiar with the solution, you may refer only to the running-config example HowTo Configure BGP on Mellanox Switches (Running-Config).

     

    References

     

    Setup

    In this setup four switches and two servers are used in VMS topology, as follows :

    • Mellanox SX1036 ToR switches - SX03, Sx04
    • Mellanox SX1036 Spine switches - SX05, SX06
    • Two servers (S1, S2) , each installed CentOS 6.4 with Mellanox ConnectX-3 adapter cards

     

    3.png

     

    BGP Network Design

    The eBGP network design is as follows:

    1. The ToR switches will be configured with one AS number (AS 100), while the spine switches will be configured with another AS number (AS 200).

    2. ECMP will be configured for load balancing.

    3. Either router interface or VLAN interface can be configured. In this example VLAN interface will be used. Note that for router interface configuration, only SwitchX-2 based switches can be used.

    4. It is recommended to use peer group configuration for large setup. It is shown in this example.

     

    Prerequisites

    1. Connect the setup as described above, you can use any 40GbE QSFP cables (e.g. MC2207130-XXX).

    2. Configure the servers on the connected interface with IP and default gateway. There are serveral ways to do it,

     

    here is a simple example for server S1:

    # ifconfig eth2 12.12.5.1/24 up

    # route add -net 12.12.0.0/16 gw 12.12.5.2

            

     

    here is an example for server S2:

    # ifconfig eth2 12.12.6.1/24 up

    # route add -net 12.12.0.0/16 gw 12.12.6.2

            

     

    3. On each switch perform the following global configuration:

    - Disable spanning-tree

    - Enable LLDP (recommended)

    - Enable IP Routing (L3)

    - Enable BGP Protocol

    switch (config) # no spanning-tree

    switch (config) # lldp

    switch (config) # ip routing

    switch (config) # protocol bgp

            

     

    4. Configure six VLANs on each switch.

    Note: you can configure less VLANs on every switch, but this is just easier and just better best practice (not all VLANs are used in each switch):

    switch (config) # vlan 1-6

            

     

    5. Assign L2 access VLAN to the proper L2 interfaces and configure VLAN Interfaces (or router interfaces) per IP interfaces as described in the setup. This is different for each switch.

    For example, VLAN interface configuration on SX03 (It is similar in the other switches):

     

    switch (config) # interface ethernet 1/1 switchport access vlan 1

    switch (config) # interface vlan 1

    switch (config interface vlan 1) # ip address 12.12.1.1 255.255.255.0

    switch (config interface vlan 1) # exit

     

    switch (config) # interface ethernet 1/2 switchport access vlan 2

    switch (config) # interface vlan 2

    switch (config interface vlan 2) # ip address 12.12.2.1 255.255.255.0

    switch (config interface vlan 2) # exit

     

    switch (config) # interface ethernet 1/30 switchport access vlan 5

    switch (config) # interface vlan 5

    switch (config interface vlan 5) # ip address 12.12.5.2 255.255.255.0

    switch (config interface vlan 5) # exit

          

     

    Another option is to configure router interface on each port, and assign IP address to this port.

    Here is an example for switch SX03 (it is similar in the other switches):

    switch (config) # interface ethernet 1/1

    switch (config interface ethernet 1/1) # no switchport force

    switch (config interface ethernet 1/1) # ip address 12.12.1.1 /24

    switch (config interface ethernet 1/1) # encapsulation dot1q vlan 1

     

    switch (config) # interface ethernet 1/2

    switch (config interface ethernet 1/2) # no switchport force

    switch (config interface ethernet 1/2) # ip address 12.12.2.1 /24

    switch (config interface ethernet 1/2) # encapsulation dot1q vlan 2

     

    switch (config) # interface ethernet 1/30

    switch (config interface ethernet 1/30) # no switchport force

    switch (config interface ethernet 1/30) # ip address 12.12.5.2 /24

    switch (config interface ethernet 1/30) # encapsulation dot1q vlan 5

          

     

     

     

    Note: router interfaces can be configured only on SwitchX-2 based switches.

     

    BGP Spine Configuration (SX05, SX06)

    1. Configure the same AS number for all spines (e.g. 200)

    switch (config) # router bgp 200

    switch (config router bgp 200) #

          

     

    2. It is optional to enable fast-external-fallover BGP attribute. This attribute terminates eBGP sessions of any directly neighbor without waiting for the hold-down timer to expire if the link used to reach the peer goes down. Although this attribute improves BGP conversion time, it may cause instability in your BGP table in case of flapping interface.

    switch (config router bgp 200) # bgp fast-external-fallover

          

     

    3. Configure BGP neighbors. There are two option to configure neighbors:

    First option: Specifically add each neighbor's IP address.

    Here is an example of SX05:

    switch (config router bgp 200) # neighbor 12.12.1.1 remote-as 100

    switch (config router bgp 200) # neighbor 12.12.3.1 remote-as 100

          

     

    Second option: Create a peer-group and add all neighbors in the range of IP addresses for of the remote AS.

    Here is an example of SX03:

    switch (config router bgp 100) # neighbor my-peer-group peer-group

    switch (config router bgp 100) # bgp listen range 12.12.0.0 /16 peer-group my-peer-group remote-as 100

          

     

    The decision which option to use is influenced by security considerations and ease of configuration and the side of the network:

    • In case the setup is big and there are no security threats, it is recommended to use the second option to make it easy to configure the network.
    • In case the setup is small and there may be security threats, it is recommended to use the first option to specifically define the remote neighbors.

     

    BGP ToR Configuration (SX03, SX04)

    1. Configure the same AS number for all ToRs (e.g. 100)

    switch (config) # router bgp 100

    switch (config router bgp 100) #

          

     

    2. It is optional to enable fast-external-fallover BGP attribute. This attribute terminates eBGP sessions of any directly neighbor without waiting for the hold-down timer to expire if the link used to reach the peer goes down. Although this attribute improves BGP conversion time, it may cause instability in your BGP table in case of flapping interface.

    switch (config router bgp 100) # bgp fast-external-fallover

          

     

    4. Enable ECMP across AS paths. With this option enabled, all best routes with similar length are are considered for ECMP.

    switch (config router bgp 100) # bestpath as-path multipath-relax

          

     

    5.  (Optional) Configure ECMP maximum-paths parameter, in this example we have two spines, therefore we will use two maximum paths. It is possible to limit the maximum paths used by the ECMP.

    switch (config router bgp 100) # maximum-paths 2

          

     

    4. Distribute the servers networks in the BGP. For example, in case of SX03, the S1 server's network should be added:

    switch (config router bgp 100) #  network 12.12.5.0 /24

          

     

    5. Configure BGP neighbors. There are two option to configure neighbors

    First option: Specifically add each neighbor's IP address, and configures the switch to permit the advertisement of prefixes containing duplicate AS numbers (neighbors from the same AS as the router are considered as iBGP peers, and neigh-bors from other ASs are considered eBGP peers)

    Here is an example of SX03:

    switch (config router bgp 100) # neighbor 12.12.1.2 remote-as 200

    switch (config router bgp 100) # neighbor 12.12.2.2 remote-as 200

    switch (config router bgp 100) # neighbor 12.12.1.2 allowas-in 1

    switch (config router bgp 100) # neighbor 12.12.2.2 allowas-in 1

          

     

    Second option: Create a peer-group and add all neighbors in the range of IP addresses for of the remote AS.

    Here is an example of SX03:

    switch (config router bgp 100) # neighbor my-peer-group peer-group

    switch (config router bgp 100) # neighbor my-peer-group allowas-in 1

    switch (config router bgp 100) # bgp listen range 12.12.0.0 /16 peer-group my-peer-group remote-as 200

          

     

    The decision which option to use is influenced by security considerations and ease of configuration and the side of the network:

    • In case the setup is big and there are no security threats, it is recommended to use the second option to make it easy to configure the network.
    • In case the setup is small and there may be security threats, it is recommended to use the first option to specifically define the remote neighbors.

     

    Verification

    1. Check that the BGP TCP connection is established in all switches.

    Make sure the link state reached to 'ESTABLISHED' state.

     

    Here is an example from SX03:

    switch (config) #    show ip bgp summary

    BGP router identifier 12.12.2.1, local AS number 100

    BGP table version is 11, main routing table version 11

    3 network entries using 588 bytes of memory

    3 path entries using 588 bytes of memory

    3 BGP path attribute entries using 96 bytes of memory

    0 multipath network entries and 0 multipath paths

    0 BGP community entries using 0 bytes of memory

    0 received paths for inbound soft reconfiguration

    BGP using 21216 total bytes of memory

    Dampening disabled. 0 history paths, 0 dampened paths

    BGP activity 10/3 prefixes, 7/3 paths

    Neighbor          V        AS MsgRcvd MsgSent  TblVer  InQ OutQ Up/Down    State/PfxRcd

    12.12.1.2        4        200    1045    1045      9    0    0 0:17:24:21  ESTABLISHED

    12.12.2.2        4        200    1045    1046      10    0    0 0:17:24:18  ESTABLISHED

         

     

    2. Verify that BGP routes are added and ECMP is operational.

    Make sure that for ECMP routes the chars '*>' and 'm*' appears.

    Here is an example from SX03 with to next hops for VLAN 6 network (12.12.6.0)

    switch (config) # show ip bgp
    BGP table version is 11, local router ID is 12.12.2.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
                  r RIB-failure, S Stale, m multipath, b backup-path, x best-external
    Origin codes: i - IGP, e - EGP, ? - incomplete

        Network            Next Hop            Metric    LocPrf    Weight Path
    *>  12.12.5.0/24      0.0.0.0                  0          0      32768 i
    *>  12.12.6.0/24      12.12.1.2                0        100          0 200 100 I
    m*  12.12.6.0/24      12.12.2.2                0        100          0 200 100 I

         

     

    3. Check the routing table.

    Make sure again that there are two routes for each ECMP connection.

    Make sure that the remote networks (in this example VLAN 6) was distributed correctly via bgp source.

     

    switch (config) # show ip route

    Destination      Mask              Gateway          Interface        Source            Distance/Metric

    default          0.0.0.0          10.7.12.1        mgmt0            DHCP              0/0

    10.7.12.0        255.255.252.0    0.0.0.0          mgmt0            direct            0/0

    12.12.1.0        255.255.255.0    0.0.0.0          vlan1            direct            0/0

    12.12.2.0        255.255.255.0    0.0.0.0          vlan2            direct            0/0

    12.12.5.0        255.255.255.0    0.0.0.0          vlan5            direct            0/0

    12.12.6.0        255.255.255.0    12.12.1.2        vlan1            bgp              20/0

                                        12.12.2.2        vlan2            bgp              20/0

         

     

    4. Ping between the servers, make sure there is L3 unicast connectivity

    For example, ping from Server S1 to sever S2

    # ping 12.12.6.1

    PING 12.12.6.1 (12.12.6.1) 56(84) bytes of data.

    64 bytes from 12.12.6.1: icmp_seq=1 ttl=61 time=0.095 ms

    64 bytes from 12.12.6.1: icmp_seq=2 ttl=61 time=0.032 ms

    64 bytes from 12.12.6.1: icmp_seq=3 ttl=61 time=0.049 ms

         

     

    Other show outputs can be found under the prefix "show ip bpg" in the MLNX-OS CLI

     

    Troubleshooting

    Network troubleshooting  and debugging can be very painful, and in many cases requires networking experiance.

    Here are some guidelines that may help:

    1. Make sure all ports are in UP state.

    2. Verify spanning-tree is disabled in all switches or at least in the relevant ports.

    3. Make sure that L2 configuration is done properly (e.g. acess VLANs)

    4. Make sure all L3 interfaces are configured properly (same subnet for each two ports on the same link)

    5. Go through the BGP configuration, note that the ToRs are have more configuration as all networks should be added and ECMP should be configured.

     

     

    Additional Considerations

    1. In case multicast (PIM, IGMP) is required to be running on the network, you may use HowTo Configure IP Multicast (PIM, IGMP) on Mellanox Ethernet Switches example to configure PIM and IGMP. There is no difference if the unicase L3 runs via OSPF or BGP.

    2. In case more than one port is connected between each two switches (e.g. 2x40GbE) there are two options to configure the setup:

    • Use LAG on the ports and one IP interface configured on each end
    • Configure each interface as a difference IP address. For example, in case of two interfaces connected you will have two IP addresses configured (one on each port) towards the same switch.

     

    Pros and Cons: The usage of LAG makes the routing table and configuration easier, but may cause degradation of performance as there will be two levels of load-balancing (the first is the L3 ECMP and the second is the LAG). It is very hard to recommend which option to use in such case, it mainly depends on the user application.