6 Replies Latest reply on Apr 20, 2015 12:53 AM by ipuc

    ARP timeout on Mellanox IB Gateway SX6036G

    ale

      Hi,

       

      I have set up an installation of Openstack Juno on 8 nodes: one cloud controller and 7 Nova compute nodes. The nodes are all equipped with Mellanox ConnectX-3 Infiniband HCA. Openstack networking (Neutron) is configured to work over the Infiniband interface using the eth_ipoib driver for ethernet para-virtualization. The ethernet interface created by eth_ipoib is used by Openstack networking services over 2 UFM (version 4.8.0) partitions:

       

      • one pkey is used for the private network between cloud nodes. It is defined as a local network and it is only connected to the Logical Server Group of the cloud nodes
      • the other pkey is used for the Openstack/Neutron external network. It is defined as a local network and it is connected to the Logical Server Group of the cloud nodes and to 2 Infiniband gateways SX6036G (Mellanox OS 3.3.5006) configured in load balancing Active-Active mode (load balancing algorithm: ib-base-ip)

       

      Neutron is configured with the new Distributed Virtual Router (DVR) feature of Juno (https://wiki.openstack.org/wiki/Neutron/DVR). With DVR enabled, NAT services are distributed across all the nodes (controller and compute). One node (our cloud controller) provides only SNAT to VMs and all the other compute nodes provides DNAT to VMs.

       

      Networking over the private network it’s working fine. Instead, on the external network problems arise when assigning floating IPs to the VMs. DVR manages floating IPs by creating a FloatingIP router namespace, the Floating IP agent gateway (FIP), on each node that runs a VM to which is assigned a floating IP. Each floating IP of a VM is then assigned to the DVR namespace of the tenant network.

       

      I guess that floating IPs are not always reachable from the internet because, sometimes, the gateways map IP addresses with the wrong IB-MAC addresses in the ARP table.

       

      For example, now the ARP tables of the external network (proxy-arp 7) of the 2 gateways contains

       

      root@master# xdsh gw -l admin --devicetype IBSwitch::Mellanox "show ip arp interface proxy-arp 7"

      sx60g01: Mellanox MLNX-OS Switch Management

      sx60g01:

      sx60g01: Total number of entries: 16

      sx60g01:   Address              Type            Hardware Address          Interface          

      sx60g01:   ------------------------------------------------------------------------

      sx60g01:   XXX.XXX.XX.188       Dynamic ETH     00:24:F7:14:7A:C1         proxy-arp 7        

      sx60g01:   XXX.XXX.XX.189       Dynamic ETH     00:24:F7:14:B4:C1         proxy-arp 7        

      sx60g01:   XXX.XXX.XX.190       Dynamic ETH     00:00:0C:07:AC:83         proxy-arp 7        

      sx60g01:   XXX.XXX.XX.130       Dynamic IB      50:05:07:00:5B:01:7E:11   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.132       Dynamic IB      50:05:07:00:5B:01:7E:11   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.134       Dynamic IB      50:05:07:00:5B:01:80:0D   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.136       Dynamic IB      50:05:07:00:5B:01:80:0D   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.138       Dynamic IB      50:05:07:00:5B:01:7F:E5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.142       Dynamic IB      50:05:07:00:5B:01:78:19   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.144       Dynamic IB      50:05:07:00:5B:01:7F:D5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.146       Dynamic IB      50:05:07:00:5B:01:7F:D5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.148       Dynamic IB      50:05:07:00:5B:01:7E:11   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.152       Dynamic IB      50:05:07:00:5B:01:7F:D5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.154       Dynamic IB      50:05:07:00:5B:01:7F:E5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.158       Dynamic IB      50:05:07:00:5B:01:78:19   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.160       Dynamic IB      50:05:07:00:5B:01:7E:CD   proxy-arp 7        

      sx60g01:

      sx60g02: Mellanox MLNX-OS Switch Management

      sx60g02:

      sx60g02: Total number of entries: 12

      sx60g02:   Address              Type            Hardware Address          Interface          

      sx60g02:   ------------------------------------------------------------------------

      sx60g02:   XXX.XXX.XX.188       Dynamic ETH     00:24:F7:14:7A:C1         proxy-arp 7        

      sx60g02:   XXX.XXX.XX.189       Dynamic ETH     00:24:F7:14:B4:C1         proxy-arp 7        

      sx60g02:   XXX.XXX.XX.190       Dynamic ETH     00:00:0C:07:AC:83         proxy-arp 7        

      sx60g02:   XXX.XXX.XX.129       Dynamic IB      50:05:07:00:5B:01:7E:11   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.133       Dynamic IB      50:05:07:00:5B:01:7E:B1   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.135       Dynamic IB      50:05:07:00:5B:01:7E:CD   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.137       Dynamic IB      50:05:07:00:5B:01:80:0D   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.139       Dynamic IB      50:05:07:00:5B:01:78:19   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.141       Dynamic IB      50:05:07:00:5B:01:7F:E5   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.143       Dynamic IB      50:05:07:00:5B:01:7E:B1   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.145       Dynamic IB      50:05:07:00:5B:01:7F:D5   proxy-arp 7        

      sx60g02:   XXX.XXX.XX.147       Dynamic IB      50:05:07:00:5B:01:7E:B1   proxy-arp 7        

      sx60g02:

       

      So, for example, the IP address XXX.XXX.XX.133, that is unreachable from the internet, is defined on sx60g02 with IB-MAC 50:05:07:00:5B:01:7E:B1. However the node that is running the VM to which that floating IP is assigned, has MAC address 50:05:07:00:5b:01:7e:cd.

       

      Moreover, looking at the ARP table of sx60g01, you can see that there are entries for the following IPs

       

      sx60g01:   XXX.XXX.XX.148       Dynamic IB      50:05:07:00:5B:01:7E:11   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.152       Dynamic IB      50:05:07:00:5B:01:7F:D5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.154       Dynamic IB      50:05:07:00:5B:01:7F:E5   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.158       Dynamic IB      50:05:07:00:5B:01:78:19   proxy-arp 7        

      sx60g01:   XXX.XXX.XX.160       Dynamic IB      50:05:07:00:5B:01:7E:CD   proxy-arp 7        

       

      I have assigned these floating IPs to VMs more that a week ago. At the moment no nodes, either virtual or real, have these IPs assigned.

       

      Why are they still present in the ARP table of the gateway?

       

      Thank you very much in advance

       

      Ale