26 Replies Latest reply on Apr 7, 2014 7:01 AM by ale

    How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?

    ale

      Hi all,

      I'm trying to set up Mellanox-Neutron-Havana-Redhat on OpenStack RDO using SR-IOV. I have sucessfully set up the Nova Compute hosts with openstack-neutron-mellanox, eswitchd and mlnxvif driver. Nova compute hosts are equipped with Mellanox ConnectX-3 hca with SR-IOV enabled firmware (CentOS 6.5, Mellanox OFED 2.1).

       

      [root@n08 ~]# lspci |grep Mellanox

      01:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

      01:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:00.2 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:00.3 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:00.4 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:00.5 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:00.6 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:00.7 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

      01:01.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

       

      Nova compute works like a charm (VMs are able to ping each other and RDMA, tested with ib_{write,read}_*, is working) but I'm having problems configuring the Network Node. The Network node is equipped with a Mellanox ConnectX hca.

       

      [root@n02 ~]# lspci |grep Mellanox

      85:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

       

      Following the instructions at Mellanox-Neutron-Havana-Redhat I have configured it in Paravirtualized mode to make use of DHCP with Linux Bridge Neutron plugin (patched as described in the guide). The eIPoIB interface (eth2) is used by Linux Bridge as the physical interface

       

      [linux_bridge]

      physical_interface_mappings = default:eth2


      Following the configuration guide I have replaced the original IPoIB daemon (/sbin/ipoibd) with that supplied in the Openstack Havana repo (http://www.mellanox.com/downloads/solutions/openstack/havana/ipoibd) but it's not working. The python script exits with following error:


      [root@n02 ~]# /sbin/ipoibd log_level=debug

      2014-01-24 15:14:29,042 [root] DEBUG: cmdline /sbin/ipoibd log_level=debug

      2014-01-24 15:14:29,045 [root] DEBUG: cmd [/bin/logger -p daemon.info -t ipoibd "ipoibd daemon (pid 4943) started -- version 1.2.2302"], rc [0], out_len [0], retry [0]

      2014-01-24 15:14:29,045 [root] INFO: ipoibd daemon (pid 4943) started -- version 1.2.2302

      2014-01-24 15:14:29,045 [root] DEBUG: all_str              all

      2014-01-24 15:14:29,046 [root] DEBUG: interval             2

      2014-01-24 15:14:29,046 [root] DEBUG: max_iter             0

      2014-01-24 15:14:29,046 [root] DEBUG: gc_enable            yes

      2014-01-24 15:14:29,046 [root] DEBUG: gc_enable            yes

      2014-01-24 15:14:29,046 [root] DEBUG: log_level            10

      2014-01-24 15:14:29,049 [root] DEBUG: cmd [/bin/cat /etc/issue], rc [0], out_len [47], retry [0]

      2014-01-24 15:14:29,050 [root] DEBUG: system               CentOS

      2014-01-24 15:14:29,050 [root] DEBUG: NO MANAGMENT !!!

      2014-01-24 15:14:29,050 [root] DEBUG: UNKNOWN              UNKNOWN

      2014-01-24 15:14:29,050 [root] ERROR: Couldn't find managment tools (e.g. virsh/xenstore)

       

      It is looking for libvirt or Xen client packages that are, obviously, not installed on the Network node.

      If I manually configure the eth2 interface (following these instructions: eIPoIB Manual Configuration) everything seems to work fine and VMs are able to get an address from the DHCP on the Network node.

       

      This is my configuration of eIPoIB driver:

       

      [root@n02 ~]# tail /sys/class/net/eth2/eth/{slaves,vifs}

      ==> /sys/class/net/eth2/eth/slaves <==

      ib0.8001.1

      ib0.8001.2

       

      ==> /sys/class/net/eth2/eth/vifs <==

      SLAVE=ib0.8001.1 MAC=fa:16:3e:04:4c:f7 VLAN=1

      SLAVE=ib0.8001.2 MAC=fa:16:3e:52:32:68 VLAN=1

       

      where the mac addresses are those of the Neutron DHCP and ROUTER

       

      [root@n02 ~]# ip netns exec qdhcp-570dc494-a876-401c-a80d-6696e2140a5f ip a | grep -A2 'ns-'

      12: ns-bf3135d2-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

          link/ether fa:16:3e:04:4c:f7 brd ff:ff:ff:ff:ff:ff

          inet 10.0.0.3/24 brd 10.0.0.255 scope global ns-bf3135d2-4b

          inet6 fe80::f816:3eff:fe04:4cf7/64 scope link

             valid_lft forever preferred_lft forever

       

      [root@n02 ~]# ip netns exec qrouter-6e95aabe-b64a-4139-b12a-1413444071c1 ip a | grep -A 2 'q[rg]'

      16: qr-03eecdb6-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

          link/ether fa:16:3e:52:32:68 brd ff:ff:ff:ff:ff:ff

          inet 10.0.0.1/24 brd 10.0.0.255 scope global qr-03eecdb6-1b

          inet6 fe80::f816:3eff:fe52:3268/64 scope link

             valid_lft forever preferred_lft forever

      18: qg-96f34856-9d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

          link/ether fa:16:3e:51:aa:65 brd ff:ff:ff:ff:ff:ff

          inet XXX.XXX.XXX.XXX/XX brd XXX.XXX.XXX.XXX scope global qg-96f34856-9d

          inet6 fe80::f816:3eff:fe51:aa65/64 scope link

             valid_lft forever preferred_lft forever

       

      The eIPoIB interfaces should be automatically configured by the ipoibd server. Is the script in the repository correct? Should I have to download it from another repo?

       

      Thank you very much.

      Ale


       

      Message was edited by: Alessandro Federico Adding more infos about eth_ipoib config.

        • Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?

          Hi,

           

          If you have Mellanox OFED >=2.1 the ipoibd file that comes with OFED is ok - there is no need to copy it from the repository,

          i updated the Wiki.

           

          Itzik

            • Re: Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?
              ale

              Hi Itzik,

              thanks for reply. I have already tried to start the ipoibd coming with Mellanox OFED 2.1 but it does not work. It does not creates a correct configuration. It also begin an infinite loop creating many ib0.8001 clones and enslaving them into eth2. This loop ends when it tries to create the 256-th clone.

               

              [root@n02 ~]# ip a | grep ib0.8001 | wc -l

              256

               

              [root@n02 ~]# head -n 4 /sys/class/net/eth2/eth/{slaves,vifs}

              ==> /sys/class/net/eth2/eth/slaves <==

              ib0.1

              ib0.8001.1

              ib0.8001.2

              ib0.8001.3

               

              ==> /sys/class/net/eth2/eth/vifs <==

              SLAVE=ib0.1      MAC=00:02:c9:50:60:8d VLAN=N/A

              SLAVE=ib0.8001.1 MAC=00:02:c9:50:60:8d VLAN=1

              SLAVE=ib0.8001.2 MAC=N/A              VLAN=N/A

              SLAVE=ib0.8001.3 MAC=N/A              VLAN=N/A

               

              [root@n02 ~]# tail -n 1 /sys/class/net/eth2/eth/{slaves,vifs}

              ==> /sys/class/net/eth2/eth/slaves <==

              ib0.8001.255

               

              ==> /sys/class/net/eth2/eth/vifs <==

              SLAVE=ib0.8001.83 MAC=N/A

               

              As you can see above the vifs table is corrupted at line 83.

              Moreover the ib0 pkey "tagged" clone ib0.8001.1 (the only one configured) has been enslaved with the mac address of eth2 instead, for example, of the mac address of the Neutron DHCP that is fa:16:3e:04:4c:f7

               

              [root@n02 ~]# ip netns exec qdhcp-570dc494-a876-401c-a80d-6696e2140a5f ip a | grep -A2 'ns-'

              15: ns-bf3135d2-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                  link/ether fa:16:3e:04:4c:f7 brd ff:ff:ff:ff:ff:ff

                  inet 10.0.0.3/24 brd 10.0.0.255 scope global ns-bf3135d2-4b

                  inet6 fe80::f816:3eff:fe04:4cf7/64 scope link

                     valid_lft forever preferred_lft forever

               

              Please have a look at the attached ipoibd log file /var/log/eipoib_daemon.log.

               

              Thanks

               

              Ale

                • Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?

                  Hi,

                   

                  Can you please make sure you have only one ipoibd running ? ( ps -ef |grep ipoibd)

                  Can you kill all the ipobd instances running and then do /etc/init.d/openibd restart.

                   

                  Thanks,

                  Itzik

                    • Re: Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?
                      ale

                      Hi Itzik,

                      yes, I have already checked for how many instance of the ipoibd daemon was running because I found strange that very line was repeated 3 times in the logs ;-)

                      however there is only one instance

                       

                      [root@n02 ~]# ps -Lef |grep ipoibd

                      root      2149     1  2149  3    1 11:37 ?        00:00:10 /usr/bin/python /sbin/ipoibd -D eth_ipoib

                      root     31155  3874 31155  0    1 11:41 pts/0    00:00:00 grep ipoibd

                       

                      I have just rebooted the server to be sure :-)

                       

                      let me know if you need more infos about my configuration.

                       

                      thanks

                      ale

                        • Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?

                          Hi ALE,

                          Few points:

                          1. the "3 times print" is a bug that already fixed in the next version.

                          2. I saw something in the log you sent, perhaps it is the reason for the infinite loop of creation, will update you when i will be sure, (and will send you new version)


                          Anyway, am trying to understand the topology you are using:

                          1. Where did you run the eIPoIB interface? is it over a guest?

                          2.  Can you run "ibstat" command and send the output of it? (over the device that runs the eipoib)


                          Thanks, Erez

                            • Re: Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?
                              ale

                              Hi Erez,

                              my configuration is as follows

                              • nova compute nodes are equipped with Mellanox ConnectX-3 (MT27500) SR-IOV enabled. They are running Mellanox eSwitchd, Mellanox VIF driver and Mellanox Neutron Agent (for details, see my post Mellanox eSwitchd issue on Openstack Havana nova-compute). Of course no eIPoIB driver here, neither on the hosts (nova compute) nor on the guests.
                              • the network node is equipped with a Mellanox ConnectX HCA

                              [root@n02 ~]# ibstat mlx4_0

                              CA 'mlx4_0'

                                      CA type: MT26428

                                      Number of ports: 1

                                      Firmware version: 2.9.1000

                                      Hardware version: b0

                                      Node GUID: 0x0002c9030050608c

                                      System image GUID: 0x0002c9030050608f

                                      Port 1:

                                              State: Active

                                              Physical state: LinkUp

                                              Rate: 40

                                              Base lid: 26

                                              LMC: 0

                                              SM lid: 15

                                              Capability mask: 0x02510868

                                              Port GUID: 0x0002c9030050608d

                                              Link layer: InfiniBand

                              It is configured in paravirtualized mode using the eIPoIB driver as described here: Mellanox-Neutron-Havana-Redhat - OpenStack. The eIPoIB interface is eth2 and it's used by Linux Bridge as the physical interface for the IB network (default:1:6)

                              [root@n02 ~]# grep ^[\[a-z] /etc/neutron/plugin.ini

                              [vlans]

                              tenant_network_type = ib

                              network_vlan_ranges = default:1:6,external

                              [linux_bridge]

                              physical_interface_mappings = default:eth2,external:eth1.55

                              [vxlan]

                              [agent]

                              rpc_support_old_agents = True

                              [securitygroup]

                              firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver

                              I'm testing with only one tenant IB network so Linux Bridge have created only one tagged eth2 interface on the vlan 1

                              [root@n02 ~]# ip link show | grep -A1 eth2

                              7: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN

                                  link/ether 00:02:c9:50:60:8d brd ff:ff:ff:ff:ff:ff

                              --

                              15: eth2.1@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP

                                  link/ether 00:02:c9:50:60:8d brd ff:ff:ff:ff:ff:ff

                              this interface eth2.1 is bridged with the tap devices for the DHCP and the external router

                              [root@n02 ~]# brctl show brq570dc494-a8

                              bridge name     bridge id               STP enabled     interfaces

                              brq570dc494-a8          8000.0002c950608d       no              eth2.1

                                                                                      tap03eecdb6-1b

                                                                                      tapbf3135d2-4b

                              this is the namespace for the external router

                              [root@n02 ~]# ip netns exec qrouter-6e95aabe-b64a-4139-b12a-1413444071c1 ip a |grep -A2 'qr-'

                              16: qr-03eecdb6-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                  link/ether fa:16:3e:52:32:68 brd ff:ff:ff:ff:ff:ff

                                  inet 10.0.0.1/24 brd 10.0.0.255 scope global qr-03eecdb6-1b

                                  inet6 fe80::f816:3eff:fe52:3268/64 scope link

                                     valid_lft forever preferred_lft forever

                              this is the namespace for the DHCP

                              [root@n02 ~]# ip netns exec qdhcp-570dc494-a876-401c-a80d-6696e2140a5f ip a |grep -A2 'ns-'

                              13: ns-bf3135d2-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                  link/ether fa:16:3e:04:4c:f7 brd ff:ff:ff:ff:ff:ff

                                  inet 10.0.0.3/24 brd 10.0.0.255 scope global ns-bf3135d2-4b

                                  inet6 fe80::f816:3eff:fe04:4cf7/64 scope link

                                     valid_lft forever preferred_lft forever

                              in order to get it work I have to manually create two ib0 clones with the first pkey (8001) and enslave them to eIPoIB interface (eth2)

                              [root@n02 ~]# tail /sys/class/net/eth2/eth/{slaves,vifs}

                              ==> /sys/class/net/eth2/eth/slaves <==

                              ib0.8001.1

                              ib0.8001.2

                               

                              ==> /sys/class/net/eth2/eth/vifs <==

                              SLAVE=ib0.8001.1 MAC=fa:16:3e:04:4c:f7 VLAN=1

                              SLAVE=ib0.8001.2 MAC=fa:16:3e:52:32:68 VLAN=1

                              • finally, the subnet manager is running on my cloud controller that is also equipped with a Mellanox ConnectX HCA (MT26428).

                              [root@n01 ~(keystone_admin)]# cat /etc/opensm/partitions.conf

                              Default=0xffff, ipoib, mtu=4 : ALL=full;

                              management=0x7fff, ipoib, sl=0, defmember=full : ALL, ALL_SWITCHES=full,SELF=full;

                              vlan1=0x1, ipoib, sl=0, defmember=full : ALL;

                              vlan2=0x2, ipoib, sl=0, defmember=full : ALL;

                              vlan3=0x3, ipoib, sl=0, defmember=full : ALL;

                              vlan4=0x4, ipoib, sl=0, defmember=full : ALL;

                              vlan5=0x5, ipoib, sl=0, defmember=full : ALL;

                              vlan6=0x6, ipoib, sl=0, defmember=full : ALL;

                               

                              Thank you very much.

                               

                              Ale

                                • Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?

                                  Hi Ale,

                                   

                                  I found a possible reason for the bug, checking the fix now, will update you soon.

                                   

                                  Thanks, Erez

                                      • Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?
                                        ophirmaor

                                        Hi Ale,

                                        please use this ipoib file to fix the problem

                                         

                                        http://www.mellanox.com/downloads/solutions/temp/ipoibd

                                         

                                        Let me know if that solves it.

                                         

                                        Thanks,

                                        Ophir.

                                          • Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?
                                            ale

                                            Hi Ophir,

                                            thanks for the update!

                                            I was out of office last week. I will try the new ipoibd asap.

                                             

                                            Thank you

                                            Ale

                                            • Re: Re: How should I use the ipoibd daemon from Mellanox Openstack Havana repo with SR-IOV?
                                              ale

                                              Hi Ophir,

                                              I have tested the new ipoibd service but unfortunately it's still not working :-(

                                              The new script generates this configuration:

                                               

                                              [root@n02 ~]# tail /sys/class/net/eth2/eth/{slaves,vifs}

                                              ==> /sys/class/net/eth2/eth/slaves <==

                                              ib0.1

                                              ib0.8001.1

                                              ib0.8001.2

                                              ib0.8001.3

                                              ib0.8001.4

                                               

                                              ==> /sys/class/net/eth2/eth/vifs <==

                                              SLAVE=ib0.1      MAC=00:02:c9:50:60:8d VLAN=N/A

                                              SLAVE=ib0.8001.1 MAC=00:02:c9:50:60:8d VLAN=1

                                              SLAVE=ib0.8001.2 MAC=N/A               VLAN=N/A

                                              SLAVE=ib0.8001.3 MAC=d2:50:00:71:e1:90 VLAN=1

                                              SLAVE=ib0.8001.4 MAC=8a:4b:57:2a:49:62 VLAN=1

                                               

                                              It creates 2 clones (ib0.1 and ib0.8001.1, pkey/vlan tagged) and it enslaves them with the MAC address of the eIPoIB interface (eth2) causing these logs by the kernel:

                                               

                                              eth2.1: received packet with own address as source address

                                              eth2.1: received packet with own address as source address

                                               

                                              It creates and enslaves one clone pkey tagged (ib0.8001.2) without configuring it.

                                              Finally it creates 2 other clones pkey/vlan tagged (ib0.8001.3 and ib0.8001.4) but it enslaves them with the MAC addresses of the linux TAP interfaces used by the bridge

                                               

                                              [root@n02 ~]# brctl show brq570dc494-a8

                                              bridge name     bridge id               STP enabled     interfaces

                                              brq570dc494-a8          8000.0002c950608d       no              eth2.1

                                                                                                      tap03eecdb6-1b

                                                                                                      tapbf3135d2-4b

                                              [root@n02 ~]# ip a show dev tap03eecdb6-1b

                                              28: tap03eecdb6-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                                  link/ether 8a:4b:57:2a:49:62 brd ff:ff:ff:ff:ff:ff

                                                  inet6 fe80::884b:57ff:fe2a:4962/64 scope link

                                                     valid_lft forever preferred_lft forever

                                              [root@n02 ~]# ip a show dev tapbf3135d2-4b

                                              18: tapbf3135d2-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                                  link/ether d2:50:00:71:e1:90 brd ff:ff:ff:ff:ff:ff

                                                  inet6 fe80::d050:ff:fe71:e190/64 scope link

                                                     valid_lft forever preferred_lft forever

                                               

                                              These 2 last clones should instead be enslaved with the MAC addresses of VETH interfaces of the DHCP and ROUTER namespaces:

                                               

                                              [root@n02 ~]# ip netns exec qdhcp-570dc494-a876-401c-a80d-6696e2140a5f ip a show dev ns-bf3135d2-4b

                                              17: ns-bf3135d2-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                                  link/ether fa:16:3e:04:4c:f7 brd ff:ff:ff:ff:ff:ff

                                                  inet 10.0.0.3/24 brd 10.0.0.255 scope global ns-bf3135d2-4b

                                                  inet6 fe80::f816:3eff:fe04:4cf7/64 scope link

                                                     valid_lft forever preferred_lft forever

                                              [root@n02 ~]# ip netns exec qrouter-6e95aabe-b64a-4139-b12a-1413444071c1 ip a show dev qr-03eecdb6-1b

                                              27: qr-03eecdb6-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

                                                  link/ether fa:16:3e:52:32:68 brd ff:ff:ff:ff:ff:ff

                                                  inet 10.0.0.1/24 brd 10.0.0.255 scope global qr-03eecdb6-1b

                                                  inet6 fe80::f816:3eff:fe52:3268/64 scope link

                                                     valid_lft forever preferred_lft forever

                                               

                                              With the previous configuration the VM does not get an IP address because the eIPoIB module is misconfigured. Please, see these kernel logs:

                                               

                                              eth_ipoib: vif: fa:16:3e:04:4c:f7 with vlan: 1 miss for parent: ib0

                                              eth_ipoib: vif: fa:16:3e:04:4c:f7 with vlan: 1 miss for parent: ib0

                                              eth_ipoib: vif: fa:16:3e:04:4c:f7 with vlan: 1 miss for parent: ib0

                                               

                                              As I wrote before my working configuration is:

                                               

                                              [root@n02 ~]# tail /sys/class/net/eth2/eth/{slaves,vifs}

                                              ==> /sys/class/net/eth2/eth/slaves <==

                                              ib0.8001.1

                                              ib0.8001.2

                                               

                                              ==> /sys/class/net/eth2/eth/vifs <==

                                              SLAVE=ib0.8001.1 MAC=fa:16:3e:04:4c:f7 VLAN=1  <------- DHCP

                                              SLAVE=ib0.8001.2 MAC=fa:16:3e:52:32:68 VLAN=1  <------- ROUTER

                                               

                                              Thank you very much!

                                              Ale