8 Replies Latest reply on Mar 17, 2015 10:41 AM by mkkang01

    VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)

    mkkang01

      Hello,

       

      I'm working on SR-IOV using Mellanox ConnectX-3 card (switch: Voltaire 4036) on CentOS 7.

       

      Mellanox OFED Driver Installation and Configuration for SR-IOV

      Mellanox-Neutron-Icehouse-Redhat-Ethernet - OpenStack

      Nova-neutron-sriov - OpenStack

       

      Most description/packages are written/made based on CentOS 6.*/python 2.6.

      I'm working on CentOS 7 so I've installed needed packages from git sources and tar balls.

      I could verify that SR-IOV is installed using lspci command.

       

      # lspci -nn | grep Mell

      21:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]

      21:00.1 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:00.2 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:00.3 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:00.4 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:00.5 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:00.6 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:00.7 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.1 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.2 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.3 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.4 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.5 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.6 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:01.7 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

      21:02.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

       

      I could verify that this VF can be attached to VM (SR-IOV, hostdev) using virsh.

       

      But, "VLAN operation failed" while launching OpenStack VM, according to the following mlnx-agent and eswitchd logs.

       

      2015-02-24 16:51:31,346 DEBUG eswitchd [-] Handling message - {u'action': u'set_vlan', u'vlan': 1000, u'fabric': u'physnet1', u'port_mac': u'fa:16:3e:cc:76:bd'}

      2015-02-24 16:51:31,346 DEBUG eswitchd [-] Running command: sudo eswitch-rootwrap /etc/eswitchd/rootwrap.conf ip link set ens4 vf 9 vlan 1000 qos 0

      2015-02-24 16:51:31,441 DEBUG eswitchd [-]

      Command: ['sudo', 'eswitch-rootwrap', '/etc/eswitchd/rootwrap.conf', 'ip', 'link', 'set', 'ens4', 'vf', '9', 'vlan', '1000', 'qos', '0']

      Exit code: 2

      Stdout: ''

      Stderr: 'RTNETLINK answers: Operation not supported\n'

      2015-02-24 16:51:31,442 ERROR eswitchd [-] Set VLAN operation failed

       

      Also, I tried manually, but it's same as follows. Assigning MAC is fine but VLAN setup is failed. The iproute2-3.19.0 is installed.

       

      # ip link set ens4 vf 9 mac fa:16:3e:cc:76:bd

      # ip link show ens4

      11: ens4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master ovs-system state DOWN mode DEFAULT qlen 1000

          link/ether 00:02:c9:fb:a4:50 brd ff:ff:ff:ff:ff:ff

          vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 8 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 9 MAC fa:16:3e:cc:76:bd, vlan 4095, spoof checking off, link-state auto

          vf 10 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 11 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 12 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 13 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 14 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

          vf 15 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

      # ip link set ens4 vf 9 vlan 1000

      RTNETLINK answers: Operation not supported

       

      # cat /boot/config-3.10.0-123.20.1.el7.x86_64 | grep NETFILTER_NETLINK

      CONFIG_NETFILTER_NETLINK=m

      CONFIG_NETFILTER_NETLINK_ACCT=m

      CONFIG_NETFILTER_NETLINK_QUEUE=m

      CONFIG_NETFILTER_NETLINK_LOG=m

      CONFIG_NETFILTER_NETLINK_QUEUE_CT=y

       

      Any suggestions are welcome.

        • Re: VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)
          mkkang01

          This "ip link sest" problem is resolved by upgrading ConnectX-3 firmware to 2.33.5000.

          But, hca_self_test.ofed shows FAIL/DOWN results. Any suggestion to resolve this?

           

          # hca_self_test.ofed

           

          ---- Performing Adapter Device Self Test ----

          Number of CAs Detected ................. 17

          PCI Device Check ....................... PASS

          Kernel Arch ............................ x86_64

          Host Driver Version .................... MLNX_OFED_LINUX-2.4-1.0.0 (OFED-2.4-1.0.0): modules

          Host Driver RPM Check .................. PASS

          Firmware on CA #0 VPI .................. v2.33.5000

          Firmware Check on CA #0 (VPI) .......... PASS

          Firmware Check on CA #1 (VPI) .......... FAIL

              REASON: CA #1: failed to get firmware version

          Firmware Check on CA #2 (VPI) .......... FAIL

              REASON: CA #2: failed to get firmware version

          Firmware Check on CA #3 (VPI) .......... FAIL

              REASON: CA #3: failed to get firmware version

          Firmware Check on CA #4 (VPI) .......... FAIL

              REASON: CA #4: failed to get firmware version

          Firmware Check on CA #5 (VPI) .......... FAIL

              REASON: CA #5: failed to get firmware version

          Firmware Check on CA #6 (VPI) .......... FAIL

              REASON: CA #6: failed to get firmware version

          Firmware Check on CA #7 (VPI) .......... FAIL

              REASON: CA #7: failed to get firmware version

          Firmware Check on CA #8 (VPI) .......... FAIL

              REASON: CA #8: failed to get firmware version

          Firmware Check on CA #9 (VPI) .......... FAIL

              REASON: CA #9: failed to get firmware version

          Firmware Check on CA #10 (VPI) .......... FAIL

              REASON: CA #10: failed to get firmware version

          Firmware Check on CA #11 (VPI) .......... FAIL

              REASON: CA #11: failed to get firmware version

          Firmware Check on CA #12 (VPI) .......... FAIL

              REASON: CA #12: failed to get firmware version

          Firmware Check on CA #13 (VPI) .......... FAIL

              REASON: CA #13: failed to get firmware version

          Firmware Check on CA #14 (VPI) .......... FAIL

              REASON: CA #14: failed to get firmware version

          Firmware Check on CA #15 (VPI) .......... FAIL

              REASON: CA #15: failed to get firmware version

          Firmware Check on CA #16 (VPI) .......... FAIL

              REASON: CA #16: failed to get firmware version

          Host Driver Initialization ............. PASS

          Number of CA Ports Active .............. 0

          Port State of Port #1 on CA #0 (VPI)..... DOWN (Ethernet)

          Port State of Port #2 on CA #0 (VPI)..... DOWN (Ethernet)

          Error Counter Check on CA #0 (VPI)...... NA (Eth ports)

          Kernel Syslog Check .................... PASS

          Node GUID on CA #0 (VPI) ............... 00:02:c9:03:00:fb:a4:50

          Node GUID on CA #1 (VPI) ............... NA

          Node GUID on CA #2 (VPI) ............... NA

          Node GUID on CA #3 (VPI) ............... NA

          Node GUID on CA #4 (VPI) ............... NA

          Node GUID on CA #5 (VPI) ............... NA

          Node GUID on CA #6 (VPI) ............... NA

          Node GUID on CA #7 (VPI) ............... NA

          Node GUID on CA #8 (VPI) ............... NA

          Node GUID on CA #9 (VPI) ............... NA

          Node GUID on CA #10 (VPI) ............... NA

          Node GUID on CA #11 (VPI) ............... NA

          Node GUID on CA #12 (VPI) ............... NA

          Node GUID on CA #13 (VPI) ............... NA

          Node GUID on CA #14 (VPI) ............... NA

          Node GUID on CA #15 (VPI) ............... NA

          Node GUID on CA #16 (VPI) ............... NA

          ------------------ DONE ---------------------

            • Re: VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)
              ferbs

              The script simply tries to query the VFs you've created for firmware version. Don't think there's anything wrong here. you'll see above that the real HCA is identified with 2.33.5000

                • Re: VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)
                  mkkang01

                  Thanks, Erez! OK. If the failed VF firmware checking is fine... how about "Port State of Port #x on CA #0 (VPI)"? Before setting SR-IOV, that was not "DOWN".

                    • Re: VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)
                      ferbs

                      It looks like before the link type was Infiniband and the ports were on Init stage, and right now they're on ethernet and the port type is down. You can change the ports to work as IB with the connectx_port_config command.

                        • Re: VLAN setup on SR-IOV enabled Mellanox ConnectX-3 (CentOS7)
                          mkkang01

                          Hi Erez,

                           

                          Thanks for the suggestion. I'm following HowTo Change Port Type in Mellanox ConnectX-3 Adapter now.

                          Actually, /sys/bus/pci/devices/0000\:21\:00.0/mlx4_port1 or port2 are already set as eth.

                           

                          Once port_type_array is set, I can't change the port configuration as follows:

                           

                          # cat /etc/modprobe.d/mlx4_core.conf

                          options mlx4_core port_type_array=2,2 num_vfs=16 probe_vf=0 enable_64b_cqe_eqe=0  log_num_mgm_entry_size=-1

                           

                          # connectx_port_config

                          ConnectX PCI devices :

                          |----------------------------|

                          | 1             0000:21:00.0 |

                          |----------------------------|

                          Before port change:

                          eth

                          eth

                          Not allowed to change port configuration, quitting...

                           

                           

                          When trying it after commenting current setup in mlx4_core.conf, still Ethernet is DOWN as follows:

                           

                          # cat /sys/bus/pci/devices/0000\:21\:00.0/mlx4_port1

                          eth

                          # cat /sys/bus/pci/devices/0000\:21\:00.0/mlx4_port2

                          eth

                           

                          # connectx_port_config -s

                          --------------------------------

                          Port configuration for PCI device: 0000:21:00.0 is:

                          eth

                          eth

                          --------------------------------

                           

                          # connectx_port_config

                          ConnectX PCI devices :

                          |----------------------------|

                          | 1             0000:21:00.0 |

                          |----------------------------|

                          Before port change:

                          eth

                          eth

                          |----------------------------|

                          | Possible port modes:       |

                          | 1: Infiniband              |

                          | 2: Ethernet                |

                          | 3: AutoSense               |

                          |----------------------------|

                          Select mode for port 1 (1,2,3): 1

                          Select mode for port 2 (1,2,3): 1

                          WARNING: Illegal port configuration attempted,

                            Please view dmesg for details.

                           

                          // ... [ 4135.654328] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 ...

                           

                          # connectx_port_config

                          ConnectX PCI devices :

                          |----------------------------|

                          | 1             0000:21:00.0 |

                          |----------------------------|

                          Before port change:

                          eth

                          eth

                          |----------------------------|

                          | Possible port modes:       |

                          | 1: Infiniband              |

                          | 2: Ethernet                |

                          | 3: AutoSense               |

                          |----------------------------|

                          Select mode for port 1 (1,2,3): 2

                          Select mode for port 2 (1,2,3): 2

                          After port change:

                          eth

                          eth

                           

                          # hca_self_test.ofed

                          ---- Performing Adapter Device Self Test ----

                          Number of CAs Detected ................. 1

                          PCI Device Check ....................... PASS

                          Kernel Arch ............................ x86_64

                          Host Driver Version .................... MLNX_OFED_LINUX-2.4-1.0.0 (OFED-2.4-1.0.0): modules

                          Host Driver RPM Check .................. PASS

                          Firmware on CA #0 VPI .................. v2.33.5000

                          Firmware Check on CA #0 (VPI) .......... PASS

                          Host Driver Initialization ............. PASS

                          Number of CA Ports Active .............. 0

                          Port State of Port #1 on CA #0 (VPI)..... DOWN (Ethernet)

                          Port State of Port #2 on CA #0 (VPI)..... DOWN (Ethernet)

                          Error Counter Check on CA #0 (VPI)...... NA (Eth ports)

                          Kernel Syslog Check .................... PASS

                          Node GUID on CA #0 (VPI) ............... 00:02:c9:03:00:fb:a4:50

                          ------------------ DONE ---------------------

                           

                           

                          When upgrading ConnectX-3 firmware, it was not installed automatically using "mlnxofedinstall" on CentOS7.

                          So I upgraded the firmware from 2.11 to 2.33 using "firmware/mlxfwmanager_sriov_en_x86_64 --online -u -d 21:00.0" command.

                          Is there any other stuff to be checked? Any tips are welcome.