7 Replies Latest reply on Jun 27, 2014 6:47 AM by pseral

    QoS for Virtual Functions(SR-IOV devices)

      Host: CentOS 6.4

      Mellanox Adapter: ConnectX-3 EN

      Mellanox Driver: MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 with SRIOV enabled

      Virtualization: KVM with SRIOV devices PCI passthroughed



      I was able to get QoS (Bandwidth control) working on my host by configuring physical functions using: End-to-End QoS Configuration for Mellanox Switches (SwitchX) and Adapters


      Not Working:

      My next step was to make QoS work with SR-IOV devices that were attached (through PCI passthrough) to KVM guests. I assumed configuring physical function should be enough and I started my guest (virtual machine) and tested the egress rate of a VLAN device (VLAN 2) that I created from a virtual function (SR-IOV device) inside the guest. Unfortunately the egress rate does not correspond to what the host (physical function) was able (configured) to achieve.


      Noticed that any VLAN index I use inside the guest, the bandwidth I measure does not correspond to that of the VLAN index in the host configuration, but VLAN index 0 always.


      On host I created, configured and tested VLAN 2 to be 1Gbits/s and VLAN 0 to be 5Gbits/s.

      On guest I created VLAN 2 and when observing the bandwidth I found I got 5 Gbits/s instead of 1 Gbits/s.

      Could anyone point me in right direction on how to configure the SR-IOV devices for QoS properly, or is it even supported in my Mellanox adapter?

      Thanks in advance!

        • Re: QoS for Virtual Functions(SR-IOV devices)

          I was able to solve the problem of SR-IOV with QoS with the help of Setting virtual network attributes on a Virtual...


          But the solution in the link suggests that a virtual function will be binded to only one VLAN only.

          Does that mean there is no way to send more than one VLAN through one virtual function like the way host does?



            • Re: QoS for Virtual Functions(SR-IOV devices)

              Let me check this issue.

              I moved the question to the solutions space Solutions


              • Re: QoS for Virtual Functions(SR-IOV devices)


                How many VLANs you have on the VM? is it one VLAN or two?

                Do you work in VGT mode? (VLAN Guest Tagging ) or VST (VLAN switching tagging)?

                You can review the OFED  UM on the SR-IOV chapter (

                Do you run RoCE?

                The VST mode is the default mode, which means, that the hypervisor will enforce (override) the tagging (no matter what the guest will use).

                You can also configure two VFs for the VM, one for each VLAN  (working in VST mode).



                  • Re: QoS for Virtual Functions(SR-IOV devices)

                    Thanks for getting back to me..


                    I would like to have multiple VLANs with each VLAN having their own QoSs inside VM. It would be nice to configure everthing inside the guest instead of doing anything at host at all. But I am okay with setting VLAN to QoS mappings in Host and send traffic with VLAN tags from inside VM and get the traffic throttled according to host settings.


                    We are not using RoCE.


                    I tried both VST and VGT:

                    VGT: With VGT, the problem I am facing is inside VM, I created vlan devices using vconfig and send traffic and the rate limit I get is that of VLAN0 on host. So configuring VLAN inside VM is not working. So rate limit of VLAN0 on host is all I can get.

                    VST: With VST, creating more than one VF(with each VFs having their own VLAN/QoS set on host) as you said, but that would complicate our design a bit and we are trying to keep it as the last resort.


                    If I could use VGT with multiple VLANs inside VM and get rate control that would be the ultimate thing I'm looking for.


                    Note: Documentation(Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf) says VGT is default behaviour.

                      • Re: Re: QoS for Virtual Functions(SR-IOV devices)


                        Could you try the following:

                        At the VM:

                        - Configure 3 VLANs (let's say) and map each one of to egress with specific priority

                        let's say

                            - Management VLAN10 - priority 1

                            - Service VLAN30 - priority 3

                            - Storage VLAN50 - priority 5


                        to do that use the command vconfig set_egress_map

                        see example here ( End-to-End QoS Configuration for Mellanox Switches (SwitchX) and Adapters )


                        # for i in {0..7}; do vconfig set_egress_map eth1.10 $i 1 ; done

                        # for i in {0..7}; do vconfig set_egress_map eth1.30 $i 3 ; done

                        # for i in {0..7}; do vconfig set_egress_map eth1.50 $i 5 ; done


                        At this point, the VM will generate traffic within each VLAN colored with the specific priority (on the VLAN tag)

                        e.g. traffic with the management VLAN will always egress with priority=1 on the VLAN tag.


                        At the Hypervisor

                        Run the command mlx_qos (see here and example: End-to-End QoS Configuration for Mellanox Switches (SwitchX) and Adapters)

                        This command maps the priority to TC (Traffic Class) and supply the rate limiting and ETS configuration.


                        Let me know how it goes.




                          • Re: QoS for Virtual Functions(SR-IOV devices)

                            Thanks ophirmaor, setting the egress_map(mapping sk_prio -> user_prio) inside the guest did the trick.


                            To add to knowledge base(although stated partly in documentation):

                            We don't use vconfig, what I got to work was set the socket priority option and map that priority(sk_prio) to a user priority(user_prio) using tc/tc_wrap.py inside the guest and mapped those user priority to hardware priority(traffic class) and control the rate.