QoS Tuning on Mellanox Spectrum Switches - FAQ

Version 6

    This post discusses QoS tuning on the Mellanox Spectrum switch.




    RED/ECN Configuration

    The consideration of setting RED/ECN profile thresholds effect on the performance of throughput and fairness. Low threshold defines the queue size per which the packets begin to be dropped/marked. These marking signals to the sender to drop the injection rate of the traffic. Due to congestion control behavior, the total injection rate of the senders at some point of time will reach a value below the network capacity. In this case the queues begin to empty. Hence, the recommendation is to begin marking when some spare queue built up, in order to have enough packets in the buffer to keep the outgoing link busy while the senders inject in too low rate. High threshold defines the slope, i.e. the dropping/marking probability given the queue occupancy. Lower slope is better for fairness, since the high rate flows have more probability to be marked. On other hand, larger high threshold makes the queues longer because the packets are dropped/marked in a higher queue occupancy in average. Our rule of thumb is to set thresholds to: min=150KB, max=1500KB.


    2. In what cases do I need to change the values, what would be the affect on my traffic?

    There are several general rules:

    • If throughput utilization is low, but not drops (or pauses) occur, it means that the thresholds are set too low and link is underutilized.
    • If drops (or pauses) occur, it means that the thresholds are set too high and the buffer is overflowed.
    • In cases of larger link bandwidth and/or longer links (eg. inter-data center) the recommendation is to increase the thresholds.


    Buffer management for Lossy/Lossless Fabric

    The general equation is to set reserved headroom size minus the xoff threshold to be equal to 2 * MTU+2 * link BW * link propagation time.


    2. Why do we recommend 17K for both xoff and xon for PFC? Don’t we need hysteresis here?

    We recommend to set the xoff and xon thresholds to the same value in order to send xon immediately when the buffer becomes available. There is no reason to reserve hysteresis since the used bandwidth for the pause frames is negligible.


    3. What is the reason for configuring alpha 2?

    Egress alpha of 2 was found to give the best performance results based on the article titled Absorbing micro-burst traffic by enhancing dynamic threshold policy of data center switches.

    Separate ingress and egress in Mellanox's Spectrun switch allows to extend the model described in the paper, such that we can also set ingress alpha. We chose to set ingress alpha to 8 for ingress fairness as well.


    4. Why do we need two pools?

    We can do buffer admission accounting from the ingress point of view and the egress point of view in parallel.


    5. Why do we differently configure the reserved buffer in case of ePool and iPool?

    Reserved buffer for ingress is used for lossless headroom, hence it needs to be large enough. There is no such constraint for egress buffer.