This post discusses the alpha parameter which is part of the shared buffer configuration of Mellanox Spectrum™ based switch systems.
- HowTo Configure ECN on Mellanox Ethernet Switches (Spectrum)
- HowTo Configure Mellanox Spectrum Switch for Resilient RoCE
- HowTo Configure Mellanox Spectrum Switch for Lossless RoCE
Mellanox Spectrum switches support shared memory among all packets which need to be buffered (called also shared buffer). This is contrary to the ingress packet buffer memory, which was used in previous devices (SwitchX), and on which each port has a bound memory to buffer incoming packets. Using shared buffer, all incoming packets can share the whole packet memory space, independent of the port they arrived on.
Sharing the whole buffer space might cause starvation and unfairness problem. Assume that the switch is the congestion point of two long flows that arrive on different ingress ports and destined to the same egress port. Since the egress port is not able to forward all the packets on time, the excess packets are buffered. If the flows are long enough their packets will fill the whole shared buffer space. In this case there will not be space left to buffer packets of other flows that are traversing in the switch, causing starvation of other flows.
For the purpose of fair buffer usage, an advanced management scheme is used. The number of buffered packets from each region (region is defined as: ingress port, ingress port/PG, egress port, egress port/TC, and egress multicast SP) is bound using a dynamic threshold. Upon packet arrival, its regions are checked for usage and compared to the thresholds. Only if no region exceeded its threshold, the packet is buffered.
The dynamic condition per region is defined using the following equation: Region usage [Bytes] < Threshold [Bytes] = alpha * free_buffer [Bytes].
Alpha is a parameter between 0 and infinity, where the discrete values that can be configured are as follows: 0, 1/128, 1/64, 1/32, 1/16, 1/8, 1/4, 1/2, 1, 2, 4, 8, 16, 32, 64, inf.
switch (config) # interface ethernet 1/1 ingress-buffer iPort.pg0 map pool iPool0 type lossy reserved 20480 shared alpha ?
The parameter free_buffer is the current (re-evaluated on packet ingress) free space in the pool, where the packet is destined to be buffered (Spectrum allows configuration of several pools in order to statically divide the buffer among several traffic types.). Free space is simply calculated as pool size minus the amount of data that is buffered on this pool.
Following the above threshold equation, it can be deduced that each region can use a maximum of total_buffer * alpha / (alpha+1) where total_buffer is the configured pool size.
max_used (the threshold) = alpha*free = alpha*(total_buffer-max_used)
=> max_used = total_buffer* alpha / (alpha+1)
When "k" regions (several ingress port groups that compete for a shared buffer at the same time) of the pool are congested (meaning that k regions are trying to buffer as many packets as possible) each region is bound to total_buffer * alpha / (k*alpha+1) .
The intuition behind this admission scheme is to allow a larger space for region usage when more of the buffer is free, and reduce the allowed usage of a region when the free buffer space is getting lower.
1. The thresholds for buffer allowance are checked only once per packet, during packet arrival. The buffered packets are not dropped later if the free space decreases and the region is not satisfying the threshold condition anymore.
2. When alpha is configured to infinity, but there is no free buffer space, the condition is considered to be violated (the packet is not accepted to the buffer).