9 Replies Latest reply on Nov 9, 2018 12:51 AM by tbarbette

    rx-out-of-buffer

    tbarbette

      Hi Community,

       

      I'd like to understand better a problem we have, which seems to be linked to the fact that DPDK's xstats/ethtool -S shows a lot of "rx-out-of-buffer" packets. I found the performance counter document, but it does not say much about why this could happen, which buffer we're speaking about. I quote "Number of times  receive queue had no software buffers allocated for the adapter's incoming traffic.". As rx_nombufs (DPDK stats) is 0 I guess it does not mean that there is not enough software buffers. Are they some internal MLX buffers? What can be done to prevent that?

       

      Thanks,

      Tom

        • Re: rx-out-of-buffer
          yairi
          This message was posted by Yair Ifergan on behalf of Martijn van Breugel

          Hi Tom,

          Thanks you for posting your question on the Mellanox Community.

          Based on the information provided, the following Mellanox Community document explains the 'rx_out_of_buffer' ethtool/xstat statistic.

          You can improve the rx_out_of_buffer behavior with tuning the node and also modifying the ring-size on the adapter (ethtool -g <int>)

          Also make sure, you follow the DPDK Performance recommendations from the following link -> https://doc.dpdk.org/guides/nics/mlx5.html#performance-tuning

          If you still experience performance issues after these recommendations, please do not hesitate to open a Mellanox Support Case, by emailing to support@mellanox.com

          Thanks and regards,
          ~Mellanox Technical Support

          • Re: rx-out-of-buffer
            tbarbette

            Thanks for the answer ! But I don't see what you refer as "the following Mellanox Community document" ? So I kind of still don't know what it is. If you refer to the line "Number of times receive queue had no software buffers allocated for the adapter's incoming traffic.", then the tuning you mention will not change the problem because the rings are never full, and the CPU is not busy. So what buffer does "rx-out-of-buffer" count if it's not the ring buffers?

              • Re: rx-out-of-buffer
                martijn@mellanox.com

                Hi Tom,

                 

                My apologies for not providing the link to the Mellanox Community document. The link is -> Understanding mlx5 ethtool Counter

                 

                The "rx_out_of_buffer" counter from 'ethtool -S' indicates RX packet drops due to lack of receive buffers. The lack of receive buffers can be related to a system tuning issue or system capability.

                 

                What happens, when you turn off 'interrupt coalescence' on the NIC with the following command -> # ethtool -C <int> adaptive-rx off rx-usecs 0 rx-frames 0

                 

                Also make sure, you disable flow-control on the NIC and set the PCI Max Read Request to '4096'. Link to document -> Understanding PCIe Configuration for Maximum Performance

                 

                Thanks and regards,
                ~Mellanox Technical Support

                  • Re: rx-out-of-buffer
                    tbarbette

                    Thanks, but how is it possible to have rx_out_of_buffer and all queues that have not a single "imissed" (DPDK counter that says how much packet could not be received because of a lack of buffers) in any single queues ? Something does not add up here.

                     

                    We use DPDK so ethtool -C will not impact the performance as those would be overridden by DPDK.  We did disable flow-control and send max read request. But my questions here is not the performance, we have a ticket with support for that, it is specifically about the rx_out_of_buffer. I do not understand how that number can increase while the rings themselves do not have any reported miss?

                • Re: rx-out-of-buffer
                  tbarbette

                  So I could finally answer this specific questions with help from support :

                   

                  The counters for "imissed" that is the number of packets that could not be delivered to a queue is not implemented with DPDK mellanox driver. So the only way to know if a specific queue dropped packets is to track it with eth_queue_count, for which I added support in the DPDK driver, it is coming in the next release.

                  rx_out_of_buffer is actually (what should have been) the imissed aggregated for all queues. That is the number of packets dropped because your CPU does not consume them fast enough.

                   

                  In our case, rx_out_of_buffer did not explain all the drops.

                   

                  So we observed that rx_packets_phy was higher than rx_good_packets. Actually, if you look at ethtool -S (that contains more counter than DPDK xstats)  you will also have rx_discards_phy.

                  If there is no intrinsic error in the packets (checksums etc), you'll have rx_packets_phy = rx_packets_phy + rx_discards_phy + rx_out_of_buffer.

                   

                  So rx_discards_phy is actually (as stated in the mentioned doc above) the number of packets dropped by the NIC, not because there were not enough buffers in the queue but because there is some congestion in the NIC or the bus.

                  We're now investigating why that happens, but this question is resolved.

                   

                  Tom