Counters Troubleshooting for Linux Driver

Version 6

    One of the basic troubleshooting option for networking is counter checking, this post is focused on troubleshooting options via port counters for Mellanox driver of ConnectX-3/ConnectX-3 Pro adapters.

     

    References

    • MLNX_EN User Manual (click here)
    • RoCE Application Note: (click here)


    Some counters are very useful for networking troubleshooting, however, it is important to understand the location it is being used (increased). Mellanox driver supports various of counters in different levels, when the server is configured with SR-IOV it becomes even more tricky. The figure below shows different locations for counters checks (hardware and software locations).

     

    In the example shown in the figure below you can see that the hypervisor is connected to the physical function (PF) while each VM is connected to a virtual function (VF). Note that this example may be typical. However, there are other options when the hypervisor is not connected to the PF, but one of the VM (via pass through mode). Another option is to connect another VF to the hypervisor (e.g. probed VF).

     

    ofed.png

    In addition, counters can be seen in different Linux tools. For example:

    • Running ifconfig or ip link show commands on the hypervisor will show counters at the PF (2). Similarly when running ifconfig on the VM, it will show the counters at the VF (3). Note that in non-SRIOV mode those commands will show also counters at point (1).
    • RDMA (RoCE) counters bypass the kernel, and cannot be seen via standard Linux commands. The counters can be viewed in  /sys/class/infiniband/<device>/ ports/<port-number>/counters. It is being counted at points (5) and (7).
    • The ethtool shows many counters in different levels. At the hypervisor it counts at points (1) (2) and (4) while on the VM it counts at points (3) and (6).

     

    One of the tools to read Ethernet counters is Ethtool (ethtool). To get counters via ethtool use the command:

     

    #ethtool -S <if-number>

              

     

    The full list of counters can be seen in MLNX_EN User Manual (click here). Here are some troubleshooting guidance of most of the important counters shows via ethtool. When running the ethtool in the hypervisor or VM you get different list of counters.

     

    Important counters - when running the ethtool on the hypervisor (in the example above):

    CounterDescription
    rx_errors

    Number of received packets that were dropped due to PHY layer related errors. For example:

    • symbol error, or an invalid block.
    • Length related errors (greater than MTU octets, length less than 64 octets, error in length)
    • Bad CRC that are not runts, jabbers, or alignment errors.

    This counter is increased at point (1) in the figure above.

    rx_dropped

    Number of received packets which were chosen to be discarded even though no errors had been detected to prevent them from passing to the upper layer. For example, drop due to buffer overflow.

    This counter is increased at point (1) in the figure above.

    rx_over_errors

    Number of received frames that were dropped due to on hardware port receive buffer overflow.

    This counter is increased at point (1) in the figure above. In most cases, rx_over_errors is equal to rx_dropped.

    rx_crc_errors

    Number of received frames with a bad CRC that are not runts, jabbers, or alignment errors.

    This counter is increased at point (1) in the figure above.

    rx_jabbers

    Number of received frames with a length greater than MTU octets and a bad CRC.

    This counter is increased at point (1) in the figure above.

    tx_errors

    Number of frames that failed to transmit. Include frame dropped due to error in the length field.

    This counter is increased at point (1) in the figure above.

    tx_dropped

    Number of transmitted frames that were dropped.

    This counter is increased at point (1) in the figure above.

    vport_rx_dropped

    Received packets discarded due to luck of software receive buffers (WQEs).

    Important indication to weather RX completion routines are keeping up with HW ingress packet rate.

    This counter is increased at point (2) in the figure above.

    vport_rx_filtered

    Received packets dropped due to packet check that was failed. For example:

    • Incorrect VLAN
    • Incorrect Ethertype
    • unavailable queue/QP
    • Loopback prevention

    This counter is increased at point (2) in the figure above.

    Note: In high performance  scenarios vport_rx_filters may increment due to rx_over_errors. In addition,

    In SRIOV configurations vport_rx_filters increments can be seen and it is a normal condition (expected).

    vport_tx_errors

    Packets dropped due to transmit errors.

    This counter is increased at point (2) in the figure above.

     

    Important counters - when running the ethtool on the VM (in the example above):

     

    CounterDescription
    rx_errors

    Received packets dropped due to packet check that was failed. For example:

    • Incorrect VLAN
    • Incorrect Ethertype
    • unavailable queue/QP
    • Loopback prevention

    This counter is increased at point (3) in the figure above.

    rx_dropped

    Received packets discarded due to software receive buffers (WQEs).

    This counter is increased at point (3) in the figure above.

    tx_errorsNumber of frames that failed to transmit. Include frame dropped due to error in the length field. This counter is increased at point (3) in the figure above.

     

     

    Software Counters

     

    counterDescription
    rx_lro_aggregated

    The number of packets processed by the LRO (Large Receive Offload) mechanism (good for IPv4 TCP), and should be equal to rx_packets in good/normal condition.

    rx_lro_flushed

    The number of offloaded packets the LRO mechanism passed to kernel. Ideally the packet size is 64KB (depends on kernel). 64KB is the maximum packet size.

    rx_lro_no_desc

    This is abnormal condition, and mostly will not happen. The LRO mechanism has no room to receive packets from the adapter. In normal condition, it should not increase, mostly when using 64 packets budget and flush LRO descriptors every NAPI cycle. In addition, LRO has a lot of space (much more than 64).

    tx_tso_packets

    When using TCO  (TCP Segmentation Offload), it offloads tasks from the CPU and improve CPU utilization. This counter shows the number of offloaded TSO packets received by the driver from the TCP layer. The rate of TSO This counter is correlated strongly with the TX performance and CPU utilization. TSO is crucial for wire speed performance, and the kernel will enable it only when the CPU is not on heavy load.

    Some other reference can be found here.

    tx_queue_stopped

    The number of times the kernel didn't manage to send packets as the queue was full. the tx_queue_stopped and tx_wake_queue are usually equal (TX queue is stopped and later gets wake up call). This is an important indication to whether TX completion routines are keeping up with the transmit routines. If the application is sending in an higher rate than driver is evicting CQEs from the buffer this will start to go up.

    tx_wake_queue

    The number of time the kernel got message from the adapters that there is a queue to run (tx_queue_stopped is released).  his is an important indication to whether TX completion routines are keeping up with the transmit routines. If the application is sending in an higher rate than driver is evicting CQEs from the buffer this will start to go up.

    tx_timeout

    This a rare event, that usually indicate on a severe issue. It means around 15 sec timeframe that passed since a packet was sent without a CQE generated. Usually a lost interrupt or a bad cable.

    rx_csum_goodThe number of packets received with good checksum (in L4).
    rx_csum_noneThe number of packets received with no checksum (in L4).
    tx_cksum_offloadThe number of packets sent with hardware checksum.