HowTo Troubleshoot Mellanox Ethernet Switches via Port Counters

Version 10

    This post discusses and shows the way to troubleshoot network problems via Mellanox port counters.

     

    References

    • MLNX-OS User Manual

     

     

    Ethernet port counters of SwitchX based switches

    In the table below you can find the important counters for troubleshooting:

     

    CounterDescription
    RX error packetsThe number of ingress packets that contained errors preventing them from being deliverable to a higher-layer protocol. In such cases, you may need to check the physical connection (e.g. cables).
    RX discard packets

    The number of ingress packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. In most cases it is related to lack of resources such as buffer overflow. In other cases, such as wrong configuration on the port, this counter may count packets that were filtered due to VLAN mismatch (e.g. the VLAN ID on the packet was not configured on this port), it is recommended to verify the switchport configuration (access, trunk, hybrid) on the port and the list of assigned VLANs and compare it to the remote port configuration.

    RX fcs errors

    The number of ingress packages that are not an integral number of octets in length and do not pass the FCS check.

    RX undersize packets

    The number of ingress packets received that were less than 64 octets long (excluding framing bits, but including FCS octets) and were otherwise well formed. In normal conditions, this counter should remain on zero.

    RX pause packets

    The number of MAC Control packets received with an opcode indicating the PAUSE operation.

    This counter is important to identify congestion in your lossless network (flow control is enabled - global pause). In cases where flow control is disable this counter will remain zero. In case PFC is enabled, this counter will remain zero (there is another counter per port per priority that counts pause packets).

    RX unknown control opcode

    The number of ingress packets received which were discarded because of an unknown or unsupported protocol. In normal conditions this counter will remain zero.

    For example, in case the port is enabled with global pause (flow control) and receiving PFC frames from the remote side, this counter will raise.

    RX symbol errors

    The number of times PHY indicates a 'Receive Error' signal. In normal conditions this counter will remain zero. In cases this counters grows, may need to check the cable connectivity.

    TX discard packetsThe number of egress packets which were chosen to be discarded even though no errors had been detected to prevent their being transmitted. In normal conditions this counter will remain zero.
    TX pause packets

    The number of MAC Control packets transmitted with an opcode indicating the PAUSE operation.

    This counter is important to identify congestion in your lossless network (flow control is enabled - global pause). In cases where flow control is disable this counter will remain zero. In case PFC is enabled, this counter will remain zero (there is another counter per port per priority that counts pause packets).

    TX wait

    The number of ticks during which the port selected had data to transmit but no data was sent during the entire tick either because of insufficient credits or because of lack of arbitration.

    A tick is a multiple of the time needed to transfer one byte on a single lane, that is, the symbol time.

    For example, for links operating at Ethernet 10GE on a single lane or 40GE on 4 lanes, the symbol time is 0.8 nanoSec.

    This counter is important for congestion discovery in your network.

    TX wait useconds

    The time in micro-seconds during which the port selected had data to transmit but no data was sent during the entire time either because of insufficient credits or because of lack of arbitration.

    This counter is important for congestion discovery in your network.

    TX queue depth TC0 ... TC3

    This counter is not a regular counter, it contains the transmit queue depth in bytes on traffic class per port.

    This number is important for congestion discovery. In case of non-congested port, this counter will remain zero.

    Note: unlike other counter this counter can grow and shrink according to the size of the TC queue.

     

    Ethernet port priority counters (per port per priority) for SwitchX based switches

    In the table below you can find the important counters for troubleshooting:

     

    CounterDescription
    RX pause packets

    The number of MAC Control packets received with an opcode indicating the PAUSE operation for this priority. This counter is important to identify congestion in your lossless network. In case PFC is enabled on this priority, this counter may raise in case of congestion for this priority.

    RX pause duration milliseconds

    The time in microseconds that transmission of packets have been paused for this priority.

    This counter is important to identify congestion in your lossless network.

    TX pause packetsThe number of MAC Control packets transmitted with an opcode indicating the PAUSE operation for this priority. This counter is important to identify congestion in your lossless network. In case PFC is enabled on this priority, this counter may raise in case of congestion for this priority.

     

    Showing the counters via CLI

     

    1. To show the full port counters, simply run the following command on the desired port (e.g. 1/1):

    switch (config) # show interfaces ethernet 1/1 counters

    Rx
      269654              packets
      2552                unicast packets
      267001              multicast packets
      101                  broadcast packets
      30143731            bytes
      3322                packets of 64 bytes
      198588              packets of 65-127 bytes
      67744                packets of 128-255 bytes
      0                    packets of 256-511 bytes
      0                    packets of 512-1023 bytes
      0                    packets of 1024-1518 bytes
      0                    packets Jumbo
      0                    error packets
      0                    discard packets
      0                    fcs errors
      0                    undersize packets
      0                    oversize packets
      0                    pause packets
      0                    unknown control opcode
      0                    symbol errors

    Tx
      383158              packets
      100                  unicast packets
      380506              multicast packets
      2552                broadcast packets
      37993172            bytes
      0                    discard packets
      0                    pause packets
      0                    TX wait
      0                    TX wait useconds
      0                    queue depth TC0
      0                    queue depth TC1
      0                    queue depth TC2
      0                    queue depth TC3
    switch (config) #

       

     

     

    2. To show the full port priority counters, run the following command on the desired port (e.g. 1/1 priority 3):

    switch (config) # show interfaces ethernet 1/1 counters priority 3

    Rx
      0                    packets
      0                    unicast packets
      0                    multicast packets
      0                    broadcast packets
      0                    bytes
      0                    pause packets
      0                    pause duration milliseconds

    Tx
      0                    packets
      0                    unicast packets
      0                    multicast packets
      0                    broadcast packets
      0                    bytes
      0                    pause packets
    switch (config) #