0 Replies Latest reply on Dec 7, 2017 7:51 AM by xguerin

    Random packet loss when using raw packet QPs and L2 flows

    xguerin

      Hello there,

       

      I wanted to stress test my CX3 cluster. To do so, I am using the example applications provided here: Raw Ethernet Programming: Basic Introduction - Code Example. Except that instead of an ICMP packet the sender sends a raw ethernet payload (EtherType == 8) with an incrementing counter. I also added an extra wait time at the send of the send.

       

      With that setup I am noticing random packet loss (i.e. non-sequential counters) on the receiver side. The faster the rate (ie. the smaller the extra wait time) the larger the gaps in sequence numbers. My QPs are large enough (1000s of WR) and I do not run into any queue overrun. Besides, the problem also appear with very slow rates (~1pkt/s).

       

      I've tried to update all drivers/firmwares, to no avail. Here is my configuration:

       

      Machine A

       

        Device Type:      ConnectX3Pro

        Part Number:      MCX314A-BCC_Ax

        Description:      ConnectX-3 Pro EN network interface card; 40GigE; dual-port QSFP; PCIe3.0 x8 8GT/s; RoHS R6

        PSID:             MT_1090111023

        PCI Device Name:  0000:81:00.0

        Port1 MAC:        248a0772ca40

        Port2 MAC:        248a0772ca41

        Versions:         Current        Available

           FW             2.42.5000      2.40.7000

           PXE            3.4.0752       3.4.0746

       

      Machine B

       

        Device Type:      ConnectX3

        Part Number:      MCX354A-FCB_A2-A5

        Description:      ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6

        PSID:             MT_1090120019

        PCI Device Name:  0000:0b:00.0

        Port1 MAC:        7cfe90bed011

        Port2 MAC:        7cfe90bed012

        Versions:         Current        Available

           FW             2.42.5000      2.40.7000

           PXE            3.4.0752       3.4.0746

       

      Both machines use the Mellanox OFED drivers version 4.2-1.2.0.0 and runs RHEL 7.3 with kernel 3.10.0-514.6.1.el7.x86_64.

       

      Any help would be welcomed