Are you using the two ports of the NIC or just one?
What is the OS version and kernel you are using?
Did you try the newer EN driver? (1.5.10 or the latest MOFED package).
I am using OS X version 10.8.2.
It doesn't matter if I use one of both ports. It just takes longer to hit it with two ports.
I have eyeballed the differences between 1.5.9 and 1.5.10 and see nothing which would effect this.
My most recent trial, I reduced the # rings to 1 RX ring. It has stopped. Here is some debug info I (now) have showing the ring params:
ring 0/1 cq 0xffffff80da688000 cqn 8c cons 3ff prod 3ff bytes 2335f59df0e4 pkts ffffffff
Wow. That is nice. The pkts counter is just software used for adjusting moderation, so that fact that it it about to wrap is incidental.
The last-consumed slot at 3fe shows:
cqe[3fe] owner_sr_op 81 vlan_my_qpn 105 status 1440 byte_cnt 233a checksum ffff
and the next one to be consumed:
cqe[3ff] owner_sr_op 1 vlan_my_qpn 105 status 1440 byte_cnt 233a checksum ffff
cqe owner_sr_op 81 vlan_my_qpn 105 status 1440 byte_cnt 233a checksum ffff
So i will indefinitely examine that location waiting for the owner_sr_op bit to change.
If I look at the stats returned from DUMP_ETH_STATS I see that the RX count still advances. Something is receiving the packets, and dropping them. I get no errors in any of the stats counters I have examined.
If I do a "ping -f -b <bcast>" from the other side, I see the RX counters for BCAST frames received from DUMP_ETH_STATS increasing nicely.
It if was just a matter of missing an interrupt, the polling would fix things. However, the fact that I can repeatedly examine the next-to-be-consumed cqe and it is not being updated has me perplexed.
Are there some credits or something which need to be replenished?
figured it out finally.
There is a bug in the linux driver code. I have no idea how it manages to work under linux, but there is some code in the RX path which breaks for me and causes the RX ring to be depleted in some fashion, right around the 32-bit wrapping.
I note that this area changed in 1.5.10. The code in question was deleted.