I ran into an issue with a simple configuration that uses the NVME-oF protocol between a Host server with a ConnectX-4 and our NVME-oF target connected via a 100G switch MLNX switch. Here are the detail on teh configuration and the issue:
- Linux kernel version:
- root@host139# uname -a
Linux host139 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
- root@host139:# cat /etc/os-release
VERSION="16.10 (Yakkety Yak)"
- MLNX RNIC:
Product Name: CX415A - ConnectX-4 QSFP28
Part number: MCX415A-CCAT
FW Version: firmware-version: 12.16.1020
- Host connected to a 100G switch port
- Target connected to a 25G switch port
Target that support NVME-oF protocol
Summary of the issue:
- Host is running an FIO script that sends workload to a NVME-oF target across the MLNX switch to the target.
- FIO workload is 100% Writes – 4 Jobs / 32 Queue Depth / 8K Random Writes
- A few seconds into the test the Target sends the MLNX RNIC an RNR NAK with a “wait time” of 0.12ms for a SEND (NVME Write) frame from the Host. The Host has sent other SEND frames after the one that got NAKd which the target does not acknowledge as per the spec
- MLNX RNIC re-transmits the NAK’d frame after 1.5ms and that IO compltetes successfully
- However, the Host does not re-transmit any of the SEND frames that were sent after the one that got NAKd so the FIO job just hangs.
This seems to be a fundamental behavior of RNR NAK which I think the MLNX RNIC would handle – so I am hoping that there could be some configuration setting or something similar that either the target should have done or should have been configured on the Host for this to work.
Please let me know if there is any additional information is needed to understand the issue.