This post provides an example for how to bring up NVMe over Fabrics persistent association between initiator (host) and target, using RDMA transport layer.
This solution uses systemd services to ensure NVMe-oF initiator remains persistent in case fatal errors occur in the low level device.
Note: This post assumes that the user can configure initial NVMEoF association between target and host, and that the RDMA layer is enabled. Otherwise, please refer to HowTo Configure NVMe over Fabrics.
- http://www.nvmexpress.org/specifications/NVMe-oF Configuration on Vimeo
- HowTo Configure NVMe over Fabrics Target using nvmetcli
- HowTo Configure NVMe over Fabrics (NVMe-oF) Target Offload
- Two servers, one configured as NVMe-oF target, and the other used as host (initiator).
Configuration Video By Mellanox Academy
1. Configured NVMe-oF target that exposes subsystems over RDMA transport using traddr 18.104.22.168, and trsvcid 4420.
2. Configured network interfaces on the initiator side that can reach the target portal (e.g. 22.214.171.124:4420). These interfaces should be able to recover fatal errors (e.g. can be configured using BOOTPROTO=static in the ifcfg configuration file).
3. In order to be persistent also after reboot, one should make sure to load nvme-rdma module in the initiator side at boot time (e.g. by adding nvme-rdma to /etc/modules for debian or by adding nvme-rdma to /etc/modules-load.d/nvme-rdma.conf in RHEL).
Note: Also need to make sure that the needed network interfaces to reach the target portal (e.g. 126.96.36.199:4420) are configured at boot time as well.
NVMe-oF Persistent Initiator Configuration
- Latest version of nvme-cli installed (can be installed directly from GitHub - linux-nvme/nvme-cli: NVMe management command line interface).
1. Create nvme_fabrics_persistent.service script file under /etc/systemd/system directory:
Description=NVMf auto discovery service
Note: make sure nvme-cli installation created has nvme executable under /usr/sbin/. Otherwise, locate it using "which nvme" command, and copy the result to ExecStart section in nvme_fabrics_persistent.service script.
2. Create nvme_fabrics_persistent.timer script file under /etc/systemd/system directory:
Description=NVMf auto discovery timer
Note: In this example, we will trigger the timer every 60 seconds (this can be changed by setting a different value for OnUnitActiveSec attribute).
3. Start the new systemd service (and enable it to ensure the service is started at boot):
# systemctl start nvme_fabrics_persistent.service
# systemctl enable nvme_fabrics_persistent.service
4. Start the new systemd timer (and enable it to ensure the timer is started at boot):
# systemctl start nvme_fabrics_persistent.timer
# systemctl enable nvme_fabrics_persistent.timer
5. Create a configuration file (must be under /etc/nvme/discovery.conf) with the target configuration:
--transport rdma --traddr 188.8.131.52 --trsvcid 4420
Note: In this example, we will connect to all the allowed subsystems that are exposed using 184.108.40.206:4420 portal. In order to connect to other subsystems under different portals, you should duplicate the above line and update the relevant attributes. Note that other options (such as --host-traddr/--hostnqn/etc...) can be configured and added to the configuration file.
At this point, NVMe-oF initiator will be durable to HCA fatal errors for the configured portals written in /etc/nvme/discovery.conf file.
Note: In case you would like to remove a portal from persistent group, the suitable configuration line should be removed from /etc/nvme/discovery.conf file.