0 Replies Latest reply on Feb 25, 2018 1:09 PM by johnsmith

    Soft RoCE not working (no errors)

    johnsmith

      Hello,

       

      I am attempting to run soft RoCE and interface with a X4 card in a different computer.

       

      I am running CentOS 7.4, and I installed MLNX_OFED version 4.2-1.2.0.0. The install finished without error, and I ran the service restart command when prompted. I proceed to try and setup soft RoCE following the directions here: HowTo Configure Soft-RoCE.

       

      When I run rxe_cfg status/start the script complains that the rdma_rxe module is not loaded (and no other errors even in verbose mode). When I run run lsmod | grep rdma_rxe, I see that rdma_rxe is in fact loaded loaded, and that it is using mlx_compat. Small variation from the above instructions on my system - rdma_rxe is using mlx_compat, not ib_core (even though ib_core is loaded and used by mlx_compat). I figured this is some wrapper used by Mellanox in newer version of the OFED. I have even tried running modprobe rdma_rxe and see no error messages in loading rdma_rxe, and dmesg does not show any error messages from the kernel. I have also tried reloading the module and restarting the machine.

       

      After 'starting' rxe_cfg, doing rxe_cfg add <adapter_name> does nothing. It does load any IB devices associated with the NIC, and I still see the 'rdma_rxe module is not loaded' message.

       

      I looked around a bunch and could not find anything which helped. I have also tried the same stuff with version 4.2-1.0.0.0 of MLNX_OFED. This computer did have a X4 card in it when I first installed the OFED package. I took it out in case it was preventing soft RoCE from working on other NICs, restarted, re-installed OFED, and did the same troubleshooting without the Mellanox card in.

       

      Any help would be appreciated.