0 Replies Latest reply on Dec 16, 2015 7:50 AM by ronniebr

    RDMA_CM_EVENT_ADDR_ERROR when running in RoCE mode

    ronniebr

      I have developed a test client server application which uses the verbs library and seems to work well when I have my ConnectX-3 Pro cards configured to use Infiniband.

       

      However, if I reconfigure the ports to use Ethernet mode and try to use roce v1 mode my client always fails with the same error whenever I try call rdma_resolve_addr(...) - it generates RDMA_CM_EVENT_ADDR_ERROR, error: -2 (ENOENT).

       

      If I try use udaddy instead of my own application I see exactly the same error:

       

      >strace -f -s 32 -x udaddy -s 192.168.0.100

      ...

      open("/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3

      ...

      write(1, "udaddy: connecting\n", 19udaddy: connecting)    = 19

      write(3,"\x15\x00\x00\x00\x10\x01\x00\x00\x00\x00\x00\x00\xd0\x07\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 280) = 280

      write(3, "\x0c\x00\x00\x00\x08\x00\x48\x01\xa0\xbc\x4e\x2a\xff\x7f\x00\x00", 16) = 16

      write(1, "udaddy: event: RDMA_CM_EVENT_ADD"..., 51udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -2) = 51

      write(1, "test complete\n", 14test complete) = 14

      write(3, "\x01\x00\x00\x00\x10\x00\x04\x00\x30\xc1\x4e\x2a\xff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 24) = 24

      close(3) = 0

      write(1, "return status -2\n", 17return status -2) = 17

      shutdown(4, 2 /* send and receive */)   = 0

      close(4) = 0

      exit_group(-2)  = ?

       

      The ENOENT error seems to be coming from the rdma_cm kernel module in response to the RDMA_USER_CM_CMD_RESOLVE_ADDR command which is written to /dev/infiniband/rdma_cm - see write(3,"\x15...

       

      Looking briefly at the rdma_cm code the ENOENT error code typically seems to be returned when there is no matching entry found in the GID cache.

       

      Is there something I should be doing on my system to ensure that the GID cache is populated?

       

      The system is running RHEL6.6 with MLNX_OFED_LINUX-3.1-1.0.3-rhel6.6-x86_64 installed. 

       

      Thanks.

       

      -Ronnie