2 Replies Latest reply on Apr 3, 2016 4:18 PM by weijia

    I have problem porting my RDMA application from InfiniBand(Mellanox Connectx-3 40Gb IB) to RoCE(Connectx-4 100GbE).

    weijia

      So, I have a small application written in C testing RDMA write. It works perfectly on Mellanox ConnectX-3 40Gb IB NIC. We got new Mellanox ConnectX-4 100GbE hardware, which supports RoCE (Testing with 'ib_send_bw' tool shows its throughput is close to 98Gbps, which is exciting). I did some modification to the code at changing queue pair to RTR/RTS state:

      1) set queue pair attribute: attr->ah_attr.grh fields

      2) set attr->ah_attr.is_global to 1

      The problem happens at ibv_poll_cq() after RDMA write requests are sent. The work completion object(struct ibv_wc) reports failure with status=10(IBV_WC_REM_ACCESS_ERR). I double checked my ibv_reg_mr() call, it does have all of the access modes set up:

      ===============================================================

      ctxt.mr=ibv_reg_mr(ctxt.pd, ctxt.pages, page_size*MAX_PAGE,

               IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_READ | IBV_ACCESS_REMOTE_ATOMIC );

      ===============================================================

      I'm wondering what's happen and I printed the vendor_err in ibv_wc object:

      status=10, qp_num=323, vendor_err=136

      I can't find a reference explaining vendor_err=136, but I do have some information reported by the driver (it should be in ./libmlx5-1.0.2mlnx1/src/cq.c)

      ================================================================

      mlx5: compute28: got completion with error:

      00000000 00000000 00000000 00000000

      00000000 00000000 00000000 00000000

      00000000 00000000 00000000 00000000

      00000000 00008813 08000143 0000fed0

      ================================================================

      I guess those numbers mean something to Mellanox people. I hope you can help me out of this problem. BTW, the OFED version I use is 3.2-2.0.0.0 for ubuntu12.04-x86_64