3 Replies Latest reply on Oct 30, 2017 2:08 AM by vmansur

    RDMA read failing with Remote Invalid Request Error

    vmansur

      I am attempting RDMA read between Mellanox ConnectX-4 adapter's and the same is failing with the following error CQE:

       

      [ 3134.946117] mlx5_0:dump_cqe:262:(pid 5984): dump error cqe

      [ 3134.946120] 00000000 00000000 00000000 00000000

      [ 3134.946122] 00000000 00000000 00000000 00000000

      [ 3134.946124] 00000000 00000000 00000000 00000000

      [ 3134.946127] 00000000 00008a12 100000ac 000008d2

       

      opcode=0xd, syndrome=0x12, vendor syndrome=0x8a

       

      Per Mellanox PRM opcode 0xd (13) is requester error, syndrome 0x12 is Remote_Invalid_Request_Error

      http://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf

       

      I haven't been able to figure out much on the Remote_Invalid_Request_Error error, but one link on the web (rdmamojo) pointed out that this could be due to qp_access_flags in remote QP wasn't configured to support this operation), insufficient buffering to receive a new RDMA or Atomic Operation request, or the length specified in a RDMA request.

       

      I have validated all of the above (qp_access flags on the responder has RDMA read enabled, enough buffer to receive RDMA read and length specified for RDMA read request is also fine on the requester). In addition I have also validated the Remote Addr/Rkey, Local Addr/Lkey and length and the entire WQE posted, they all looks fine.

       

      Any idea what else could cause this error (Remote_Invalid_Request_Error) ? Also I could find details on vendor syndrome of 0x8a, is there a way to decode this error for further details on the failure ?

       

      Thanks for your help !

        • Re: RDMA read failing with Remote Invalid Request Error
          joakimziegler

          I'm having a similar problem with the RHEL Inbox drivers and NFS over RDMA between some newly installed machines, getting Local Length Errors. While I don't have a solution for you specifically, I'm wondering, how did you do the validation you mention (qp_access flags, buffer sizes, etc.)? I'm wondering if any of the things you're mentioning might help me with my problem.

          • Re: RDMA read failing with Remote Invalid Request Error
            vmansur

            I was able to get this issue resolved. The problem was with the "max_dest_rd_atomic" QP attribute. Per documentation, "max_dest_rd_atomic" is "number of RDMA Reads outstanding at any time for this QP as a destination". Our code was using RDMACM for connection management. The way "max_dest_rd_atomic" is set by RDMACM is via attribute called "responder_resources" sent as an argument "rdma_conn_param" to "rdma_connect". The argument did not look obvious and hence was not set causing RDMACM to set "max_dest_rd_atomic" to zero. causing RDMA reads initiated to this node to fail.

             

            Basically the syndrome  "Remote_Invalid_Request_Error" means lot of issues that are not clearly defined, hence it took us time to figure out the exact issue. This is where I was hoping that "vendor syndrome" might come in handy to figure out root cause for "Remote_Invalid_Request_Error" or similar errors that have multiple failure reasons. Unfortunately "vendor syndrome" doesn't seem to be exported by Mellanox. It will help if Mellanox could export this error with its corresponding description such that it will help Mellanox RDMA users to debug similar issues.

             

            Thanks !