0 Replies Latest reply on May 13, 2016 12:49 PM by tepp

    ConnectX-2 CQE remote access error (Protection Error on remote data buffer) vendor syndrome 0x88

    tepp

      Dell hardware running Oracle Solaris 11.3 serving as a storage appliance:

      Kernel version: SunOS 5.11 11.3
      Entire Version : 0.5.11-0.175.3.4.0.5.0
      System Type:  Dell PowerEdge R730xd

      pci bus 0x0081 cardnum 0x00 function 0x00: vendor 0x15b3 device 0x673c
        Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
        CardVendor 0x15b3 card 0x0021 (Mellanox Technologies, Card unknown)
         STATUS    0x0010  COMMAND 0x0046
         CLASS     0x02 0x80 0x00  REVISION 0xb0
         BIST      0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
         BASE0     0x00000000c8000000 SIZE 1048576  MEM64
         BASE2     0x0000037fff800000 SIZE 8388608  MEM64 PREFETCHABLE
         MAX_LAT   0x00  MIN_GNT 0x00  INT_PIN 0x01  INT_LINE 0x0f

       

      Mar 30 21:04:53 zfstgt-1 genunix: [ID 937861 kern.info] mcxnex0: CQE ERR: IB Client 'srpt'
      Mar 30 21:04:53 zfstgt-1 genunix: [ID 204426 kern.warning] WARNING: mcxnex0: CQE ERR: cqe ffffa100fbff5fa0 QPN 0x40064 indx 0x1d81 status 0x13 vendor syndrome 0x88
      Mar 30 21:04:53 zfstgt-1 genunix: [ID 158346 kern.info] mcxnex0: CQE remote access error

       

      Mar 23 22:41:53 zfstgt-2 genunix: [ID 937861 kern.info] mcxnex0: CQE ERR: IB Client 'srpt'
      Mar 23 22:41:53 zfstgt-2 genunix: [ID 204426 kern.warning] WARNING: mcxnex0: CQE ERR: cqe ffffa10004dd6e80 QPN 0x1c0057 indx 0x48d9 status 0x13 vendor syndrome 0x88
      Mar 23 22:41:53 zfstgt-2 genunix: [ID 158346 kern.info] mcxnex0: CQE remote access error

       

      MCXNEX_CONT(state, "CQE ERR: IB Client '%s'", ulp);

      CQE = Completion Queue Entry

      ulp = Upper Layer Protocol

      srpt = SCSI RDMA Protocol Target Driver for Infiniband (IB)

      QPN = Completion Queue Pair Number

       

      MCXNEX_WARNING(state,

           "CQE ERR: cqe %p QPN 0x%x indx 0x%x status 0x%x "

           "vendor syndrome 0x%x", cqe,

           MCXNEX_CQE_QPNUM_GET(cq, cqe),

           MCXNEX_CQE_WQECNTR_GET(cq, cqe), status,

           MCXNEX_CQE_ERROR_VENDOR_SYNDROME_GET(cq, cqe));

       

      status = #define    MCXNEX_CQE_REM_ACC_ERR    0x13

      switch (status) {

           case MCXNEX_CQE_REM_ACC_ERR:

               MCXNEX_CONT(state, MCXNEX_FMA_REMACC);

               ibt_status = IBT_WC_REMOTE_ACCESS_ERR;

               break;

       

      #define    MCXNEX_FMA_REMACC    "CQE remote access error"

      #define    IBT_WC_REMOTE_ACCESS_ERR    23    /* Protection Error on remote data buffer */

       

      Does vendor syndrome 0x88 add further context to this event or just confirm the protection error on remote data buffer?