0 Replies Latest reply on Nov 7, 2017 9:46 PM by 馒头亮

    ceph+RDMA the rdma information(gid\lid\qpn) that are received by the osd task are 0

    馒头亮

      Hi,I have do as this guide said. But my ceph cluster is health_err. The error information is "HEALTH_ERR 320 pgs are stuck inactive for more than 300 seconds; 320 pgs stuck inactive; 320 pgs stuck unclean".

       

       

      In ceph ms log, the rdma information(gid\lid\qpn) that are send by the osd task are right. But the rdma information(gid\lid\qpn) that are received by the osd task are 0.

       

      Ceph log:

       

      Infiniband send_msg sending: 3, 1321022, 0, 0, fe80000000000000248a070300f8cd01

      Infiniband recv_msg recevd: 0, 0, 0, 0, ▒̽V▒▒

       

       

      sudo ceph daemon osd.0 perf dump AsyncMessenger::RDMADispatcher

      {

          "AsyncMessenger::RDMADispatcher": {

              "polling": 0,

              "rx_bufs_in_use": 0,

              "rx_bufs_total": 8192,

              "tx_total_wc": 9514,

              "tx_total_wc_errors": 9514,

              "tx_retry_errors": 4759,

              "tx_wr_flush_errors": 4755,

              "rx_total_wc": 0,

              "rx_total_wc_errors": 0,

              "rx_fin": 0,

              "handshake_errors": 0,

              "total_async_events": 0,

              "async_last_wqe_events": 0,

              "created_queue_pair": 5040,

              "active_queue_pair": 4

          }

      }