2 Replies Latest reply on Oct 2, 2017 12:43 PM by kimabdlinux@gmail.com

    ceph + rdma error: ibv_open_device failed

    xjtabc

      I followed this doc:

      Bring Up Ceph RDMA - Developer's Guide

      But mon could not start with this error:

      7f5acb890700 -1 Infiniband Device open rdma device failed. (2) No such file or directory

       

      I checked ceph code:

      116  name = ibv_get_device_name(device);

      117  ctxt = ibv_open_device(device);

      118  if (ctxt == NULL) {

      119    lderr(cct) << __func__ << " open rdm a device failed. "<< cpp_strerror(errno) << dendl;

      120    ceph_abort();

      121  }

       

      Then

      gdb info:

      Breakpoint 1, Device::Device (this=0x55555f3597e0, cct=0x55555eec01c0, d=<optimized out>)

          at /usr/src/debug/ceph-12.2.0/src/msg/async/rdma/Infiniband.cc:116

      116  name = ibv_get_device_name(device);

      $7 = {ops = {alloc_context = 0x0, free_context = 0x0}, node_type = IBV_NODE_CA, transport_type = IBV_TRANSPORT_IB,                                 -------------------------------------------*(struct ibv_device *) device

        name = "mlx4_0", '\000' <repeats 57 times>, dev_name = "uverbs0", '\000' <repeats 56 times>,

        dev_path = "/sys/class/infiniband_verbs/uverbs0", '\000' <repeats 220 times>,

        ibdev_path = "/sys/class/infiniband/mlx4_0", '\000' <repeats 227 times>}

       

      Breakpoint 2, Device::Device (this=0x55555f3597e0, cct=0x55555eec01c0, d=<optimized out>)

          at /usr/src/debug/ceph-12.2.0/src/msg/async/rdma/Infiniband.cc:117

      117  ctxt = ibv_open_device(device);

      Cannot access memory at address 0x646f6e2f305f3478                                                 -------------------------------------------*(struct CephContext *) ctxt

       

      It seems that ibv_open_device failed

       

      # ibstat

      CA 'mlx4_0'

      CA type: MT26428

      Number of ports: 1

      Firmware version: 2.9.1000

      Hardware version: b0

      Node GUID: 0x0002c90300589efc

      System image GUID: 0x0002c90300589eff

      Port 1:

      State: Active

      Physical state: LinkUp

      Rate: 40

      Base lid: 33

      LMC: 0

      SM lid: 23

      Capability mask: 0x0251086a

      Port GUID: 0x0002c90300589efd

      Link layer: InfiniBand

       

      Is there any problem with the data of struct ibv_device?