Bring Up Ceph RDMA - Developer's Guide

Version 60

    This post provides bring up examples for Ceph RDMA cluster.


    >> Learn  RDMA on the Mellanox Academy for free




    Support matrix

    OSCentOS7.2, Ubuntu 16.04
    NICConnectX-4, ConnectX-4 Lx


    1. (Optional): Install the latest MLNX_OFED and restart openibd driver.


    2. Insure that rping is running between all nodes

        Server: rping –s –v server_ip

        Client: rping –c –v –a server_ip



    1. Get latest stable CEPH version with RDMA support from the following branch:


    This version is based on luminous 12.1.0 RC.


    2. Compile:


    # git clone --recursive -b  luminous-12.1.0-rdma

    # cd ceph




    # cd build

    # make -j8

    # sudo make install


    3. Insure that your version has RDMA:

    # strings /usr/bin/ceph-osd |grep -i rdma


    4. Kill all Ceph processes on all nodes:

    # sudo systemctl stop

    # sudo systemctl stop


        Or by using "kill" command


    5. Insure that all Ceph processes are down on every ceph node:

    # ps aux |grep ceph

    6. Bring up ceph in tcp mode (default Async messenger)


    7. Test that CEPH is up and running

        # ceph -s


    8. Turn down all ceph processes


    9. Add to your Ceph conf under [global] section:



    // for setting frontend and backend to RDMA

    ms_type = async+rdma


    // for setting backend only to RDMA

    ms_cluster_type = async+rdma


    //set a device name according to IB or ROCE device used, e.g.

    ms_async_rdma_device_name = mlx5_0


    // for better performance if using LUMINOUS 12.2.x release

    ms_async_rdma_polling_us = 0


    //Set local GID for ROCEv2 interface used for CEPH

    //The GID corresponding to IPv4 or IPv6 networks

    //should be taken from show_gids command output

    //This parameter should be uniquely set per OSD server/client

    //Not defining this parameter limits the network to RoCEv1

    //That means no routing and no congestion control (ECN)



    You can get the GID index using show_gids script, see Understanding show_gids Script .


    10. Update the configuration file in all Ceph nodes.


    11. If you are using systemd services:

    11.1    Validate that the following parameters are set in relevant systemd files in /usr/lib/systemd/system/:

















    Note, in case you modify systemd configuration for Ceph-mon/Ceph-osd you may need to run the below:

    # systemctl daemon-reload


    11.2    Restart all cluster processes on the monitor node:

    # sudo systemctl start //also starts ceph-mgr

    # sudo systemctl start


    On the OSD nodes:

    # sudo systemctl start


    # for i in `sudo ls /var/lib/ceph/osd/  | cut -d -  -f 2` ;do sudo systemctl start ceph-osd@$i ;done


    12. For manual start up of CEPH processes

    12.1    Open /etc/security/limits.conf and add the following lines. The RDMA is tightly coupled with the physical memory address.

    * soft memlock unlimited
    * hard memlock unlimited

    root soft memlock unlimited

    root hard memlock unlimited

    12.2    Run the processes

        On the monitor node

    # sudo /usr/bin/ceph-mon --cluster ceph --id clx-ssp-056 --setuser ceph --setgroup ceph

    # sudo /usr/bin/ceph-mgr --cluster ceph --id clx-ssp-056 --setuser ceph --setgroup ceph


        On the OSD nodes

    # for i in `sudo ls /var/lib/ceph/osd/  | cut -d -  -f 2` ;do sudo /usr/bin/ceph-osd --cluster ceph --id  $i --setuser ceph --setgroup ceph &  done



    1. Check health:

    # ceph -s


    2. Check RDMA is working as expected.


    The following command can show whether RDMA traffic occurs on server dory01 hosting osd.0 and a monitor:

    # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep ms_type

    "ms_type": "async+rdma"


    #ceph --admin-daemon /var/run/ceph/ceph-mon.dory01.asok config show | grep ms_type

    "ms_type": "async+rdma"


    # ceph daemon osd.0 perf dump AsyncMessenger::RDMAWorker-1


        "AsyncMessenger::RDMAWorker-1": {

    "tx_no_mem": 0,

    "tx_parital_mem": 0,

    "tx_failed_post": 0,

    "rx_no_registered_mem": 0,

      "tx_chunks": 30063062,

    "tx_bytes": 1512924920228,

    "rx_chunks": 23115500,

    "rx_bytes": 480212597532,

    "pending_sent_conns": 0



    Known issues

    VersionKnown issueSolution/WA

    ceph pg dump

    ceph osd df tree

    fail with the following error:

    Error EACCES: access denied' does your client key have mgr caps?


    to resolve the issue please run:

    ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr 'allow *'

    Any Ceph 12.x version

    and beyond

    rbd map command is not supported with CEPH RDMA

    Use rbd nbd map command instead. Make sure that rbd-nbd is installed and

    Network Block Device (NBD) support is enabled in the kernel