HowTo Configure Ceph RDMA

Version 70

    The purpose of this document is to describe how to bring-up CEPH RDMA cluster.

    These instructions are intended for experienced Ceph users.





    Apply the following to all servers:


    1. Make sure you have ping and rping working between all Ceph nodes.


    2. Open /etc/security/limits.conf and add the following lines to ping the memory.The RDMA is tightly coupled with the physical memory address.

    * soft memlock unlimited

    * hard memlock unlimited

    root soft memlock unlimited

    root hard memlock unlimited

    3. For the deployment process, you must enable the ssh login without using a password, as required by Ceph. See Preflight Checklist — Ceph Documentation.


    4. Install ceph-deploy

    sudo rpm -Uvh;


    Ceph Cluster Creation

    This configuration is based on repo — ceph-deploy 1.5.37 documentation, and Storage Cluster Quick Start — Ceph Documentation


    1. Create a working directory.

    mkdir my_cluster

    cd my_cluster


    2. Install Cephs:

    i. Create ceph.conf and install ceph on all nodes:

    $ceph-deploy new --cluster-network= --public-network= "list of monitors"

    $ceph-deploy --overwrite-conf install --repo-url= --gpg-url= "list all nodes"

    Note: if user or password have special characters as "@", i.e. "reserved character" then the character must be replaced to "reserved characters after percent-encoding", e.g. : %40.


    Reminder: number of ceph monitors must be an odd number.


    ii. Add the following lines under my_cluster\ceph.conf.

               "ms_async_rdma_device_name" should be the active device (*).





        (*) To find your active device, you can run the following:

      • $ ifconfig


        ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500             <------------ This is the device you checked rping at the prerequisite phase.

                inet  netmask  broadcast

                inet6 ffff::ffff:ffff:ffff:ffff  prefixlen 64  scopeid 0x20<link>

                ether ff:ff:ff:ff:ff:ff  txqueuelen 1000  (Ethernet)

                RX packets 10104263288  bytes 646758740138 (602.3 GiB)

                RX errors 0  dropped 0  overruns 0  frame 0

                TX packets 13  bytes 2410 (2.3 KiB)

                TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0   


        $ cat /sys/class/net/ens4/device/infiniband/mlx5_0/ports/*/state

             4: ACTIVE

        ----------> mlx5_0 - this is your active device

    iii. Run:

    ceph-deploy --overwrite-conf mon create-initial

    ceph-deploy --overwrite-conf admin  "list all nodes"

    For each node in list_all_nodes, run:

      sudo chmod +rx /etc/ceph/ceph.client.admin.keyring


    3. Configure the Ceph OSD Daemons (OSDs).

    Run the following for each disk on each node.

    ceph-deploy --overwrite-conf disk zap node1:sdb
    ceph-deploy --overwrite-conf osd prepare node1:sdb

    ceph-deploy osd activate node1:"sdb"1


    Note: The minimal number of OSD nodes must be the same number specified for replicas. The default number of replicas is 3.


    4. Run ceph -s to check status. You should see a HEALTH_OK message or similar as a response:

    # ceph -s


       health *HEALTH_OK*

         monmap e1: 1 mons at {r-aa-zorro002=}

                election epoch 7, quorum 0 r-aa-zorro002

         *osdmap e26: 3 osds: 3 up, 3 in*

                flags sortbitwise,require_jewel_osds

          pgmap v57: 64 pgs, 1 pools, 0 bytes data, 0 objects

                101 MB used, 1381 GB / 1381 GB avail

                      64 active+clean


    Congratulations! Your Ceph rdma cluster is up and running.


    Note: If you see a Number of placement-groups (pg) warning, refer to Placement Groups — Ceph Documentation for more information.