HowTo Configure Ceph RDMA (outdated)

Version 76

    This page is outdated.

    Please follow the link Bring Up Ceph RDMA - Developer's Guide

     

    The purpose of this document is to describe how to bring-up CEPH RDMA cluster.

    These instructions are intended for experienced Ceph users.

     

    >> Learn  RDMA on the Mellanox Academy for free

     

    References

     

    Prerequisites

    Apply the following to all servers:

     

    1. Make sure you have ping and rping working between all Ceph nodes.

     

    2. Open /etc/security/limits.conf and add the following lines to ping the memory.The RDMA is tightly coupled with the physical memory address.

    * soft memlock unlimited

    * hard memlock unlimited

    root soft memlock unlimited

    root hard memlock unlimited

    3. For the deployment process, you must enable the ssh login without using a password, as required by Ceph. See Preflight Checklist — Ceph Documentation.

     

    4. Install ceph-deploy

    sudo rpm -Uvh https://download.ceph.com/rpm-kraken/el7/noarch/ceph-deploy-1.5.36-0.noarch.rpm;

     

    Ceph Cluster Creation

    This configuration is based on repo — ceph-deploy 1.5.37 documentation, and Storage Cluster Quick Start — Ceph Documentation

     

    1. Create a working directory.

    mkdir my_cluster

    cd my_cluster

     

    2. Install Cephs:

    i. Create ceph.conf and install ceph on all nodes:

    $ceph-deploy new --cluster-network=11.130.1.0/24 --public-network=11.130.1.0/24 "list of monitors"

    $ceph-deploy --overwrite-conf install --repo-url=ftp://user:password@ftpsupport.mellanox.com/rpm-v11.1.0-6639-gb304df1 --gpg-url=ftp://user:password@ftpsupport.mellanox.com/rpm-v11.1.0-6639-gb304df1/release.asc "list all nodes"

    Note: if user or password have special characters as "@", i.e. "reserved character" then the character must be replaced to "reserved characters after percent-encoding", e.g. : %40.

     

    Reminder: number of ceph monitors must be an odd number.

     

    ii. Add the following lines under my_cluster\ceph.conf.

              "ms_async_rdma_device_name" should be the active device (*).

    [global]

    ...

    ms_type=async+rdma

    ms_async_rdma_device_name=mlx5_0

        (*) To find your active device, you can run the following:

      • $ ifconfig

                  ...

        ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500             <------------ This is the device you checked rping at the prerequisite phase.

                inet 11.1.1.14  netmask 255.255.255.0  broadcast 11.1.1.255

                inet6 ffff::ffff:ffff:ffff:ffff  prefixlen 64  scopeid 0x20<link>

                ether ff:ff:ff:ff:ff:ff  txqueuelen 1000  (Ethernet)

                RX packets 10104263288  bytes 646758740138 (602.3 GiB)

                RX errors 0  dropped 0  overruns 0  frame 0

                TX packets 13  bytes 2410 (2.3 KiB)

                TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0   

         

        $ cat /sys/class/net/ens4/device/infiniband/mlx5_0/ports/*/state

             4: ACTIVE

        ----------> mlx5_0 - this is your active device

    iii. Run:

    ceph-deploy --overwrite-conf mon create-initial

    ceph-deploy --overwrite-conf admin  "list all nodes"

    For each node in list_all_nodes, run:

      sudo chmod +rx /etc/ceph/ceph.client.admin.keyring

     

    3. Configure the Ceph OSD Daemons (OSDs).

    Run the following for each disk on each node.

    ceph-deploy --overwrite-conf disk zap node1:sdb
    ceph-deploy --overwrite-conf osd prepare node1:sdb

    ceph-deploy osd activate node1:"sdb"1

     

    Note: The minimal number of OSD nodes must be the same number specified for replicas. The default number of replicas is 3.

     

    4. Run ceph -s to check status. You should see a HEALTH_OK message or similar as a response:

    # ceph -s

     

       health *HEALTH_OK*

         monmap e1: 1 mons at {r-aa-zorro002=2.2.68.102:6789/0}

                election epoch 7, quorum 0 r-aa-zorro002

         *osdmap e26: 3 osds: 3 up, 3 in*

                flags sortbitwise,require_jewel_osds

          pgmap v57: 64 pgs, 1 pools, 0 bytes data, 0 objects

                101 MB used, 1381 GB / 1381 GB avail

                      64 active+clean

     

    Congratulations! Your Ceph rdma cluster is up and running.

     

    Note: If you see a Number of placement-groups (pg) warning, refer to Placement Groups — Ceph Documentation for more information.