HowTo Compile and Configure Ceph and Accelio over RDMA for Ubuntu 14.04

Version 23

    This post meant for developers or advance users who wish to understand how to compile and configure Ceph over Accelio over RDMA for Ubuntu 14.04 for two hosts and a switch connected in a basic setup.

     

    Note: This post is outdated

     

    References

     

    Setup

    1. Two hosts (equipped with ConnectX-3 adapters) configured with Ubuntu 14.04 OS:

    • CephServer
      • IP address: 11.11.11.1
    • CephClient
      • IP address: 11.11.11.2

     

    2. Install latest MLNX_OFED driver on both servers.

    MLNX_OFED_LINUX-2.4-1.0.0 is recommended and can be downloaded here - http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

    1%3Fauth_token%3D5bb42666fd23bee7775bf2afd5f493de97519046

    Note: The link layer that was tested her is Ethernet (which means RoCE is being used), but it could be InfiniBand as well.

     

    Configuration

     

    1. Install additional Ubuntu packages

    # apt-get update

    # apt-get install libtool autoconf automake build-essential ibverbs-utils rdmacm-utils infiniband-diags perftest librdmacm-dev libibverbs-dev numactl libnuma-dev libaio-dev libevent-dev autotools-dev autoconf automake cdbs gcc g++ git libboost-dev libedit-dev libssl-dev libtool libfcgi libfcgi-dev libfuse-dev linux-kernel-headers libcrypto++-dev libcrypto++ libexpat1-dev pkg-config uuid-dev libkeyutils-dev libgoogle-perftools-dev libatomic-ops-dev libaio-dev libgdata-common libgdata13 libsnappy-dev libleveldb-dev libboost-regex1.54-dev libboost-thread-dev libboost-program-options1.54-dev libblkid-dev cmake libudev1 libudev-dev xfsprogs xfslibs-dev libcurl4-nss-dev libcurl4-openssl-dev libcurl4-gnutls-dev

    Note: if the apt-get doesn't succeed try install each package by it self, as there may be dependencies.

     

    2. Follow this procedure to make sure that RDMA is enabled and functioning.

     

    HowTo Enable, Verify and Troubleshoot RDMA

     

    3. Download Accelio source, build & install latest master branch

     

    Use Accelio master (recommended commit 9cea8291787b72a746e42964d5de42d6d48f0e0d) with latest Ceph master (recommended commit f74c60f429f26c1cd55e219642fa19cb5a803468)

     

    Note: The following instructions showing how to build and install both Accelio and Ceph into non-standard locations (/opt/accelio, /opt/ceph)

     

    # mkdir /tmp/xio

    # cd /tmp/xio

    # git clone git://github.com/accelio/accelio.git accelio.git

    ...

     

    # cd accelio.git

    # git checkout -b rec_commit 9cea8291787b72a746e42964d5de42d6d48f0e0d

     

    # ./autogen.sh

    configure.ac:13: installing './compile'

    configure.ac:13: installing './config.guess'

    configure.ac:13: installing './config.sub'

    configure.ac:7: installing './install-sh'

    configure.ac:7: installing './missing'

    benchmarks/usr/xio_perftest/Makefile.am: installing './depcomp'

     

     

    # ./configure --prefix=/opt/accelio

    ...

     

    # make && make install

    ...

     

    4. Verify that Accelio is working

     

    The following test and measure Accelio performance with one-way message.

    On the Ceph server:

    ceph-server #  tests/usr/hello_test_ow/run_ow_server.sh 11.11.11.1 1234 0 4096 rdma

    =============================================

    Server Address         : 11.11.11.3

    Server Port            : 1234

    Transport              : rdma

    Header Length          : 0

    Data Length            : 4096

    CPU Affinity           : 1

    Finite run             : 0

    =============================================

    listen to rdma://11.11.11.3:1234

     

    On the Ceph client:

    ceph-client # cd accelio.git

    ceph-client # tests/usr/hello_test_ow/run_ow_client.sh 11.11.11.1 1234 0 4096 rdma

    =============================================

    Server Address         : 11.11.11.3

    Server Port            : 1234

    Transport              : rdma

    Header Length          : 0

    Data Length            : 4096

    Connection Index       : 0

    CPU Affinity           : 1

    Finite run             : 0

    =============================================

    shmget rdma pool sz:2097152 failed (errno=12 Cannot allocate memory)

    **** starting ...

    **** [0x79d160] session established

    session event: connection established. reason: Success

    transactions per second: 387137, bandwidth: TX 1512.25 MB/s, length: TX: 4096 B

    transactions per second: 387349, bandwidth: TX 1513.08 MB/s, length: TX: 4096 B

    transactions per second: 387364, bandwidth: TX 1513.14 MB/s, length: TX: 4096 B

    ...

     

    At this point, Accelio is running on both servers.

     

    5. Download latest upstream ceph master source (recommended commit f74c60f429f26c1cd55e219642fa19cb5a803468), build & install (on both hosts)

    # mkdir /tmp/ceph ; cd /tmp/ceph

    $ git clone --recursive https://github.com/ceph/ceph  ceph.git

    ...

    $ cd ceph.git

    $ git checkout -b rec_commit f74c60f429f26c1cd55e219642fa19cb5a803468

     

    Compile with automake (refer to the README.xio for more information)

    $ ./autogen.sh

    $ CXXFLAGS="-I/opt/accelio/include"  CFLAGS="-I/opt/accelio/include" LDFLAGS="-L/opt/accelio/lib"  ./configure --prefix=/opt/ceph --enable-xio

    $ make -j32 && make install

     

    Although, there are other options to create the ceph cluster, it is easy to use mkcephfs script.

    Note: the wget example here is for Ubuntu (the mkcephfs is a bin file compiled for Ubuntu).

    # cd /opt/ceph/bin
    # wget http://www.mellanox.com/downloads/solutions/temp/mkcephfs
    # chmod a+x mkcephfs

     

    6. Create ceph.conf with the following relevant parameter.

    Here is an example for ceph.conf, refer to Ceph documentation for more configuration options (here)

    In order to use this example run the following:

     

    # mkdir –p /ceph-test/var/run/ceph/ /ceph-test/var/log/ceph/ /ceph-test/ceph-data/ /ceph-test/cl_mkcephfs /etc/ceph


    Note:

    • This file should be copied to the ceph server and ceph client. The only different in the file is the rdma_local parameter, it should be the local IP address of the host (server or client).
    • In this example, all the ceph server components are located on one server. It is a cluster of on server.
    • This example of ceph.conf uses /dev/sdb as filestore OSD

     

    Copy the corresponding ceph.conf for each server to /etc/ceph/ directory.

     

     

      [global]

     

      admin_socket = /ceph-test/var/run/ceph/ceph-$name.$id.asok

     

      ; allow to open a lot of files

      max_open_files = 131072

     

      ; setup logging

      log_file = /ceph-test/var/log/ceph/$name.log

      pid_file = /ceph-test/var/run/ceph/$name.pid

     

      filestore_xattr_use_omap = 1

     

      osd_pool_default_size = 1

      osd_pool_default_min_size = 1

     

      ; turn off debugs

      debug_auth = 0/0

      debug_asok = 0/0

      debug_buffer = 0/0

      debug_client = 0/0

      debug_context = 0/0

      debug_crush = 0/0

      debug_crypto = 0/0

      debug_filer = 0/0

      debug_filestore = 0/0

      debug_finisher = 0/0

      debug_heartbeatmap = 0/0

      debug_journal = 0/0

      debug_journaler = 0/0

      debug_lockdep = 0/0

      debug_monclient = 0/0

      debug_mon = 0/0

      debug_monc = 0/0

      debug_ms = 0/0

      debug_objclass = 0/0

      debug_objecter = 0/0

      debug_objectcacher = 0/0

      debug_optracker = 0/0

      debug_osd = 0/0

      debug_paxos = 0/0

      debug_perfcounter = 0/0

      debug_rados = 0/0

      debug_rbd = 0/0

      debug_rgw = 0/0

      debug_timer = 0/0

      debug_tp = 0/0

      debug_throttle = 0/0

     

      ; debug XIO

      debug_xio = 0

     

       ; default datacrc & headercrc is true

      ms_crc_header = false

      ms_crc_data = false

     

      ; rdma setting

      ; no secure authentication – A MUST for RDMA

      ; XioMessenger currently does not support CEPHX

     

      auth_supported = none

      auth_service_required = none

      auth_client_required = none

      auth_cluster_required = none

        

      ; IMPORTANT: This should be the local IP address of the interface that performs the RDMA (e.g. 11.11.11.1 for the server and 11.11.11.2 for the client)

      rdma_local = 11.11.11.1

      enable experimental unrecoverable data corrupting features = ms-type-xio

      ms_type = xio

      xio_mp_max_64 = 262144

      xio_mp_max_256 = 262144

      xio_mp_max_1k = 262144

      xio_mp_max_page = 131072

      xio_portal_threads = 4

      xio_max_send_inline = 8192

     

      osd_op_threads = 2

      filesore_op_threads = 4

      filestore_fd_cache_size = 64

      filestore_fd_cache_shards = 32

      osd_op_num_threads_per_shard = 1

      osd_op_num_shards = 10

     

      throttle_perf_counter = false

      ms_dispatch_throttle_bytes = 0

     

      rbd_cache = false

     

      [osd]

      osd_client_message_size_cap = 0

      osd_client_message_cap = 0

      osd_enable_op_tracker = false

     

      osd_data = /ceph-test/ceph-data/osd.$id

      osd_journal = /ceph-test/ceph-data/osd.$id/journal

      osd_journal_size = 256

      osd_scrub_load_threshold = 2.5

     

      osd_mkfs_type = xfs

      osd_mount_options_xfs = rw,noatime

      osd_mkfs_options_xfs = -Kf

     

      osd_class_dir = /opt/ceph/lib/rados-classes

     

      [osd.0]

     

      ; Add the real hostname of the ceph server

      host = ceph-server

     

      devs = /dev/sdb

      ms_bind_port_min = 7100

      ms_bind_port_max = 7200

     

      ;[osd.1]

      ;host = ceph-server

     

      ;devs = /dev/ram1

      ;ms_bind_port_min = 7201

      ;ms_bind_port_max = 7300

     

      ; To have mkcephfs creating cluster with more OSDs on other node ex: ceph-server2

      ;[osd.2]

      ; host = ceph-server2

     

      ; devs = /dev/ram0

      ; ms_bind_port_min = 7100

      ; ms_bind_port_max = 7200

     

      ;[osd.3]

      ; host = vlab-028

      ; devs = /dev/ram1

      ; ms_bind_port_min = 7201

      ; ms_bind_port_max = 7300

     

     

      [mon]

      mon_data = /ceph-test/ceph-data/mon.$id

     

      [mon.0]

      ; make sure to have the ceph server hostname

      host = ceph-server

      mon_addr = 11.11.11.1:16789

      ;user = root

     

      [mds]

      ; where the mds keeps its secret encryption keys

      keyring = /ceph-test/ceph-data/keyring.$name

     

      cluster_addr = 11.11.11.1:26789

      public_addr = 11.11.11.1:36789

     

      objecter_timeout = 10

      mds_reconnect_timeout = 5

      mds_beacon_interval = 2

     

      [mds.0]

      ; make sure to have the ceph server hostname

       host = ceph-server

      ;user = root

     

    7. Make sure that the Accelio XioMessenger is working with the xio_client/xio_server.

     

    On the Ceph server:

    ceph-server # cd ceph.git

    ceph-server # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" ./src/xio_server -c /etc/ceph/ceph.conf --addr 11.11.11.1 --port 1234  --dfast

    ping 0 nanos: 1430346838077074547

    ping 65536 nanos: 1430346838361074537

    ping 131072 nanos: 1430346838573074530

    ping 196608 nanos: 1430346838789074522

    ping 262144 nanos: 1430346839001074515

    ping 327680 nanos: 1430346839217074507

    ....

     

    On the Ceph client:

    ceph-client # cd ceph.git

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" ./src/xio_client -c /etc/ceph/ceph.conf --addr 11.11.11.1 --port 1234  --dfast

    xio finished 1000000 1430346861

    xio finished 2000000 1430346864

    xio finished 3000000 1430346867

    xio finished 4000000 1430346870

    xio finished 5000000 1430346874

    xio finished 6000000 1430346877

    Processed 6662463 one-way messages in 25s

     

     

    8. Create mon & osd data on the ceph-server

     

    ceph-server # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" /opt/ceph/bin/mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring -d /ceph-test/ceph-data  --mkfs
    ...

     

    9. Manual launch ceph-mon on the ceph-server

    ceph-server # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" /opt/ceph/bin/ceph-mon -i 0 --pid-file /ceph-test/var/run/ceph/mon.0.pid

    ...

     

    10. Verify if ceph-mon is up & running from the ceph-client.

    You can see that the health is in error state, as there is no osds running.

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" PYTHONPATH="/opt/ceph/lib/python2.7/site-packages/" /opt/ceph/bin/ceph -s

    2015-04-30 02:18:18.249382 7fb29a8e3700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:18:18.250454 7fb29a8e3700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:18:18.250672 7fb29a8e3700 -1 WARNING: experimental feature 'ms-type-xio' is enabled

    Please be aware that this feature is experimental, untested,

    unsupported, and may result in data corruption, data loss,

    and/or irreparable damage to your cluster.  Do not use

    feature with important data.

     

     

    2015-04-30 02:18:18.252814 7fb29a8e3700  0 Peer type: mon throttle_msgs: 512 throttle_bytes: 536870912

        cluster 8eb9f961-0c37-488b-b807-61c8893ed2b1

         health HEALTH_ERR

                64 pgs stuck inactive

                64 pgs stuck unclean

                no osds

         monmap e1: 1 mons at {0=11.11.11.4:16789/0}

                election epoch 2, quorum 0 0

         osdmap e1: 0 osds: 0 up, 0 in

          pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects

                0 kB used, 0 kB / 0 kB avail

                      64 creating

     

    11 Manual launch ceph-osd on the ceph-server

    ceph-server# LD_LIBRARY_PATH="/opt/accelio/lib;/opt/ceph/lib" /opt/ceph/bin/ceph-osd -i $i --pid-file /ceph-test/var/run/ceph/osd.$i.pid

     

    12. Verify that the ceph cluster is up and running (from the ceph client).

     

    You can see that the health is in OK state.

     

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib:/opt/accelio/lib" PYTHONPATH="/opt/ceph/lib/python2.7/site-packages/" /opt/ceph/bin/ceph -s

    2015-04-30 02:24:35.167390 7f9ca026c700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:24:35.168448 7f9ca026c700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:24:35.168682 7f9ca026c700 -1 WARNING: experimental feature 'ms-type-xio' is enabled

    Please be aware that this feature is experimental, untested,

    unsupported, and may result in data corruption, data loss,

    and/or irreparable damage to your cluster.  Do not use

    feature with important data.

     

    2015-04-30 02:24:35.170921 7f9ca026c700  0 Peer type: mon throttle_msgs: 512 throttle_bytes: 536870912

        cluster 8eb9f961-0c37-488b-b807-61c8893ed2b1

         health HEALTH_OK

         monmap e1: 1 mons at {0=11.11.11.4:16789/0}

                election epoch 2, quorum 0 0

         osdmap e3: 1 osds: 1 up, 1 in

          pgmap v7: 64 pgs, 1 pools, 0 bytes data, 0 objects

                289 MB used, 78206 MB / 78495 MB avail

                      64 active+clean

     

    Note: Remember to pass LD_LIBRARY_PATH=/opt/accelio/lib;/opt/ceph/lib” to all commands (ceph, fio, qemu…).

     

    13. Create a pool.

     

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib:/opt/accelio/lib" PYTHONPATH="/opt/ceph/lib/python2.7/site-packages/" /opt/ceph/bin/ceph osd pool create testpool1  1024 1024

    2015-04-30 02:28:38.506956 7fbcec11d700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:28:38.511449 7fbcec11d700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:28:38.511678 7fbcec11d700 -1 WARNING: experimental feature 'ms-type-xio' is enabled

    Please be aware that this feature is experimental, untested,

    unsupported, and may result in data corruption, data loss,

    and/or irreparable damage to your cluster.  Do not use

    feature with important data.

     

     

    2015-04-30 02:28:38.513773 7fbcec11d700  0 Peer type: mon throttle_msgs: 512 throttle_bytes: 536870912

    pool 'testpool1' created

     

    14. Verify that the pool was created.

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib:/opt/accelio/lib" PYTHONPATH="/opt/ceph/lib/python2.7/site-packages/" /opt/ceph/bin/ceph df

    2015-04-30 02:30:11.331050 7f0e45562700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:30:11.334108 7f0e45562700 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:30:11.334369 7f0e45562700 -1 WARNING: experimental feature 'ms-type-xio' is enabled

    Please be aware that this feature is experimental, untested,

    unsupported, and may result in data corruption, data loss,

    and/or irreparable damage to your cluster.  Do not use

    feature with important data.

     

     

    2015-04-30 02:30:11.336560 7f0e45562700  0 Peer type: mon throttle_msgs: 512 throttle_bytes: 536870912

    GLOBAL:

        SIZE       AVAIL      RAW USED     %RAW USED

        78495M     78202M         293M          0.37

    POOLS:

        NAME          ID     USED     %USED     MAX AVAIL     OBJECTS

        rbd           0         0         0        78202M           0

        testpool1     1         0         0        78202M           0

     

    15. Create image

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" /opt/ceph/bin/rbd  -p testpool1 --image-format 2 create --size 128 test_img1

    ...

     

    16. Verify that the image was created.

    # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" /opt/ceph/bin/rbd  -p testpool1 --image test_img1 info

    2015-04-30 02:32:52.026053 7fc629ff0800 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:32:52.026171 7fc629ff0800 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:32:52.026498 7fc629ff0800 -1 WARNING: the following dangerous and experimental features are enabled: ms-type-xio

    2015-04-30 02:32:52.026792 7fc629ff0800 -1 WARNING: experimental feature 'ms-type-xio' is enabled

    Please be aware that this feature is experimental, untested,

    unsupported, and may result in data corruption, data loss,

    and/or irreparable damage to your cluster.  Do not use

    feature with important data.

     

     

    2015-04-30 02:32:52.028319 7fc629ff0800  0 Peer type: mon throttle_msgs: 512 throttle_bytes: 536870912

    2015-04-30 02:32:52.063657 7fc629ff0800  0 Peer type: osd throttle_msgs: 512 throttle_bytes: 536870912

    rbd image 'test_img1':

            size 128 MB in 32 objects

            order 22 (4096 kB objects)

            block_name_prefix: rbd_data.10076b8b4567

            format: 2

            features: layering

            flags:

     

    17. For IO benchmarking that uses fio, you need to use fio that was compiled with librbd support.

     

    Open the example file "examples/rbd.fio"  (change pool=testpool1  rbdname=test_img1)

     

    [global]

    #logging

    #write_iops_log=write_iops_log

    #write_bw_log=write_bw_log

    #write_lat_log=write_lat_log

    ioengine=rbd

    clientname=admin

    pool=testpool1

    rbdname=test_img1

    invalidate=0    # mandatory

    rw=randwrite

    bs=4k

    runtime=1m

    time_based

     

    [rbd_iodepth32]

    iodepth=32

     

    Run,

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" PYTHONPATH="/opt/ceph/lib/python2.7/site-packages/"  ./fio  examples/rbd.fio

     

    You can download the fio from here:

    # wget http://www.mellanox.com/downloads/solutions/temp/fio_rbd.tgz

    # tar zxvf fio_rbd.tgz

    # cd fio_rbd

     

    To run:

    # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" PYTHONPATH="/opt/ceph/lib/python2.7/site-packages/"  ./fio  examples/rbd.fio

     

    18. To launch a VM with qemu-rbd image, Run:

    ceph-client # LD_LIBRARY_PATH="/opt/ceph/lib;/opt/accelio/lib" qemu-system-x86_64 -m 1024 -cpu qemu64,+vmx -cdrom /mnt/ubuntu-mini-remix-14.04-amd64.iso -boot d -vnc 0.0.0.0:1 -drive file=rbd:testpool1/test_img1:conf=/etc/ceph/ceph.conf,index=0,media=disk,if=virtio -enable-kvm &