HowTo Configure and Test BeeGFS with RDMA

Version 6

    BeeGFS is a scale-out parallel cluster file system developed by the Fraunhofer Competence Center for High Performance Computing (http://www.beegfs.com). BeeGFS utilizes native InfiniBand for data transport via RDMA-CM. This post describes how to set up BeeGFS with Mellanox adapters, and activate a special network benchmark mode (netbench_mode). This benchmark mode will allow pure network evaluation without having a powerful storage system underneath.

    References

     

    Setup

    • OS: RHEL 7.x on all nodes
    • OFED: MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0), MLNX_OFED 3.3 is currently not working with BeeGFS
    • Adapter: ConnectX-4 VPI
    • Benchmarking Software: iozone 3.434 (http://iozone.org/src/current/iozone3_434.tar)
    • Servers: Two similar server systems (at least 64GB RAM). One acting as BeeGFS-server, and the other one as BeeGFS-client. In the example below, systems with 2x Intel Xeon CPU E5-2620 v3 @ 2.40GHz (Haswell) are used.

     

    Configuration

     

    1. On both servers, add the appropriate BeeGFS repositories from http://www.beegfs.com/release/:

    # wget -o /etc/yum.repos.d/beegfs-rhel7.repo http://www.beegfs.com/release/beegfs_2015.03/dists/beegfs-rhel7.repo

     

    2. The BeeGFS server node runs the following services:

     

    • Management service (beegfs-mgmt)
    • Metadata service (beegfs-meta)
    • Storage service (beegfs-storage)

     

    Install the three services as root:

    # yum install beegfs-mgmtd                                 

    # yum install beegfs-meta                                  

    # yum install beegfs-storage  

     

    3. The BeeGFS client node runs the following services:

     

    • Helper service (beegs-helperd)
    • Client service (beegs-client)

     

    Install the following tools on the client node:

    yum install beegfs-client                                 

    yum install beegfs-helperd                                

    yum install beegfs-utils   

     

    4. The beegfs-client kernel module has to be built against the OFED ibverbs library. For MLNX OFED, the buildArgs configuration variable in /etc/beegfs/beegfs-client-autobuild.conf has to be modified as follows:

    buildArgs=-j8 BEEGFS_OPENTK_IBVERBS=1 OFED_INCLUDE_PATH=/usr/src/ofa_kernel/default/include/

     

    5. Rebuild the kernel module:

    # /etc/init.d/beegfs-client rebuild

     

    6. Verify the successful compilation.

     

    7. Configure services on the server. In this example, four storage targets are configured.

    # /opt/beegfs/sbin/beegfs-setup-mgmtd -p <MGMT_DATA_DIR>

    # /opt/beegfs/sbin/beegfs-setup-meta -p <META_DATA_DIR> -s <META_SERVICE_ID> -m <MGMT_SERVER_IP>

    # /opt/beegfs/sbin/beegfs-setup-storage -p <TARGET1_DIR> -s <STORGAGE_SERVICE_ID> -i <STORAGE_TARGET_ID1> -m <MGMT_SERVER_IP>

    # /opt/beegfs/sbin/beegfs-setup-storage -p <TARGET2_DIR> -s <STORGAGE_SERVICE_ID> -i <STORAGE_TARGET_ID2> -m <MGMT_SERVER_IP>

    # /opt/beegfs/sbin/beegfs-setup-storage -p <TARGET3_DIR> -s <STORGAGE_SERVICE_ID> -i <STORAGE_TARGET_ID3> -m <MGMT_SERVER_IP>

    # /opt/beegfs/sbin/beegfs-setup-storage -p <TARGET4_DIR> -s <STORGAGE_SERVICE_ID> -i <STORAGE_TARGET_ID4> -m <MGMT_SERVER_IP>

     

    Notes:

    • The IDs can range between 1 and 64k.
    • The MGMT_SERVER_IP in this case is the IP address of the BeegFS server.
    • STORAGE_SERVICE_ID needs to be the identical for all 4 targets.
    • STORAGE_TARGET_ID needs to be different for every target.
    • All directory paths should be full path - starting from root "/".

     

    8. Start services on the server node:

    # /etc/init.d/beegfs-mgmtd start

    # /etc/init.d/beegfs-meta start

    # /etc/init.d/beegfs-storage start

     

    9. Set up the client:

    # /opt/beegfs/sbin/beegfs-setup-client -m <MGMT_SERVER_IP>

     

    10. Start services on the client:

    # /etc/init.d/beegfs-helperd start

    # /etc/init.d/beegfs-client start

     

    Note: The file system is mounted at /mnt/beegfs by default.

     

    11. Check the setup with the command beegfs-check-servers (on the client), verify that RDMA connections are via the Metadata and Storage services, while Management is via TCP service.

    # beegfs-check-servers

    Management

    ==========

    mti-mar-s5 [ID: 1]: reachable at 10.20.2.21:8008 (protocol: TCP)

     

    Metadata

    ==========

    mti-mar-s5 [ID: 1]: reachable at 11.11.11.5:8005 (protocol: RDMA)

     

    Storage

    ==========

    mti-mar-s5 [ID: 1]: reachable at 11.11.11.5:8003 (protocol: RDMA)

     

     

    Benchmarking

     

    1. For benchmarking, CPU Hyperthreading and power saving features should be disabled in BIOS (you can also add intel_pstate=disable to the kernel command line).

    See also in Understanding BIOS Configuration for Performance Tuning.

     

    2. Download and build iozone.

    # wget http://iozone.org/src/current/iozone3_434.tar

    ...

    # tar -xvf iozone3_434.tar

    ...

    # cd cd iozone3_434/src/current/

     

    # make linux

     

    3. Identify the NUMA node the HCA is connected to (in this example: node 0), and the range of cores that belong to that NUMA node (use lscpu, in this example cores 0-5).

    For more information, see Understanding NUMA Node for Performance Benchmarks.

     

    4. The following BeeGFS tuning parameters have been identified for delivering highest performance on the test setup. On different setups they may have to be adapted:

     

    On the server in /etc/beegfs/beegfs-storage.conf:

    tuneNumWorkers=24

    tuneBindToNumaZone=0

     

    On the client in /etc//beegfs/beegfs-client.conf:

    connRDMABufSize=32768

    connRDMABufNum=70

    connMaxInternodeNum=64

     

    5. Increase the chunksize from standard 512k to 1M (on the client side):

    #  beegfs-ctl --setpattern --chunksize=1M --numtargets=4 /mnt/beegfs

    New chunksize: 1048576

    New number of storage targets: 4

     

    Path:

    Mount: /mnt/beegfs

     

    6. Enable the netbench_mode on the client. Data is not written to an actual storage device which allows for purely benchmarking the network infrastructure:

    # echo 1 > /proc/fs/beegfs/*/netbench_mode

     

    7. Finally, change into the mountpoint of BeeGFS and run the iozone benchmark:

    # cd /mnt/beegfs

     

    8. Pin the iozone process to the correct cores and start with 6 parallel threads:

    # taskset -c 0-5 ~/iozone3_434/src/current/iozone -i0 -r2m -s128g -x -t6

     

            Iozone: Performance Test of File I/O

                    Version $Revision: 3.434 $

                    Compiled for 64 bit mode.

                    Build: linux-AMD64

     

            Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins

                         Al Slater, Scott Rhine, Mike Wisner, Ken Goss

                         Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,

                         Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,

                         Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,

                         Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,

                         Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,

                         Vangel Bojaxhi, Ben England, Vikentsi Lapa,

                         Alexey Skidanov.

     

            Run began: Tue Jun  7 15:35:02 2016

     

            Record Size 2048 kB

            File size set to 134217728 kB

            Stonewall disabled

            Command line used: /root/iozone3_434/src/current/iozone -i0 -r2m -s128g -x -t6

            Output is in kBytes/sec

            Time Resolution = 0.000001 seconds.

            Processor cache size set to 1024 kBytes.

            Processor cache line size set to 32 bytes.

            File stride size set to 17 * record size.

            Throughput test with 6 processes

            Each process writes a 134217728 kByte file in 2048 kByte records

     

            Children see throughput for  6 initial writers  = 11797417.00 kB/sec

            Parent sees throughput for  6 initial writers   = 11747703.71 kB/sec

            Min throughput per process                      = 1957981.12 kB/sec

            Max throughput per process                      = 1976861.38 kB/sec

            Avg throughput per process                      = 1966236.17 kB/sec

            Min xfer                                        = 134217728.00 kB

     

            Children see throughput for  6 rewriters        = 11848507.25 kB/sec

            Parent sees throughput for  6 rewriters         = 11729938.19 kB/sec

            Min throughput per process                      = 1955025.38 kB/sec

            Max throughput per process                      = 1989349.12 kB/sec

            Avg throughput per process                      = 1974751.21 kB/sec

            Min xfer                                        = 134217728.00 kB

     

    iozone test complete.