0 Replies Latest reply on Apr 10, 2017 12:23 AM by mschoett

    Own libibverbs application similar to ib_send_bw with low throughput

    mschoett

      Dear Mellanox Community,

       

      I created a small application that mimics the concept of ib_send_bw to get familiar with libibverbs and programming for InfiniBand hardware. With the source code of ib_send_bw and resources like rdmamojo.com, it was comfortable to get something up and running. However, my own test application isn't performing well on IBV_WR_SEND work requests compared to ib_send_bw.

       

      The code is available here:

      GitHub - stnot/ib_test

       

      I run this on a cluster with two nodes connected to a 18 port Mellanox 56 gbit switch and 56gbit HCAs installed on the nodes. Please let me know if you need more information about my setup to analyze this issue.

      Running ib_send_bw -a prints the following output:

      ---------------------------------------------------------------------------------------

                          Send BW Test

      Dual-port       : OFF        Device         : mlx4_0

      Number of qps   : 1        Transport type : IB

      Connection type : RC        Using SRQ      : OFF

      RX depth        : 512

      CQ Moderation   : 100

      Mtu             : 2048[B]

      Link type       : IB

      Max inline data : 0[B]

      rdma_cm QPs     : OFF

      Data ex. method : Ethernet

      ---------------------------------------------------------------------------------------

      local address: LID 0x04 QPN 0x02bd PSN 0x8f0e46

      remote address: LID 0x08 QPN 0x0341 PSN 0xd9ee2e

      ---------------------------------------------------------------------------------------

      #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec] MsgRate[Mpps]

      2          1000             0.00               11.00 5.765330

      4          1000             0.00               35.07 9.192796

      8          1000             0.00               72.67 9.524639

      16         1000             0.00               145.20 9.516145

      32         1000             0.00               276.78 9.069401

      64         1000             0.00               584.29 9.573034

      128        1000             0.00               1173.28 9.611550

      256        1000             0.00               2108.89 8.637993

      512        1000             0.00               3693.42 7.564126

      1024       1000             0.00               4143.79 4.243243

      2048       1000             0.00               4385.30 2.245272

      4096       1000             0.00               4457.75 1.141185

      8192       1000             0.00               4486.35 0.574253

      16384      1000             0.00               4509.63 0.288616

      32768      1000             0.00               4514.77 0.144473

      65536      1000             0.00               4517.63 0.072282

      131072     1000             0.00               4518.87 0.036151

      262144     1000             0.00               4519.43 0.018078

      524288     1000             0.00               4519.53 0.009039

      1048576    1000             0.00               4519.82 0.004520

      2097152    1000             0.00               4519.94 0.002260

      4194304    1000             0.00               4519.97 0.001130

      8388608    1000             0.00               4519.97 0.000565

      ---------------------------------------------------------------------------------------

       

      I changed some settings to get values that are (I guess) comparable to the current settings on my own implementation:

      ib_send_bw --rx-depth=100 --tx-depth=100 --size=1024 --iters=100000

       

      ---------------------------------------------------------------------------------------

                          Send BW Test

      Dual-port       : OFF        Device         : mlx4_0

      Number of qps   : 1        Transport type : IB

      Connection type : RC        Using SRQ      : OFF

      RX depth        : 100

      CQ Moderation   : 100

      Mtu             : 2048[B]

      Link type       : IB

      Max inline data : 0[B]

      rdma_cm QPs     : OFF

      Data ex. method : Ethernet

      ---------------------------------------------------------------------------------------

      local address: LID 0x04 QPN 0x02be PSN 0xd3bf9c

      remote address: LID 0x08 QPN 0x0342 PSN 0x13b557

      ---------------------------------------------------------------------------------------

      #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec] MsgRate[Mpps]

      1024       100000           0.00               2842.02 2.910229

      ---------------------------------------------------------------------------------------

       

      Running my own code posted above, I am getting a lower throughput compared to the ib_send_bw results:

      ./ib_server 1024 10000 msg

       

      *** 10000 MESSAGE_SEND resulted in an average latency of 8.50us ***

       

      114.918097 MB/sec

       

       

      I analyzed the time for various sections of my code. The ibv_poll_cq sections are consuming over 99% of the execution time and return 0 (no work completion) most of the time. I suspect that something is not configured correctly and adds further processing time to each send or/and (?) receive request put to the queue. But I wasn't able to figure out the exact cause so far.

       

      I would appreciate it if someone of the community could take a look at my code and point out any issues with using libibverbs incorrectly/inefficiently or improper configured parameters that cause this performance loss. If you need more data about my setup or any other information that helps analyzing this issue, please let me know and I am glad to provide them.