0 Replies Latest reply on Apr 10, 2018 9:46 AM by mhedayati87

    Dual-port RDMA Throughput Issue

    mhedayati87

      I have two nodes connected with dual-port Mellanox Connext-X3 VPI HCAs via an IB switch. The nodes are two socket machines with Hasswell CPUs and 2 16GB DIMMs per each socket (totaling 64GB). Everything seems to work perfectly, except for the performance numbers that don't seem right.

      When I run ib_read_bw benchmark:

      server# ib_read_bw --report_gbits
      client# ib_read_bw server --report_gbits

      ---------------------------------------------------------------------------------------
      #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
      65536 1000 37.76 37.76 0.072016

      ---------------------------------------------------------------------------------------

       

      But when I run dual-port:

      server# ib_read_bw --report_gbits -O
      client# ib_read_bw server --report_gbits -O
      ---------------------------------------------------------------------------------------
      #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
      65536 2000 52.47 52.47 0.100073
      ---------------------------------------------------------------------------------------

       

      I only get less than 40% improvement (am I wrong to expect ~2x the single port bandwidth)?

      I don't know what could be the bottleneck here and how to find it.

      Other configurations that may be helpful:

      • Each socket has 8 cores, overall each machine has 32 HTs
      • Each DIMM provides ~14GB/s bw (per socket mem-bw: ~28 GB/s, overall ~56 GB/s)
      • I used Mellanox's Auto Tuning Utility tool to tune the interrupts.
      • IB links are 4X 10.0 Gbps (FDR10) -- each 40 Gb/s
      • I am using Mellanox OFED 4.3.
      • Running on Fedora 22 with 4.15 kernel.