1 Reply Latest reply on Oct 24, 2017 6:10 AM by alkx

    Scalability issue for multiple clients




      Our setup is

      1 x Mellanox MX354A Dual port FDR CX3 adapter w/1 x QSA adapter

      1 x Xeon E5-2450 processor (8 cores, 2.1Ghz)

      16GB Memory (4 x 2GB RDIMMs, 1.6Ghz)


      We have 4-node cluster, and all of them are server and client at the same time.

      When write, a node split data into 4 pieces and concurrently write to 4 nodes.

      When read, a node read from 4 nodes.


      We expect it scales with the number of multiple clients.

      when a node is reading, it can get 6.4 GB/s bandwidth

      but when 2 nodes are reading, both only get 5GB/s each, although aggregated bandwidth is enough.


      There's only 1 CPU, no NUMA discrepancy arises.

      Concerned possible NIC cache misses, measured PCIe Read using pcm-pcie.

      Simply PCIe read cannot scale with increasing number of clients even if its bandwidth is actually much higher.


      There must be contention when multiple connections(QPs) read from a single server.

      Please Mellanox, can you pinpoint the root cause and possible solution for multiple-client scalability?

        • Re: Scalability issue for multiple clients

          It might be useful if you can transform your test description to something that uses ib_read_bw/ib_write_bw or iperf (if you are using TCP) and show what is the output.

          Can you see any drops in 'ethtool -S' output or device statistic ( ifconfig, ip)?

          Does the sender use 1 CPU when writing to two different clients? Or, in other words, does he use the same thread?

          You might check what is the output of 'mlnx_perf' command, however you need Mellanox OFED installed on the host.