    questions regarding SRQ performance



      I am trying to benchmark the SRQ performance in terms of operations per second. The setup is as follows:

      Two senders, 1 QP per sender and 1 thread per sender.

      One receiver, 2 QPs per receiver.


      Scenario #1:  Private receiving queue per QP. 2 threads, each thread is working on 1 QP

                                             The aggregate throughput is 18Mops/s

      Scenario #2:  1 SRQ, 1 thread for 2 QPs.

                                              The aggregate throughput is 12Mops/s

      Scenario #3: 2SRQs, 2 threads, 1 thread per QP

                                              The aggregate throughput is 12Mops/s, and the throughput of each SRQ is around 6Mops/s


      By increasing the number of SRQs, the aggregate throughput is not increasing. I am wondering what might be the reason for this?


      I have also tried the multi-process setup, and the result is similar.


