I am trying to benchmark the SRQ performance in terms of operations per second. The setup is as follows:
Two senders, 1 QP per sender and 1 thread per sender.
One receiver, 2 QPs per receiver.
Scenario #1: Private receiving queue per QP. 2 threads, each thread is working on 1 QP
The aggregate throughput is 18Mops/s
Scenario #2: 1 SRQ, 1 thread for 2 QPs.
The aggregate throughput is 12Mops/s
Scenario #3: 2SRQs, 2 threads, 1 thread per QP
The aggregate throughput is 12Mops/s, and the throughput of each SRQ is around 6Mops/s
By increasing the number of SRQs, the aggregate throughput is not increasing. I am wondering what might be the reason for this?
I have also tried the multi-process setup, and the result is similar.
For the above question please open a ticket with Mellanox support by sending the info to firstname.lastname@example.org.
this need to be investigated in order to provide you with the most accurate reply.