2 Replies Latest reply on Nov 26, 2018 2:48 AM by matro

    Performance decrease on other NUMA node

    matro

      Hallo Mellanox Community,

       

      I observe a performance decrease by ~15% if I plug in my Connect X5 CX556A into a PCIe slot belonging to numa node1. On node0 I get better results.

       

      But I remember a test some month ago I saw the performance penalty on node0 (on the same machine) and node1 works perfect. So I am assuming it is not caused by the HW (or by other PCIe components) itself.

       

      Is their any Mellanox configuration in the driver about numa nodes ?

       

      I am using a DPDK application.

       

      My command on numa node1:

      "l3fwd-bounce -v -w 0000:d8:00.0 --socket-mem=0,16384 -l 1,5,9,49,53,57 -- -p 0x1 --config '(0,0,1),(0,1,49),(0,2,5),(0,3,53),(0,4,9),(0,5,57)' -P"

       

      My command on numa node0:

      "l3fwd-bounce -v -w 0000:3b:00.0 --socket-mem=16384,0 -l 4,8,10,52,56,58 -- -p 0x1 --config '(0,0,4),(0,1,52),(0,2,8),(0,3,56),(0,4,10),(0,5,58)' -P "

       

      All used cores are isolated (isolcpus).

       

      Do you have any hints for me, how I can come to better results ?

        • Re: Performance decrease on other NUMA node
          matro

          I found a suspicious behavior in the frame balancing of the Mellanox card into the 6 queues. Queue 1 seems to get only few packets.

           

          Command:

          l3fwd-bounce -v -w 0000:d8:00.0 --socket-mem=0,16384 -l 1,5,9,49,53,57 -- -p 0x1 --config '(0,0,1),(0,1,49),(0,2,5),(0,3,53),(0,4,9),(0,5,57)' -P

          core 1: received 124389578, sent 124389578, empty rx bursts 141061256

          core 5: received 796841776, sent 796841776, empty rx bursts 1024321074

          core 9: received 796847260, sent 796847260, empty rx bursts 1022132414

          core 49: received 806257286, sent 806257286, empty rx bursts 1389903566

          core 53: received 796918078, sent 796918078, empty rx bursts 1034221027

          core 57: received 796925370, sent 796925370, empty rx bursts 1025396356

          port 0: received 4118179348 packets (1054253913088 bytes); sent 4118179348 packets (1037781195696 bytes)

           

          When I am NOT using CPU core 1 and its HT partner (49) the balancing works better and I achieve the expected performance!

           

          Command:

          l3fwd-bounce -v -w 0000:d8:00.0 --socket-mem=0,16384-l 5,9,11,53,57,59 -- -p 0x1 --config '(0,0,5),(0,1,53),(0,2,9),(0,3,57),(0,4,11),(0,5,59)' -P

          core 5: received 806215772, sent 806215772, empty rx bursts 1109925928

          core 9: received 796841011, sent 796841011, empty rx bursts 1114308276

          core 11: received 796880281, sent 796880281, empty rx bursts 1105713499

          core 53: received 806276520, sent 806276520, empty rx bursts 1107186920

          core 57: received 796915472, sent 796915472, empty rx bursts 1114022822

          core 59: received 796870944, sent 796870944, empty rx bursts 1111857481

          port 0: received 4800000000 packets (1228800000000 bytes); sent 4800000000 packets (1209600000000 bytes)

           

           

          How comes that ?