2 Replies Latest reply on Nov 3, 2016 3:29 AM by zak_bm

    SX6036 IPoIB speed issue

    zak_bm

      Hi,

       

      I was hoping that someone would be able to help me.

      I have 15 HP DL380 gen9 servers running Windows Server 2012 R2 each with dual port HP Connect-X3 VPI cards connected to a Mellanox SX6036 switch.

      The line speed is correctly displayed as 32Gbps (QDR) but we are not getting anywhere near that performance. Real-world RDMA speeds seems to max out at 25Gbps (which is OK but could be better) but the maximum speed we seem to get with IPoIB  is 1Gbps. Below is a ntttcp test:

       

      c:\temp\NTttcp-v5.31\x64>ntttcp.exe -r -m 8,*,10.167.255.111  -rb 2M -a 16 -t 30

       

      Copyright Version 5.31

      Network activity progressing...

       

       

      Thread  Time(s) Throughput(KB/s) Avg B / Compl

      ======  ======= ================ =============

           0 30.062 1896.880     65536.000

      1   30.046 1414.365     63434.262

      2   30.124 1459.567     64410.918

      3   30.093 2473.399     65536.000

      4   30.062 1847.914     65536.000

      5   30.062 2452.531     65365.777

      6   30.047 2726.406     63310.495

      7   30.047 2890.476     61291.891

       

       

      #####  Totals:  #####

       

       

      Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)

      ================ =========== ============== ================

            503.877392 30.069 3966.649           16.757

       

       

      Throughput(Buffers/s) Cycles/Byte       Buffers

      ===================== =========== =============

      268.118     395.755      8062.038

       

       

      DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)

      ============= ============= =============== ==============

      81.845 54.124 5762.646          0.769

       

       

      Packets Sent Packets Received Retransmits Errors Avg. CPU %

      ============ ================ =========== ====== ==========

      7198 133199 2      6     15.137

       

       

      We have a similar environment which is slightly different. 9 HP DL380 , Connect-X3 cards connected to a IS5022 switch. This environment performs as expected:

       

      c:\temp\NTttcp-v5.31\x64>ntttcp.exe -s -m 8,*,192.168.84.10 -l 128k -a 2 -t 30

      Copyright Version 5.31

      Network activity progressing...

       

       

      Thread  Time(s) Throughput(KB/s) Avg B / Compl

      ======  ======= ================ =============

           0 30.000       469060.267 131072.000

           1 30.000       359970.133 131072.000

           2 30.000       446084.267 131072.000

           3 30.000       437909.333 131072.000

           4 30.000       348608.000 131072.000

           5 30.000       387993.600 131072.000

           6 30.000       348654.933    131072.000

           7 30.000       357444.267 131072.000

       

       

      #####  Totals:  #####

       

       

         Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)

      ================ =========== ============== ================

      92452.875000 30.000 4037.074         3081.762

       

       

      Throughput(Buffers/s) Cycles/Byte       Buffers

      ===================== =========== =============

      24654.100       0.614 739623.000

       

       

      DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)

      ============= ============= =============== ==============

      51664.567 1.719 72494.233          1.225

       

       

      Packets Sent Packets Received Retransmits Errors Avg. CPU %

      ============ ================ =========== ====== ==========

      24013397 2663789 15      6      6.892

       

       

      The only major difference between the two environments is the switch. I’m pretty sure that the SX6036 is configured correctly but there must be something wrong if we are getting a throughput of 16MBps compared with 3081MBps!

       

      Any help on this issue would be much appreciated. I can provide switch config and more details if required.

       

      Thanks,

      Zak

        • Re: SX6036 IPoIB speed issue
          rage@mellanox.com

          Hello Zak,

           

          Check/Do the following:

          1. Make sure you have the latest Mellanox Driver/Firmware

          2. Make sure you are able to reach line rate speed with "nd_write_bw.exe" which comes with Mellanox WinOF( http://www.mellanox.com/page/products_dyn?product_family=32&mtag=windows_sw_drivers  )

          3. Please consult our Performance tuning guide. http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf

          4. Perform the Ntttcp between two server back-to-back(without SX6036 switch), What are your performance results?

           

          Cheers,

          ~R

            • Re: SX6036 IPoIB speed issue
              zak_bm

              Hi Rage,

               

              Thanks very much for your reply. Apologies for my late response, it's been a busy couple of weeks.

               

              1. The driver is 5.22.12433.0 and firmware 2.36.5000. This was the latest as of a couple of weeks ago.

               

              2. nd_write_bw results:

              #qp #bytes #iterations    MR [Mmps]     Gb/s     CPU Util.

              0   1048576   22712        0.004        31.75    100.00

               

              I'm not sure why the CPU is displayed at 100% as it is actually using about 4%.

               

              3. I have looked at the performance tuning PDF before and run the balanced tuning option in the driver configuration.

               

              4.  This is the results when they are plugged into each other back to back (even worse!):

               

              c:\temp\NTttcp-v5.31\x64>ntttcp.exe -s -m 8,*,192.168.100.11 -l 128k -a 2 -t 30

              Copyright Version 5.31

              Network activity progressing...

              Thread  Time(s) Throughput(KB/s) Avg B / Compl

              ======  ======= ================ =============

                   0   30.114          773.594    131072.000

                   1   29.551           60.641    131072.000

                   2   30.021         2890.776    131072.000

                   3   30.911           62.114    131072.000

                   4   30.036          762.818    131072.000

                   5   29.895          706.473    131072.000

                   6   31.536           60.883    131072.000

                   7   29.926          774.176    131072.000

               

              #####  Totals:  #####

                 Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)

              ================ =========== ============== ================

                    178.625000      30.000       3903.587            5.954

               

              Throughput(Buffers/s) Cycles/Byte       Buffers

              ===================== =========== =============

                             47.633     327.094      1429.000

               

              DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)

              ============= ============= =============== ==============

                    583.767         0.419        4805.100          0.051

               

              Packets Sent Packets Received Retransmits Errors Avg. CPU %

              ============ ================ =========== ====== ==========

                     47982             7338        3356      6      4.445

               

              As the switch has now been ruled out, I will also contact HP to see if they can offer some support.

               

              Thanks again for help with this.

               

              Zak