9 Replies Latest reply on Dec 7, 2016 8:35 PM by vtskiboard

    40Gb/s IPoIB only gives 5Gb/s real throughput?!

    dangi12012

      I really need some expertise here:

      For the current situation see update 2 below.

       

      Update: I tested with Windows 2012 Clients to verify and I still get about 5.5 Gbit/s max.

      Maybe someone has other 40Gbit adapters what are the speeds for you?

      Update 2: The mainboard had 16x physical and only 2x electrical connection. (Special thx to Erez support admin for a quick and good answer)

      After changing to a PCIe 3.0 8x lane I now get the following speed: (should still be 3x faster)

      Update 3: One support admin suggested to not use passive copper, but to use optic fibre. After getting an 56Gbit Optical fibre IB cable I now get these results:

      Unbenannt.PNG

      Which is still way below the advertised 40Gbit!

      The story goes like this: Advertised 40Gbit , 32Gbit theoretical which is really only 25.6 Gbit according to Enez from Mellanox which turns out to be in Reality HALF-DUPLEX 16Gbit!

      Do I make something wrong or is it just the way it works for customers of mellanox :/

      If there is still something wrong how do i fix it?

       

      OLD PART DO NOT READ: (READ UPDATE 3 instead)

      I have two Windows 10 Machines with two MHQH19B-XTR 40 Gbit Adapters and a QSFP cable in between. The Vlan manager is opensm.

       

      The connection should be about 32Gbits Lan. In reality i only get 5 Gbit performance. So clearly something is very wrong.

      C:\Program Files\Mellanox\MLNX_VPI\IB\Tools>iblinkinfo

      CA: E8400:

            0x0002c903004cdfb1      2    1[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>       1    1[  ] "IP35" ( )

      CA: IP35:

            0x0002c903004ef325      1    1[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>       2    1[  ] "E8400" ( )

       

      I tested my IPoIB with a program called lanbench and nd_read_bw:

      nd_read_bw -a -n 100 -C 169.254.195.189

      #qp #bytes #iterations    MR [Mmps]     Gb/s     CPU Util.

      0   512       100          0.843        3.45     0.00

      0   1024      100          0.629        5.15     0.00

      0   2048      100          0.313        5.13     0.00

      0   4096      100          0.165        5.39     0.00

      0   8192      100          0.083        5.44     0.00

      0   16384     100          0.042        5.47     0.00

      0   32768     100          0.021        5.47     100.00

      ..stays at 5.47 after that. with CPU util 100%

      The processor is an intel core I7 4790k so it should not be at 100%. According to Taskmanager only 1 Core is actively used.

      Firmware, Drivers, Windows 10 are up to date.

       

      My goal is to get the fastest possible File sharing between two windows 10 machines.

      What could be the problem here and how do I fix it?

       

       

       

      Speed.PNG

       

      After endless hours of searching I found out that vstat showed that I have a 10GBit connection.

       

      C:\Users\Daniel>"C:\Program Files\Mellanox\MLNX_VPI\IB\Tools\vstat.exe"

       

              hca_idx=0

              uplink={BUS=PCI_E Gen2, SPEED=5.0 Gbps, WIDTH=x8, CAPS=5.0*x8} --> Looks good

              MSI-X={ENABLED=1, SUPPORTED=128, GRANTED=10, ALL_MASKED=N}

              vendor_id=0x02c9

              vendor_part_id=26428

              hw_ver=0xb0

              fw_ver=2.09.1000

              PSID=MT_0D90110009

              node_guid=0002:c903:004e:f324

              num_phys_ports=1

                      port=1

                      port_guid=0002:c903:004e:f325

                      port_state=PORT_ACTIVE (4)

                      link_speed=10.00 Gbps

                      link_width=4x (2)

                      rate=40.00 Gbps

                      real_rate=32.00 Gbps (QDR)

                      port_phys_state=LINK_UP (5)

                     active_speed=10.00 Gbps --> WHY?

                      sm_lid=0x0001

                      port_lid=0x0001

                      port_lmc=0x0

                      transport=IB

                      max_mtu=4096 (5)

                      active_mtu=4096 (5)

                      GID[0]=fe80:0000:0000:0000:0002:c903:004e:f325

       

      What I should get is : (thx to erez)

      PCI_LANES(8)*PCI_SPEED(5)*PCI_ENCODING(0.8)*PCI_HEADERS(128/152)*PCI_FLOW_CONT(0.95) = 25.6 Gbit

       

      Can anyone help me with this problem?

        • Re: 40Gb/s IPoIB only gives 5Gb/s real throughput?!
          zhangsuo

          Hi   Danie

          Could you help  check the type of your pcie slot

           

          Thanks

          • Re: 40Gb/s IPoIB only gives 5Gb/s real throughput?!
            praetzel

            Yes 40Gb/s data rate, but sending 8 bits data in a 10 bit packet giving 32Gb/s max data thruput; however the PCIe bus will limit you to about 25Gb/s.

            Keep in mind that the performance for hardware to hardware is better than software to software.  I've only used Mellanox cards with Linux and the performance for hardware to hardware hits 25Gb/s with ConnectX-2 cards.

            The IB equipment you are using has 4 pairs of wire running at 10Gb/s each - hence 40Gb/s total.

             

            Real world file sharing, even with older 10Gb/s InfiniHost cards is better than 10Gb/s ethernet.  My MAXIMUM performance tests (using the Linux fio program) are below.  That being said we've avoided Windows file servers since at least Windows 2000 - the performance has been terrible compared to Linux; esp. when one factors in the cost of the hardware required.

             

            I would suggest that you compare the exact servers using an ethernet link to see how it compares.  In the end theoretical performance is nice - but what really matters is the actual software you are using.  In my case going to 10Gb ethernet or QDR IB things like data replication (ZFS snapshots, rsync) went from 90 minutes to sub 3 minutes.  It was often not the increased bandwidth but the lower latency (IOPs) that mattered.  For user applications accessing the file server - compile times were only reduced by about 30% going to InfiniBand or 10Gb ethernet - but the ethernet is around 10x as expensive.  I've not performance tested our Oracle database - but it went to 10Gb ethernet because my IB setup is for the students and I don't trust it yet on a "corporate" server.

             

            In the case of file sharing you'll want to see if you're using the old ports 137 to 139 instead of 445 as that can impact performance.

             

            Also - there is no way to exploit the exceptionally low latency of InfiniBand unless you've got SSDs or your data in RAM.

             

             

            NetworkGB Data
            in 30 sec
            Aggregate
            Bandwidth (MB/s, Gb/s)
            Bandwidth
            (MB/s, Mb/s)
            latency (ms)iops
            QDR IB 40Gb/s
            NFS over RDMA
            943,100, 25802, 6.40.615 12,535
            DDR IB 20Gb/s
            NFS over RDMA
            24.4834, 6.7208, 1.72.43256
            SDR IB 10Gb/s
            NFS over RDMA
            22.3762, 6.1190, 1.52.572978
            QDR IB 40Gb/s16.7568, 4.5142, 1.13.42218
            DDR IB 20Gb/s13.9473, 3.8118, 0.944.11845
            SDR IB 10Gb/s13.8470, 3.8117, 0.944.21840
            10Gb/s ethernet5.9202, 1.651, 0.419.7793
            1Gb/s ethernet3.2112, 0.902817.8438
            100Mb/s ethernet346MB11.52.917445
            10Mb/s ethernet via switch36MB1.2279kB/s17974
            10Mb/s ethernet via hub33MB1.0260kB/s19204
            1 of 1 people found this helpful