9 Replies Latest reply on Nov 7, 2013 4:31 PM by inbusiness

    IPoIB Performance - ESXi 5.1 U1

      Hi all

       

      First of all I'd just like to say I think its excellent that Mellanox provides a forum for IB home labs / hobby, that is very good service.

       

      For me, my home lab is a way of testing out "new" solutions before I consider recommending them - I like to have full confidence in my recommendations and of course its a great way to push your skills where you may not be able to do in a corporate / budget environment.

       

      Anyway, I am very new to IB but with it becoming prominent in the Big Data market and also VSANs (storage in the cabinet); I wanted to see what it was all about.

       

      I have purchased the following (I reliase its not cutting edge, but early next year I will be upgrading to a QDR / 40Gbps Mellanox IB switch with built-in SM) assuming I can get this working the way I expect:

       

      1 x Voltaire GridDirector ISR 9024D (not the M model)

      2 x MHGH28-XTC (Rev X1) HCA cards - I flashed these to firmware version 2.7000

      2 x CX4 cables

      2 x VMware ESXi custom systems

      2 x Intel 335 SSDs (500MB/s each) - in 2 weeks this will become 4 x Intel 335 SSDs  (providing theoretical 2Gbps ish IO in RAID-0)

       

      Ok, I have installed the relevant drivers - for sake of a simple guide (which can be corrected if you think I have missed / done something wrong) here is what I did:

       

      -------------------------

       

      [ INFINIBAND ]

       

      1. Install the Mellanox OFED drivers

      esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1

       

      esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check

       

      esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip

       

      Installation Result

         Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.

         Reboot Required: true

         VIBs Installed: Mellanox_bootbank_net-ib-cm_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-ib-core_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-ib-ipoib_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-ib-mad_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-ib-sa_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-ib-umad_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-mlx4-core_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_net-mlx4-ib_1.8.2.0-1OEM.500.0.0.472560, Mellanox_bootbank_scsi-ib-srp_1.8.2.0-1OEM.500.0.0.472560

         VIBs Removed:

         VIBs Skipped:

       

      esxcli software acceptance set --level=CommunitySupported

      esxcli software vib install -v /tmp/ib-opensm-3.3.16.x86_64.vib --no-sig-check

       

      2. Reboot

       

      3. Fix MTU and partitions.conf

       

      vi /tmp/partitions.conf

      Default=0x7fff,ipoib,mtu=5:ALL=full;

       

      cp partitions.conf /scratch/opensm/0x001a4bffff0c1399/

      cp partitions.conf /scratch/opensm/0x001a4bffff0c139a/

       

      4. Flashed both HCA cards to firmware 2.7000

       

      5.Created a virtual network in ESXi using one port on the HCA each (per ESXi system) - ESXi recognises this vnic as up and 20Gbps

       

      6. TRIED to set the MTU > 2k but failed, won't go higher than 2k in the vswitch.

       

      7. Created 2 x WIN7 systems each with 2x4GHz vCPUs, 8GB RAM, 1 x SSD based HDD (theoretical 500MB/s or slightly less IO - no other VM using this SSD datastore) and configured NICs using IP on the IPoIB vswitch same subnet, ping works etc

       

      8. Copied a 3.6GB ISO from WIN701 to WIN702 - 289Mbps (15secs) - thats fast but I was expecting more throughput

       

      9. Created a 4GB RAM disk on each system

       

      10. Re-copied the above file, result: 360MB/sec

      ---------------------------------

       

      I was expecting much quicker than this copy rates, especially via RAMdisk - are there any areas that you can suggest I look at as this is not performing at the level I'd expect.

       

      Thanks

        • Re: IPoIB Performance - ESXi 5.1 U1

          Here is another test using Linux VMs:

           

          [root@ib-lnx-01 ~]# rsync -av --progress /tmp/SM-6.3.2.0.632023-e50-00_OVF10.ova root@192.168.0.116:/tmp

          root@192.168.0.116's password:

          sending incremental file list

          SM-6.3.2.0.632023-e50-00_OVF10.ova

            3370676736 100%  217.62MB/s    0:00:14 (xfer#1, to-check=0/1)

          ^[[28~

          sent 232340 bytes  received 464511 bytes  16019.56 bytes/sec

          total size is 3370676736  speedup is 4837.01

           

          So I'm getting under the performance of a 2Gbps connection, does anyone have any ideas or is this the limit on the ESXi driver / IPoIB implementation using my technology?

          • Re: IPoIB Performance - ESXi 5.1 U1
            ingvar_j

            HI. We also noted that it is not possible to use more than 2044 MTU on the ESXi host in Vsphere 5.1u1

            To compare your performance with ours:

            -  2 linux VMs on 2 separate ESXi hosts,

            -  MTU size 2044 on the virtual nics and on both the vmknics (IPoIB) on the hosts

            -  iperf is the testing tool (since it does not involve disk access, just shuffles the data in memory)

            - The speed we got was about  9-9,5 Gbits/s

            - If using MTU=1500, the speed dropped to ca 7 Gbit/s

            A Vmotion of a VM took typically 4 sec


            Note that the speed gained is of course dependant to the type of hardware you have (cpu speed, #cores, pci bus type etc) In our setup we have QDR speed on the IB switches.

             

            As for IPoIB, you would at least get 10-12 Gb/s on a physical linux box (4k MTU).

             

            One more thing: if you test to read from a server/nas using disk access, it can be a good advice to skip the WRITE operations on the receiving end.

            Just output the result to /dev/null like:

            (the command will find all files, execute 'cat' and redirect output to null point it to a location with LARGE files like ISO/DVDs to get the best result)

             

            nfs mount a share on the NAS as /mnt/remote

            cd /mnt/remote/<iso subdir>

            find . -type f -print -exec sh -c 'cat "$1" >/dev/null' {} {} \;

             

            I used this meaasure to verify the speed from my NAS which could not run iperf.

            Regards, Ingvar

              • Re: IPoIB Performance - ESXi 5.1 U1

                Hi Ingvar

                 

                Thank you very much for your response, you are getting some nice speeds there - would you mind sharing the iperf command you are using to test this and also describe your VMware configuration if you don't mind? (virtual machine network / vswitch and VM(s) config) just purely so I can emulate your setup.

                 

                I was disappointed with the disk speeds, but I will work at it as it was only a first test; however first I would like to verify that the setup is correct and so far using a RAMdisk this is a no.

                 

                Thanks once again

                  • Re: IPoIB Performance - ESXi 5.1 U1
                    ingvar_j

                    The go-to-place for iperf is

                    http://iperf.fr/

                    (win/linux/macOS/solaris and source code download)

                    There is no ESXi version available, so thats why we did our pert tests from a VM-guest instead

                    The iperf tests always takes two boxes, one running the server and the other as client

                    Server side:

                    iperf -s

                     

                    Client side

                    iperf -c <ip-addr-to-server>

                    You can add the following

                      -t 100           run in 100 seconds

                      -i 5             print the result every 5 seconds

                      -d              dual direction test

                      -P  2          run 2 parallel tests (just pick any number, the default is 1)

                     

                    Remember to turn off /adjust the firewall settings to allow incoming traffic on port 5001

                     

                    Linux

                    If you run Centos/Fedora style of distro, add the EPEL repository and install iperf from there.

                    http://www.rackspace.com/knowledge_center/article/installing-rhel-epel-repo-on-centos-5x-or-6x

                     

                    There you can also find "nload"  which is a nice tool for just checking the current I/O performance

                    Start nload with

                    nload ib0 (if a physical box using IPoIB)

                    or

                    nload eth0 (if a vmGuest using eth0 as first nic)

                     

                    For our VMware setup, we have  blade chassis system with built-in Mellanox Infiniscale-IV  QDR switches and a 4036 on top

                    The cabling is QDR on QFSP

                    The HCAs are Mellanox MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

                    The ESXi hosts has 2 CPU with 8 cores/2T  each A total of 32 logical processors (E5-2670) 

                    ESXi 5,1,0 build 799733

                    IPoIB nics configured as uplinks to a dvSwitch

                     

                    The VMguests are 2vCPU /2GB ram installed with Centos 6.4 64bit

                    Just let me know if there are any other vmware specific settings/values to compare. We just installed the MX Ofed for vmware according with the manual.

                      • Re: IPoIB Performance - ESXi 5.1 U1

                        Hi Ingvar

                         

                        I really appreciate you taking the time out to post what you have above - thanks, I will try this later once I finish work for the day and come back with my results.

                         

                        I will use the same distro as yourself (good choice btw :-) ) but I will not be using a distributed switch, though that should not make any difference.

                         

                        I will post again later.

                         

                        Thanks

                        • Re: IPoIB Performance - ESXi 5.1 U1

                          Ok now thats interesting, using "iperf" I am seeing the following performance:

                           

                          # iperf -c 10.0.0.20

                          Interval 0.0-10.0 sec

                          Transfer: 9.12 GBytes

                          Bandwidth: 7.83 Gbits/sec

                           

                          # iperf -c 10.0.0.20 -G

                          Interval: 0.0-10.0 sec

                          Transfer: 9.08 GBytes

                          Bandwidth: 0.91 GBytes/sec

                           

                          # iperf -c 10.0.0.20 -M

                          Interval: 0.0-10.0 sec

                          Transfer: 9330 MBytes

                          Bandwidth: 933 MBytes/sec

                           

                          This is between two independent ESXi 5.1 hypervisors hosting two independent VMs:

                           

                          VM #1 (Server) - CentOS x64 6.4 // 2vCPUs (4Ghz each) // 2GB RAM (DDR3 PC3-10666C9 1333MHz Dual Channel)

                          VM #1 (Client) - CentOS x64 6.4 // 2vCPUs (3.2Ghz each) // 2GB RAM (DDR3 PC3-10666C9 1333MHz Dual Channel)

                           

                          ESXi Config:

                           

                          Dedicated vSwitch with Virtual Machine Network (no routing) containing vmnic_ib0 20000 Full MTU=2044

                          ESXi 1 & 2 ConnectX HCA CX4 cables into Voltaire GridDirector ISR9024D (1xUplink per ESXi host)

                           

                          Not sure what kind of ceiling in terms of performance I should expect from this kind of setup / test however - it *looks* about right, but disappointed with the disk performance - I need to work more on that. What kind of performance did you get with your NAS? how many spindles / speeds / size etc? (if you don't mind me asking) - thanks!

                            • Re: IPoIB Performance - ESXi 5.1 U1
                              ingvar_j

                              In the first iperf test run you got 7,83 Gb/s,  which isn't that bad

                              The other two tests using the "-M and "-G" switches I havnt tried. (M= TCP_MAXSEG  SIZE) but what is the "G" option?

                               

                              We got two IB ports from each ESXi connected to the switches, I'm unsure if it helps to speed things up when just running one vmguest in the host (standard RoundRobin set-up on the dvswitch up-links. I guess all trafic from a single test/tcp connection  will use the same vmknic.

                               

                              For disk access in this setup/test we actually used  an NFS mount to a  NAS with 12 disks and 2*10Gb  ethernet interfaces connected via a MX6036 with Ethernet-to-IB gateway. So it was not connected directly with IB interfaces.

                              When doing the Read test (described in my previous post:  ' find . -f  ..............cat >/dev/nul ........') from a vmguest we got about 3-4 Gb/s. I think we hit the performance celing in this specific NAS. We could have run I/O meter of course to verify the speed, but currently we have no Windows boxes installed for this test. (never played with the linux version of IOMeter)

                               

                              Another option whould be to run an NFS NAS server on IPoIB (dont have)  or SRP (neither) to speed things up.

                              I guess SRP whould be the best option since it does not use the IP stack at all, just RDMA directly to the target. I have no experience at all with SRP, but it is mentioned in the release notes for 1.8.1.

                               

                              Someone else having experiences with SRP on Vmware , willing to share experiences?