One of my favorite lines that came from a coworker, when asked about the future of InfiniBand was:
“What we are seeing is a combination of technologies… you could almost call it Infini-Net or Ether-Band”
Which brings me to a question that I hear from customers again and again: “How is InfiniBand staying so strong in an Ethernet dominated world?” or “Why do I still need to consider InfiniBand when Ethernet is catching up in latency and bandwidth?”
To which, there would be a biased and unbiased opinion on both sides of the discussion, but the answer is a resounding “Mellanox is leading the convergence of the two technologies while keeping them both on a definitive roadmap to which they both have no end in sight.”.
Here are some examples of trends in the interconnect industry and how it affects the future of InfiniBand:
- The TOP500 Super Computers in the High Performance Computing arena…
- Oracle has adopted InfiniBand as its de-facto backplane for all Exadata, Exalogic, and Exalytics systems, and more
- Unnamed High Frequency Traders are still using InfiniBand due to its ability to scale with Multicast better than Ethernet in large deployments
- Clouds are being deployed en masse with InfiniBand since it is the only technology that can scale to hundreds and thousands of nodes with true, native multipathing and scalability at layer 2
- Storage vendors are seeing the end of the road with Fibre Channel, while at the same time, FCoE does not seem to be taking off like some had predicted
- Intel validated the future of InfiniBand by acquiring the assets of QLogic
- Microsoft placed RDMA drivers inside Windows 2012
So if the bandwidth of InfiniBand and Ethernet are catching up, and the delta of latencies of the switch chips are starting to decrease in spread of port to port measurements and jitter, what technology does InfiniBand still provide that Ethernet does not have?
This is an interesting question, and let’s first point out one major point: Mellanox has both technologies (Ethernet and InfiniBand) on every adapter (ConnectX-3) and our new switch line based on the SwitchX-2 ASIC. We see true value in Ethernet, and we are the only company in the world that subscribes to the concept of Virtual Protocol Interconnect or VPI, aka “Choose your own L1/L2 transport that is right for the job.”
InfiniBand as a Key Advantage:
So what is InfiniBand and why are more and more storage vendors moving to it for both the backplane and network connect? InfiniBand is a standards-based protocol that came into existence circa 2000. InfiniBand was a merger of two technologies, NGIO and Future I/O, which were competing to be the PCI bus replacement technology. By design InfiniBand has the characteristics of a bus technology. In fact PCI Express the eventual PCI replacement technology is conceptually a subset of InfiniBand.
InfiniBand’s core differentiator is twofold. First it uses a credit based flow control system. This means that data is never sent unless the receiver can guarantee sufficient buffering. This makes InfiniBand a lossless fabric. Secondly InfiniBand natively supports Remote Dynamic Memory Access (RDMA), the ability to move data between memory regions on two remote systems in a manner that fully offloads the CPU and operating system. A concept of that is a legacy of its original bus design, RDMA is critical to distributed systems. InfiniBand with RDMA enables a number of key advantages.
InfiniBand physical signaling technology has always stayed well ahead of other network technologies allowing the greatest bandwidth of any networking protocol. InfiniBand today runs at 56Gb/s with a road map to get to EDR (100Gb/s) in the not too distant future. The name InfiniBand itself is a reference to the bandwidth promise. The InfiniBand roadmap is deliberately designed to guarantee that bandwidth of a single link will remain greater that the data rate of the PCIExpress bus. This allows the system to move data over the network as fast as it can possibly generate it without ever backing up to due to a network limitation. This effectively makes the bandwidth of InfiniBand… Infinite.
Although bandwidth is the probably the best known property of InfiniBand the benefits of RDMA actually result in more performance gain for most storage applications. InfiniBand’s ability to bypass the operating system and CPU using RDMA allow much more efficient data movement paths. The Operating system is responsible for managing all resources of the system including access to CPU and IO devices. The normal data path for protocols like TCP, UDP, NFS, iSCSI all have to wait in line with the other applications and system processes to get their turn on the CPU. This not only slows the network down it uses system resources that could be used for executing the jobs faster.
The RDMA bypass allows the data path for InfiniBand traffic to skip the lines. Data is placed immediately when it is received without being subject to variable delays based on CPU load. This has three effects. First, there is no waiting, so the latency of transactions is incredibly low. Raw RDMA ½ RTT latency is sub microsecond. Secondly, because there is no contention for resources the latency will be consistent. Third, and finally, by skipping the OS using RDMA results in a large savings of CPU cycles. With a more efficient system, those saved CPU cycles can be used to accelerate application performance.
How has Ethernet become more like Infiniband to fill those niches
Ethernet has traditionally amended its feature set with new concepts or adopted the concepts of other technologies. What has emerged over the last few years has been data center bridging (DCB) with the introduction of priority flow control and congestion notification. These are both technologies adopted from InfiniBand’s credit flow system and service level/QoS capabilities. RoCE was introduced to allow RDMA over Converged Ethernet, so applications can enjoy faster network message transfer than sockets-based applications. OpenFlow is being created to create ‘dumb switches’ where the routing decisions are made by a single, topology aware entity. Cut-thru and lossless chips with shallow buffers are being introduced to the market, to provide clean networks in which servers are maintaining their own congestion instead of making the Ethernet switch a temporal storage device. TRILL is being developed as a next generation answer to LAG/LACP and spanning tree, but still not a standard in creating truly large, scale out designs that InfiniBand has enjoyed with its native multipathing and automatic path migration (APM) for highly available links.
So where does Ethernet still lag behind InfiniBand
InfiniBand still has the upper leg on Ethernet, even though Ethernet is treated as the standard for local area networking. Ethernet is lacking in destination based routing, which takes away the need to broadcast ARPs, resulting in decreased overhead on the network. The InfiniBand Subnet Manager (SM) predetermines and programs routes such that there is no need for a discovery on a regular interval, only when hosts join or leave the fabric. As such, the SM does not participate in dataflow, which means more efficiency of packet routing on the L2 domain. New bandwidth standards take many years to decide upon, and implement, while the IBTA, the InfiniBand Trade Assocation, has an endless roadmap of continued bandwidth increases including the current Fourteen Data Rate (FDR @ 56Gb/s) and the upcoming EDR and HDR speeds. Enhanced Transmission Selection and Traffic Classes are still confusing and not as robust as Virtual Lanes (VLs) and mapped Service Levels for true Quality of Service (QoS) that has been in the technology since day one.
How is Mellanox Accelerating the convergence of Ethernet and InfiniBand:
Mellanox has been working to develop robust products that put the decision of the network administrator in control of the technology he or she wishes to deploy, and at a moment’s notice. The term VPI or Virtual Protocol Interconnect is the definition of choosing any port on any device – switch or adapter – and configuring its speed and protocol. For example, today one might want 10Gb/s Ethernet, and tomorrow, the administrator might choose to reconfigure that port and adapter to 56Gb/s InfiniBand based on a new application.
Additionally, InfiniBand has moved to same encoding scheme, from 8b/10b encoding to 64b/66b encoding, same as Ethernet. This means there is as little as 1% overhead from data transmission to control bits, down from 20% on QDR InfiniBand. This gives nearly line-rate performance on FDR today.
Lastly, Mellanox has been driving the signal rate to stay ahead of PCIe bus width, so more virtual machines can be utilized per server, more transactions per second in a database can be queried or written, and more blocks of storage can be written per second, which makes Fibre Channel look even more antiquated.
The Future of InfiniBand…
InfiniBand is still the most scalable, performance oriented interconnect technology in the market today. Mellanox believes in the future of both InfiniBand and Ethernet, and ultimately will make the interconnect transparent to the user..