What Happened to the Good Old RFC2544?

Version 1

    Compromising on the basics has resulted in broken data centers…

    After the Spectrum vs. Tomahawk Tolly report was published, people asked me:

    “Why was this great report commissioned to Tolly? Isn’t there an industry benchmark where multiple switch vendors participate?”

    So, the simple answer is: No, unfortunately there isn’t…

    Up until about 3 years ago, Nick Lippis and Network World ran an “Industry Performance Benchmark”.

    These reports were conducted by a neutral third party, and different switch vendors used to participate and publish reports showing how great their switches were, how they passed RFC 2544, 2889 and 3918, etc.

    blog-image1

    Time to check which switch you plan to use in your data center!!!

    Since the release of Trident2, which failed to pass the very basic RFC 2544 (it lost 19.8% of packets when tested with small packets), these industry reports seemed to have vanished. It is as if no one wants to run a benchmark showing RFC 2544 anymore. No wonder the tests are all failing.

    The questions you really need to ask are the following:

    • Why is it that RFC 2544, which was established to test switches and verify they don’t lose packets, is all of a sudden being “forgotten”?
    • Is the Ethernet community lowering its standards because it has become too hard to keep up with the technology?
    • Has it become difficult to build 40GbE and 100GbE switches running at wire speed for all packet sizes and based on modern, true cut-through technology?

    The answers to all these questions is simple: RFC 2544 is as important as ever and still the best way to test a data center switch. Sure, it is hard to build a state-of-the-art switch which is why RFC 2544 is now more important than ever. This is because there are more small packets in the network (requests packets, control packets, cellular messages, SYN attacks…), and ZeroPacketLoss was and is still essential for your Ethernet switches.

    Here is how Arista defined RFC 2544 before abandoning it:

    “RFC 2544 is the industry leading network device benchmarking test specification since 1999, established by the Internet Engineering Task Force (IETF). The standard outlines methodologies to evaluate the performance of network devices using throughput, latency and frame loss. Results of the test provide performance metrics for the Device Under Test (DUT). The test defines bi-directional traffic flows with varying frame size to simulate real world traffic conditions.”

    And indeed the older Fulcrum-based 10GbE switch passed these tests: http://www.arista.com/media/system/pdf/LatencyReport.pdf

    A simple web search will provide you with numerous articles defining the importance of running RFC 2544 before choosing a switch.

    While working on this blog I ran into a performance report sponsored by one of the big server vendors for a switch using the Broadcom Tomahawk ASIC. They worked hard to make a failed RFC 2544 look okay. Using a very specific port connectivity and packet sizes, RFC 2544 failed only with 64 Byte packets, using 8 ports (out of 32 ports), and even the mesh test passed. What is a data center customer to conclude from this report? That one should buy a 32-port switch and use only 8 ports? Sponsoring such a report clearly means that RFC 2544, 2889 and 3918 are still important when making a switch decision. I definitely agree: these tests have been established to help customers buy the best switches for their data centers.

    So, how has the decline in RFC 2544 testing resulted in unfair clouds?

    Not surprisingly, once the market accepted the packet-loss first introduced by Trident2, things have not improved. In fact, they’ve gotten worse.

    Building a 100GbE switch is harder than building a 40GbE switch, and the compromises are growing worse. So, the 19.8% switch packet loss has soared up to 30% switch packet loss, and the sizes of packets being lost have increased.

    Moreover, a single switch ASIC is now comprised of multiple cores, which means a new type of compromise. When an ASIC is built out of multiple cores, not all ports are equal. What does this actually mean? It means that nothing is predictable any longer. The behavior depends upon which ports are allocated to which buffers (yes, it is not a single shared buffer anymore). The situation also depend on the hardware layout which defines which ports are assigned to which switch cores. To make it simple: 2 users connected to 2 different ports do not get the same bandwidth… for more details, read the Spectrum Tolly report.

    The latest Broadcom Based Switch Tolly report was released three weeks after the original Tolly report was issued. It attempted to “answer” the RFC 2544 failure, but nowhere did it refute the fairness issue. It is hard to explain why 2 ports connected to the same switch provide different service level agreements. In one test, the results showed 3% vs. 50% of the available bandwidth. So, this means you have one customer who is very happy and another customer who is very unhappy. But this would be true only if the customer were to know the facts, right? Has anyone told the unhappy customer that the SLA is broken? Probably not.

    Bottom line:

    Has compromising on the basics truly benefitted end customers and proven worthwhile? Are they really happy with the additional, worsening compromises they have had to make in order to build their switches with faster switch ASICs, which as all can see, are undergoing multiple ASIC revisions and at the end of the day yielding delayed, compromised, packet-losing 100GbE data centers? One should think not!

    Meanwhile…

    Mellanox Spectrum™ runs at line rate at all packet sizes. The solution supports true cut through switching; has a single shared buffer; consists of a single, symmetrically balanced switch core; provides the world’s lowest power; and runs MLNX-OS® and Cumulus Linux, with more network operating systems coming…

    So, stop compromising and get Mellanox Spectrum for your data center today!!!

     

    A Final Word About Latency
    Note that this report also uses a methodology to measure latency that is unusual at best, and bordering on deceptive. It is standard industry practice to measure latency from the first bit into the switch to the first bit out (FIFO). By contrast here they took the unusual approach of using a last in first out (LIFO) latency measurement methodology. Using LIFO measurements has the effect of dramatically reducing reported latencies. But unlike the normal FIFO measurements the results are not particularly enlightening or useful. For example you cannot just add latencies and get the results through a multi-hop environment. Additionally for a true cut-through switch such as the Mellanox Spectrum, using LIFO measurements would actually result in negative latency measurements – which clearly doesn’t make sense. The only reason to use these non-standard LIFO measurements is to obscure the penalty caused by switches not able to perform cut-through switching and to reduce the otherwise very large reported latencies that result from store and forward switching.