High-performance scientific applications typically require the lowest possible latency in order to have the parallel processes be in sync as much as possible. In the past, this requirement drove the adoption of SMP machines, where the floating point elements (CPU, GPUs) were placed as much as possible on the same board. With the increased demands for higher compute capability, and lowering the cost of adoption for making large scale HPC more available, we have witnessed the increase of clustering as the preferred architecture for high-performance computing.
We introduce and explore some of the latest advancements in the areas of high speed networking and suggest new usage models that leverage the latest technologies that meet the desired requirements of today’s demanding applications. The recently launched Mellanox Connect-IB™ InfiniBand adapter introduced a novel high-performance and scalable architecture for high-performance clusters. The architecture was designed from the ground up to provide high performance and scalability for the largest supercomputers in the world, today and in the future.
The device includes a new network transport mechanism called Dynamically Connected Transport™ Service (DCT), which was invented to provide a Reliable Connection Transport mechanism — the service that provides many of InfiniBand’s advanced capabilities such as RDMA, large message sends, and low latency kernel bypass — at an unlimited cluster size. We will also discuss optimizations for MPI collectives communications, that are frequently used for processes synchronization and show how their performance is critical for scalable, high-performance applications
The presentation posted on this blog was delivered at the NCAR International Computing for the Atmospheric Sciences Symposium (iCAS2013).