NVIDIA will present Efficient Communication in GPU Clusters with GPUDirect Technologies, the joint development from Mellanox and NVIDIA GPUDirect ASYNC. The talk will be held at IEEE Hot Interconnects at the Huawei North America Headquarters in Santa Clara, CA on Friday August 26, 2016 at 1:30pm.
Discrete GPUs have become ubiquitous in computing platforms. State-of-the-art GPUs are connected on a compute node via the PCI-Express bus and have dedicated on-board high-bandwidth memory. Efficiently feeding data to the GPU and streaming results out of the GPU is critical to maximize the utilization of compute resources on the GPU. GPUDirect is a family of technologies that allow peer GPUs, CPU, third party network adapters, solid-state drives and other devices to directly read and write to a GPU device memory. They eliminate unnecessary memory copies and dramatically reduce CPU overhead involved in moving data from/to GPU device memory. This can result in significant improvements in communication performance for applications. GPUDirect Async, the most recent addition to this technology suite, also allows a GPU to trigger and poll for completion of operations performed by a third-party I/O device. In this tutorial, we provide an overview of GPUDirect family of technologies. We then go into details of each of the technologies including GPUDirect Peer-to-peer, GDR Copy, GPUDirect RDMA and GPUDirect Async. We present the capabilities in NVIDIA GPU hardware and software that enable these technologies. We provide details of user mode and kernel mode API that allow developers to add GPUDirect capabilities in communication libraries and network drivers. We also provide an overview of how node architectures can impact the performance of GPUDirect technologies.