Skip navigation
All Places > HPC > Blog > Authors scotschultz


11 Posts authored by: scotschultz

Yesterday was a full day of topics including a keynote from Pavel Shamis, Oak Ridge National Labs covering OpenSHMEM for Exascale and a second keynote from DK Panda, OSU on HPC
programming models for Exascale system!  Day one also covered a wide variety of topics from deploying high performance clouds and how GPU’s are being leveraged in machine learning


Day two at the HPC Advisory Council, Stanford University event! These events are always packed with great sessions that cover topics from trends in High Performance Computing from all aspects!  Traditional HPC usage, the latest in liquid cooling, accelerating Big Data, and even who is using HPC in the enterprise.  The HPC Advisory Council events are also a great place to learn and understand how technology influences and is being leveraged in new and exciting ways.  There is literally something for everyone, regardless if you think "HPC doesn't really apply to me..."


Richard Graham, from Mellanox started today's presentation keynote with how Mellanox has introduced several new capabilities and features into the latest 100Gb/s interconnect technology with ConnectX-4, Switch-IB and LinkX.   Mellanox EDR 100G solutions were selected by the DOE for these 2017-2018 leadership systems as we deliver superior performance and scalability over current / future competition. Along with our proven performance, scalability, application offloads and management capabilities,  He also explained how Mellanox was also instrumental in
bringing advanced capabilities to meet the requirements for CORAL including techniques in acceleration with other types of hardware – Open POWER and GPUs.

Useful links:

HPCAC 2015 Stanford.png


The second keynote today, Arno Kolster from Ebay / PayPal explained how they are using advanced HPC in their everyday business to prevent fraud detection, advanced pattern recognition and leveraging HPC to address new challenges that face e-commerce. 

HPCAC 2015 Stanford-2.png


Later today, we will also hear from Antonis Karalis, from the HPC|Music project – this is a new area of research showcasing how HPC will be used to usher in a new era in the areas of music production.  Audio and music is such a huge part of everyday life, and unlike video/cinema where advancements in technology are paramount to drive the industry – audio production has a different set of challenges that are more dependent upon real-time data processing, extremely sensitive and require low latency communications; especially as we begin to model more realistic analog- type sounds at extreme resolution that just cannot be done on today’s most expensive workstations.


Head over to the HPC Advisory Council site to see where the next workshop will be held, and I’ll look forward to seeing you at one of the soon!

Mellanox participates at the 2014 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IPDPS  is an international forum for engineers and scientists from around the world to present their latest research findings in all aspects of parallel computation.


The 28th IEEE-IPDPS is being held in Phoenix at the Arizona Grand Resort.  Exascale levels of computing pose many system and application-level computational  challenges.  Recent technology improvements significantly improve InfiniBand's scalability, performance, and ease of use.



Richard Graham, Senior Solutions Architect at Mellanox, gives a presentation and discusses communication middleware architecture issues, as they relate to extreme-scale computing.  Topics covered included the latest advancements with InfiniBand, such as the latest architecture of Connect-IB, Dynamically Connected Transport, CoreDirect architecture, On-Demand Paging and solutions that Mellanox is providing to address the challenges of extreme scale computing.


After the session, Richard Graham also participated in a private panel with over fifty PhD students of various technical disciplines, to discuss the industry as a whole and what they might expect as they make the transition from academia as the next generation industry thought leaders.

The 10th Annual OpenFabrics International Developer Workshop was held on March 30-April 2  and offered a multi-day event dedicated to the development and improvement of OpenFabrics Software (OFS).  This year's workshop theme was around "Disruptive Technology"; the pro's and the con's and how the software must be ready to adapt as we approach a new era in high performance computing.


The Workshop kicked off Sunday, March 30, with a keynote from Dr. Thomas Sterling, Executive Associate Director and Chief Scientist at CREST. Presentations began the following morning covering topics focused on I/O for Exascale systems and Enterprise applications and included interest areas such as distributed computing, storage and data access and data analysis applications.


The 2014 Workshop presentations are available from the OpenFabrics website.


The 2nd Annual IBUG Workshop was also held  at Monterey after the International Developer Workshop.

The annual IBUG Workshop was a 2-day event with sessions related to understanding, implementing, and administering OpenFabrics Software (OFS) and the underlying hardware. Bringing together users of InfiniBand, RoCE and all RDMA technologies bundled in the OpenFabrics Software suite, the IBUG provides them with a setting to talk together about the challenges and different opportunities using OFS.   A few of the topics for this year's event included, virtualization of IB, tuning MPI for IB, RDMA stacks and SMC-R/RoCE updates.


Look for the video's posted on InsideHPC in the near future.

The HPC Advisory Council published a best practices paper showing record application performance for LS-DYNA® Automotive Crash Simulation, one of the automotive industry’s most computational and network intensive applications for automotive design and safety.  The paper can be downloaded here : HPC Advisory Council : LS-Dyna Performance Benchmark and Profiling.  The LS-DYNA benchmarks were tested on a Dell™ PowerEdge R720 based-cluster comprised of 32 nodes and with networking provided by Mellanox Connect-IB™ 56Gb/s InfiniBand adapters and switch.  The results demonstrate that the combined solution delivers world-leading performance versus any given system at these sizes, or versus larger core count system based on Ethernet or proprietary interconnect solution based supercomputers.


The TopCrunch project ( is used to track the aggregate performance trends of high performance computer systems and engineering software.  Rather than using a synthetic benchmark, actual engineering software applications are used with real datasets and run on high performance computer systems.



The performance testing was performed at the HPC Advisory Council High Performance Center. The center provides users and vendors with a unique capability to design, develop, test and qualify solutions for the HPC market. The center, located in California, operates 24/7 and provides secure remote access to its users.

The HPC Advisory Council, together with Stanford University, is holding the HPC Advisory Council Stanford Conference and Exascale Workshop 2014 at Stanford, California.  I am attending the conference, and specifically today, the conference was kicked off with the first keynote from Mark Seager, CTO for the Technical Computing Ecosystem at Intel.  In his presentation, he discussed challenges such as extreme levels of parallelism and trends that technical computing segments should be aware of.


The presentations are being covered by InsideHPC, so if you didn't get to make it here in person, you can watch the presentations as they are posted at InsideHPC.


You may also find the second keynote of the day "Programming Models for Exascale Systems" from Dhabaleswar K. Panda, Ohio State University interesting as well.  He went into challenges in designing runtime environments of MPI/PGAS (UPC and OpenSHMEM programming models.  Also covered were insights into GPU computing, Intel MIC, scalable collectives.


Always an exceptional event - and the discussion today which was a panel session "Road to Exascale Panel", included participation from a great audience as well as the industry thought leaders, including Mellanox's CTO Michael Kagan, Intel CTO for the Technical Computing Ecosystem Mark Seager,  DK Panda from OSU,  and Alan Poston from Xyratex.


Also presentations from Mellanox, Intel and Cray, as well as a session on the work and progress that Michigan Technological University is doing.


This week I am here for a few days at George Washington University at the Dell XL user conference. GWU graciously hosted the fall event and provided a tour of their facilities.


The Colonial One HPC initiative is a joint venture between GW’s Division of Information Technology, Columbian College of Arts and Sciences and the School of Medicine and Health Sciences.


The cluster showcases performance, density, efficiency.  It’s modular architecture will support growth
allowing enhancement and expansion.  Using common hardware, the cluster accommodates a variety of hardware
configurations and accelerators.


The cluster features 1408 cpu cores and 159,744 CUDA cores in Dell C8220 and C8220x nodes interconnected with Mellanox 56 Gb/s FDR ConnectX-3 network.  A very impressive collection of technology and one of the few that has (about 33% of the system) this many GPU enabled nodes.

The Linux Foundation, the nonprofit organization is dedicated to accelerating the growth of Linux.  Enterprise companies today seek high-value solutions that deliver on integration and innovation -- solutions that are built on a solid foundation of knowledge and Linux and open source technologies.


Mellanox Technologies supplies InfiniBand and Ethernet interconnect solutions and services for servers and storage, delivering data faster to applications by providing high throughput, low latency and offloading technologies such as RDMA. Mellanox’s fast interconnect portfolio of adapters, switches, software, cables and silicon are well-suited to challenging computing environments such as High-Performance Computing, Web 2.0, cloud, financial services, databases and storage.


“Mellanox’s solutions for Linux servers offer world-leading performance for data center applications,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “The Linux Foundation’s long-term support for Linux, and its role in fast-tracking innovation and collaboration with members is extremely valuable to Mellanox and our customers.”


Read the press release here :


Linux Foundation Welcomes New Members from Enterprise Software, Hardware and Services

High-performance scientific applications typically require the lowest possible latency in order to have the parallel processes be in sync as much as possible.  In the past, this requirement drove the adoption of SMP machines, where the floating point elements (CPU, GPUs) were placed as much as possible on the same board. With the increased demands for higher compute capability, and lowering the cost of adoption for making large scale HPC more available, we have witnessed the increase of clustering as the preferred architecture for high-performance computing.


We introduce and explore some of the latest advancements in the areas of high speed networking and suggest new usage models that leverage the latest technologies that meet the desired requirements of today’s demanding applications.   The recently launched Mellanox Connect-IB™ InfiniBand adapter introduced a novel high-performance and scalable architecture for high-performance clusters.  The architecture was designed from the ground up to provide high performance and scalability for the largest supercomputers in the world, today and in the future.


The device includes a new network transport mechanism called Dynamically Connected Transport™ Service (DCT), which was invented to provide a Reliable Connection Transport mechanism — the service that provides many of InfiniBand’s advanced capabilities such as RDMA, large message sends, and low latency kernel bypass — at an unlimited cluster size.  We will also discuss optimizations for MPI collectives communications, that are frequently used for processes synchronization and show how their performance is critical for scalable, high-performance applications


The presentation posted on this blog was delivered at the NCAR International Computing for the Atmospheric Sciences Symposium (iCAS2013).

High-performance simulations require the most efficient compute platforms. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores and their utilization factor and the interconnect performance, efficiency, and scalability. Efficient high-performance computing systems require high-bandwidth, low-latency connections between thousands of multi-processor nodes, as well as high-speed storage systems.


Mellanox has released "Deploying HPC Clusters with Mellanox InfiniBand Interconnect Solutions".  This guide describes how to design, build, and test a high performance compute (HPC) cluster using Mellanox® InfiniBand interconnect covering the installation and setup of the infrastructure including:


  • HPC cluster design
  • Installation and configuration of the Mellanox Interconnect components
  • Cluster configuration and performance testing

Over the last decade, specialized heterogeneous hardware designs ranging from Cell over GPGPU to Intel


Xeon Phi have become a viable option in High Performance Computing – mostly due to the fact that these


heterogeneous architectures allow for a better flops-per-watt ratio than conventional multi-core designs.


However, the corresponding programming models for heterogeneous architectures so far remain limited.


Neither offload models (like Cuda from Nvidia, OpenACC or Intel offload directives) nor the native


execution on the accelerator (e.g execution on Intel Xeon Phi) is able to provide a single cohesive view of


the underlying fragmented heterogeneous memory.


The upcoming new GASPI standard will able to bridge this gap in the sense that GASPI can provide


partitioned global address spaces (so called segments), which span across both the memory of the Host


and e.g. an Intel Xeon Phi.


Download the solution brief here :  Fraunhofer ITWM demonstrates GPI 2.0 with Mellanox Connect-IB™ and Intel® Xeon Phi™

Mellanox presented at the 9th European LS-Dyna Users' Conference this year in Machester, UK, at the Manchester Central Convention Complex.


Darren J. Harkins, Senior Systems Engineer at Mellanox is presenting at the conference -  Sunday, June 2, 2013 - Tuesday, June 4, 2013. Mellanox Technologies was also a silver level sponsor for the event.   Experts from academia and industry, including Mellanox, presented their work to colleagues, and LS-DYNA developers talked about the latest software developments.  Mellanox presented the latest features and  information on the new Connect-IB™ InfiniBand product family, as well showed Connect-IB FDR InfiniBand demonstrating superior application performance running the LS-DYNA Benchmarks, the summary of the presentation is below :


  • Connect-IB FDR InfiniBand demonstrates superior LS-DYNA scalability performance, Up to 336% higher than 1GbE, and over 90% higher than 10GbE at 16 nodes
  • Connect-IB allows LS-DYNA to run at the highest network throughput at FDR rate and delivers ~20% higher system performance than QDR InfiniBand at 16 nodes, and the gap increases with system size
  • MPI tuning with SRQ provides better in scalability improvement
  • Speedup of 13% above the baseline at 28 nodes


Mellanox also submitted a white paper entitled “LS-DYNA Performance Optimizations via Connect-IB”, written by Pak Lui, Gilad Shainer and Brian Klaff.  The white paper covers the newest architecture from Mellanox called Connect-IB and the advantages of the newest transport service called Dynamically Connected Transport™ (DCT).  Click on the link above to download the LS-Dyna Performance Optimizations via Connect-IB whitepaper.