NVIDIA and Global Partners Launch New Computing Platform to Accelerate Industrial AI

NVIDIA enhances its HGX AI supercomputing platform with new technologies to help accelerate industrial and scientific high performance computing systems.

News July 09, 2021 by Stephanie Leonida

NVIDIA is a multinational technology company that specializes in manufacturing graphics processors, mobile technologies, and desktop computers.

Recently, the company announced that it will be enhancing the capabilities of its HGX AI supercomputing platform with new technologies. These technologies will be a culmination of artificial intelligence (AI) and high-performance computing (HPC). NVIDIA intends to make its platform more accessible and more applicable to a wider range of industries.

Enhancing Industrial and Scientific HPC Systems

Lenovo, Hewlett Packard Enterprise (HPE), Dell Technologies, and Microsoft Azure are among the list of partners using NVIDIA’s HGX platform for next-generation AI and computing solutions.

NVIDIA’s AI supercomputing platform, the HGX AI Supercomputer. Image used courtesy of NVIDIA

NVIDIA aims to connect with industry partners by enhancing its HGX platform with three key technologies. These technologies include the NVIDIA A100 80GB PCIe (peripheral component interconnect express) GPU (graphics processing unit), NVIDIA NDR 400G InfiniBand networking, and NVIDIA Magnum IO (input/output) GPU Direct Storage software.

General Motors (GM) is already utilizing the HGX platform to apply its HPC capabilities to computational fluid dynamics (CFD) simulations. These simulations help to design large gas turbines and jet engines. The HGX platform has been the force behind breakthrough CFD methods employed by GM and has helped to drive forward the acceleration of scientific HPC systems such as the next-generation supercomputer at the University of Edinburgh.

Additions to the HGX Platform

NVIDIA InfiniBand has been curated to help industrial and scientific customers with complex workloads that can come with large datasets and require super-fast processing of high-resolution simulations and highly parallelized algorithms.

InfiniBand is one of the first off loadable, in-network computing platforms. NVIDIA hopes the platform will help boost the performance of AI, and hyper-scale cloud infrastructures, all the while reducing cost and complexity.

NVIDIA InfiniBand provides customers with self-healing network capabilities for advanced management, quality of service (QoS), enhanced virtual lane (VL) mapping, and congestion control. Combining these aspects helps to boost application throughput. The NDR InfiniBand Switch family comprises Quantum-2 fixed-configuration switches and modular switches. Based on industry standards, the switches are backward- and forward-compatible, which aids migration and expansion of existing systems and software.

Data Processing Capabilities

Data centers are integral to the proper running of an organization's daily operations. They centralize shared IT operations and equipment. Data can be stored, processed, and dismantled in these physical facilities.

NVIDIA’s AI supercomputing platform, the HGX AI Supercomputer. Image used courtesy of NVIDIA

NVIDIA unified fabric manager (UFM) Cyber-AI platform uses real-time network telemetry and AI-powered intelligence and advanced analytics to help IT managers spot operational anomalies and predict potential network failures. Administrators can predict potential link or port failures and perform maintenance to avoid data center downtime. Industrial organizations can use the UFM platform to maintain important day-to-day operations and maintain production.

NVIDIA’s A100 80GB PCIe GPUs provide a higher GPU memory bandwidth (25% more than A100 40GB, to 2TB/s). The large memory capacity and high-memory bandwidth allow more data and larger neural networks to be held in memory. These attributes, along with faster memory bandwidth, can help customers in scientific research fields obtain more data faster.

With NVIDIA’s Magnum IO GPUDirect Storage, customers handling HPC applications can move large datasets into and out of GPU memory without compromising the central processing unit (CPU). This is made possible by enabling a direct-memory access (DMA) engine near the network adapter or storage.

GPUDirect Storage helps users to create a direct path between local or remote storage. Industrial and scientific applications can benefit from reduced I/O latency and use the full bandwidth of the network adapters and manage the impact of increased data usage.

In a recent press release, the Founder and CEO of NVIDIA, Jensen Huang, commented, “The HPC revolution started in academia and is rapidly extending across a broad range of industries.” Huang added, “Key dynamics are driving super-exponential, super-Moore’s law advances that have made HPC a useful tool for industries. NVIDIA’s HGX platform gives researchers unparalleled high-performance computing acceleration to tackle the toughest problems industries face.”