Nvidia and Intel unveil advanced HPC initiatives, bolstering AI capabilities at SC2023

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Hear from top industry leaders on Nov 15. Reserve your free pass

The world’s fastest supercomputers are getting faster with both Nvidia and Intel racing to accelerate the most powerful computing systems on Earth, with a big emphasis on AI power.

At the Supercomputing 2023 (SC23) conference in Denver today, the list of the world’s fastest 500 supercomputers was released. In one form or another, all the systems have components from Nvidia or Intel, and in many cases both. The event is also a showcase to talk about the next generation of supercomputers that are being built, what technologies they use and how they will be used.

For Nvidia, the big new system it is part of is the JUPITER supercomputer hosted at the Forschungszentrum Jülich facility in Germany. JUPITER will have 24,000 Nvidia GH200 chips and when completed, will be the most powerful AI supercomputer ever built according to Nvidia with over over 90 exaflops of performance for AI training. Nvidia is also using the event to detail a series of new innovations and AI acceleration silicon including the H200 and a quad configuration for the Grace Hopper GH200 superchip.

Not to be outdone, Intel is highlighting its work on the Aurora supercomputer at the Department of Energy’s Argonne National Laboratory that is being used to build a 1 Trillion (that’s with a T and not a typo) parameter large language model (LLM). Intel is also providing new insights into the next generation of AI acceleration and GPU technology as it ups the competitive ante against rival Nvidia.

VB Event

AI Unleashed

Don’t miss out on AI Unleashed on November 15! This virtual event will showcase exclusive insights and best practices from data leaders including Albertsons, Intuit, and more.

Nvidia advances Grace Hopper superchip to build the most powerful AI system in history

Nvidia first announced that the Grace Hopper superchip, which combines CPU and GPU capabilities entered full production in May. Those chips have now found their way into the most powerful supercomputers.

“With the introduction of Grace Hopper, a new wave of AI supercomputers are emerging,” Dion Harris, Director of Accelerated Data Center Product Solutions at Nvidia said in a briefing with press and analysts.

The Grace Hopper GH200 powers the JUPITER supercomputer, which Nvidia sees as a new class of AI supercomputer. The AI power of the JUPITER will be used for weather prediction, drug discovery, and industrial engineering use cases. JUPITER is being built in collaboration with Nvidia, ParTec, Eviden and SiPearl.

JUPITER is using a new configuration for the GH200 that dramatically delivers more performance. The system uses a quad GH200 architecture, which as the name implies, uses four GH200’s in a system node.

“The quad GH200 features an innovative node architecture with 288 Neoverse ARM cores capable of achieving 16 Petaflops of AI performance with 2.5 terabytes a second of high speed memory,” Harris explained. “The four way system is connected with high speed NV link connections to the chip allowing for full coherence across the architecture.”

In total the system comprises 24,000 GH200 chips that are connected via Nvidia’s Quantum-2 InfiniBand networking. The JUPITER isn’t the only system that will use the quad GH200 approach, in fact Nvidia will be using the same approach in other supercomputers as well.

As part of the SC23 news, Nvidia is also announcing the standalone H200 silicon. While the GH200 integrates both CPU and GPU, the H200 is just the discrete GPU. The NVIDIA H200 will be offered on Nvidia HGX H200 server boards.

“The HGX H200 platform with faster and more high speed memory will deliver incredible performance for HPC and AI inference workloads,” Harris said.

Intel GPU efforts continue to advance supercomputer powers

Intel is also making a very strong showing at SC23 with its HPC and AI technologies.

In a briefing with press and analysts, Ogi Brkic, VP and General Manager, Data Center and AI/HPC Solutions Category at Intel, detailed his company’s efforts for AI and HPC acceleration.

Brkic highlighted the Intel Data Center GPU Max series and Intel Habana Gaudi 2 accelerator as helping to lead the way for large supercomputing installations like the Dawn Phase 1 supercomputer at the University of Cambridge in the UK. The Dawn system, which is currently in phase 1, is the fastest AI supercomputer in the UK and includes 512 Intel Xeon CPUs and 1,024 Intel Data Center GPU Max Series GPUs.

Aurora, which is being built in the U.S by Intel, HP Enterprise, and the US Department of Energy will be helping to develop one of the largest large language models (LLMs) in existence. Brkic said that AuroraGPT is a 1 trillion parameter LLM for science research. AuroraGPT is currently being trained across 64 nodes of Aurora, with the target being to eventually scale it to the entire supercomputer which has over 10,000 nodes.

“We’ve worked with Microsoft Deepspeed optimizations to ensure that this 1 trillion parameter LLM is available for everybody to use,” Brkic said. “The potential applications for this type of large language model are incredible, every element of science from biology, chemistry, drug research, cosmology and so on, can be impacted by availability of this generative model.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Sean Michael Kerner

Source link