Imagine one giant virtual GPU at 2PFLOPS
CEO Jensen Huang is particularly enthusiastic about it all because with general-purpose processors running into physical walls, server and device makers are turning to dedicated hardware accelerators to pick up the slack and speed up specialist operations, such as neural-network number crunching. Step forward ASICs and GPUs, like those made by Nvidia’s and its rivals, to be those accelerators plugged into CPUs to spread the load.
“The world of computing has changed,” mused Huang. “CPU scaling has slowed at a time when computing demand is skyrocketing. NVIDIA’s HGX-2 with Tensor Core GPUs gives the industry a powerful, versatile computing platform that fuses HPC and AI to solve the world’s grand challenges.”
Down to brass tacks: the HGX-2 is an update of the HGX-1 data-center-grade platform. The old version had eight Tesla P100 GPUs in each chassis, but the new one can fit 16 Tesla V100s, the latest GPU architecture, connected using six NVSwitches.
“Every GPU can now talk to every other GPU at 300GB/s bandwidth,” Nvidia’s AI product management and marketing lead Parnesh Kharya explained during a conference call to journalists. The whole thing essentially acts as one giant GPU accelerator with half a TB of memory, and tops out at two peta-FLOPS – that’s two quadrillion Tensor Core floating-point operations a second. It can handle 64, 32, 16, and 8-bit precisions.
Kharya also said the complexity of typical neural networks has increased 300,000 times in the last five years, according to a recent analysis by OpenAI. “The second trend is that these models are being updated at an unprecedented pace,” he added. Thus, in Nvidia’s view, you need some serious oomph to process all that.
The HGX-2 isn’t aimed at your run-of-the-mill startup venturing out into AI. It’s for hyper-scale developers and researchers dealing with huge amounts of data to train and deploy models in production. It’ll be useful for things like large-scale image recognition, language translation, or recommender systems, wherever you can justify an expensive rack of equipment to power your software.
The reemergence of neural networks, particular those used in deep learning, is partly down to leaps and bounds in processing speed and capabilities, labeled training data, and storage and interconnects to hold and shunt around that enormous amount of data. But contests, such as DAWNBench. are reminders that most algorithms running on this shiny new metal are still woefully inefficient.
Before you have a chance to refine and optimize your AI code, if brute force is what you need to solve your problems, then the HGX-2 may be what you’re looking for. Nvidia’s DGX-2 server based on the HGX-2 platform can process 15,500 images per second on the well-known ImageNet database containing 1.28 million images, we’re told.
At the moment, Nvidia is working with server manufacturers, such as Foxconn, Lenovo, and SuperMicro, to ship them to customers hopefully by the end of the year.
Our pals at our high-performance computing sister site The Next Platform have a deeper dive on the technology, here. ®