NVIDIA reveals its A100 Ampere-based GPU is now in full production. This new server chip promises up to 20x higher AI training and inference performance. The chip is made on a 7nm process and uses a massive 54 billion transistors. It features third-gen Tensor Cores with TF32 support, structural sparsity acceleration, multi-instance GPU support to be partitioned into up to seven independent GPUs, and third-genertion NVLink support.
NVIDIA CEO Jen-Hsun Huang said a consumer version will follow at a later point. The consumer variants of Ampere will use "different configurations."
The NVIDIA A100 GPU is a technical design breakthrough fueled by five key innovations:
NVIDIA Ampere architecture — At the heart of A100 is the NVIDIA Ampere GPU architecture, which contains more than 54 billion transistors, making it the world’s largest 7-nanometer processor.
Third-generation Tensor Cores with TF32 — NVIDIA’s widely adopted Tensor Cores are now more flexible, faster and easier to use. Their expanded capabilities include new TF32 for AI, which allows for up to 20x the AI performance of FP32 precision, without any code changes. In addition, Tensor Cores now support FP64, delivering up to 2.5x more compute than the previous generation for HPC applications.
Multi-instance GPU — MIG, a new technical feature, enables a single A100 GPU to be partitioned into as many as seven separate GPUs so it can deliver varying degrees of compute for jobs of different sizes, providing optimal utilization and maximizing return on investment.
Third-generation NVIDIA NVLink — Doubles the high-speed connectivity between GPUs to provide efficient performance scaling in a server.
Structural sparsity — This new efficiency technique harnesses the inherently sparse nature of AI math to double performance.
Together, these new features make the NVIDIA A100 ideal for diverse, demanding workloads, including AI training and inference as well as scientific simulation, conversational AI, recommender systems, genomics, high-performance data analytics, seismic modeling and financial forecasting.
NVIDIA also revealed the DGX A100, a new $199,000 server that offers up to 5 petaflops of computing power:
Eight NVIDIA A100 Tensor Core GPUs, delivering 5 petaflops of AI power, with 320GB in total GPU memory with 12.4TB per second in bandwidth.
Six NVIDIA NVSwitch™ interconnect fabrics with third-generation NVIDIA NVLink® technology for 4.8TB per second of bi-directional bandwidth.
Nine Mellanox ConnectX-6 HDR 200Gb per second network interfaces, offering a total of 3.6Tb per second of bi-directional bandwidth.
Mellanox In-Network Computing and network acceleration engines such as RDMA, GPUDirect® and Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ to enable the highest performance and scalability.
15TB Gen4 NVMe internal storage, which is 2x faster than Gen3 NVMe SSDs.
NVIDIA DGX software stack, which includes optimized software for AI and data science workloads, delivering maximized performance and enabling enterprises to achieve a faster return on their investment in AI infrastructure.
Also interesting to note is that each DGX A100 features two AMD 64-core EPYC "Rome" processors.