NVIDIA’s Blackwell GPU Outpaces Hopper H100 by 2.2x in AI Training

NVIDIA’s Blackwell GPU Outpaces Hopper H100 by 2.2x in AI Training


 NVIDIA has unveiled impressive performance metrics for its new Blackwell GPU in AI training, showcasing a 2.2x speed increase over the Hopper H100 on MLPerf v4.1 benchmarks. Demonstrated through demanding tasks like fine-tuning the Llama 2 70B model, this breakthrough positions Blackwell as a powerful contender in AI acceleration.

The recent MLPerf tests for the Blackwell GPU encompassed diverse, computation-heavy AI models, including Llama 2 70B (LLM fine-tuning), Stable Diffusion (text-to-image), DLRMv2 (recommendation systems), BERT (NLP), RetinaNet (object detection), GPT-3 175B (pre-training), and R-GAT (graph neural networks). With this lineup, Blackwell exhibited substantial improvements in training speed, especially critical for AI applications in high-demand fields.

Notably, NVIDIA has been continuously optimizing its Hopper GPUs since their release, achieving up to 1.3x faster LLM pre-training and a 70% performance increase in GPT-3 training with software upgrades. However, the Blackwell GPU leapfrogs these advancements, delivering a new level of efficiency and enabling organizations to run substantial workloads with fewer GPUs.

The Blackwell’s architecture is optimized for higher compute throughput per GPU, including enhanced high-bandwidth memory, which allows it to meet extensive data needs without compromising performance. According to benchmark tests, just 64 Blackwell GPUs achieve the training throughput of 256 Hopper GPUs, significantly raising efficiency in data centers.

NVIDIA’s recent achievements underscore its commitment to advancing AI hardware, as highlighted by the Blackwell’s speed and capacity for large-scale deployments. Expected to become a standard in data centers, Blackwell promises to drive accelerated progress in AI training and inference tasks across various industries.

Comments