Industry
NVIDIA Blackwell Ultra Delivers Up to 50x Performance for Agentic AI
![]()
NVIDIA Blackwell Ultra: A Game-Changer for Agentic AI
Big news for the AI industry! New data from SemiAnalysis reveals that the NVIDIA Blackwell Ultra platform is set to revolutionize agentic AI, delivering an astonishing up to 50x better performance and 35x lower costs compared to the previous NVIDIA Hopper platform. This leap forward is particularly impactful for demanding applications like agentic coding and sophisticated coding assistants.
What Happened: Unpacking the Performance Boost
The core of this breakthrough lies in the NVIDIA GB300 NVL72 systems. Cloud providers like Microsoft, CoreWeave, and Oracle Cloud Infrastructure are already deploying these systems at scale, focusing on low-latency and long-context agentic AI use cases. The numbers are impressive: GB300 NVL72 systems provide up to 50x higher throughput per megawatt and up to 35x lower cost per million tokens for low-latency agentic applications when compared to the NVIDIA Hopper platform.
This isn't just about raw power. The broader NVIDIA Blackwell platform has already seen wide adoption from leading inference providers such as Baseten, DeepInfra, Fireworks AI, and Together AI, helping them reduce cost per token by up to 10x. Blackwell Ultra further refines this with 1.5x higher NVFP4 compute performance and 2x faster attention processing, crucial for efficiently handling complex AI tasks.
Why It Matters: Powering the Next Generation of AI Agents
Agentic AI and coding assistants are increasingly central to software development, requiring real-time responsiveness and the ability to process vast amounts of information. The significant performance gains and cost reductions offered by Blackwell Ultra are critical here. For instance, Signal65 analysis shows that NVIDIA GB200 NVL72 already delivers over 10x more tokens per watt, translating into one-tenth the cost per token compared to the NVIDIA Hopper platform. Blackwell Ultra takes this even further.
NVIDIA's continuous innovation in software plays a huge role too. Optimizations through NVIDIA TensorRT-LLM, NVIDIA Dynamo, Mooncake, and SGLang are key contributors to these performance boosts. In fact, TensorRT-LLM library improvements alone delivered up to 5x better performance on GB200 for low-latency workloads in just four months.
For long-context workloads – think AI coding assistants reasoning across entire codebases with 128,000-token inputs and 8,000-token outputs – GB300 NVL72 offers up to 1.5x lower cost per token compared to GB200 NVL72. This efficiency means AI agents can now understand more complex contexts without a prohibitive cost, unlocking new possibilities for highly sophisticated applications.
What's Next: Enabling Deeper, Faster AI
The deployment of Blackwell Ultra by major cloud providers signifies a pivotal moment for the AI ecosystem. By dramatically lowering the cost and increasing the efficiency of agentic AI, NVIDIA is enabling a new class of applications that can perform real-time reasoning across massive datasets. This will accelerate innovation across industries, making more powerful, intelligent, and interactive AI experiences accessible to more users than ever before.
Read more: NVIDIA Blog: Blackwell Ultra for Agentic AI