AnthropicModel

Anthropic Upgrades Claude Opus to 4.8, Boosting Benchmarks and Collaboration

Written by

Drafted with AI; edited and reviewed by a human.

3 min read

Anthropic Upgrades Claude Opus to 4.8, Boosting Benchmarks and Collaboration

TL;DR

  • Anthropic has released Claude Opus 4.8, an upgrade to its Claude Opus 4.7 model.
  • Opus 4.8 shows significant benchmark improvements in coding, agentic tasks, reasoning, and practical knowledge work.
  • New features include user control over effort levels on claude.ai and a "dynamic workflows" feature in Claude Code.
  • Fast mode for Opus 4.8 is now three times cheaper and operates at 2.5x the speed.

Anthropic is rolling out a significant upgrade to its flagship AI model with the release of Claude Opus 4.8. Building directly on the capabilities of Claude Opus 4.7, this new version promises enhanced performance across a range of critical tasks, from coding to complex reasoning. The most exciting aspect for users is that this upgrade is available today at the same price point as its predecessor, offering more power without an increased cost.

Opus 4.8 introduces several key advancements aimed at improving user experience and AI utility. On the claude.ai platform, users now have direct control over how much "effort" Claude expends on a given task, allowing for finer-tuned resource management and output tailored to specific needs. Furthermore, Claude Code gains a powerful new "dynamic workflows" feature, designed to tackle exceptionally large-scale problems by breaking them down and managing complex chains of operations.

The improvements in Claude Opus 4.8 are not merely incremental; they are reflected in significant gains across various benchmark tests. This new iteration stands out as the only model to successfully complete every case end-to-end on the Super-Agent benchmark, outperforming both previous Opus models and even GPT-5.5 when cost parity is considered. On the CursorBench benchmark, Opus 4.8 consistently surpasses earlier Opus models at all effort levels, demonstrating more efficient tool calling and a stronger ability to carry tasks through to completion.

In the realm of legal applications, Opus 4.8 has set a new record, achieving the highest score ever recorded on the Legal Agent Benchmark, breaking the 10% overall threshold on the all-pass standard. This level of accuracy is crucial for professional legal work, enabling users to confidently delegate more complex tasks. The model also excels in computer-use and browser-agent tasks, scoring an impressive 84% on Online-Mind2Web, a notable leap over Opus 4.7 and GPT-5.5, solidifying its position as a leading agent model for these applications.

Beyond raw performance metrics, Claude Opus 4.8 has been engineered for greater reliability and a more seamless collaborative experience. Early testers have highlighted its improved judgment, with the model being more adept at asking pertinent questions, identifying its own errors, and pushing back on unsound plans. This makes it a more trustworthy partner for complex development and exploration, especially within Claude Code.

The model also demonstrates a higher signal-to-noise ratio in its analysis, producing richer and more information-dense outputs more quickly. A key differentiator is Opus 4.8's proactive flagging of potential issues within input and output data, a crucial feature that reduces the burden on users to catch such flaws. This enhanced honesty and reduced tendency to make unsupported claims contribute to its overall trustworthiness for high-stakes professional workflows. You can explore these capabilities further in the Claude Opus 4.8 System Card.

Summary

  • Claude Opus 4.8 offers significant performance upgrades over Opus 4.7, excelling in benchmarks for coding, agentic tasks, and reasoning.
  • New features like dynamic workflows and improved user control over effort levels enhance usability for large-scale problems and tailored assistance.
  • The fast mode for Opus 4.8 is now three times cheaper and 2.5x faster, making advanced AI more accessible and cost-effective.
  • Opus 4.8 demonstrates enhanced reliability, higher quality analysis, and a greater tendency to flag potential issues, making it a more trustworthy AI collaborator.

Source: Introducing Claude Opus 4.8

Vividh-ASR Benchmark Fixes Whisper's Studio-Bias for Indic Languages

Vividh-ASR Benchmark Fixes Whisper's Studio-Bias for Indic Languages

A new benchmark and fine-tuning recipe improve Whisper's accuracy on spontaneous speech for Indic languages, outperforming larger models.

Continue reading

Get notified when our newsletter launches

We're testing demand before launching a weekly AI digest. Drop your email and you'll be the first to know when it ships — one launch announcement, no spam.

We only use your email to announce the newsletter launch — never for spam. See our Privacy