Claude Opus 4.8 Release: Benchmarks, Pricing, Features

Anthropic just shipped Claude Opus 4.8, and the headline number is not a benchmark score. It is the price. Fast mode on the new flagship is three times cheaper than fast mode on previous Opus models, while running at 2.5x the standard speed. The Claude Opus 4.8 release went live on May 28, 2026 at the same per-token price as Opus 4.7. Same day, it landed as generally available on GitHub Copilot. So if you live in your editor, you already have it. Why this matters for anyone building with AI right now. Opus has always been the expensive tier you reach for when you actually need the reasoning. Anthropic just made that math a lot easier without lowering the sticker price on the base model, which is a different kind of move than the usual price cut.

The benchmark story

Opus 4.8 wins almost everywhere. On SWE-Bench Pro for agentic coding it scored 69.2%, up from 64.3% for Opus 4.7. GPT-5.5 sits at 58.6% on the same test. Gemini 3.1 Pro pulls 54.2%. OSWorld-Verified for agentic computer use went 83.4% for Opus 4.8, edging Opus 4.7 at 82.8%, with GPT-5.5 at 78.7% and Gemini 3.1 Pro at 76.2%. Tight gap at the top. Bigger gap below. On knowledge work the spread is wider. GDPval-AA put Opus 4.8 at 1890 points. GPT-5.5 hit 1769. Opus 4.7 scored 1753. Gemini 3.1 Pro trailed at 1314, which is not close. Finance Agent v2 told the same story. Opus 4.8 at 53.9%, GPT-5.5 at 51.8%, Opus 4.7 at 51.5%, Gemini 3.1 Pro at 43.0%. And on Humanity's Last Exam with tools, the multidisciplinary reasoning test, Opus 4.8 scored 57.9% against 54.7% for Opus 4.7, 52.2% for GPT-5.5, and 51.4% for Gemini 3.1 Pro. But. One category went to OpenAI. Terminal-Bench 2.1 for agentic terminal coding still belongs to GPT-5.5 at 78.2%. Opus 4.8 came in at 74.6%, which is strong but not first. That is the honest catch. If your workflow lives in a shell and you measure your model by terminal autonomy, GPT-5.5 is still the one to beat on that single benchmark.

Dynamic workflows in Claude Code

The biggest product change ships inside Claude Code. A new feature called dynamic workflows lets the model take on much larger engineering problems on its own, scoping work as it goes instead of running a fixed plan from the start. Think of it as the model deciding when to fan out, when to backtrack, and when to call it done. Useful for the kind of multi-hour refactor that used to need a human babysitter. Anthropic also opened up effort level controls on claude.ai. You can now pick “extra” (called xhigh in Claude Code) and “max” settings depending on how much thinking you want the model to do per task. Rate limits went up at the same time, which is the only way max actually works in practice. [IMAGE: Split-screen visualization of a developer terminal running Claude Code with dynamic workflows on the left and a benchmark leaderboard chart on the right showing Opus 4.8 leading SWE-Bench Pro]

The 61% cheaper claim

Databricks ran Opus 4.8 inside their Genie AI agent and reported a 61% cheaper token cost versus Opus 4.7 for the same workload. That number is not from Anthropic. It is from a customer running the model in production, which is a different and more useful signal. For teams shipping agent products, the real cost is not the headline per-token rate. It is how many tokens the model burns to get to the answer. Opus 4.8 apparently gets there with fewer.

What this changes for creators

If you build on top of foundation models, the practical wins are reasoning, document handling, and cost efficiency. All three are now better. AI platforms that route to multiple models, the kind we run at MagicShot.ai across image generation and video generation, get to pass those savings through. The competitive read here. OpenAI still holds one benchmark. Google has a pricing and distribution play with Gemini but is behind on the reasoning tests that actually matter for agent work. Anthropic is leaning hard into developer surface area, and Claude Code is the wedge.

Project Glasswing and Claude Mythos

Anthropic teased what comes next. A research program called Project Glasswing is producing a model called Claude Mythos. Anthropic described it as more intelligent than Opus, which is the first time the company has officially put a tier above its flagship. No release date. No benchmarks. Just the name and the positioning, which is enough to tell you Opus is no longer the ceiling for Anthropic's lineup.

Bottom line

Claude Opus 4.8 is the cleanest Opus release Anthropic has put out in a while. Better numbers on five of six headline benchmarks. A real product feature in dynamic workflows. Fast mode pricing that finally makes Opus reachable for high-volume agent loops. And a tease of what is coming after it. The interesting fight now is what Anthropic ships under Glasswing while OpenAI defends that one terminal benchmark.

Harish Prajapat

AI Imaging Specialist & Lead Content Strategist

Ahmedabad, India Writing since 2021

Harish has spent the last six years testing AI image and video tools the hard way, shipping thousands of real generations for brands, marketplaces and his own side projects. At MagicShot he turns dense model releases into step-by-step workflows anyone can follow, and personally re-tests every prompt and setting before it lands in a guide. When he is not benchmarking the latest diffusion model, he is answering "which tool should I actually use?" for creators in the community.

AI Image Generation Prompt Engineering Photo Restoration Generative Workflows

View all articles by Harish

Claude Opus 4.8 Release Beats GPT-5.5 on Most Benchmarks

The benchmark story

Dynamic workflows in Claude Code

The 61% cheaper claim

What this changes for creators

Project Glasswing and Claude Mythos

Bottom line

Harish Prajapat

More news

Happy Horse 1.1 Lands on MagicShot: Alibaba's Top-Ranked Text-to-Video Model

Seedream 5.0 Pro Lands: ByteDance's New Flagship Image Model Explained

Nano Banana 2 Lite Lands on MagicShot: Google's Fastest Image Model

Ready to create something magical?

Seedream 5.0 Pro