Claude Opus 4.8 Release Beats GPT-5.5 on Most Benchmarks
- AI News
- 4 min read
- Published: May 28, 2026
- Harish Prajapat
Anthropic just shipped Claude Opus 4.8, and the headline number is not a benchmark score. It is the price. Fast mode on the new flagship is three times cheaper than fast mode on previous Opus models, while running at 2.5x the standard speed.
The Claude Opus 4.8 release went live on May 28, 2026 at the same per-token price as Opus 4.7. Same day, it landed as generally available on GitHub Copilot. So if you live in your editor, you already have it.
Why this matters for anyone building with AI right now. Opus has always been the expensive tier you reach for when you actually need the reasoning. Anthropic just made that math a lot easier without lowering the sticker price on the base model, which is a different kind of move than the usual price cut.
The benchmark story
Opus 4.8 wins almost everywhere. On SWE-Bench Pro for agentic coding it scored 69.2%, up from 64.3% for Opus 4.7. GPT-5.5 sits at 58.6% on the same test. Gemini 3.1 Pro pulls 54.2%.
OSWorld-Verified for agentic computer use went 83.4% for Opus 4.8, edging Opus 4.7 at 82.8%, with GPT-5.5 at 78.7% and Gemini 3.1 Pro at 76.2%. Tight gap at the top. Bigger gap below.
On knowledge work the spread is wider. GDPval-AA put Opus 4.8 at 1890 points. GPT-5.5 hit 1769. Opus 4.7 scored 1753. Gemini 3.1 Pro trailed at 1314, which is not close.
Finance Agent v2 told the same story. Opus 4.8 at 53.9%, GPT-5.5 at 51.8%, Opus 4.7 at 51.5%, Gemini 3.1 Pro at 43.0%. And on Humanity’s Last Exam with tools, the multidisciplinary reasoning test, Opus 4.8 scored 57.9% against 54.7% for Opus 4.7, 52.2% for GPT-5.5, and 51.4% for Gemini 3.1 Pro.
But. One category went to OpenAI. Terminal-Bench 2.1 for agentic terminal coding still belongs to GPT-5.5 at 78.2%. Opus 4.8 came in at 74.6%, which is strong but not first.
That is the honest catch. If your workflow lives in a shell and you measure your model by terminal autonomy, GPT-5.5 is still the one to beat on that single benchmark.
Dynamic workflows in Claude Code
The biggest product change ships inside Claude Code. A new feature called dynamic workflows lets the model take on much larger engineering problems on its own, scoping work as it goes instead of running a fixed plan from the start.
Think of it as the model deciding when to fan out, when to backtrack, and when to call it done. Useful for the kind of multi-hour refactor that used to need a human babysitter.
Anthropic also opened up effort level controls on claude.ai. You can now pick “extra” (called xhigh in Claude Code) and “max” settings depending on how much thinking you want the model to do per task. Rate limits went up at the same time, which is the only way max actually works in practice.
[IMAGE: Split-screen visualization of a developer terminal running Claude Code with dynamic workflows on the left and a benchmark leaderboard chart on the right showing Opus 4.8 leading SWE-Bench Pro]
The 61% cheaper claim
Databricks ran Opus 4.8 inside their Genie AI agent and reported a 61% cheaper token cost versus Opus 4.7 for the same workload. That number is not from Anthropic. It is from a customer running the model in production, which is a different and more useful signal.
For teams shipping agent products, the real cost is not the headline per-token rate. It is how many tokens the model burns to get to the answer. Opus 4.8 apparently gets there with fewer.
What this changes for creators
If you build on top of foundation models, the practical wins are reasoning, document handling, and cost efficiency. All three are now better. AI platforms that route to multiple models, the kind we run at MagicShot.ai across image generation and video generation, get to pass those savings through.
The competitive read here. OpenAI still holds one benchmark. Google has a pricing and distribution play with Gemini but is behind on the reasoning tests that actually matter for agent work. Anthropic is leaning hard into developer surface area, and Claude Code is the wedge.
Project Glasswing and Claude Mythos
Anthropic teased what comes next. A research program called Project Glasswing is producing a model called Claude Mythos. Anthropic described it as more intelligent than Opus, which is the first time the company has officially put a tier above its flagship.
No release date. No benchmarks. Just the name and the positioning, which is enough to tell you Opus is no longer the ceiling for Anthropic’s lineup.
Bottom line
Claude Opus 4.8 is the cleanest Opus release Anthropic has put out in a while. Better numbers on five of six headline benchmarks. A real product feature in dynamic workflows. Fast mode pricing that finally makes Opus reachable for high-volume agent loops. And a tease of what is coming after it.
The interesting fight now is what Anthropic ships under Glasswing while OpenAI defends that one terminal benchmark.
Frequently Asked Questions
Anthropic released Claude Opus 4.8 on May 28, 2026, with same-day availability on GitHub Copilot.
Fast mode on Claude Opus 4.8 runs at three times cheaper than previous Opus models while moving at 2.5 times the standard speed. Databricks also reported a 61 percent cheaper token cost versus Opus 4.7 in their Genie AI agent.
Claude Opus 4.8 leads on SWE-Bench Pro with 69.2 percent, OSWorld-Verified with 83.4 percent, GDPval-AA with 1890 points, Finance Agent v2 with 53.9 percent, and Humanity’s Last Exam with 57.9 percent. GPT-5.5 still leads on Terminal-Bench 2.1 at 78.2 percent versus 74.6 percent for Opus 4.8.
