Claude Opus 4.7 Review: Features, Benchmarks, and How to Use It
Claude Opus 4.7 review with verified specs, pricing, xhigh effort level, task budgets, and /ultrareview. Hands-on guide for developers and prompt engineers.

On April 16, 2026, Anthropic shipped Claude Opus 4.7, the latest iteration of its flagship model. Two days later, they launched Claude Design on top of it and publicly conceded that a stronger internal model, Claude Mythos Preview, exists but is being held back for safety reasons. That week of announcements is the most interesting thing Anthropic has done in a quarter, and Opus 4.7 sits at the center of it.
I've been running Opus 4.7 through my usual set of coding, research, and writing workloads for 48 hours. This review sticks to verified facts from Anthropic's announcement, first-party integration notes from GitHub and AWS, and my own hands-on testing. Nothing speculative. If I don't know something, I'll say so.
What's actually new in Claude Opus 4.7
The one-liner version: Opus 4.7 is an incremental but meaningful upgrade over Opus 4.6, with the biggest wins in agentic coding, vision, and long-running tool use. Anthropic has not positioned this as a generational leap. It is positioned as the best commercially available Claude, with the caveat that Mythos Preview is stronger and staying locked up.
Here are the concrete changes, all verified against Anthropic's official announcement.
Pricing stays flat. Input tokens are $5 per million, output tokens are $25 per million, identical to Opus 4.6. That matters because Anthropic clearly had room to raise prices given the capability jump on coding benchmarks, and they chose not to. For teams already budgeted on Opus 4.6 pricing, the upgrade is effectively free.
Model identifier. The string to use in API calls is claude-opus-4-7. If you have been using claude-opus-4-6 in your codebase, that is literally the only line you need to change to start testing.
Availability. Opus 4.7 is available on the Claude platform and Claude apps, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. All four enterprise channels got it on day one.
Vision upgrade. This is the spec I find most underrated. Opus 4.7 now accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels, which is more than 3× the previous Claude image ceiling. If you work with technical diagrams, screenshots of full dashboards, or chemical structures, this is the difference between usable and unusable.
The three feature launches that matter most
Anthropic rolled out three developer-facing features alongside the model. These are the ones I would actually integrate this week.
1. The xhigh effort level
Claude's effort levels control how much reasoning the model does before answering. Before 4.7, you had low, medium, high, and max. The jump between high and max was large, both in quality and in cost and latency. Teams routinely wanted "a bit more than high, a lot less than max" and had to hack it with prompt framing.
Opus 4.7 introduces xhigh, slotted between high and max. In my testing on an agentic refactor task, xhigh gave me roughly 80% of the quality improvement of max at about 40% of the latency cost. That is not a universal number, it depends heavily on workload, but the direction is clear. If you have been using max as your default, switch to xhigh for most things and reserve max for genuinely hard single-shot reasoning.
2. Task budgets (public beta)
Task budgets let you cap token spend on autonomous agent runs. You set a budget at the start of a session and the agent surfaces a warning or halts when it approaches the cap. This matters because long-running agents can and do rack up real bills. I have personally seen a coding agent chew through $80 of credits on an unattended overnight run because it got stuck in a recursive file-reading loop.
Task budgets do not replace good agent design. They are the seatbelt, not the driver. But the existence of a first-class budget primitive on the platform changes how you architect agents. You can now design for "do as much useful work as you can within this envelope" instead of guessing and praying.
For structured agent orchestration, pair task budgets with the COAST framework (Context, Objective, Actions, Steps, Task). COAST gives you the structure to hand the agent a clearly scoped job; task budgets enforce the ceiling. That combination is the closest I have gotten to "autonomous but financially sane."
3. The /ultrareview slash command in Claude Code
Claude Code got a new dedicated command: /ultrareview. It is not a general review, it is a bug-hunting review. It is specifically tuned to find latent defects, race conditions, edge cases, and silent failures that a normal /code-review pass will miss.
Pro and Max subscribers get three free /ultrareview runs per cycle, then it counts against usage. I have used it on two PRs so far. On the first one it found an off-by-one in a cursor pagination function I had missed. On the second it flagged a Promise.all in an error recovery path that would swallow the original error. Both were real bugs. Neither would have been caught by the regular review pass.
One sharp tip: /ultrareview is not a substitute for the APE framework (Action, Purpose, Expectation) when you want a custom focused review. Use /ultrareview for "find the bugs," use APE-scaffolded prompts for "review against these specific constraints."
Performance and benchmarks
GitHub reported a 13% lift in resolution on their internal coding benchmark versus Opus 4.6. Anthropic's own release notes report improvements across agentic coding, multidisciplinary reasoning, scaled tool use, and agentic computer use. Anthropic did not release a headline benchmark number like MMLU or HumanEval the way they sometimes do.
A few observations from my own testing, which you should treat as anecdotal:
Instruction following is tighter. Opus 4.7 is more literal. Tell it "do not include bullet points" and it actually does not include bullet points. Tell it "answer in exactly three sentences" and you get three sentences. This has a flip side: if your prompts are sloppy, 4.7 will execute them more literally than 4.6 did. Tighten your prompts.
Long-context reasoning is better. Anthropic mentions improvements to long-context reasoning and file-system-based memory retention across sessions. In practice I noticed fewer moments of "wait, earlier in this conversation you said X" confusion during extended sessions.
Honesty and prompt injection resistance are improved. This is a reported safety improvement from Anthropic. In my testing I pushed a few adversarial prompts at it and it held up better than 4.6.
Cyber capabilities are intentionally reduced. This is explicit in Anthropic's release notes. Opus 4.7 has been deliberately constrained on cybersecurity tasks because Mythos Preview exists and they do not want a weaker version of those capabilities leaking into a generally available model. For legitimate security research, Anthropic offers a Cyber Verification Program.
Pricing math: is Opus 4.7 worth it?
At $5 per million input tokens and $25 per million output tokens, Opus 4.7 is not cheap. Let me lay out the math plainly so you can decide.
A typical agentic coding session with tool use might consume 200,000 input tokens and 40,000 output tokens across a two-hour window. That is roughly $2 in total. If your use case is "engineer-assisted code change that takes me two hours," $2 of model cost is cheaper than 60 seconds of your salaried time. Opus 4.7 is obviously worth it.
A typical marketing copy run (CARE-style brief in, 1,500-word draft out) is maybe 3,000 input tokens and 3,000 output tokens. That is $0.09 per run. A single human copywriter editing pass can go through 30 of those for under $3 of model cost. Obviously worth it.
Where it gets fuzzy: very high-volume applications (bulk classification, massive RAG retrieval, chatbots at scale) where the economics push you toward Sonnet or a cheaper model. For those, use Opus 4.7 only for the hard reasoning step and route bulk work to Sonnet 4.6 or Haiku 4.5.
How I'd restructure my prompts for 4.7
Because instruction following is tighter, a few things that worked loosely on 4.6 need to be cleaner on 4.7.
Be explicit about format. If you want markdown, say markdown. If you want no markdown, say no markdown. 4.7 obeys more literally.
Use fewer filler qualifiers. "Try to..." or "if possible, consider..." reads as "this is optional" to 4.7. If you mean it, state it as a rule. If you do not mean it, leave it out.
Lean on the RACE framework. Role, Action, Context, Expectation. The tight structure plays well with 4.7's stricter literal interpretation. I ported a few prompts from looser "you are a helpful assistant" framing into RACE and saw noticeably better outputs on the first shot.
Give examples when constraints are unusual. 4.7 is an excellent one-shot learner. For any prompt that has formatting quirks, a single concrete example beats three paragraphs of explanation.
Here's a template I'm now using as my Opus 4.7 default for coding tasks:
That prompt, run at xhigh, is my baseline for PR review. It costs roughly $0.05 per medium PR.
Opus 4.7 vs the competition in April 2026
VentureBeat's coverage described Opus 4.7 as "narrowly retaking lead for most powerful generally available LLM." That is a fair characterization based on the evidence I have seen. It edges past the leading OpenAI and Google models on most agentic coding and tool-use benchmarks, while remaining competitive on general reasoning. The gap is small and will close the next time a competitor ships. That is the nature of this race.
Where Opus 4.7 clearly leads:
- Agentic coding with long tool-use chains
- Vision on technical content at higher resolution
- Instruction following and honesty
- File-system-based memory across sessions
- Pure reasoning on math-heavy benchmarks
- Speed and cost for high-volume chat
- Native multimodal generation (image out, audio out)
Getting started with Opus 4.7 today
The fastest path to productive use:
- On the Claude apps or claude.ai. Pick Opus 4.7 from the model picker. Done. Start with one of your existing workflows to feel the difference.
- On the API. Change your model string to
claude-opus-4-7. Test withxhigheffort on a task wherehighfelt slightly underpowered. - In Claude Code. Update your CLI if you haven't already. Run
/ultrareviewon your next PR and see what it surfaces. - On Bedrock, Vertex AI, or Foundry. The model IDs vary by provider; check your console. For most teams with enterprise procurement already in place, no new contract is needed. Opus 4.7 just shows up.
Week 1. Swap one non-critical internal workflow to 4.7. Measure quality and cost deltas against 4.6 baseline.
Week 2. Enable /ultrareview on all PRs in one service. Track bugs caught.
Week 3. Pilot task budgets on your most expensive agent. Set a conservative cap, observe, loosen if warranted.
Week 4. Write up the deltas. Decide whether to promote 4.7 to default.
This cadence gives you real data, not vibes, and keeps rollback painless if something surprises you.
What I'd keep an eye on
A few open questions and things to watch:
Mythos Preview leak into 4.8. Anthropic will eventually release a model with some of Mythos's capabilities once safeguards mature. The Opus 4.8 or Opus 5 announcement will be worth reading carefully.
Task budgets graduating from beta. The beta-to-GA transition often brings tighter enforcement and sometimes API changes. If you deploy task budgets in production, keep an eye on the release notes.
xhigh becoming the new default. I would not be surprised to see xhigh quietly become the Claude apps default within a release or two.
What to do next
If you want to go deeper on adjacent Anthropic news, read my Claude Design review, the companion product that ships on Opus 4.7's vision capabilities, and my Claude Mythos explainer for the safety story behind why Opus 4.7 is the most capable model you can actually use.
For a head-to-head with other writing-first LLMs, my ChatGPT vs Jasper vs Claude comparison walks through where each one wins.
And for the primary source, Anthropic's official Opus 4.7 announcement lays out the release notes, pricing, and developer features in their own words.
Bottom line: Opus 4.7 is a solid upgrade at a flat price, with three developer features (xhigh, task budgets, /ultrareview) that are individually useful and collectively meaningful. If you already pay for Claude, swap the model ID this week and spend an afternoon re-tuning your prompts. You will get your investment back in the first agentic run.
Sources:

Keyur Patel is the founder of AiPromptsX and an AI engineer with extensive experience in prompt engineering, large language models, and AI application development. After years of working with AI systems like ChatGPT, Claude, and Gemini, he created AiPromptsX to share effective prompt patterns and frameworks with the broader community. His mission is to democratize AI prompt engineering and help developers, content creators, and business professionals harness the full potential of AI tools.
Related Articles
Explore Related Frameworks
A.P.E Framework: A Simple Yet Powerful Approach to Effective Prompting
Action, Purpose, Expectation - A powerful methodology for designing effective prompts that maximize AI responses
RACE Framework: Role-Aligned Contextual Expertise
A structured approach to AI prompting that leverages specific roles, actions, context, and expectations to produce highly targeted outputs
R.O.S.E.S Framework: Crafting Prompts for Strategic Decision-Making
Use the R.O.S.E.S framework (Role, Objective, Style, Example, Scenario) to develop prompts that generate comprehensive strategic analysis and decision support.
Try These Related Prompts
Absolute Mode
A system instruction that enforces direct, unembellished communication focused on cognitive rebuilding and independent thinking, eliminating filler behaviors.
Unlock Hidden Prompts
Discover advanced prompt engineering techniques and generate 15 powerful prompt templates that most people overlook when using ChatGPT for maximum results.
Brutal Honest Advisor
Get unfiltered, direct feedback from an AI advisor who cuts through self-deception and provides harsh truths needed for breakthrough growth and clarity.
