How to Prompt Reasoning Models: GPT-5.4, R1 & Claude

Reasoning models think before they answer. That changes everything about how you should prompt them. Standard chat models generate tokens left-to-right in a single pass. Reasoning models run an internal chain-of-thought process first, working through the problem before producing a final response. If you have been prompting reasoning models the same way you prompt GPT-5.4 or Claude Sonnet, you are leaving significant performance on the table.

Three major reasoning model families dominate right now: OpenAI's GPT-5.4 Thinking, DeepSeek R1, and Anthropic's Claude with extended thinking. Each one handles reasoning differently, costs differently, and responds to different prompting strategies. I have spent months testing all three across coding, math, logic, and complex analysis tasks. This guide covers exactly how to get the best results from each one, with real prompts you can copy and use immediately.

How Reasoning Models Work

Standard models like GPT-5.4 or Claude Sonnet 4.6 generate responses token by token. They are fast, capable, and perfectly suited for most tasks. Reasoning models add an extra step: before generating the final answer, they allocate tokens to an internal thinking process. This thinking phase lets the model plan, consider alternatives, catch its own mistakes, and work through multi-step logic.

The critical difference is the thinking token budget. Every reasoning model spends tokens on its internal chain-of-thought. More thinking tokens generally mean better answers on hard problems, but also higher costs and longer response times. Understanding this tradeoff is the key to using reasoning models effectively.

Here is where the three families diverge:

OpenAI GPT-5.4 Thinking uses a built-in reasoning phase before generating the final answer. You control depth via a reasoning_effort parameter with five levels: none, low, medium, high, and xhigh. The thinking process is not fully visible in the API, though the ChatGPT interface shows an "Upfront Planning" summary.
DeepSeek R1 exposes its full thinking process. You can read every intermediate step wrapped in tags, making it the most transparent reasoning model available.
Claude extended thinking shows its reasoning with some caveats. Anthropic recently moved from the budget_tokens parameter to an adaptive thinking system with an effort parameter, letting the model decide how much thinking each problem needs.

When do reasoning models actually help? They shine on multi-step math, formal logic, complex code debugging, legal or scientific analysis, and any task where getting the intermediate steps right matters as much as the final answer. For simple Q&A, creative writing, or routine summarization, standard models are faster, cheaper, and often just as accurate.

Prompting Each Reasoning Model

OpenAI GPT-5.4 Thinking

GPT-5.4 launched on March 5, 2026, and it consolidates reasoning directly into the flagship model. Earlier, OpenAI offered separate o-series models (o3, o4-mini) for reasoning tasks, but as of February 2026, those have been retired from ChatGPT in favor of GPT-5.4 Thinking. The o-series models remain available through the API on a Priority tier for existing users, but GPT-5.4 Thinking is now the recommended reasoning path for both consumer and API use.

GPT-5.4 Thinking supports five reasoning effort levels (none, low, medium, high, and xhigh) that control how much compute the model dedicates to its internal thinking phase. At xhigh, it approaches the quality of the former o3 on hard problems. At low, it adds minimal reasoning overhead for simpler tasks.

In ChatGPT, the Thinking variant shows an "Upfront Planning" summary that lets you see how the model approaches the problem before generating the full response. On the API side, reasoning tokens are billed as output tokens but are not visible in the response.

The biggest prompting shift: keep your prompts clean and direct. GPT-5.4 Thinking already reasons deeply, so over-explaining or adding verbose chain-of-thought instructions can actually hurt performance. Unlike standard models where "think step by step" helps, GPT-5.4 Thinking does this internally.

Key patterns for GPT-5.4 Thinking:

State the problem clearly without over-prompting
Use reasoning_effort: "xhigh" for math and logic, "low" for simpler analysis
Specify the output format you want; GPT-5.4 follows format instructions well
Avoid "think step by step" instructions; the model already handles this internally

DeepSeek R1

DeepSeek R1 is the standout open-source reasoning model. What makes it unique is full transparency: every thinking step appears inside tags before the final answer. You can literally watch the model consider approaches, catch errors, and revise its reasoning in real time.

R1 is free to use on deepseek.com with no account required. The API pricing is remarkably low at $0.28 per million input tokens and $0.42 per million output tokens, making it the cheapest reasoning model by a wide margin.

R1 excels at math, logic puzzles, and structured analysis. Its visible thinking process is particularly valuable when you need to audit the reasoning, whether for regulated industries, education, or any task where understanding why matters as much as the answer itself. For a detailed comparison of DeepSeek against other major models, see the DeepSeek vs ChatGPT vs Claude breakdown.

Key patterns for DeepSeek R1:

Explicitly ask it to show its reasoning when you want detailed explanations
For pure answers, you can ask it to be concise; R1 respects that
Leverage the output for learning and verification
Combine with follow-up prompts that reference specific thinking steps

Claude Extended Thinking

Anthropic's approach to reasoning is Claude's extended thinking mode. On Claude Opus 4.6, the system previously used a budget_tokens parameter to cap thinking length. As of early 2026, Anthropic has transitioned to adaptive thinking with a simpler effort parameter, letting Claude dynamically decide how much reasoning each problem needs.

Extended thinking works well for complex code analysis, long-document reasoning, multi-constraint optimization, and tasks requiring careful weighing of tradeoffs. Claude shows its thinking process, giving you visibility into how it approached the problem, though the format is less structured than R1's explicit tags.

Key patterns for Claude extended thinking:

Use it for problems with genuine complexity; simple tasks waste the thinking budget
The adaptive thinking mode handles most situations well without manual tuning
Claude's thinking excels at code review, architectural analysis, and multi-constraint problems
Pair it with specific evaluation criteria so the thinking phase has clear goals

Model Comparison

Model	Access	Input/Output per 1M Tokens	Visible Thinking	Best For
GPT-5.4 Thinking	ChatGPT Plus/Pro, API	$2.50 / $15.00 (+ reasoning tokens)	Partial (Upfront Planning in ChatGPT)	General reasoning, math, science
DeepSeek R1	Free web, API	$0.28 / $0.42	Yes ( tags)	Budget reasoning, education, audit trails
Claude Opus 4.6	Claude Max, API	$5.00 / $25.00	Yes (adaptive)	Code analysis, long-form reasoning

5 Prompting Techniques for Reasoning Models

These techniques work across all reasoning models but are especially effective when the model has a dedicated thinking phase. Each uses a structured approach, similar to how prompt frameworks like ROSES and SCOPE organize complex instructions.

1. Problem Decomposition

Break complex problems into explicit sub-problems. Reasoning models handle decomposed problems more reliably because each sub-problem gets its own thinking allocation.

This approach aligns with prompt chaining principles: even within a single prompt, decomposition improves accuracy.

2. Constraint Specification

Reasoning models respond well to explicit constraints. Instead of hoping the model infers your requirements, spell them out. The TRACE framework is useful for structuring these constraint-heavy prompts.

3. Verification Prompts

Ask the model to verify its own answer before finalizing. This plays to reasoning models' natural strength: they already self-check during thinking, and an explicit verification step adds another layer.

4. Multi-Pass Reasoning

For ambiguous or open-ended problems, ask the model to generate multiple approaches and then evaluate them. This forces the thinking phase to explore alternatives rather than committing to the first viable path.

5. Thinking Budget Management

Different problems need different levels of reasoning. Learning when to use xhigh vs. low effort (for GPT-5.4 Thinking) or when to invoke extended thinking (for Claude) is the highest-leverage skill for managing costs. For more on structuring multi-step prompts effectively, see advanced prompt engineering techniques.

Rule of thumb: If a human expert would need more than 30 seconds of focused thought, use high reasoning effort. For tasks you could answer while multitasking, low effort saves time and money.

When NOT to Use Reasoning Models

Reasoning models are not universally better. They are slower, more expensive, and sometimes overthink straightforward tasks. Skip them for:

Simple Q&A: "What's the capital of France?" does not need a thinking phase
Creative writing: Poetry, stories, and marketing copy benefit more from fluency than formal reasoning. Standard models like GPT-5.4 or Claude Sonnet 4.6 are better here
Batch processing: If you are classifying 10,000 items, the per-token cost of reasoning adds up fast with minimal accuracy gain
Conversational tasks: Chatbots, brainstorming sessions, and casual back-and-forth work better with faster standard models
Summarization: Unless the source material contains contradictions or requires critical evaluation, standard models summarize just as well

The best approach is hybrid: use standard models as your default and switch to reasoning models only when the task has genuine complexity. The framework comparison guide can help you match prompt structures to the right model tier.

Cost Comparison

Cost matters, especially when reasoning models can consume 10-50x more tokens on their thinking process than the final output contains.

Model	Input / 1M Tokens	Output / 1M Tokens	Thinking Cost	Best For
GPT-5.4 Thinking	$2.50	$15.00	Reasoning tokens billed as output	General reasoning, math, science
DeepSeek R1	$0.28	$0.42	Included in output tokens	Budget-friendly reasoning, open-source use
Claude Opus 4.6	$5.00	$25.00	Included in output tokens	Code + document analysis
GPT-5.4 (standard, no reasoning)	$2.50	$15.00	N/A	General tasks, no reasoning needed
Claude Sonnet 4.6 (no reasoning)	$3.00	$15.00	N/A	Fast code + writing, no reasoning needed

DeepSeek R1 stands out on cost: you can run heavy reasoning workloads at a fraction of what GPT-5.4 Thinking charges, since reasoning tokens add significantly to the output bill. GPT-5.4 Thinking at xhigh effort tends to score higher on the hardest benchmarks (math olympiad, competitive programming), while R1 performs comparably on typical engineering and analysis tasks.

For consumer access, ChatGPT Plus ($20/month) includes GPT-5.4 Thinking. ChatGPT Pro ($200/month) offers unlimited GPT-5.4 Pro with the highest rate limits. DeepSeek R1 is free on deepseek.com. Claude extended thinking requires Claude Max ($100-200/month) or API access.

Frequently Asked Questions

Are reasoning models always better than standard models?

No. Reasoning models excel at problems requiring multi-step logic, math, formal analysis, and complex debugging. For creative writing, simple classification, summarization, and conversational tasks, standard models are faster, cheaper, and equally effective. The overhead of a thinking phase adds latency and cost without improving outputs on straightforward work. Use reasoning models selectively. Think of them as a power tool, not your everyday driver.

Can I see the model's thinking process?

It depends on the model. DeepSeek R1 shows its full reasoning inside tags, making every intermediate step visible. Claude extended thinking also exposes the reasoning process, though Anthropic reserves the right to filter certain content from the thinking output. OpenAI's GPT-5.4 Thinking shows an "Upfront Planning" summary in ChatGPT but does not expose the full reasoning chain in the API; reasoning tokens are billed but not visible. If transparency matters for your use case (auditing, education, debugging), R1 or Claude are stronger choices.

Which reasoning model is cheapest?

DeepSeek R1 wins on price by a significant margin at $0.28/$0.42 per million tokens for input/output. It is also free to use on deepseek.com with no account needed. GPT-5.4 Thinking starts at $2.50/$15.00, but reasoning tokens (billed as output) can significantly increase the effective cost on high-effort problems. Claude Opus 4.6 extended thinking runs $5.00/$25.00. If you need reasoning capabilities on a budget, start with R1 and only escalate to GPT-5.4 Thinking or Claude extended thinking for problems that genuinely require their additional capabilities.

How to Prompt Reasoning Models: GPT-5.4, DeepSeek R1 & Claude

How Reasoning Models Work

Prompting Each Reasoning Model

OpenAI GPT-5.4 Thinking

DeepSeek R1

Claude Extended Thinking

Model Comparison

5 Prompting Techniques for Reasoning Models

1. Problem Decomposition

2. Constraint Specification

3. Verification Prompts

4. Multi-Pass Reasoning

5. Thinking Budget Management

When NOT to Use Reasoning Models

Cost Comparison

Frequently Asked Questions

Are reasoning models always better than standard models?

Can I see the model's thinking process?

Which reasoning model is cheapest?

Written by Keyur Patel

Related Articles

DeepSeek vs ChatGPT vs Claude: Which AI Should You Use in 2026?

12 Advanced Prompt Engineering Techniques That Actually Work

15 Best AI Prompt Frameworks in 2026 (With Templates)

Prompt Chaining: How to Connect Multiple AI Prompts for Complex Tasks

Explore Related Frameworks

A.P.E Framework: A Simple Yet Powerful Approach to Effective Prompting

RACE Framework: Role-Aligned Contextual Expertise

R.O.S.E.S Framework: Crafting Prompts for Strategic Decision-Making

Try These Related Prompts

Absolute Mode

Unlock Hidden Prompts

Security Code Auditor