How to Prompt Reasoning Models: GPT-5.4, DeepSeek R1 & Claude
Learn to prompt reasoning models like GPT-5.4 Thinking, DeepSeek R1, and Claude extended thinking. Techniques, real examples, and when to use each model.

Reasoning models think before they answer. That changes everything about how you should prompt them. Standard chat models generate tokens left-to-right in a single pass. Reasoning models run an internal chain-of-thought process first, working through the problem before producing a final response. If you have been prompting reasoning models the same way you prompt GPT-5.4 or Claude Sonnet, you are leaving significant performance on the table.
Three major reasoning model families dominate right now: OpenAI's GPT-5.4 Thinking, DeepSeek R1, and Anthropic's Claude with extended thinking. Each one handles reasoning differently, costs differently, and responds to different prompting strategies. I have spent months testing all three across coding, math, logic, and complex analysis tasks. This guide covers exactly how to get the best results from each one, with real prompts you can copy and use immediately.
How Reasoning Models Work
Standard models like GPT-5.4 or Claude Sonnet 4.6 generate responses token by token. They are fast, capable, and perfectly suited for most tasks. Reasoning models add an extra step: before generating the final answer, they allocate tokens to an internal thinking process. This thinking phase lets the model plan, consider alternatives, catch its own mistakes, and work through multi-step logic.
The critical difference is the thinking token budget. Every reasoning model spends tokens on its internal chain-of-thought. More thinking tokens generally mean better answers on hard problems, but also higher costs and longer response times. Understanding this tradeoff is the key to using reasoning models effectively.
Here is where the three families diverge:
- OpenAI GPT-5.4 Thinking uses a built-in reasoning phase before generating the final answer. You control depth via a
reasoning_effortparameter with five levels: none, low, medium, high, and xhigh. The thinking process is not fully visible in the API, though the ChatGPT interface shows an "Upfront Planning" summary. - DeepSeek R1 exposes its full thinking process. You can read every intermediate step wrapped in
tags, making it the most transparent reasoning model available. - Claude extended thinking shows its reasoning with some caveats. Anthropic recently moved from the
budget_tokensparameter to an adaptive thinking system with aneffortparameter, letting the model decide how much thinking each problem needs.
Prompting Each Reasoning Model
OpenAI GPT-5.4 Thinking
GPT-5.4 launched on March 5, 2026, and it consolidates reasoning directly into the flagship model. Earlier, OpenAI offered separate o-series models (o3, o4-mini) for reasoning tasks, but as of February 2026, those have been retired from ChatGPT in favor of GPT-5.4 Thinking. The o-series models remain available through the API on a Priority tier for existing users, but GPT-5.4 Thinking is now the recommended reasoning path for both consumer and API use.
GPT-5.4 Thinking supports five reasoning effort levels (none, low, medium, high, and xhigh) that control how much compute the model dedicates to its internal thinking phase. At xhigh, it approaches the quality of the former o3 on hard problems. At low, it adds minimal reasoning overhead for simpler tasks.
In ChatGPT, the Thinking variant shows an "Upfront Planning" summary that lets you see how the model approaches the problem before generating the full response. On the API side, reasoning tokens are billed as output tokens but are not visible in the response.
The biggest prompting shift: keep your prompts clean and direct. GPT-5.4 Thinking already reasons deeply, so over-explaining or adding verbose chain-of-thought instructions can actually hurt performance. Unlike standard models where "think step by step" helps, GPT-5.4 Thinking does this internally.
Key patterns for GPT-5.4 Thinking:- State the problem clearly without over-prompting
- Use
reasoning_effort: "xhigh"for math and logic,"low"for simpler analysis - Specify the output format you want; GPT-5.4 follows format instructions well
- Avoid "think step by step" instructions; the model already handles this internally
DeepSeek R1
DeepSeek R1 is the standout open-source reasoning model. What makes it unique is full transparency: every thinking step appears inside tags before the final answer. You can literally watch the model consider approaches, catch errors, and revise its reasoning in real time.
R1 is free to use on deepseek.com with no account required. The API pricing is remarkably low at $0.28 per million input tokens and $0.42 per million output tokens, making it the cheapest reasoning model by a wide margin.
R1 excels at math, logic puzzles, and structured analysis. Its visible thinking process is particularly valuable when you need to audit the reasoning, whether for regulated industries, education, or any task where understanding why matters as much as the answer itself. For a detailed comparison of DeepSeek against other major models, see the DeepSeek vs ChatGPT vs Claude breakdown.
Key patterns for DeepSeek R1:- Explicitly ask it to show its reasoning when you want detailed explanations
- For pure answers, you can ask it to be concise; R1 respects that
- Leverage the
output for learning and verification - Combine with follow-up prompts that reference specific thinking steps
Claude Extended Thinking
Anthropic's approach to reasoning is Claude's extended thinking mode. On Claude Opus 4.6, the system previously used a budget_tokens parameter to cap thinking length. As of early 2026, Anthropic has transitioned to adaptive thinking with a simpler effort parameter, letting Claude dynamically decide how much reasoning each problem needs.
Extended thinking works well for complex code analysis, long-document reasoning, multi-constraint optimization, and tasks requiring careful weighing of tradeoffs. Claude shows its thinking process, giving you visibility into how it approached the problem, though the format is less structured than R1's explicit tags.
- Use it for problems with genuine complexity; simple tasks waste the thinking budget
- The adaptive thinking mode handles most situations well without manual tuning
- Claude's thinking excels at code review, architectural analysis, and multi-constraint problems
- Pair it with specific evaluation criteria so the thinking phase has clear goals
Model Comparison
| Model | Access | Input/Output per 1M Tokens | Visible Thinking | Best For |
|---|---|---|---|---|
| GPT-5.4 Thinking | ChatGPT Plus/Pro, API | $2.50 / $15.00 (+ reasoning tokens) | Partial (Upfront Planning in ChatGPT) | General reasoning, math, science |
| DeepSeek R1 | Free web, API | $0.28 / $0.42 | Yes ( tags) | Budget reasoning, education, audit trails |
| Claude Opus 4.6 | Claude Max, API | $5.00 / $25.00 | Yes (adaptive) | Code analysis, long-form reasoning |
5 Prompting Techniques for Reasoning Models
These techniques work across all reasoning models but are especially effective when the model has a dedicated thinking phase. Each uses a structured approach, similar to how prompt frameworks like ROSES and SCOPE organize complex instructions.
1. Problem Decomposition
Break complex problems into explicit sub-problems. Reasoning models handle decomposed problems more reliably because each sub-problem gets its own thinking allocation.
This approach aligns with prompt chaining principles: even within a single prompt, decomposition improves accuracy.
2. Constraint Specification
Reasoning models respond well to explicit constraints. Instead of hoping the model infers your requirements, spell them out. The TRACE framework is useful for structuring these constraint-heavy prompts.
3. Verification Prompts
Ask the model to verify its own answer before finalizing. This plays to reasoning models' natural strength: they already self-check during thinking, and an explicit verification step adds another layer.
4. Multi-Pass Reasoning
For ambiguous or open-ended problems, ask the model to generate multiple approaches and then evaluate them. This forces the thinking phase to explore alternatives rather than committing to the first viable path.
5. Thinking Budget Management
Different problems need different levels of reasoning. Learning when to use xhigh vs. low effort (for GPT-5.4 Thinking) or when to invoke extended thinking (for Claude) is the highest-leverage skill for managing costs. For more on structuring multi-step prompts effectively, see advanced prompt engineering techniques.
Rule of thumb: If a human expert would need more than 30 seconds of focused thought, use high reasoning effort. For tasks you could answer while multitasking, low effort saves time and money.
When NOT to Use Reasoning Models
Reasoning models are not universally better. They are slower, more expensive, and sometimes overthink straightforward tasks. Skip them for:
- Simple Q&A: "What's the capital of France?" does not need a thinking phase
- Creative writing: Poetry, stories, and marketing copy benefit more from fluency than formal reasoning. Standard models like GPT-5.4 or Claude Sonnet 4.6 are better here
- Batch processing: If you are classifying 10,000 items, the per-token cost of reasoning adds up fast with minimal accuracy gain
- Conversational tasks: Chatbots, brainstorming sessions, and casual back-and-forth work better with faster standard models
- Summarization: Unless the source material contains contradictions or requires critical evaluation, standard models summarize just as well
Cost Comparison
Cost matters, especially when reasoning models can consume 10-50x more tokens on their thinking process than the final output contains.
| Model | Input / 1M Tokens | Output / 1M Tokens | Thinking Cost | Best For |
|---|---|---|---|---|
| GPT-5.4 Thinking | $2.50 | $15.00 | Reasoning tokens billed as output | General reasoning, math, science |
| DeepSeek R1 | $0.28 | $0.42 | Included in output tokens | Budget-friendly reasoning, open-source use |
| Claude Opus 4.6 | $5.00 | $25.00 | Included in output tokens | Code + document analysis |
| GPT-5.4 (standard, no reasoning) | $2.50 | $15.00 | N/A | General tasks, no reasoning needed |
| Claude Sonnet 4.6 (no reasoning) | $3.00 | $15.00 | N/A | Fast code + writing, no reasoning needed |
DeepSeek R1 stands out on cost: you can run heavy reasoning workloads at a fraction of what GPT-5.4 Thinking charges, since reasoning tokens add significantly to the output bill. GPT-5.4 Thinking at xhigh effort tends to score higher on the hardest benchmarks (math olympiad, competitive programming), while R1 performs comparably on typical engineering and analysis tasks.
For consumer access, ChatGPT Plus ($20/month) includes GPT-5.4 Thinking. ChatGPT Pro ($200/month) offers unlimited GPT-5.4 Pro with the highest rate limits. DeepSeek R1 is free on deepseek.com. Claude extended thinking requires Claude Max ($100-200/month) or API access.
Frequently Asked Questions
Are reasoning models always better than standard models?
No. Reasoning models excel at problems requiring multi-step logic, math, formal analysis, and complex debugging. For creative writing, simple classification, summarization, and conversational tasks, standard models are faster, cheaper, and equally effective. The overhead of a thinking phase adds latency and cost without improving outputs on straightforward work. Use reasoning models selectively. Think of them as a power tool, not your everyday driver.
Can I see the model's thinking process?
It depends on the model. DeepSeek R1 shows its full reasoning inside tags, making every intermediate step visible. Claude extended thinking also exposes the reasoning process, though Anthropic reserves the right to filter certain content from the thinking output. OpenAI's GPT-5.4 Thinking shows an "Upfront Planning" summary in ChatGPT but does not expose the full reasoning chain in the API; reasoning tokens are billed but not visible. If transparency matters for your use case (auditing, education, debugging), R1 or Claude are stronger choices.
Which reasoning model is cheapest?
DeepSeek R1 wins on price by a significant margin at $0.28/$0.42 per million tokens for input/output. It is also free to use on deepseek.com with no account needed. GPT-5.4 Thinking starts at $2.50/$15.00, but reasoning tokens (billed as output) can significantly increase the effective cost on high-effort problems. Claude Opus 4.6 extended thinking runs $5.00/$25.00. If you need reasoning capabilities on a budget, start with R1 and only escalate to GPT-5.4 Thinking or Claude extended thinking for problems that genuinely require their additional capabilities.

Keyur Patel is the founder of AiPromptsX and an AI engineer with extensive experience in prompt engineering, large language models, and AI application development. After years of working with AI systems like ChatGPT, Claude, and Gemini, he created AiPromptsX to share effective prompt patterns and frameworks with the broader community. His mission is to democratize AI prompt engineering and help developers, content creators, and business professionals harness the full potential of AI tools.
Related Articles

DeepSeek vs ChatGPT vs Claude: Which AI Should You Use in 2026?

12 Advanced Prompt Engineering Techniques That Actually Work

9 Best AI Prompt Frameworks in 2026 (With Templates)

Prompt Chaining: How to Connect Multiple AI Prompts for Complex Tasks
Explore Related Frameworks
A.P.E Framework: A Simple Yet Powerful Approach to Effective Prompting
Action, Purpose, Expectation - A powerful methodology for designing effective prompts that maximize AI responses
RACE Framework: Role-Aligned Contextual Expertise
A structured approach to AI prompting that leverages specific roles, actions, context, and expectations to produce highly targeted outputs
R.O.S.E.S Framework: Crafting Prompts for Strategic Decision-Making
Use the R.O.S.E.S framework (Role, Objective, Style, Example, Scenario) to develop prompts that generate comprehensive strategic analysis and decision support.
Try These Related Prompts
Absolute Mode
A system instruction that enforces direct, unembellished communication focused on cognitive rebuilding and independent thinking, eliminating filler behaviors.
Unlock Hidden Prompts
Discover advanced prompt engineering techniques and generate 15 powerful prompt templates that most people overlook when using ChatGPT for maximum results.
Brutal Honest Advisor
Get unfiltered, direct feedback from an AI advisor who cuts through self-deception and provides harsh truths needed for breakthrough growth and clarity.