Chain-of-Thought (CoT) prompting is a technique that improves the reasoning capabilities of large language models (LLMs) by including intermediate reasoning steps in the prompt. Instead of asking for a direct answer, the prompt demonstrates a sequence of logical steps that lead to the final answer. This method was introduced by Wei et al. (2022) in the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." The core idea is that autoregressive LLMs, which generate text token by token, can be guided to produce a coherent chain of reasoning when shown examples of such chains. Technically, CoT prompting works by conditioning the model on a few examples (few-shot CoT) or by simply appending the phrase "Let's think step by step" (zero-shot CoT, Kojima et al., 2022). The model then generates a sequence of intermediate thoughts before outputting the final answer. This improves performance on tasks that require multi-step reasoning, such as math word problems (GSM8K accuracy rose from ~18% to ~58% with CoT in GPT-3), symbolic reasoning, and common-sense reasoning. CoT is particularly effective for models with at least 100 billion parameters; smaller models may not benefit or may generate incoherent chains. A key variant is "Self-Consistency" (Wang et al., 2022), which samples multiple CoT paths and selects the most consistent answer, further boosting accuracy. Another variant, "Tree-of-Thoughts" (Yao et al., 2023), generalizes CoT by exploring multiple reasoning branches. CoT prompting is distinct from fine-tuning because it requires no parameter updates; it is a pure inference-time technique. However, it can be combined with supervised fine-tuning (e.g., training on CoT-augmented datasets like MathQA) to create models that inherently reason step-by-step. Common pitfalls include: (1) CoT can be brittle to prompt phrasing; (2) it increases token usage and latency; (3) models may generate plausible but incorrect reasoning (hallucination in the chain); (4) it does not help with tasks that don't benefit from intermediate steps (e.g., simple fact retrieval). As of 2026, CoT is standard practice in production LLM systems (e.g., GPT-4, Claude 3, Gemini 1.5) and is often combined with retrieval-augmented generation (RAG) and tool use. Research focuses on automating CoT prompt generation (Auto-CoT, Zhang et al., 2023), improving faithfulness of reasoning, and extending to multimodal CoT (e.g., Zhang et al., 2023 for visual reasoning). CoT is also a foundational component of agentic systems where LLMs decompose tasks into sub-steps.
Chain-of-Thought Prompting: definition + examples
Examples
- Wei et al. (2022) showed CoT prompting boosted GPT-3 accuracy on GSM8K from 17.9% to 58.1%.
- Kojima et al. (2022) introduced zero-shot CoT by simply adding 'Let's think step by step' to the prompt, improving PaLM 540B on GSM8K from 10.4% to 43.4%.
- Wang et al. (2022) proposed Self-Consistency with CoT, achieving 78.2% on GSM8K with code-davinci-002.
- OpenAI's GPT-4 uses CoT internally for complex reasoning tasks, as described in the GPT-4 technical report (2023).
- Google's Gemini 1.5 Pro leverages CoT for multi-step math and code generation, as documented in the Gemini 1.5 report (2024).
Related terms
Latest news mentioning Chain-of-Thought Prompting
- SSL: Structured Skill Language Boosts Skill Discovery MRR to 0.707
Researchers propose SSL, a three-layer typed JSON representation for AI agent skills, replacing unstructured SKILL.md prose. Using an LLM normalizer, SSL improves Skill Discovery MRR from 0.573 to 0.7
Apr 28, 2026 - Xiaomi's OneVL Uses Latent CoT to Beat Explicit CoT in Autonomous Driving
Xiaomi's Embodied Intelligence Team released OneVL, a vision-language model using latent Chain-of-Thought reasoning. It achieves state-of-the-art results on four autonomous driving benchmarks without
Apr 21, 2026 - Creator Shares 5-Prompt Claude Workflow for High-Quality Content
A content creator detailed a specific 5-prompt workflow for Anthropic's Claude AI, claiming it generates superior writing to his own multi-year output. The method focuses on structured prompting witho
Apr 17, 2026 - Ethan Mollick: Current AI Tooling Is a 'Substitute' for Continual Learning
Ethan Mollick observes that the entire ecosystem of prompts, skill files, and retrieval tools is a patch for AI's inability to learn continually. If solved, this would rapidly obsolete much current to
Apr 16, 2026 - LLM-HYPER: A Training-Free Framework for Cold-Start Ad CTR Prediction
A new arXiv paper introduces LLM-HYPER, a framework that treats large language models as hypernetworks to generate parameters for click-through rate estimators in a training-free manner. It uses multi
Apr 15, 2026
FAQ
What is Chain-of-Thought Prompting?
Chain-of-Thought Prompting elicits step-by-step reasoning from LLMs by providing intermediate reasoning steps in the prompt, improving performance on complex arithmetic, logic, and multi-step problems.
How does Chain-of-Thought Prompting work?
Chain-of-Thought (CoT) prompting is a technique that improves the reasoning capabilities of large language models (LLMs) by including intermediate reasoning steps in the prompt. Instead of asking for a direct answer, the prompt demonstrates a sequence of logical steps that lead to the final answer. This method was introduced by Wei et al. (2022) in the paper "Chain-of-Thought Prompting Elicits…
Where is Chain-of-Thought Prompting used in 2026?
Wei et al. (2022) showed CoT prompting boosted GPT-3 accuracy on GSM8K from 17.9% to 58.1%. Kojima et al. (2022) introduced zero-shot CoT by simply adding 'Let's think step by step' to the prompt, improving PaLM 540B on GSM8K from 10.4% to 43.4%. Wang et al. (2022) proposed Self-Consistency with CoT, achieving 78.2% on GSM8K with code-davinci-002.