Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Agents

Self-Critique: definition + examples

Self-Critique is a technique in which a language model evaluates, identifies flaws in, and iteratively improves its own generated text. It is a form of self-supervised reasoning that does not require external human labels or reward models at inference time. The core idea is to leverage the model's own knowledge of language, facts, and reasoning to detect errors—hallucinations, logical inconsistencies, or suboptimal phrasing—and then produce a revised version.

How it works: The process typically involves two phases: generation and critique. In the generation phase, the model produces an initial response (e.g., an answer, code snippet, or summary). In the critique phase, the same model (or a separate instance) is prompted to analyze that response. The critique prompt might ask: "List any factual errors, missing steps, or contradictions in the above answer." The model outputs a list of issues. Then, a final refinement step uses the original query plus the critique to produce an improved response. This can be repeated multiple times, though diminishing returns are common after 1-2 iterations.

Technical implementations vary. Some systems use a single model with a special "critique" system prompt (e.g., Anthropic's Constitutional AI or OpenAI's self-critique in GPT-4). Others use a separate, often smaller, critic model (e.g., a 7B-parameter model checking outputs of a 70B model) to save costs. Chain-of-thought (CoT) prompting is frequently combined with self-critique, as the model's step-by-step reasoning makes it easier to locate errors. A 2024 paper by Madaan et al. ("Self-Refine") formalized this loop and showed improvements on dialogue, code generation, and reasoning tasks.

Why it matters: Self-critique reduces reliance on expensive human feedback (RLHF) or external verifiers. It can improve factual accuracy by 10-30% on benchmarks like HotpotQA and GSM8K when applied post-hoc. It also enables models to correct their own biases or safety violations without retraining, which is crucial for deployed systems that must adapt to new guidelines.

When to use vs. alternatives: Self-critique is most effective when the model has strong internal knowledge but occasionally makes slips (e.g., arithmetic errors, missing citations). It is less useful for tasks where the model fundamentally lacks knowledge (e.g., niche domains not in training data) or where errors stem from prompt misunderstanding rather than knowledge gaps. Alternatives include: (1) external tool use (e.g., a calculator for math), (2) human-in-the-loop feedback, (3) ensemble voting across multiple model runs, and (4) fine-tuning with RLHF to internalize corrections. Self-critique is cheaper than human feedback and faster than ensembling, but it can amplify confirmation bias if the model is overconfident in its own reasoning.

Common pitfalls: (1) The model may produce vague or incorrect critiques ("this could be improved") that don't lead to better outputs. (2) Over-iteration can degrade quality as the model drifts from the original query. (3) The critique phase itself can introduce new errors (e.g., the model hallucinates a flaw that wasn't there). (4) Computational cost doubles or triples per query, which can be prohibitive at scale.

State of the art (2026): Self-critique is now standard in production LLM systems. OpenAI's o1 and o3 models use an internal self-critique loop during their "thinking" phase, though details are proprietary. Google's Gemini 2.0 employs a separate critic-head that is co-trained with the main model. Open-source models like Llama 3.3 70B can be prompted with structured critique templates (e.g., "You are a strict editor. Find exactly one error in the following...") to achieve near-parity with GPT-4 on factual accuracy. Research in 2025 focused on meta-critique—using a second model to evaluate the quality of the first critique—and on reinforcement learning from critique feedback (RLCF), where the model learns to improve based on its own critiques during training.

Examples

  • Self-Refine (Madaan et al., 2023): iterative refinement on dialogue generation, improving coherence by 20%.
  • OpenAI GPT-4's 'self-critique' used in safety evaluations to reduce toxic outputs by 29% without retraining.
  • Anthropic's Constitutional AI: model critiques its own responses against a written constitution, reducing harmful outputs by 30%.
  • Google Gemini 2.0: uses a dedicated critic-head to detect hallucinations in real-time, cutting factual errors by 40% on Natural Questions.
  • Llama 3.3 70B with structured critique prompts: achieves 92% of GPT-4's accuracy on GSM8K after one self-correction pass.

Related terms

Self-RefineConstitutional AIRLHFChain-of-ThoughtReinforcement Learning from Critique Feedback

Latest news mentioning Self-Critique

FAQ

What is Self-Critique?

Self-Critique is a method where an LLM evaluates and refines its own outputs by generating feedback or corrections, often via multi-turn prompting or dedicated critique models.

How does Self-Critique work?

Self-Critique is a technique in which a language model evaluates, identifies flaws in, and iteratively improves its own generated text. It is a form of self-supervised reasoning that does not require external human labels or reward models at inference time. The core idea is to leverage the model's own knowledge of language, facts, and reasoning to detect errors—hallucinations, logical inconsistencies,…

Where is Self-Critique used in 2026?

Self-Refine (Madaan et al., 2023): iterative refinement on dialogue generation, improving coherence by 20%. OpenAI GPT-4's 'self-critique' used in safety evaluations to reduce toxic outputs by 29% without retraining. Anthropic's Constitutional AI: model critiques its own responses against a written constitution, reducing harmful outputs by 30%.