The next frontier is not agents.
It is loops.
An agent loop is a task with a check: the AI does some work, checks the result, then continues or stops. The model reasons — the loop is what makes it reliable.
Ask an AI agent to fix a bug and you'll often get an answer in seconds: “Done.” Except sometimes it never ran the test. The bug is still there — the agent guessed once and reported a success it never checked.
That gap — between an agent that says it finished and one that can prove it — is the whole story of agent loops. A loop is what closes it: do the work, check the result, and stop only when the check passes. It's the difference between a confident answer and a correct one — and it's quietly becoming the most important thing to get right in applied AI.
How we got here
“What should I tell the model?”
“What tools, memory, and role should the model have?”
“How does the system keep working, checking, retrying, escalating, and stopping?”
Prompt, workflow, agent, loop — what's the difference?
These four words get used interchangeably, which is exactly why agentic systems are unreliable. They are different layers — and the loop is the one that decides whether the agent actually finishes the job.
| Layer | What it is | Example |
|---|---|---|
| Prompt | A one-shot instruction. No check, no second step. | “Summarize this paper.” |
| Workflow | A predefined sequence of steps on fixed code paths. | Fetch → extract → format → send. |
| Agent | A tool-using AI that dynamically directs its own process. | Reads, searches, writes code, calls APIs. |
| Loop | A task with a check: act, verify the result, continue or stop. | Act → observe → check → retry → stop on evidence. |
“[Agents] are typically just LLMs using tools based on environmental feedback in a loop.”— Anthropic, Building Effective Agents
Give the agent a way to verify its work
This is the first item in Anthropic's Claude Code best practices, and it is the whole game. The verification — a test, a benchmark, a rubric, a re-query — is what makes a loop a loop. Without it, an agent guesses once and can confidently report a success that never happened. The check, not the agent's opinion, decides whether the work improved and when to stop.
Why loops matter
The reliability of an agentic system depends less on the model and more on the loop around it. Two teams with the same model get very different results depending on how they structure execution.
Anatomy of a good loop
The reliable ones share a recognizable set of parts — each one a place where sloppy agents quietly fail.
A result you can measure or review. “Improve the code” is vague; “every page loads under 50 ms on the same test” is a real finish line.
The fresh inputs and state the agent should inspect before acting.
One small, bounded, reversible change per iteration — easy to verify, easy to undo.
The ground truth back from the action: results, errors, tool output, environment state.
A fixed test, benchmark, rubric, or approval — the check, not the agent's opinion, decides if the work improved.
How the loop responds to a failed check: adjust, escalate, or try a different path.
When success is reached, when no change is needed, or when it's blocked or out of budget.
A verifiable artifact — a passing test, a diff, a PR, a report — not just a claim of success.
A hard ceiling on iterations, tokens, or time so the loop can never run away.
A checkpoint where a person reviews before a costly or irreversible step.
Write a loop in plain language
You don't need a framework to start — a loop is mostly a well-structured prompt. Name the trigger, the inputs to inspect, the one change allowed, the check to run, and the conditions that stop it. Run it by hand once before you schedule it; the first run usually reveals a missing check or a fuzzy stop condition.
When [trigger], inspect [fresh inputs].
Choose one in-scope action using [criteria], then make the change.
Run [acceptance check] under the same conditions.
Record what changed, the evidence, and the next step in [state file].
Repeat only while progress is measurable and [budget] remains.
Stop when [success gate] passes. Stop without changes when [no-op condition] is true.
Ask for approval or report a blocker when [escalation condition] occurs.
Never [forbidden action]. Finish with [pull request, report, artifact, or handoff].Template structure adapted from Forward Future's Loop Library.
Loops you can copy and run
Paste-ready loop prompts — each one a goal with a fixed check and an explicit stop. Drop them into Claude Code, Cursor, or Codex and adapt the brackets to your project.
Test-driven bug-fix loop
EngineeringReproduce the reported bug and capture it as a failing test. Make the smallest change that could fix it, then run the test suite. If it still fails, inspect the error, revise, and run again. Stop when the new test passes and no existing tests regress. If you cannot reproduce after 3 attempts, stop and report what you tried. Never change unrelated code. Finish with the diff and the failing-then-passing test run as evidence.
Production error sweep
OperationsReview production logs for errors. If you find an actionable issue, trace it to root cause, apply the smallest safe fix, and verify the fix under the same conditions. If no actionable errors are present, stop without making changes. Escalate anything you cannot safely auto-fix. Finish with a short report of what changed and the evidence.
Content quality loop
ContentFor each draft, score it against the editorial rubric: specific headline, fact-led lede, original analysis (not rephrasing), every number/name verifiable, scannable structure, and a forward-looking close. For anything below the bar, web-verify the facts, rewrite to the bar, and re-score. Stop when every item passes or the budget is reached. Never publish an evergreen explainer as news. Finish by saving the improved pieces with their scores.
SEO/GEO visibility loop
ContentRun an SEO/GEO audit across crawlability, indexation, titles, internal links, structured data, source citations, and answer-first content. Rank the gaps by expected traffic impact. Fix the single highest-leverage gap, then re-run the same crawl and re-check the target queries across search and AI answer engines. Record changes and evidence in a state file. Repeat while measurable gaps remain and within budget. Stop when no critical technical issue is left. Finish with a report.
Site health loop
EvaluationCheck that every section feed renders, key pages return 200, no records carry malformed data, and the pipelines are fresh. For any self-healable failure, apply the fix and re-verify. If a failure is not safely auto-fixable, stop and hand it to a human with evidence. Cap the run at 3 iterations. Finish with a pass/fail report and the list of self-heals applied.
gentic.news runs versions of the bottom three on itself — the site-health and content-quality loops keep this platform honest.
Types of agent loops
Most working agent systems are one of a dozen recognizable archetypes. Knowing which you need — and its stop condition — is half of building a reliable agent.
1. ReAct loop
Yao et al., 20222. Reflection loop
3. Evaluator-optimizer loop
Anthropic pattern4. Generator-critic loop
5. Plan-execute-replan loop
6. Tool-use repair loop
7. Test-driven coding loop
8. Research verification loop
Used in gentic.news's pipeline9. Memory-reflection loop
Generative Agents, 202310. Search / tree loop
Tree of Thoughts / LATS11. Multi-agent conversation loop
AutoGen / CrewAI12. Human-approval loop
13. Skill-acquisition loop
Voyager, 2023Running loops safely in parallel
Parallel loops are where reliability turns into risk: race conditions, conflicting writes, runaway cost, cross-contamination. The rule is simple — parallel actions are safe when they're independent, and dangerous when they share mutable state without coordination.
Give each parallel action its own sandbox — separate files, git worktrees, DB rows, or sessions. No shared mutable state means no collision. (Anthropic's “fan out across files” works for exactly this reason.)
When actions genuinely touch one shared resource (a schema, a counter), serialize just that with a lock or a queue. A shared resource without coordination is how parallel loops deadlock.
Bound concurrency (how many run at once) and put a hard ceiling on iterations, tokens, and cost. A parallel fan-out without a budget is how you get a runaway bill.
Run N actions independently, then use one barrier step to merge and de-duplicate the results before anything downstream consumes them. The merge is where you catch conflicts.
After a parallel batch, re-check the whole system, not each piece — a set of individually-fine changes can still break together.
Auto-run low-risk actions; require human approval for deploys, payments, or deletes. Give each agent only the tools it needs — and never one untrusted input + powerful tools + an exfiltration path at once.
A cookbook, not a model
The clearest sign loops are their own layer: someone built a library for them. Loop Library, from Matthew Berman's Forward Future, is a free, community-contributed catalog of copy-paste agent-loop prompts — dozens of them across engineering, evaluation, operations, content, and design. It is not an agent and not a model; it is a cookbook of reusable operating patterns you adapt and run.
“An agent loop is a task with a check. The agent does some work, checks the result, and then continues or stops… Use a loop when the result of one step should change the next step. If it will not, use a one-time task instead.”— Loop Library
npx skills add Forward-Future/loop-library --skill loop-library -gBerman launched it with “find loops, submit your own, tokenmaxx” — a nod to spending more agentic compute on hard problems. The discipline the library adds is the point: a real check and a hard stop, so the extra tokens buy reliability rather than runaway loops.
Where builders find real loop examples
The patterns above are documented across frameworks and research — here is where to read the source and copy working code.
31 copy-paste agent-loop prompts with checks and stop conditions
The canonical workflow-vs-agent distinction; evaluator-optimizer loop
“Give Claude a way to verify its work”; explore→plan→code; fan-out
Deterministic loop workflows with explicit termination conditions
Stateful, cyclical agent runtimes; reflection & judge loops
The classic reasoning↔action loop
Reflection + memory; learn from failure without fine-tuning
Embodied lifelong-learning loop with a growing skill library
Loop vs agent, made practical
“An agent without a loop is a capable worker with vague instructions. An agent with a good loop is a worker with a checklist, a test plan, a reviewer, and a stop rule.”
The market talks about agents, but the real competitive advantage is moving to loop design. The teams that win will not only have access to strong models — they will know how to wrap those models in reliable loops: coding loops, research loops, evaluation loops, memory loops, review loops, and deployment loops.
Agent loops — FAQ
What is an agent loop?
An agent loop is a task with a check: the agent does some work, checks the result against a fixed criterion, and then continues or stops. As Anthropic puts it, agents are “typically just LLMs using tools based on environmental feedback in a loop.” Use a loop when the result of one step should change the next step; if it will not, a one-time prompt is enough.
How is an agent loop different from a workflow or a prompt?
A prompt is one instruction with no check. A workflow is a fixed sequence on predefined code paths. A loop is adaptive: it observes the result of each action, checks it against a goal, and decides whether to retry, escalate, or stop. The difference that matters is the check and the stop condition — that is what turns a capable model into a reliable one.
Why do I need a loop if I can just ask the AI to do it?
When you ask a strong agent to do a multi-step task, a loop is already running under the hood — you just don't see it. The loop is what lets it recover when step two fails. The reason builders obsess over loop design: without a way to verify its work, an agent guesses once and can confidently report success that never happened. The verification is what makes the loop a loop.
What is the single most important part of a good loop?
A way for the agent to verify its own work — the #1 item in Anthropic's Claude Code best practices. A fixed test, benchmark, or rubric (not the agent's own opinion) is what lets the loop tell whether it improved and when to stop. Everything else — small reversible steps, budgets, escalation — supports that check.
How do I run loops safely in parallel?
Isolate what you can (separate files, worktrees, or rows, so actions can't collide), coordinate what you must (a lock or queue for any shared resource), cap concurrency and cost, fan out then reconcile in one barrier step, verify the combined result, and gate irreversible actions behind human approval. Parallel actions are safe when independent and dangerous when they share mutable state without coordination.
What is the Loop Library?
Loop Library, from Matthew Berman's Forward Future, is a free catalog of copy-paste agent-loop prompts — dozens of them across engineering, evaluation, operations, content, and design — each a reusable instruction with a built-in check and stop condition that you paste into a coding agent like Claude Code, Cursor, or Codex.
Loops are how agentic AI becomes dependable.
Explore more AI execution patterns, agent systems, and intelligence infrastructure on gentic.news.