The Execution Layer

The next frontier is not agents.
It is loops.

An agent loop is a task with a check: the AI does some work, checks the result, then continues or stops. The model reasons — the loop is what makes it reliable.

Agent

the worker

Loop

the task-with-a-check around it

Loop Library

a catalog of reusable loop prompts

Ask an AI agent to fix a bug and you'll often get an answer in seconds: “Done.” Except sometimes it never ran the test. The bug is still there — the agent guessed once and reported a success it never checked.

That gap — between an agent that says it finished and one that can prove it — is the whole story of agent loops. A loop is what closes it: do the work, check the result, and stop only when the check passes. It's the difference between a confident answer and a correct one — and it's quietly becoming the most important thing to get right in applied AI.

How we got here

Layer 1

First we engineered prompts

“What should I tell the model?”

Layer 2

Then we designed agents

“What tools, memory, and role should the model have?”

Layer 3

Now the layer is loops

“How does the system keep working, checking, retrying, escalating, and stopping?”

Prompt, workflow, agent, loop — what's the difference?

These four words get used interchangeably, which is exactly why agentic systems are unreliable. They are different layers — and the loop is the one that decides whether the agent actually finishes the job.

Layer	What it is	Example
Prompt	A one-shot instruction. No check, no second step.	“Summarize this paper.”
Workflow	A predefined sequence of steps on fixed code paths.	Fetch → extract → format → send.
Agent	A tool-using AI that dynamically directs its own process.	Reads, searches, writes code, calls APIs.
Loop	A task with a check: act, verify the result, continue or stop.	Act → observe → check → retry → stop on evidence.

“[Agents] are typically just LLMs using tools based on environmental feedback in a loop.”— Anthropic, Building Effective Agents

The one rule

Give the agent a way to verify its work

This is the first item in Anthropic's Claude Code best practices, and it is the whole game. The verification — a test, a benchmark, a rubric, a re-query — is what makes a loop a loop. Without it, an agent guesses once and can confidently report a success that never happened. The check, not the agent's opinion, decides whether the work improved and when to stop.

Why loops matter

The reliability of an agentic system depends less on the model and more on the loop around it. Two teams with the same model get very different results depending on how they structure execution.

→Loops reduce vague prompting — the pattern carries the intent, not a single fragile instruction.

→Loops make agents reliable by forcing a check after every action.

→Loops create repeatable behavior you can reason about and reproduce.

→Loops make evaluation easy — there is a defined criterion to test against.

→Loops prevent infinite wandering with explicit stop conditions and budgets.

→Loops make agent work auditable: every iteration leaves a trace.

Anatomy of a good loop

The reliable ones share a recognizable set of parts — each one a place where sloppy agents quietly fail.

1Goal

A result you can measure or review. “Improve the code” is vague; “every page loads under 50 ms on the same test” is a real finish line.

2Context

The fresh inputs and state the agent should inspect before acting.

3Action

One small, bounded, reversible change per iteration — easy to verify, easy to undo.

4Observation

The ground truth back from the action: results, errors, tool output, environment state.

5Check

A fixed test, benchmark, rubric, or approval — the check, not the agent's opinion, decides if the work improved.

6Retry rule

How the loop responds to a failed check: adjust, escalate, or try a different path.

7Stop condition

When success is reached, when no change is needed, or when it's blocked or out of budget.

8Evidence / output

A verifiable artifact — a passing test, a diff, a PR, a report — not just a claim of success.

9Budget limit

A hard ceiling on iterations, tokens, or time so the loop can never run away.

10Human approval point

A checkpoint where a person reviews before a costly or irreversible step.

Write a loop in plain language

You don't need a framework to start — a loop is mostly a well-structured prompt. Name the trigger, the inputs to inspect, the one change allowed, the check to run, and the conditions that stop it. Run it by hand once before you schedule it; the first run usually reveals a missing check or a fuzzy stop condition.

When [trigger], inspect [fresh inputs].
Choose one in-scope action using [criteria], then make the change.
Run [acceptance check] under the same conditions.
Record what changed, the evidence, and the next step in [state file].
Repeat only while progress is measurable and [budget] remains.
Stop when [success gate] passes. Stop without changes when [no-op condition] is true.
Ask for approval or report a blocker when [escalation condition] occurs.
Never [forbidden action]. Finish with [pull request, report, artifact, or handoff].

Template structure adapted from Forward Future's Loop Library.

Loops you can copy and run

Paste-ready loop prompts — each one a goal with a fixed check and an explicit stop. Drop them into Claude Code, Cursor, or Codex and adapt the brackets to your project.

Test-driven bug-fix loop

Engineering

Reproduce the reported bug and capture it as a failing test. Make the smallest change that could fix it, then run the test suite. If it still fails, inspect the error, revise, and run again. Stop when the new test passes and no existing tests regress. If you cannot reproduce after 3 attempts, stop and report what you tried. Never change unrelated code. Finish with the diff and the failing-then-passing test run as evidence.

Production error sweep

Operations

Review production logs for errors. If you find an actionable issue, trace it to root cause, apply the smallest safe fix, and verify the fix under the same conditions. If no actionable errors are present, stop without making changes. Escalate anything you cannot safely auto-fix. Finish with a short report of what changed and the evidence.

Content quality loop

Content

For each draft, score it against the editorial rubric: specific headline, fact-led lede, original analysis (not rephrasing), every number/name verifiable, scannable structure, and a forward-looking close. For anything below the bar, web-verify the facts, rewrite to the bar, and re-score. Stop when every item passes or the budget is reached. Never publish an evergreen explainer as news. Finish by saving the improved pieces with their scores.

SEO/GEO visibility loop

Content

Run an SEO/GEO audit across crawlability, indexation, titles, internal links, structured data, source citations, and answer-first content. Rank the gaps by expected traffic impact. Fix the single highest-leverage gap, then re-run the same crawl and re-check the target queries across search and AI answer engines. Record changes and evidence in a state file. Repeat while measurable gaps remain and within budget. Stop when no critical technical issue is left. Finish with a report.

Site health loop

Evaluation

Check that every section feed renders, key pages return 200, no records carry malformed data, and the pipelines are fresh. For any self-healable failure, apply the fix and re-verify. If a failure is not safely auto-fixable, stop and hand it to a human with evidence. Cap the run at 3 iterations. Finish with a pass/fail report and the list of self-heals applied.

gentic.news runs versions of the bottom three on itself — the site-health and content-quality loops keep this platform honest.

Types of agent loops

Most working agent systems are one of a dozen recognizable archetypes. Knowing which you need — and its stop condition — is half of building a reliable agent.

1. ReAct loop

Yao et al., 2022

reason → act → observe → reason again

Best for: Tool use, research, browsing, debugging

2. Reflection loop

generate → critique → revise → repeat

Best for: Writing, reasoning, code review

3. Evaluator-optimizer loop

Anthropic pattern

produce → evaluate vs criteria → improve → re-evaluate

Best for: Quality-controlled content, code, structured output

4. Generator-critic loop

generate → critic reviews → revise → approve/reject

Best for: High-accuracy, safety-sensitive, editorial work

5. Plan-execute-replan loop

plan → execute step → observe → update plan

Best for: Long tasks, coding agents, operations

6. Tool-use repair loop

choose tool → call → inspect error → repair args → retry

Best for: API, data, browser, coding agents

7. Test-driven coding loop

reproduce → fix → run tests → inspect → patch → repeat

Best for: Claude Code, Codex, Cursor, Devin-style agents

8. Research verification loop

Used in gentic.news's pipeline

find claim → retrieve sources → cross-check → reject weak → cite

Best for: AI news, intelligence, fact-checking

9. Memory-reflection loop

Generative Agents, 2023

observe → store memory → retrieve → reflect → plan

Best for: Long-running agents, assistants

10. Search / tree loop

Tree of Thoughts / LATS

branch → evaluate → expand best → backtrack or stop

Best for: Complex reasoning, planning

11. Multi-agent conversation loop

AutoGen / CrewAI

planner → executor → reviewer → human approves

Best for: Multi-role / enterprise workflows

12. Human-approval loop

draft → assess risk → request approval → execute

Best for: Emails, payments, production changes

13. Skill-acquisition loop

Voyager, 2023

propose goal → generate → execute → self-verify → save skill

Best for: Embodied / lifelong-learning agents

Running loops safely in parallel

Parallel loops are where reliability turns into risk: race conditions, conflicting writes, runaway cost, cross-contamination. The rule is simple — parallel actions are safe when they're independent, and dangerous when they share mutable state without coordination.

Isolate what you can

Give each parallel action its own sandbox — separate files, git worktrees, DB rows, or sessions. No shared mutable state means no collision. (Anthropic's “fan out across files” works for exactly this reason.)

Coordinate what you must

When actions genuinely touch one shared resource (a schema, a counter), serialize just that with a lock or a queue. A shared resource without coordination is how parallel loops deadlock.

Cap everything

Bound concurrency (how many run at once) and put a hard ceiling on iterations, tokens, and cost. A parallel fan-out without a budget is how you get a runaway bill.

Fan out, then reconcile

Run N actions independently, then use one barrier step to merge and de-duplicate the results before anything downstream consumes them. The merge is where you catch conflicts.

Verify the combined result

After a parallel batch, re-check the whole system, not each piece — a set of individually-fine changes can still break together.

Gate the irreversible

Auto-run low-risk actions; require human approval for deploys, payments, or deletes. Give each agent only the tools it needs — and never one untrusted input + powerful tools + an exfiltration path at once.

The Loop Library

A cookbook, not a model

The clearest sign loops are their own layer: someone built a library for them. Loop Library, from Matthew Berman's Forward Future, is a free, community-contributed catalog of copy-paste agent-loop prompts — dozens of them across engineering, evaluation, operations, content, and design. It is not an agent and not a model; it is a cookbook of reusable operating patterns you adapt and run.

“An agent loop is a task with a check. The agent does some work, checks the result, and then continues or stops… Use a loop when the result of one step should change the next step. If it will not, use a one-time task instead.”— Loop Library

Installable skill for your coding agent

npx skills add Forward-Future/loop-library --skill loop-library -g

Berman launched it with “find loops, submit your own, tokenmaxx” — a nod to spending more agentic compute on hard problems. The discipline the library adds is the point: a real check and a hard stop, so the extra tokens buy reliability rather than runaway loops.

Where builders find real loop examples

The patterns above are documented across frameworks and research — here is where to read the source and copy working code.

Forward Future — Loop Library↗

31 copy-paste agent-loop prompts with checks and stop conditions

Anthropic — Building Effective Agents↗

The canonical workflow-vs-agent distinction; evaluator-optimizer loop

Claude Code — Best Practices↗

“Give Claude a way to verify its work”; explore→plan→code; fan-out

Google ADK — LoopAgent↗

Deterministic loop workflows with explicit termination conditions

LangGraph↗

Stateful, cyclical agent runtimes; reflection & judge loops

ReAct (paper)↗

The classic reasoning↔action loop

Reflexion (paper)↗

Reflection + memory; learn from failure without fine-tuning

Voyager (paper)↗

Embodied lifelong-learning loop with a growing skill library

Loop vs agent, made practical

“An agent without a loop is a capable worker with vague instructions. An agent with a good loop is a worker with a checklist, a test plan, a reviewer, and a stop rule.”

The agent executes.

The loop governs execution.

The model reasons.

The loop constrains and evaluates.

The tool gives capability.

The loop gives reliability.

gentic.news — strategic read

The market talks about agents, but the real competitive advantage is moving to loop design. The teams that win will not only have access to strong models — they will know how to wrap those models in reliable loops: coding loops, research loops, evaluation loops, memory loops, review loops, and deployment loops.

Agent loops — FAQ

What is an agent loop?

An agent loop is a task with a check: the agent does some work, checks the result against a fixed criterion, and then continues or stops. As Anthropic puts it, agents are “typically just LLMs using tools based on environmental feedback in a loop.” Use a loop when the result of one step should change the next step; if it will not, a one-time prompt is enough.

How is an agent loop different from a workflow or a prompt?

A prompt is one instruction with no check. A workflow is a fixed sequence on predefined code paths. A loop is adaptive: it observes the result of each action, checks it against a goal, and decides whether to retry, escalate, or stop. The difference that matters is the check and the stop condition — that is what turns a capable model into a reliable one.

Why do I need a loop if I can just ask the AI to do it?

When you ask a strong agent to do a multi-step task, a loop is already running under the hood — you just don't see it. The loop is what lets it recover when step two fails. The reason builders obsess over loop design: without a way to verify its work, an agent guesses once and can confidently report success that never happened. The verification is what makes the loop a loop.

What is the single most important part of a good loop?

A way for the agent to verify its own work — the #1 item in Anthropic's Claude Code best practices. A fixed test, benchmark, or rubric (not the agent's own opinion) is what lets the loop tell whether it improved and when to stop. Everything else — small reversible steps, budgets, escalation — supports that check.

How do I run loops safely in parallel?

Isolate what you can (separate files, worktrees, or rows, so actions can't collide), coordinate what you must (a lock or queue for any shared resource), cap concurrency and cost, fan out then reconcile in one barrier step, verify the combined result, and gate irreversible actions behind human approval. Parallel actions are safe when independent and dangerous when they share mutable state without coordination.

What is the Loop Library?

Loop Library, from Matthew Berman's Forward Future, is a free catalog of copy-paste agent-loop prompts — dozens of them across engineering, evaluation, operations, content, and design — each a reusable instruction with a built-in check and stop condition that you paste into a coding agent like Claude Code, Cursor, or Codex.

Loops are how agentic AI becomes dependable.

Explore more AI execution patterns, agent systems, and intelligence infrastructure on gentic.news.

AI Agent Leaderboard →Claude Code Hub Which AI should I use?