AI Agent Loop Patterns

Agents (Russell & Norvig)
In the textbook sense (Russell & Norvig), an agent is anything that perceives its environment through sensors and acts through actuators to pursue a task—often described as mapping percepts to actions over time, with goals, performance measures, and constraints (PEAS-style problem framing). Their taxonomy lines up five designs: simple reflex, model-based reflex, goal-based, utility-based, and learning agents—adding internal state, explicit goals, preference over outcomes, and adaptation from data as you move along the scale. Modern LLM agents are one engineering instantiation: the model reads state (prompt, tool output, memory), proposes the next action, and the runtime executes tools or updates—still the same perceive → decide → act cycle, with stochastic policies and richer natural-language interfaces than classical planners.
AI Agent Loop Patterns
An agent is an autonomous system that uses an LLM with tool calling to interact with external systems (databases, APIs, file systems) to perform actions, not just generate text.
This article is about AI Agent Loop Patterns: the repeating cycles of reasoning, tool use, and observation (the Re-Act family and close relatives). The LLM is the reasoning engine — it decides what to do, which tools to call, and how to read tool output. Unlike a chatbot that only emits text, these loops let an agent search, query, run code, or call APIs until a task completes.
Below are twelve common Re-Act-style loop patterns and extensions — from plain think–act– observe to dialogue, parallel tools, sequential iterations, reflection, memory, planning, chain-of-thought, tree-of-thought, sandbox execution, and learning. Pros and cons sit under each card; Python sketches and model tables are collapsible.

Re-Act
Reasoning + acting: producing reasoning steps (what to do next and why) and actions (tool calling), then observe (the value returned from the tool) and continues until the task is done. Each observe step is the only place the model sees real tool output, so the loop is grounded in facts rather than pure speculation. The run ends when the model stops requesting tools and returns a final answer (or hands off explicitly).
LOOP: THINK / REASON → ACT → OBSERVE → … → RESULT
Pros: Clear audit trail (think → act → observe). Tool results constrain hallucination on factual questions. Works with most chat models that support function calling. Easy to cap steps or tools for safety.
Cons: Latency and cost scale with loop length. Poor tool schemas or ambiguous prompts cause thrashing (repeated useless calls). The model may still misinterpret correct tool output if the task is underspecified.

Conversational Re-Act (ReSpAct)
Reason + Speak + Act: talks to the user, asks for clarification, reports back, then acts. With the user in the loop.
LOOP: INPUT → THINK → (OPTIONALLY) SPEAK/ASK → THINK → ACT → OBSERVE → … → FINAL RESULT
Pros: Fewer wrong-tool calls from ambiguous input. Better trust and transparency. Catches missing parameters early instead of failing inside a silent loop.
Cons: More round-trips and longer wall-clock time. Harder to automate in batch pipelines. Dialogue policy must avoid annoying or redundant questions.

Re-Act Description
An agent that follows the Re-Act pattern with an optional Describe step: it narrates planned or completed actions around tool use. Use it for transparency, UX, or audit logs; skip it when you want less latency or noise—the core loop stays Think → Act → Observe.
LOOP: INPUT → THINK → (OPTIONALLY) DESCRIBE (WHAT IT WILL DO OR HAS DONE) → ACT → OBSERVE → … → FINAL RESULT
Pros: Easier debugging and compliance-friendly traces. Can improve UX when tool payloads are noisy or binary.
Cons: Extra tokens and latency. Descriptions can drift from what tools actually did if not grounded in the same Observe payload. Not ideal for lowest-latency automation.

Multi-Action Re-Act
In one step the agent can output several tool calls (e.g. search + calculator), then observe all results. Same loop, but Act can be multiple actions per step. Modern chat models often expose parallel or batched tool use so independent lookups can run together and merge in Observe. When steps must be strictly ordered, keep a single Act or split across turns—parallel multi-action assumes no hidden dependency between those tools.
LOOP: INPUT → THINK → ACT (ONE OR MORE TOOL CALLS, E.G. SEARCH + CALCULATOR) → OBSERVE (ALL RESULTS) → THINK → … → FINAL ANSWER
Pros: Lower latency than serial one-tool-per-turn when dependencies allow. Fewer overall LLM calls for the same ground truth.
Cons: Risky when calls are not independent — ordering bugs or duplicate side effects. Larger Observe payloads can overflow context. Not all hosts expose true parallel execution.

Iterative Re-Act
Multiple sequential Re-Act rounds: after each Observe, the model may Think and Act again with full history—useful when the next tool choice depends on the last result, you want tighter control than packing many tools into one turn, or you need a clear iteration budget for cost and safety. Same atomic loop as Re-Act; emphasis is on repeated cycles until done or a max-step cap, not parallel multi-action in a single step.
LOOP: INPUT → THINK → ACT → OBSERVE → THINK → ACT → OBSERVE → … → FINAL ANSWER
Pros: Each new round sees fresh tool output before the next choice—good for dependent steps and auditable turn-by-turn traces. Easy to cap max_rounds for cost and safety.
Cons: More turns than multi-action when tools could have run in parallel. Latency and token use grow with loop depth; weak stop conditions or vague tools cause thrashing.

Re-Act + Reflection
After an attempt, the agent reflects (critique, self-correction) and then retries with an updated strategy.
LOOP: INPUT → REASON → ACT → OBSERVE → REFLECT → (MAYBE) REASON → ACT → … → FINAL ANSWER
Pros: Catches mistakes that a single pass would ship. Pairs well with a dedicated critic model or stricter reflection prompt. Improves robustness on complex tasks.
Cons: At least one extra LLM call per cycle — higher cost and latency. Reflection can still miss root causes or over-correct. Needs clear stop conditions to avoid infinite retries.

Re-Act + Memory
Re-Act loop + long-term or episodic memory (store important data, reuse in later steps). Memory is explicit read/write on top of Observe: tool outputs are ephemeral unless you persist them, while memory holds facts, preferences, or state across turns. Implementations range from a vector store keyed by session to structured slots the model updates after each Observe.
LOOP: INPUT → REASON → ACT → OBSERVE → MEMORY READ/WRITE → … → FINAL RESULT
Pros: Stable behavior over long horizons. Can reduce duplicate API calls. Supports personalization when memory is scoped per user or session.
Cons: Stale or wrong memories poison future turns — needs versioning, TTL, or retrieval with citations. Extra storage and privacy review. Vector memory can retrieve irrelevant facts if queries are vague.

Re-Act with Planning
Plan first (e.g. high-level steps or subgoals), then run a Re-Act loop within each step. Plan-and-execute with Re-Act as the execution engine.
LOOP: INPUT → PLAN → THINK → ACT → OBSERVE → … → FINAL RESULT
Pros: Reduces aimless tool use. Easier to parallelize or hand off sub-steps. Plan can be shown to users for approval.
Cons: Brittle if the initial plan is wrong — may need replanning logic. Two-layer control adds complexity. Planner and executor can disagree on what “done” means for a step.

Chain Of Thought Re-Act
Chain-of-thought with detailed, step-by-step reasoning in text.
LOOP: INPUT → THINK (COT) → ACT → OBSERVE → THINK (COT) → … → FINAL RESULT
Pros: Often higher accuracy on structured problems. Reasoning traces help humans trust or audit the path to an action. Works with a single model — no extra critic required.
Cons: Verbose traces increase tokens and latency. CoT can still be wrong while sounding confident. Some deployments prefer not to store raw chain-of-thought for privacy.

Tree-of-thought plus Re-Act
Before acting, the model explores several partial reasoning paths (a tree of candidate thoughts or subplans), prunes or scores them, then commits to a preferred branch and runs the standard Think → Act → Observe loop under that choice. Brings search-over-reasoning structure to hard problems; costs more LLM calls than a single CoT line—control breadth and depth for budget.
LOOP: INPUT → (BRANCH: MULTIPLE CANDIDATE THOUGHTS) → SELECT BEST → THINK → ACT → OBSERVE → … → FINAL RESULT
Pros: Explores several approaches before committing—often better on puzzles, planning, or when the first guess is wrong. Pairs well with strong reasoning models; combines with the same Re-Act tool loop after selection.
Cons: Multiplies up-front token cost (branching + selection). Heuristic picks can eliminate the right branch if scoring is weak. Needs explicit limits on branch count and depth.

Sandbox Execution Agent
Re-Act (or a planner–executor) where the main Act is not a thin API call but code or shell in an isolated environment: the model proposes commands or scripts, the runtime executes them in a sandbox (container, micro-VM, or restricted worker), and Observe is stdout, stderr, exit code, and optionally files or metrics. Suited to coding tasks, data wrangling, and reproducible runs; you trade latency and hosting cost for real compute and file I/O with guardrails (timeouts, network policy, resource caps).
LOOP: INPUT → THINK → (COMPOSE COMMAND OR SCRIPT) → EXECUTE IN SANDBOX → OBSERVE (STDOUT/STDERR/EXIT/ARTIFACTS) → … → FINAL RESULT
Pros: Real compute and filesystem in a controlled box—code can run, tests can execute, and agents can recover from failed commands using stderr. Easier to align “do work” with production coding copilots than only HTTP tools.
Cons: Sandboxing is non-trivial (escape risk, cost per session, image prep). Long logs or big artifacts blow context; needs truncation and summarization. Misconfigured network or mounts can still leak data or cost.

Re-Act + Learning
Agent updates its policy (or a retrievable knowledge store) from experience: RL from rewards, fine-tuning from feedback, or storing corrected strategies for reuse.
LOOP: INPUT → THINK → ACT → OBSERVE → (IF FEEDBACK/REWARD) → UPDATE → … → FINAL RESULT
Pros: System improves without hand-editing every prompt. Experience replay or stored strategies is cheaper than full RL for many teams.
Cons: Risk of learning the wrong pattern from noisy feedback. Fine-tuning and RL need data hygiene, eval harnesses, and rollback. “Learning” without guardrails can amplify biases or unsafe shortcuts.
Conclusion
You rarely need every loop pattern at once. Start with a minimal Re-Act loop and clear tool contracts, then add dialogue, reflection, or memory when user trust, accuracy, or long sessions demand it. For how those pieces fit into sense, think, act, observe, and finish, see Anatomy of an AI agent. For common use-case stacks (how to combine these loops with RAG, memory, tools), see Production Agent-RAG Architectures.