AIΒ agentΒ loopΒ patterns

  1. Home
  2. AI
  3. Agent, RAG, MCP & ML
  4. Agent loop patterns
Agent loop pattern diagram: user input, plan, think act observe loop, branch, final result
One agent loop pattern: plan high-level steps, then per step run think β†’ act β†’ observe until the task is done.

AgentΒ loopΒ patterns

An agent is an autonomous system that uses an LLM with tool calling to interact with external systems (databases, APIs, file systems) to perform actions, not just generate text.

This article is about agent loop patterns: the repeating cycles of reasoning, tool use, and observation (the ReAct family and close relatives). The LLM is the reasoning engine β€” it decides what to do, which tools to call, and how to read tool output. Unlike a chatbot that only emits text, these loops let an agent search, query, run code, or call APIs until a task completes.

Below are nine common ReAct-style loop patterns and extensions β€” from plain think–act– observe to dialogue, parallel tools, reflection, memory, planning, chain-of-thought, and learning. Pros and cons sit under each card; Python sketches and model tables are collapsible.

Re-Act

Re-Act

Reasoning + acting: producing reasoning steps (what to do next and why) and actions (tool calling), then observe (the value returned from the tool) and continues until the task is done. Each observe step is the only place the model sees real tool output, so the loop is grounded in facts rather than pure speculation. The run ends when the model stops requesting tools and returns a final answer (or hands off explicitly).

LOOP: THINK / REASON β†’ ACT β†’ OBSERVE β†’ … β†’ RESULT

Pros: Clear audit trail (think β†’ act β†’ observe). Tool results constrain hallucination on factual questions. Works with most chat models that support function calling. Easy to cap steps or tools for safety.

Cons: Latency and cost scale with loop length. Poor tool schemas or ambiguous prompts cause thrashing (repeated useless calls). The model may still misinterpret correct tool output if the task is underspecified.

#GPT-4o

#Claude 3.5

#Tool use

Conversational ReAct (ReSpAct)

Conversational ReAct (ReSpAct)

Reason + Speak + Act: talks to the user, asks for clarification, reports back, then acts. With the user in the loop.

LOOP: INPUT β†’ THINK β†’ (OPTIONALLY) SPEAK/ASK β†’ THINK β†’ ACT β†’ OBSERVE β†’ … β†’ FINAL RESULT

Pros: Fewer wrong-tool calls from ambiguous input. Better trust and transparency. Catches missing parameters early instead of failing inside a silent loop.

Cons: More round-trips and longer wall-clock time. Harder to automate in batch pipelines. Dialogue policy must avoid annoying or redundant questions.

When you might use two models

  • Router + worker: One small model routes (e.g. "needs clarification" vs "ready to act"), another does the main ReAct loop.
  • Specialized roles: One model for dialogue/clarification, another for heavy reasoning or tool use.

For most setups, a single general-purpose chat model (e.g. GPT-4o or Claude 3.5 Sonnet) is enough.

#GPT-4o

#Claude 3.5

#Dialogue

Re-Act Description

Re-Act Description

An agent that follows the ReAct pattern with an optional Describe step: it narrates planned or completed actions around tool use. Use it for transparency, UX, or audit logs; skip it when you want less latency or noiseβ€”the core loop stays Think β†’ Act β†’ Observe.

LOOP: INPUT β†’ THINK β†’ (OPTIONALLY) DESCRIBE (WHAT IT WILL DO OR HAS DONE) β†’ ACT β†’ OBSERVE β†’ … β†’ FINAL RESULT

Pros: Easier debugging and compliance-friendly traces. Can improve UX when tool payloads are noisy or binary.

Cons: Extra tokens and latency. Descriptions can drift from what tools actually did if not grounded in the same Observe payload. Not ideal for lowest-latency automation.

#GPT-4o

#Claude 3.5

#Describe

Multi-Action ReAct

Multi-Action ReAct

In one step the agent can output several tool calls (e.g. search + calculator), then observe all results. Same loop, but Act can be multiple actions per step. Modern chat models often expose parallel or batched tool use so independent lookups can run together and merge in Observe. When steps must be strictly ordered, keep a single Act or split across turnsβ€”parallel multi-action assumes no hidden dependency between those tools.

LOOP: INPUT β†’ THINK β†’ ACT (ONE OR MORE TOOL CALLS, E.G. SEARCH + CALCULATOR) β†’ OBSERVE (ALL RESULTS) β†’ THINK β†’ … β†’ FINAL ANSWER

Pros: Lower latency than serial one-tool-per-turn when dependencies allow. Fewer overall LLM calls for the same ground truth.

Cons: Risky when calls are not independent β€” ordering bugs or duplicate side effects. Larger Observe payloads can overflow context. Not all hosts expose true parallel execution.

#GPT-4o

#Claude 3.5

#Parallel tools

Re-Act + Reflection

Re-Act + Reflection

After an attempt, the agent reflects (critique, self-correction) and then retries with an updated strategy.

LOOP: INPUT β†’ REASON β†’ ACT β†’ OBSERVE β†’ REFLECT β†’ (MAYBE) REASON β†’ ACT β†’ … β†’ FINAL ANSWER

Pros: Catches mistakes that a single pass would ship. Pairs well with a dedicated critic model or stricter reflection prompt. Improves robustness on complex tasks.

Cons: At least one extra LLM call per cycle β€” higher cost and latency. Reflection can still miss root causes or over-correct. Needs clear stop conditions to avoid infinite retries.

Two implementation choices

  • One model: Same LLM handles Reason, Act, and Reflect. Simpler, usually enough. Use a critic-style prompt for the reflection step.
  • Two models: Separate model for reflection (critic). Use when reflection must catch subtle errors or tasks are reasoning-heavy. Stronger reasoning models (o1, DeepSeek-R1) excel as critics.

For most setups, a single capable model is sufficient. Add a specialized reflection model when quality of critique matters more.

#GPT-4o

#Claude 3.5

#Reflect

Re-Act + Memory

Re-Act + Memory

ReAct loop + long-term or episodic memory (store important data, reuse in later steps). Memory is explicit read/write on top of Observe: tool outputs are ephemeral unless you persist them, while memory holds facts, preferences, or state across turns. Implementations range from a vector store keyed by session to structured slots the model updates after each Observe.

LOOP: INPUT β†’ REASON β†’ ACT β†’ OBSERVE β†’ MEMORY READ/WRITE β†’ … β†’ FINAL RESULT

Pros: Stable behavior over long horizons. Can reduce duplicate API calls. Supports personalization when memory is scoped per user or session.

Cons: Stale or wrong memories poison future turns β€” needs versioning, TTL, or retrieval with citations. Extra storage and privacy review. Vector memory can retrieve irrelevant facts if queries are vague.

#GPT-4o

#Claude 3.5

#Memory

Re-Act with Planning

Re-Act with Planning

Plan first (e.g. high-level steps or subgoals), then run a ReAct loop within each step. Plan-and-execute with ReAct as the execution engine.

LOOP: INPUT β†’ PLAN β†’ THINK β†’ ACT β†’ OBSERVE β†’ … β†’ FINAL RESULT

Pros: Reduces aimless tool use. Easier to parallelize or hand off sub-steps. Plan can be shown to users for approval.

Cons: Brittle if the initial plan is wrong β€” may need replanning logic. Two-layer control adds complexity. Planner and executor can disagree on what β€œdone” means for a step.

One or two models

  • One model: Same LLM does Plan (decomposition) and Execute (ReAct per step). Plan is a different prompt; execution is the usual Think β†’ Act β†’ Observe loop.
  • Two models: Separate planner for high-level decomposition, executor for per-step ReAct. Use when planning is complex (e.g. o1 for planner, cheaper model for execution) or for cost optimization.

For most setups, a single capable model is sufficient. Add a specialized planner when tasks need strong decomposition or long-horizon planning.

#GPT-4o

#Claude 3.5

#Plan

CoT + Re-Act

CoT + Re-Act

Chain-of-thought with detailed, step-by-step reasoning in text.

LOOP: INPUT β†’ THINK (COT) β†’ ACT β†’ OBSERVE β†’ THINK (COT) β†’ … β†’ FINAL RESULT

Pros: Often higher accuracy on structured problems. Reasoning traces help humans trust or audit the path to an action. Works with a single model β€” no extra critic required.

Cons: Verbose traces increase tokens and latency. CoT can still be wrong while sounding confident. Some deployments prefer not to store raw chain-of-thought for privacy.

One model

  • CoT (chain-of-thought) is produced by the same LLM that runs the ReAct loop. The Think step is prompted for detailed step-by-step reasoning; the model outputs both reasoning and tool calls in the same flow.
  • No need for two models. When CoT quality matters (complex logic, math, multi-step tasks), choose models that excel at structured reasoning.

Reasoning models (o1, DeepSeek-R1) and strong general models (GPT-4o) handle CoT well. Use the Best for CoT list when step-by-step reasoning is central.

#GPT-4o

#Claude 3.5

#CoT

Re-Act + Learning

Re-Act + Learning

Agent updates its policy (or a retrievable knowledge store) from experience: RL from rewards, fine-tuning from feedback, or storing corrected strategies for reuse.

LOOP: INPUT β†’ THINK β†’ ACT β†’ OBSERVE β†’ (IF FEEDBACK/REWARD) β†’ UPDATE β†’ … β†’ FINAL RESULT

Pros: System improves without hand-editing every prompt. Experience replay or stored strategies is cheaper than full RL for many teams.

Cons: Risk of learning the wrong pattern from noisy feedback. Fine-tuning and RL need data hygiene, eval harnesses, and rollback. β€œLearning” without guardrails can amplify biases or unsafe shortcuts.

Three learning modes

  • Storing strategies: General chat models. Corrected strategies are stored and retrieved (like ReAct + Memory). No model update.
  • Fine-tuning: Models that support fine-tuning (e.g. GPT-4o-mini, Qwen). Requires feedback data to update weights.
  • RL from rewards: Smaller open models (Qwen 7B, DeepSeek-Coder). RL is costly; smaller models are more practical for training.

Model choice depends on which learning mode you implement. Storing is simplest; fine-tuning and RL need compatible model infrastructure.

#GPT-4o

#Claude 3.5

#Learn

Conclusion

You rarely need every loop pattern at once. Start with a minimal Re-Act loop and clear tool contracts, then add dialogue, reflection, or memory when user trust, accuracy, or long sessions demand it. For how those pieces fit into sense, think, act, observe, and finish, see Anatomy of an AI agent. For common use-case stacks (how to combine these loops with RAG, memory, tools), see Production Agent-RAG Architectures.