AIΒ agentΒ loopΒ patterns

AgentΒ loopΒ patterns
An agent is an autonomous system that uses an LLM with tool calling to interact with external systems (databases, APIs, file systems) to perform actions, not just generate text.
This article is about agent loop patterns: the repeating cycles of reasoning, tool use, and observation (the ReAct family and close relatives). The LLM is the reasoning engine β it decides what to do, which tools to call, and how to read tool output. Unlike a chatbot that only emits text, these loops let an agent search, query, run code, or call APIs until a task completes.
Below are nine common ReAct-style loop patterns and extensions β from plain thinkβactβ observe to dialogue, parallel tools, reflection, memory, planning, chain-of-thought, and learning. Pros and cons sit under each card; Python sketches and model tables are collapsible.

Re-Act
Reasoning + acting: producing reasoning steps (what to do next and why) and actions (tool calling), then observe (the value returned from the tool) and continues until the task is done. Each observe step is the only place the model sees real tool output, so the loop is grounded in facts rather than pure speculation. The run ends when the model stops requesting tools and returns a final answer (or hands off explicitly).
LOOP: THINK / REASON β ACT β OBSERVE β β¦ β RESULT
Pros: Clear audit trail (think β act β observe). Tool results constrain hallucination on factual questions. Works with most chat models that support function calling. Easy to cap steps or tools for safety.
Cons: Latency and cost scale with loop length. Poor tool schemas or ambiguous prompts cause thrashing (repeated useless calls). The model may still misinterpret correct tool output if the task is underspecified.

Conversational ReAct (ReSpAct)
Reason + Speak + Act: talks to the user, asks for clarification, reports back, then acts. With the user in the loop.
LOOP: INPUT β THINK β (OPTIONALLY) SPEAK/ASK β THINK β ACT β OBSERVE β β¦ β FINAL RESULT
Pros: Fewer wrong-tool calls from ambiguous input. Better trust and transparency. Catches missing parameters early instead of failing inside a silent loop.
Cons: More round-trips and longer wall-clock time. Harder to automate in batch pipelines. Dialogue policy must avoid annoying or redundant questions.

Re-Act Description
An agent that follows the ReAct pattern with an optional Describe step: it narrates planned or completed actions around tool use. Use it for transparency, UX, or audit logs; skip it when you want less latency or noiseβthe core loop stays Think β Act β Observe.
LOOP: INPUT β THINK β (OPTIONALLY) DESCRIBE (WHAT IT WILL DO OR HAS DONE) β ACT β OBSERVE β β¦ β FINAL RESULT
Pros: Easier debugging and compliance-friendly traces. Can improve UX when tool payloads are noisy or binary.
Cons: Extra tokens and latency. Descriptions can drift from what tools actually did if not grounded in the same Observe payload. Not ideal for lowest-latency automation.

Multi-Action ReAct
In one step the agent can output several tool calls (e.g. search + calculator), then observe all results. Same loop, but Act can be multiple actions per step. Modern chat models often expose parallel or batched tool use so independent lookups can run together and merge in Observe. When steps must be strictly ordered, keep a single Act or split across turnsβparallel multi-action assumes no hidden dependency between those tools.
LOOP: INPUT β THINK β ACT (ONE OR MORE TOOL CALLS, E.G. SEARCH + CALCULATOR) β OBSERVE (ALL RESULTS) β THINK β β¦ β FINAL ANSWER
Pros: Lower latency than serial one-tool-per-turn when dependencies allow. Fewer overall LLM calls for the same ground truth.
Cons: Risky when calls are not independent β ordering bugs or duplicate side effects. Larger Observe payloads can overflow context. Not all hosts expose true parallel execution.

Re-Act + Reflection
After an attempt, the agent reflects (critique, self-correction) and then retries with an updated strategy.
LOOP: INPUT β REASON β ACT β OBSERVE β REFLECT β (MAYBE) REASON β ACT β β¦ β FINAL ANSWER
Pros: Catches mistakes that a single pass would ship. Pairs well with a dedicated critic model or stricter reflection prompt. Improves robustness on complex tasks.
Cons: At least one extra LLM call per cycle β higher cost and latency. Reflection can still miss root causes or over-correct. Needs clear stop conditions to avoid infinite retries.

Re-Act + Memory
ReAct loop + long-term or episodic memory (store important data, reuse in later steps). Memory is explicit read/write on top of Observe: tool outputs are ephemeral unless you persist them, while memory holds facts, preferences, or state across turns. Implementations range from a vector store keyed by session to structured slots the model updates after each Observe.
LOOP: INPUT β REASON β ACT β OBSERVE β MEMORY READ/WRITE β β¦ β FINAL RESULT
Pros: Stable behavior over long horizons. Can reduce duplicate API calls. Supports personalization when memory is scoped per user or session.
Cons: Stale or wrong memories poison future turns β needs versioning, TTL, or retrieval with citations. Extra storage and privacy review. Vector memory can retrieve irrelevant facts if queries are vague.

Re-Act with Planning
Plan first (e.g. high-level steps or subgoals), then run a ReAct loop within each step. Plan-and-execute with ReAct as the execution engine.
LOOP: INPUT β PLAN β THINK β ACT β OBSERVE β β¦ β FINAL RESULT
Pros: Reduces aimless tool use. Easier to parallelize or hand off sub-steps. Plan can be shown to users for approval.
Cons: Brittle if the initial plan is wrong β may need replanning logic. Two-layer control adds complexity. Planner and executor can disagree on what βdoneβ means for a step.

CoT + Re-Act
Chain-of-thought with detailed, step-by-step reasoning in text.
LOOP: INPUT β THINK (COT) β ACT β OBSERVE β THINK (COT) β β¦ β FINAL RESULT
Pros: Often higher accuracy on structured problems. Reasoning traces help humans trust or audit the path to an action. Works with a single model β no extra critic required.
Cons: Verbose traces increase tokens and latency. CoT can still be wrong while sounding confident. Some deployments prefer not to store raw chain-of-thought for privacy.

Re-Act + Learning
Agent updates its policy (or a retrievable knowledge store) from experience: RL from rewards, fine-tuning from feedback, or storing corrected strategies for reuse.
LOOP: INPUT β THINK β ACT β OBSERVE β (IF FEEDBACK/REWARD) β UPDATE β β¦ β FINAL RESULT
Pros: System improves without hand-editing every prompt. Experience replay or stored strategies is cheaper than full RL for many teams.
Cons: Risk of learning the wrong pattern from noisy feedback. Fine-tuning and RL need data hygiene, eval harnesses, and rollback. βLearningβ without guardrails can amplify biases or unsafe shortcuts.
Conclusion
You rarely need every loop pattern at once. Start with a minimal Re-Act loop and clear tool contracts, then add dialogue, reflection, or memory when user trust, accuracy, or long sessions demand it. For how those pieces fit into sense, think, act, observe, and finish, see Anatomy of an AI agent. For common use-case stacks (how to combine these loops with RAG, memory, tools), see Production Agent-RAG Architectures.