Anatomy of an AI agent

Introduction
This page names the parts most LLM agents share: what comes in (Sense), how decisions are made (Think / reason), what happens in the world (Act), what comes back (Observe), and how a run ends (Finish). On top of that you can add optional pieces — Evaluate, Memory, Description, Plan, chain-of-thought, Ask — and use RAG to ground answers in your data.
Sense
Sense is every input the agent receives: not only chat text, but also sensors, webhooks, queues, and other events.
| Category | Examples |
|---|---|
| Text | Keyboard input, chat messages, prompts, documents, search queries |
| Sensor | Camera, microphone, IoT devices, GPS, temperature, motion |
| API / events | Webhooks, HTTP requests, DB triggers, queue messages, calendar events, notifications |
Anything that gives the agent information flows through Sense. A minimal mental model: [User] → [Sense] → [Agent].
Think / reason
Think is how the agent chooses what to do next — rules, an LLM, or both.
- 🔥 Deterministic: if-then rules, decision trees, state machines — same input, same output; no LLM.
- 🔥 LLM-based: the model reasons (chain-of-thought, planning, ReAct-style thoughts). May be non-deterministic when temperature > 0.
- 🔥 Hybrid: deterministic routing or safety gates, LLM for open-ended reasoning (e.g. if the user asks X → call tool A; else let the LLM decide).
Act
Act is what the agent does with a decision: generate text, speech, or alerts; or invoke tools — read/write databases, call HTTP APIs, control systems.
Observe
Observe is what comes back after an action: tool output, HTTP response, query result. The agent can loop back to Think with that observation or move toward Finish.
Finish
Finish is when the agent stops the loop and returns a final result. It is not really another pluggable component like a tool or a memory store — it is the termination condition your runtime applies on top of the loop.
Typical reasons a run ends:
- 🔥 Task done: goal met, final answer or artifact returned.
- 🔥 Max steps: step or token budget hit — safety and cost caps.
- 🔥 Error: tool or environment failure that you treat as unrecoverable in this session.
- 🔥 User stop: cancel, timeout, or leaving the session.
- 🔥 Give up: the model decides it cannot complete the task and exits explicitly instead of looping forever.
Evaluate
Evaluate is the umbrella for feedback that improves behavior over time. RLHF (reinforcement learning from human feedback) is one training-time example. Reflect is one runtime mechanism: the agent critiques its own attempt and retries.
| Evaluate | Reflect | |
|---|---|---|
| Scope | Broad feedback loop | Self-critique in the loop |
| When | Training-time (RLHF) or runtime | Runtime only |
| Who | Humans, environment, or agent | Agent (or a critic) |
Evaluate can take several forms:
- 🔥 Human feedback → reward model → fine-tuning
- 🔥 Self-evaluation: reflection, critic, Constitutional AI
- 🔥 Environment feedback: task success/failure, scores → RL-style rewards
- 🔥 Preference learning: DPO and similar methods on preferred vs non-preferred outputs
A minimal reflection trace: the agent looks at actions and observations, writes a short critique, then may start a new Reason → Act → Observe cycle. Example: Reason — search for a train time; Act — search({"q": "Paris"}); Observe — too many results, not specific enough.
For reflection-heavy runs, models such as Claude 3.5 Sonnet and GPT-4o are common; o1 / o3 / R1 for harder critique; fine-tuning and RLHF need models that support those training modes; lightweight critique can use something like GPT-4o-mini.
Memory
Memory is how the agent stores and reuses information across turns or sessions — not the raw context window alone, but anything you persist and pull back into the next prompt or tool plan.
| Type | What it stores |
|---|---|
| Episodic | Past events, interactions, and tool results from this session (short-lived working state). |
| Long-term | Facts, preferences, and learnings you keep across sessions (profiles, playbooks, vector memory). |
Description
Description is when the agent emits user-visible text about what it is doing or did — status and transparency. Common in conversational patterns; many agents skip it when latency matters.
Plan
Plan sits inside Think / reasoning — it is a different way to reason: instead of picking the next tool immediately, the LLM first returns a short list of steps that decompose the task.
Example outline the model might emit:
- Search for London–Paris train duration.
- Compute cost at 50€/h.
- Return both answers to the user.
Execute comes next: for each planned step you run the usual Think → Act → Observe loop (ReAct-style). Plan-and-execute uses that outline as the map and the per-step loop as the engine. Often the outline is produced by one LLM call at the start; replanning can add more calls later if the task drifts.
Chain of thought
Chain of thought (CoT) is also part of Think: the model writes intermediate reasoning in text before choosing a tool — e.g. “get duration first, then multiply by cost,” then search(...).
Description, plan, and CoT compared
Description, plan, and chain of thought all show up as text, but they answer different questions. Description is aimed at people (or audit logs): “I will search the pricing API next.” Plan is a task outline the runtime can follow step by step. CoT is working reasoning inside one Think turn to pick the next action — often verbose, sometimes hidden from the user.
| Primary audience | What it encodes | |
|---|---|---|
| Description | User, operator, or log reader | What the agent is doing or did — transparency and status, not the full reasoning graph. |
| Plan | Executor (your loop / planner) | Ordered subgoals for the whole task — a map before or between execution phases. |
| Chain of thought | The model (and optionally you, if you expose it) | Intermediate rationale for the current decision — which tool, which argument, why. |
You can mix them: a plan for the journey, CoT at each stoplight, optional description at each leg so users see progress without reading every hidden thought. Skipping description saves latency; skipping CoT often hurts accuracy on hard tools; a bad plan wastes effort until you replan.
Ask
Ask is when the model outputs a question to the user; the user’s reply is appended to history and the run continues. Same idea as conversational ReAct: user-in-the-loop clarification.
Grounding with RAG
Retrieval-augmented generation (RAG) does not replace Sense or Act by itself — it supplies grounded passages so the model’s answers and tool use can align with your documents instead of hallucinating facts. For more on wiring retrieval into agents, see RAG architectures.
Anatomy of AI Agents: Inside LLMs, RAG Systems, & Generative AI (video)
Reflection
Reflection in the narrow sense: a process where the agent or a separate critic reviews output or plan and revises before the next action or answer. A Reflexion-style actor pairs a reasoning actor with memory and a critic so failed attempts produce critiques and updated strategies across tries — see also Reflexion actor.
Conclusion
Agent frameworks differ by vendor, but the same skeleton shows up everywhere: bound inputs, a reasoning step, actions with observable results, and explicit stop conditions. Optional pieces — memory, evaluation, narration, planning, user questions, RAG — are where products diverge: latency, safety, and traceability trade off against each other. Naming these parts clearly makes it easier to design prompts, choose models, and debug runs when something loops too long or goes off the rails. For ReAct-style agent loop patterns (think–act–observe and extensions), see Agent loop patterns.