What is an AI Agent?

Date: 01.29.2025

An agent is an autonomous system that uses an LLM with tool calling to interact with external systems (databases, APIs, file systems) to perform actions, not just generate text.

The LLM is the reasoning engine: it decides what to do, which tools to call, and how to interpret the results. Unlike a simple chatbot that only produces text, an agent can search the web, query a database, run code, or trigger other APIs to accomplish a task.

Re-Act

Reasoning + acting: producing reasoning steps (what to do next and why) and actions (tool calling), then observe (the value returned from the tool) and continues until the task is done.

Loop: thinks/reason → act → observe → ... → result

#GPT-4o

#Claude 3.5

#Tool use

Conversational ReAct (ReSpAct)

Reason + Speak + Act: talks to the user, asks for clarification, reports back, then acts. With the user in the loop.

Loop: input → think → (optionally) Speak/ask → think → act → observe → … → final result

When you might use two models

Router + worker: One small model routes (e.g. "needs clarification" vs "ready to act"), another does the main ReAct loop.
Specialized roles: One model for dialogue/clarification, another for heavy reasoning or tool use.

For most setups, a single general-purpose chat model (e.g. GPT-4o or Claude 3.5 Sonnet) is enough.

#GPT-4o

#Claude 3.5

#Dialogue

Re-Act Description

An agent that follows ReAct pattern and describes what it’s doing

Loop: Input → Think → (optionally) Describe (what it will do or has done) → Act → Observe → … → final result

#GPT-4o

#Claude 3.5

#Describe

Multi-Action ReAct

In one step the agent can output several tool calls (e.g. search + calculator), then observe all results. Same loop, but Act can be multiple actions per step.

Loop: Input → Think → Act (one or more tool calls, e.g. search + calculator) → Observe (all results) → Think → … → Final Answer

#GPT-4o

#Claude 3.5

#Parallel tools

Re-Act + Reflection

After an attempt, the agent reflects (critique, self-correction) and then retries with an updated strategy.

Loop: Input → Reason → Act → Observe → Reflect → (maybe) Reason → Act → … → final answer

Two implementation choices

One model: Same LLM handles Reason, Act, and Reflect. Simpler, usually enough. Use a critic-style prompt for the reflection step.
Two models: Separate model for reflection (critic). Use when reflection must catch subtle errors or tasks are reasoning-heavy. Stronger reasoning models (o1, DeepSeek-R1) excel as critics.

For most setups, a single capable model is sufficient. Add a specialized reflection model when quality of critique matters more.

#GPT-4o

#Claude 3.5

#Reflect

Re-Act + Memory

ReAct loop + long-term or episodic memory (store important data, reuse in later steps).

Loop: Input → Reason → Act → Observe → Memory read/write → … → final result

#GPT-4o

#Claude 3.5

#Memory

Re-Act with Planning

Plan first (e.g. high-level steps or subgoals), then run a ReAct loop within each step. Plan-and-execute with ReAct as the execution engine.

Loop: Input → Plan → Think → Act → Observe → … → final result

One or two models

One model: Same LLM does Plan (decomposition) and Execute (ReAct per step). Plan is a different prompt; execution is the usual Think → Act → Observe loop.
Two models: Separate planner for high-level decomposition, executor for per-step ReAct. Use when planning is complex (e.g. o1 for planner, cheaper model for execution) or for cost optimization.

For most setups, a single capable model is sufficient. Add a specialized planner when tasks need strong decomposition or long-horizon planning.

#GPT-4o

#Claude 3.5

#Plan

CoT + Re-Act

Chain-of-thought with detailed, step-by-step reasoning in text.

Loop: Input → Think(CoT) → Act → Observe → Think(CoT) → … → final result

One model

CoT (chain-of-thought) is produced by the same LLM that runs the ReAct loop. The Think step is prompted for detailed step-by-step reasoning; the model outputs both reasoning and tool calls in the same flow.
No need for two models. When CoT quality matters (complex logic, math, multi-step tasks), choose models that excel at structured reasoning.

Reasoning models (o1, DeepSeek-R1) and strong general models (GPT-4o) handle CoT well. Use the Best for CoT list when step-by-step reasoning is central.

#GPT-4o

#Claude 3.5

#CoT

Re-Act + Learning

Agent updates its policy (or a retrievable knowledge store) from experience: RL from rewards, fine-tuning from feedback, or storing corrected strategies for reuse.

Loop: Input → Think → Act → Observe → (if feedback/reward) → Update → … → final result

Three learning modes

Storing strategies: General chat models. Corrected strategies are stored and retrieved (like ReAct + Memory). No model update.
Fine-tuning: Models that support fine-tuning (e.g. GPT-4o-mini, Qwen). Requires feedback data to update weights.
RL from rewards: Smaller open models (Qwen 7B, DeepSeek-Coder). RL is costly; smaller models are more practical for training.

Model choice depends on which learning mode you implement. Storing is simplest; fine-tuning and RL need compatible model infrastructure.

#GPT-4o

#Claude 3.5

#Learn