AI Architectures

Date: 25.04.2025

AI Architectures

Basic Reflection

Basic Reflection is a lightweight yet powerful agentic AI pattern where a language model (LLM) critiques its own outputs and attempts to improve them. Instead of relying solely on a one-shot response, the LLM acts as both a problem-solver and a reviewer.

At its core, the idea is simple: Let the LLM think after it has answered, and then let it try again with that feedback in mind.

This mirrors how humans often work: we give an initial answer, step back to evaluate it, and then refine it based on what we notice was missing or unclear.

How It Works


[Input Query] --> [Initial LLM Response] --> [Self-Evaluation] --> [Retry/Refine] --> [Improved Answer]

Step-by-Step:

Initial Response - The LLM generates a normal answer based on the input question.
Self-Evaluation - The LLM reviews its own output with a critical prompt like: “Was this answer helpful, specific, and accurate? What can be improved?”
Reflection-Based Rewrite - A second prompt takes the critique and re-generates a better answer — ideally more focused, detailed, or relevant.

Implementation Patterns:

Reflection can be implemented in different ways:

Single-turn Reflection: The LLM generates an answer, then evaluates it, and then re-generates a better answer.
Multi-turn Reflection: The LLM generates an answer, then evaluates it, and then re-generates a better answer. This process is repeated multiple times, allowing the LLM to refine its responses over time.
Chain-of-thought with reflection: The LLM first "thinks out loud," then critiques its own reasoning. This approach can be useful for complex or nuanced responses.

Use Cases for Basic Reflection:

This architecture is ideal when you want better quality responses without complex orchestration. Some common applications:

✍️ Writing & Summarization: Reflect on tone, clarity, or factual quality. E.g., “Rewrite the summary to include more technical detail.”
🧪 QA & Tutoring Systems: Improve step-by-step answers to math or logic questions. Spot inconsistencies in answers before returning them to the user.
🛠️ Code Generation: Reflect on whether the generated code solves the problem correctly. Retry if the logic or syntax seems off.
🤖 Autonomous Agents: Lightweight alternative to planning/execution architectures. Useful when external tool usage isn’t needed, but quality still matters.

Benefits:

Simple to implement: Often just 2–3 prompts.
No tools or plugins required.
Higher quality: Encourages second-order thinking.
Modular: Can be inserted into any chain or agent easily.

Limitations:

No real environment feedback: It reflects internally, without knowing whether the original answer actually "worked."
May hallucinate critiques: Reflection is still powered by the same LLM, so it can misjudge itself.
Requires a critical thinking component.
Overhead: Adds more tokens and latency.

Reflexion Actor

Reflexion Actor is an extension of the basic reflection pattern where the AI actively monitors and improves its own decision-making over time — across multiple steps and tasks. Unlike Basic Reflection, which is usually a one-off quality boost, Reflexion Actors maintain memory, evaluate past attempts, and re-plan dynamically.

The concept was popularized by the 2023 paper Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al.), where LLMs learn to improve performance through self-generated feedback and retry loops.

A Reflexion Actor doesn't just reflect — it learns from the past, evaluates outcomes, and adjusts its future strategy.

How It Works

This architecture introduces long-term memory and dynamic replanning into the reflection loop. It goes something like this:


[Agent Run] --> [Observe Outcome] --> [Reflection & Critique] --> [Update Memory + Replan] --> [Next Action]
                          ^                                                      |
                          +------------------------------------------------------+

Step-by-Step:

Initial Attempt The agent performs a task using standard LLM reasoning.
Observation It receives feedback or inspects the result (e.g. failed test, dead-end).
Reflection The model critiques why the attempt failed and what could be improved.
Memory Update It saves lessons learned and modifies its strategy.
Replanning The agent re-generates the next best step based on updated context and memory.

Implementation Patterns:

In a typical LangChain or custom pipeline, a Reflexion Actor includes:

A memory store (vector DB, in-memory array, or persistent storage)
A reflection prompt to analyze failures
A planner module to re-calculate actions
Optional reward estimation (verbal reinforcement)

Use Cases for Reflexion Actors:

This architecture shines in long-horizon, multi-step tasks where learning from experience improves performance.

Puzzle Solving & Reasoning Tasks - If one path fails, the actor reflects, avoids similar mistakes, and retries. Great for logic puzzles, math problems, or strategy games.
Document & Web Scraping Agents - Can handle navigation errors, broken links, and missing info by adapting in future runs.
Skill-Building Agents - AI learns to code better by reflecting on errors in generated code over time. Reflexion helps move from "try and fail" → "fail, reflect, retry smarter."
Embodied Agents (Minecraft / Robotics) - In agents like Voyager, reflexion enables agents to explore, fail, reflect, and grow autonomously in open environments.

Benefits:

Meta-cognitive loop: Makes agents more intelligent over time.
Adaptability: The agent improves based on outcomes and reflection.
Emergent learning: Mimics how humans learn from experience, not just rules.

Limitations:

More complex orchestration: Requires memory and planning modules.
Can reflect poorly: Still depends on the LLM's internal consistency and truthfulness.
Slower performance: Each loop adds overhead, making this less ideal for quick one-off tasks.

Language Agent Tree Search

Language Agent Tree Search is an architecture where an LLM agent generates multiple possible reasoning paths and explores them like a decision tree. Instead of committing to a single chain of thoughts, the model explores several options in parallel — and chooses the most promising path through simulation, scoring, or evaluation.

This idea was formalized in the Tree of Thoughts (ToT) paper (Yao et al., 2023), which showed that giving LLMs the ability to "think in trees" rather than "chains" can significantly improve complex problem-solving.

Just like humans brainstorm multiple ideas before picking one, Tree Search allows LLMs to simulate multiple paths before acting.

How It Works:

The architecture simulates a search tree, where:

Each node represents a partial reasoning step.
Branches represent different continuations.
The agent can evaluate and prune bad paths.
It either picks the best path or continues exploring deeper.


[Start]
  ├─> [Thought A]
  │     ├─> [Thought B]
  │     │     └─> [Thought B1] ──> [Best Path]
  └─> [Thought C]

Implementation Patterns:

Tree Search is generally implemented with:

A generation loop (expanding reasoning options at each level)
A scoring or voting function (to evaluate the value of each path)
A search strategy, like:
- Depth-limited search
- Beam search
- Breadth-first with pruning

LangChain, for instance, lets you build this with:

Custom prompt templates per tree depth
A scoring chain that evaluates each branch
Memory to track visited paths

Use Cases for Tree Search in LLMs:

This architecture is particularly effective for tasks that require long-term reasoning, exploration, and hypothesis testing.

Math Word Problems ➤ Explore multiple reasoning strategies (e.g., unit conversion → formula application → arithmetic).
Logical Puzzles / Sudoku ➤ Try different cell placements, prune invalid paths early.
Creative Writing ➤ Generate multiple plot outlines, choose the most coherent.
Planning & Decision-Making ➤ Simulate multiple agent strategies before choosing one (e.g. in business decision agents or games).

Benefits:

Parallel reasoning: Explores more paths than greedy decoding.
Better solutions: Can catch errors or missed opportunities early.
Search-time optimization: Evaluates before committing.

Limitations:

Computationally expensive: Branching paths grow fast.
Requires good scoring: Poor evaluation leads to bad pruning.
Harder to scale: Requires orchestration if paths diverge too much.

Plan-And-Execute

Plan-And-Execute is a powerful agentic AI architecture that splits the cognitive workload of an AI agent into two distinct phases:

Planning what needs to be done.
Executing those steps one at a time.

This architecture became popular through implementations like LangChain's AgentExecutor, AutoGPT, and BabyAGI, where the goal is to make LLMs capable of long-term, complex task execution — like writing code, planning an event, or solving real-world problems.

In short, instead of trying to do everything in one go, the LLM thinks: “Let me break this down into steps and then act step-by-step.”

How It Works:

This architecture separates the agent’s brain into:

A Planner: Generates a roadmap (what to do and in what order).
An Executor: Carries out each step, using tools or LLM calls.


[Prompt/Goal]
     |
     v
[Planner] --> [Step 1, Step 2, Step 3...]
     |
     v
[Executor] --> [Run Step 1] --> [Run Step 2] --> [Run Step 3] --> [Final Result]

You can think of it like a project manager (Planner) and a doer (Executor).

Implementation Patterns:

This is often implemented with two LLM chains:

Planning Chain ➤ Generates a list of tasks, goals, or subtasks from the main prompt. e.g. “To build a website, first choose a stack, then set up hosting…”
Execution Chain ➤ Handles each task sequentially (or conditionally), possibly using tools like:

API calls
Web search
Code generation
File I/O

Use Cases for Plan-And-Execute:

This pattern is ideal for tasks that are too complex for a single LLM call. Common use cases:

Code Generation ➤ “Build a full-stack app.” → plan architecture, write backend, then frontend.
Business Workflows ➤ “Create a product launch strategy.” → plan, write press release, draft campaign.
Agent Loops ➤ Used in BabyAGI: plan tasks → prioritize → execute → update memory → replan.
Research Assistants ➤ Break down research goals into sub-queries, summarize results, and write reports.

Benefits:

Modularity: Easier to debug or swap out Planner/Executor logic.
Scalable: Handles complex, long-horizon reasoning.
Tool-friendly: Works well with plugins and tool integrations.

Limitations:

Requires orchestration: More moving parts.
Brittle if misaligned: If the planner generates bad tasks, the executor can’t recover.
Slower: More LLM calls and tool usage can increase latency.

Reasoning without Observation

Reasoning Without Observation is an AI architecture pattern where the LLM solves problems entirely through internal thinking, without interacting with external tools, without searching the web, and without getting real-world feedback.

Instead of gathering additional information from the environment, the model relies purely on its internal knowledge and logical reasoning to produce an answer.

Imagine solving a math puzzle on a piece of paper — without asking anyone, without Googling, just reasoning based on what you already know. That's Reasoning without Observation for AI.

How It Works:

The model moves straight from question to solution without any side branches or external verifications.


[Problem/Prompt] --> [Internal Reasoning in LLM] --> [Solution/Answer]

There are no tool calls, no APIs, no database queries — only chain-of-thought inside the model itself.

Implementation Patterns

This is the simplest form of agentic behavior:

Prompt the LLM with a well-designed question.
Encourage it to "think step-by-step" internally.
Let it generate the final answer without help.

Common techniques:

Chain-of-Thought prompting: "Let's reason step-by-step."
Self-Ask: The model asks itself sub-questions internally.
Few-shot examples: Showing the model how to reason about similar problems.

Use Cases for Reasoning Without Observation

This method works best when the task does not require fresh, external information and can be solved by pure cognitive effort.

Logic and Math Problems ➤ Solving equations, proving theorems, or logical deduction puzzles.
Trivia and General Knowledge ➤ Questions where the information is common knowledge or already "inside" the model's training.
Test-Taking AI ➤ Exams like SAT, GRE, or IQ tests where questions are self-contained.
Creative Tasks ➤ Story generation, poetry, or philosophical questions — where creativity, not fact-checking, is key.

Benefits

Fast: No API calls or tool orchestration.
Low cost: Pure LLM inference without external resource usage.
Simplicity: Ideal when external feedback would add unnecessary complexity.

Limitations

Outdated knowledge: The model’s information is limited to its training data cutoff.
Hallucinations: Without real-world checks, the model might confidently generate wrong answers.
No correction loop: If the initial reasoning is wrong, there’s no opportunity to correct it through feedback.

LLM Compiler

An LLM Compiler is an architecture where a large language model (LLM) is used to transform natural language instructions into structured, executable outputs — like code, SQL queries, API calls, or even workflow plans.

The term "compiler" is inspired by traditional programming, where a compiler translates human-readable source code into machine-readable instructions.

Here, the LLM acts like a semantic compiler: it takes a human prompt and compiles it into something a machine (or another agent) can execute.

Instead of just generating text, the LLM compiles the user’s intention into action-ready outputs.

How It Works

The core idea is:

Accept natural language as input.
Internally parse, understand, and structure the request.
Output executable artifacts like code, API requests, or logic flows.


[Natural Language Prompt] --> [LLM Compiler] --> [Structured Output (Code/API/Plan)]

In advanced setups, you might also have:

Syntax checking: Validate generated code.
Post-processing: Execute or integrate the compiled output automatically.

Implementation Patterns

Common strategies include:

Prompt Engineering: Use templates like "Generate a Python function that does X."
Chain-of-Thought Generation: Let the LLM think step-by-step before generating the final code.
Validation Layers: Auto-check output (e.g., parse JSON or compile code snippets).
Self-Correction Loops: If code fails, let the LLM reflect and retry.

Examples of LLM Compiler systems:

OpenDevin: Natural language → DevOps commands.
Gorilla LLM: Natural language → API calls.
ReAct + Toolformer: Extend generation by inserting API calls automatically.

Use Cases for LLM Compilers

LLM Compilers unlock tons of real-world applications:

Code Generation ➤ Turn natural language prompts into Python, JavaScript, SQL, etc.
API Composition ➤ User: "Get me weather data for Paris." → Compile into API call to a weather service.
Database Query Builders ➤ User: "Find all users who signed up last month." → Output valid SQL query.
Automation Workflows ➤ Convert a "todo" list into an executable workflow (e.g., Zapier integration).

Benefits

Massively increases accessibility: Anyone can "program" with natural language.
Speeds up development: AI can scaffold projects, code, scripts instantly.
Versatile: Works across domains (data, APIs, UI generation, DevOps).

Benefits

Correctness issues: Generated outputs might look right but contain subtle bugs.
Security concerns: Need validation layers to avoid code injection risks.
Context dependency: If the model misunderstands the prompt, output will be garbage ("garbage in, garbage out").

Others

Pattern / Architecture	Core Idea
Chain of Thought (CoT)	Simply prepend “Let’s think step by step” (or variants) to induce internal reasoning.
Self-Consistency	Sample multiple CoT reasoning paths, then vote or aggregate their conclusions.
ReAct	Combine reasoning (the “Thought:” steps) with actions (e.g. tool calls) interleaved.
Tree of Thoughts (ToT)	Explore multiple branches of hypothetical reasoning in a tree search, then backtrack.
Language Agent / MRKL	Use a planner + a suite of specialized skills or tools connected via a “router” or call graph.
Graph-of-Thoughts	Represent intermediate ideas explicitly as nodes in a graph and traverse them.
Self-Ask with Search	recursively ask sub-questions and fetch external facts.
Least-to-Most Prompting	decompose a hard problem into simpler “sub-questions” of increasing difficulty.
Tree-of-Thoughts Variants	e.g. breadth-first vs depth-first, heuristic-guided search.
Verifier & Repair	generate an answer, then run a “verifier” prompt to check and patch errors.
Tool-Augmented Chain of Thought	extend CoT by interleaving calls to external APIs (not just reasoning).

The field is still evolving. New hybrids and refinements—combining these patterns in novel ways—are published every few months. If you’re surveying the space, you might categorize them under:

🔥 Prompt-only: CoT, self-consistency, least-to-most, reflection.
🔥 Planner/Executor: Plan-and-execute, MRKL/Language-Agent, ReAct.
🔥 Search-based: Tree of Thoughts, Graph of Thoughts, prompt tree search.
🔥 Code-oriented: LLM Compiler, generate-and-run code, verify/repair loops.