Generative AIBuilding with foundation modelsBuilding agentsIntroduction to agents

Agent architectures

12 minutes read

You are already familiar with modern LLM-powered agents and the tasks they can assist us with. When designing these agents, various architectural design patterns dictate their operation and decision-making processes. From simple, immediate-response systems to complex, goal-oriented frameworks. While we provide a conceptual discussion, understanding these foundational patterns is crucial. They provide the core blueprints for building agents that can solve complex, multi-step problems well.

ReAct agents

The ReAct (Reason + Act) framework combines thinking (reasoning) with acting (tool calls, environmental interactions) to tackle complex tasks. ReAct agents switch between thinking, acting, and observing in a loop until they complete the task. First, the agent reasons about the problem. Then, it acts by calling a tool. Finally, it observes the result to improve its next thoughts and actions in a loop until the task is complete:

The ReAct loop

This iterative process makes ReAct a popular choice; it is simple and works well in dynamic environments. The ability to observe the results of an action allows the agent to adapt its reasoning, leading to more accurate and up-to-date responses. Furthermore, the explicit thought process provides transparency, which is beneficial for debugging and understanding the agent's decision-making process.

Unfortunately, this iterative nature of ReAct agents can lead to inefficiencies and higher costs due to the numerous interactions with the LLM. After each action, the model must be invoked again to determine the next step. There's also a risk that the agent gets stuck in a loop or strays from the initial goal if it misinterprets an observation. Without proper safeguards, this can result in hallucinations or a drift from the original task.

Planner-executor

The planner-executor paradigm addresses the iterative inefficiencies of frameworks like ReAct. It separates strategic planning from tactical execution—decoupling the "thinking" from the "doing." Unlike ReAct, this architecture usually needs only one main call for the initial plan, with just occasional calls for re-planning. This approach greatly reduces the total number of calls to an LLM:

Planner-executor architecture.

When it receives an initial goal, the planner generates a full, multi-step plan before taking any action. It uses the available tools and knowledge to outline the complete sequence of actions needed to reach the goal. This process usually involves a single, large LLM call to set the overall strategy. For this step, you can use a more capable LLM.

Here's a section of a report from Claude Code (a coding agent) showing the planning step:

Claude Code's reports

The executor takes the pre-determined plan and executes it step-by-step. It is often a logic block or a smaller, non-LLM component. If an LLM is used, it can be a less capable, more cost-effective model. Its job is to call the specified tools and record the outcomes. The executor is guided by a fixed, pre-computed plan. This makes the agent less prone to misinterpreting observations or getting sidetracked, ensuring it remains focused on the original goal throughout the execution.

The executor usually follows the plan strictly. However, a critical error or a fundamental change in the environment might invalidate the remaining steps. In such cases, the executor pauses and returns the state to the planner. The planner then generates a revised plan based on the new context, and execution resumes.

Despite its efficiency, the planner-executor framework has its own set of limitations. Its reliance on a pre-computed plan makes the architecture inherently less flexible and adaptable in real-time. What if the environment changes significantly or an unexpected observation occurs? The executor is forced to continue following an obsolete plan until a critical failure point triggers a costly re-planning phase.

Moreover, for complex or longer tasks, generating a flawless plan in a single LLM call can be difficult. If the initial plan is flawed or misses a critical step, the entire execution will be compromised, leading to a major failure instead of a minor course correction. Finally, while it reduces the number of calls, the initial planning call itself must be a detailed, complex, and often long-context prompt. This ensures a high-quality plan, but can increase the cost of that single interaction.

Reflexion

Reflexion (by Shinn et al) improves agent performance using verbal reinforcement learning and a structured self-reflection process. It is very effective for agents that need to learn from past failures and improve their decision-making over multiple trials. The framework enables the agent to build upon its experiences without needing to update the underlying model's weights. Here's a diagram showing the basic ReAct agent loop with the addition of Reflexion:

Basic ReAct agent augmented with Reflexion

The core of the architecture is an actor, an LLM that interacts with an environment (this can also be a base agent architecture like ReAct). A short-term memory component called the trajectory captures the agent's outputs. An evaluator, which is another LLM or a specialized function, then analyzes this trajectory and provides feedback (a reward score) based on the actor's performance.

This feedback and any external observations are passed to a self-reflection LLM. This model is prompted to think about what went wrong (or right) and why, and generates "reflective text." This text contains high-level insights and strategies for future attempts; for instance, the agent might conclude, "I should always use tool X for action Y."

This reflective text is then stored in the agent's long-term memory, referred to as experience. For the next trial, the actor is prompted with the original task instructions and the reflective text from its long-term memory. This process of self-reflection and verbal reinforcement allows the agent to improve its approach based on past results iteratively.

The main benefit of this design is that the agent can learn and adapt quickly through trial and error. By verbally reflecting on its performance, the agent can avoid repeating mistakes and find better strategies. This makes the architecture a good fit for complex, multi-step tasks where initial attempts are likely to fail. It is also a flexible approach that you can apply to various types of agents and tasks.

However, this method also has some challenges. The process is computationally intensive because it requires multiple LLM calls for the actor, evaluator, and self-reflection models. Additionally, the effectiveness of the self-reflection depends heavily on the quality of the prompts that guide the LLMs. If the reflective prompts are not designed well, the agent might generate unhelpful or even counterproductive strategies, which can hurt its performance.

LATS

The language agent tree search (LATS) architecture aims to improve decision quality by combining an LLM's reasoning ability with the structured exploration of a search algorithm. It operates on the idea that you can best solve complex problems by exploring multiple potential solutions, not just by following a single path.

LATS works by building and exploring a "tree" of possibilities, similar to how a grandmaster analyzes a chess game:

  • First, the agent prompts the LLM to generate multiple possible actions or short action sequences. These possible actions represent the branches that extend from the current state.

  • The agent then "thinks ahead" by simulating the execution of these branches. Importantly, it does not act in the real environment yet. Instead, it predicts the outcome (the observation) of each branch. It then uses the LLM to generate more possible actions from that predicted state, building a deep tree of potential future paths.

  • A scoring mechanism—often guided by the LLM itself—evaluates all the simulated paths. It assigns a value or confidence score to each complete sequence. The scoring gives a higher priority to paths that seem most likely to achieve the goal successfully.

  • The agent then selects the first action from the highest-scoring path in the tree and executes it in the real environment. The process then restarts from the new observed state, continuously exploring and improving the path forward.

LATS offers major advantages for complex, multi-step problems, but it comes at a cost. By exploring several action-observation sequences, LATS overcomes the "greedy" nature of ReAct agents, which only look one step ahead. LATS systematically finds and avoids short-term actions that could lead to long-term dead ends, resulting in better decision quality. It also addresses the primary flaw of the planner-executor model. Instead of committing to a long plan, LATS validates the plan's key stages through internal simulation before any action is taken.

Despite its high performance, LATS adds significant overhead. Simulating and evaluating many paths and states requires more LLM calls than the ReAct, planner-executor, and Reflexion models. This makes it computationally expensive and usually best for problems where the task's complexity justifies the high cost. Additionally, the extensive "thinking ahead" phase adds significant latency before the agent can take its first action. For time-sensitive tasks, this delay can be prohibitive. Finally, the overall architecture is more complex to implement and manage; it requires careful coordination of the tree structure, the simulation logic, and the scoring mechanism.

Conclusion

You have explored four foundational architectural patterns for LLM agents, each offering a different approach to problem-solving. You started with the simple, iterative loop of ReAct, then examined the efficient but rigid structure of the planner-executor model. You then moved to the adaptive learning of Reflexion agents through self-correction and finished with the strategic exploration of the LATS framework. Each pattern presents a unique trade-off between simplicity, cost, flexibility, and decision-making quality.

It's best to see these architectures not as rigid, competing frameworks, but as a conceptual toolkit. In practice, agents often mix these patterns to create hybrid systems for specific tasks. For example, an agent might use a Planner to set a high-level strategy but execute each sub-task with a ReAct loop that includes a Reflexion mechanism for self-learning. Mastering these core ideas is your starting point; you can combine, adapt, and build on these foundations to create more advanced solutions.

How did you like the theory?
Report a typo