Generative AIBuilding with foundation modelsBuilding agentsIntroduction to agents

Introduction to agents

10 minutes read

Agents use LLMs as their computational engine to exhibit autonomous behavior, including decision-making, tool interaction, and environmental adaptation. Unlike static LLMs, agents are designed to plan actions and iteratively refine outputs using memory and external tools.

The four key components of an agent are:

The Environment it interacts with. The environment is text conversations or data sources.
Sensors to perceive the environment. Sensors are the text inputs.
Actuators to effect change in the environment. Actuators are the tools the agent can use.
Effectors, which serve as the agent's decision-making component, translate observations into actions. The effector is the LLM itself with its reasoning capabilities.

Why do we need agents?

Agents represent a paradigm that extends beyond simple input-output models. They enable LLMs to interact with environments, make decisions, and execute tasks with varying degrees of autonomy.

Agents solve these problems, augmenting the LLMs with additional capabilities by combining different abilities:

Memory: They remember past conversations or information.
Tools: They can use document loaders, image generation, or other software to get things done.
Planning: They think through what steps to take to solve your problem.

An agent framework allows an LLM to observe its environment, plan actions, use tools, and maintain memory of past interactions.

Components of Agents

The three primary components that make agents are memory, tools, and planning.

LLMs by themselves don't "remember" previous interactions. For agents to function effectively, we need to implement a memory mechanism. There are two main types of memory in agent systems:

Short-Term Memory: It functions as a buffer for immediate context. The simplest implementation uses the LLM's context window to store recent interactions. For models with smaller context windows, we might need to summarize conversations to retain important information while managing token usage.
Long-Term Memory: It allows for the retention of information over extended periods. One of the common approaches is to use chatstores to store and manage the previous interactions.

Effective memory management enhances an agent's ability to maintain context, learn from past experiences, and make more informed decisions over time.

Tools are resources or functions that helps us achieve a goal. Tool calling, also known as function calling, allows LLMs to interact with external tools and APIs. These extend an agent's capabilities by allowing it to interact with external systems. These might include Browser Automations, Code Interpreters, DuckDuckGo APIs, or an external database.

The Agent decides when it needs to use these tools and how to use them. For an agent to use tools effectively, it needs to:

Understand when a tool is needed
Select the appropriate tool
Format the input correctly
Process the tool's output

This typically requires the agent to generate structured outputs (like JSON) that can be parsed to extract tool calls.

For example, a dummy Weather API Tool can be given as follows:

*# Define the tool
def get_weather(location: str, unit: str) -> str:
	"""
    This is a dummy function representing a weather API. 
    In a real scenario, this would interact with an actual API.
    """

	if location == "Paris, France" and unit == "celsius":
        return "20°C, sunny"
	else:
        return "No weather data available"*

Declaring the function documentation, type hints for both inputs and outputs, as in the above example, and providing additional context, such as few-shot prompting, helps determine when the LLM should use an appropriate tool based on the context.

Planning is the process by which an agent decides what actions to take. It involves reasoning about the current situation and determining a sequence of steps to achieve a goal.

How planning works:

Task decomposition involves dividing a complex task into smaller, more manageable subtasks. These subtasks can be independent or sequentially dependent.
Once a task is decomposed, the agent needs to plan the steps required to achieve each subtask.
Agents must identify potential actions, reduce the list to optimal actions, prioritize them, and identify dependencies and conditional steps based on environmental changes.
Various frameworks like ReAct and Reflection are used for handling dynamic decision-making and planning in agents.

Architecture of Agent

Many LLM applications implement a particular control flow of steps before and/or after LLM calls. Instead of hard-coding a fixed control flow, we sometimes want LLM systems that can pick their own control flow to solve more complex problems.

Here’s the basic architecture of an agent:

Diagram depicting components of an AI system including chat, prompts, memory, language model, API, and web browser.

The model receives input from the user, based on which it determines the required tool calls and generates a response. Following analysis of the response, the model further decides whether additional tool calls are necessary or if the output response requires further processing. This iterative process workflow is executed for each input context.

Although the above architecture is customized with a lot of tools, it does not guarantee efficient and accurate results for every query. This is where popular architectures, such as ReAct, are introduced.

Types of Agents

AI agents are categorized based on their capabilities. The main types are:

Simple Reflex Agents
Simple Reflex Agents act solely based on the current percept, ignoring the rest of the percept history. They use condition-action rules (if-then rules) to decide what action to take. They do not consider the environment’s history or future consequences.
Model-Based Reflex Agents
Model-based reflex agents maintain some internal state to keep track of the part of the world they cannot see. The internal model is updated based on percept history, allowing it to handle partially observable environments.
Goal-Based Agents
Goal-based agents act to achieve specific objectives. They consider future actions and outcomes, using search and planning to select actions that will help them reach their goals.
Utility-Based Agents
Utility-based agents go beyond goals by considering multiple possible desirable states and choosing actions that maximize their own perceived utility (a measure of happiness or satisfaction). They evaluate the desirability of different outcomes and select the one with the highest utility.
Learning Agents
Learning agents can improve their performance over time by learning from experience. They have components for learning, performance, criticism, and problem-solving, allowing them to adapt to new situations and environments.

Multi-Agent Systems

Multi-agent systems involve multiple autonomous AI agents collaborating to solve complex problems beyond the capabilities of individual agents. These systems leverage distributed intelligence, adaptability, and parallelism to achieve goals in dynamic environments.

Multi-agent systems tend to outperform single-agent systems due to the larger pool of shared resources, optimization, and automation. Instead of multiple agents learning the same policies, one can share learned experiences to optimize time complexity and efficiency.

Some of the key characteristics of Multi-Agent Systems are:

Autonomy: Agents operate independently with self-contained decision-making.
Decentralized Control: No central authority; agents coordinate via communication protocols.
Adaptability: Agents adjust behavior based on environmental changes or new data.
Fault Tolerance: The System remains functional if individual agents fail.
Scalability: New agents can be added without disrupting the system.

Some of the common architectures of Multi-Agent Systems are:

Centralized Networks: A Single Controller agent manages the agent coordination. This is helpful in consistent decisions and simplified coordination use cases, like in supply chain management, etc. But these are often the single point of failures as the whole communication of the system gets disrupted, and also have scalability challenges.
Decentralized Networks: Agents communicate directly without a central agent or hub. More often, the agents in this architecture are autonomous with properties like fault tolerance and better scalability. Some of the cons are complex synchronization requirements. This architecture finds use cases in autonomous drones and smart grids.

Conclusion

We have covered agents and their components and how these help in it to interact with environment, make decision and execute tasks. We have explored different types of agents and their architecture mechanisms. Additionally we learned multi-agent systems, some of their key characteristics along with their network architectures.

4 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo