Generative AIBuilding with foundation modelsBuilding agentsMulti-agent systems

Introduction to multi-agent systems

12 minutes read

A multi-agent system (MAS) is a collection of autonomous AI agents working together within a shared environment to solve complex tasks. Instead of one agent doing everything, you create a team of specialized agents, each excelling at their specific domain. Think of it like the difference between hiring one person to run your entire business versus building a team where each member has a specific role they're really good at.

The single-agent problem

Let's start with a real-world scenario to understand why we need multiple agents in the first place. Imagine you're building a customer service system where one agent is responsible for handling product inquiries, processing refunds, scheduling appointments, analyzing sentiment, and escalating to human agents when things get complicated. At first glance, this might seem efficient. One agent means one system to maintain, one set of instructions, one context to manage. But as you start building this system, problems quickly emerge.

This approach creates several fundamental challenges:

  • Context window bloat: System prompt includes every domain's instructions. Tool definitions for all capabilities, even when most aren't needed for the current task.

  • High costs: Every simple query pays for the entire mega-agent's context. "What are your hours?" shouldn't need refund processing tools in context.

  • Maintenance nightmare: Updating refund logic risks breaking scheduling. Adding new capabilities affects everything.

  • Poor specialization: Agent tries to be expert at everything, excels at nothing.

These problems compound as your system grows. What starts as a manageable single-agent solution becomes increasingly brittle and expensive. You find yourself constantly tweaking the system prompt, adding more edge cases, expanding tool definitions, and wondering why performance keeps degrading despite a more capable LLM model underneath.

Multi-agent solution

The solution is conceptually simple but architecturally powerful: instead of one overwhelmed agent trying to master every domain, create a specialized team. Each agent becomes an expert in one area, with focused instructions and only the tools they actually need. Let's see how this transforms our customer service system:

# Router agent - understands intent, directs to specialists
router = Agent(
    instructions="Route customers to appropriate specialist",
    handoffs=[product_expert, refund_specialist, scheduler]
)

# Specialist agents - each focused on one domain
product_expert = Agent(
    instructions="Answer product questions using catalog database",
    tools=[search_products, check_inventory]
)

refund_specialist = Agent(
    instructions="Process refunds and returns",
    tools=[process_refund, check_return_policy]
)

Each agent now:

  • Has focused system prompt for their domain

  • Includes only relevant tools in context

  • Can be updated independently

  • Performs better at their specialty

Each specialist can be developed, tested, and improved independently. When you need to update refund policies, you modify just the refund specialist without worrying about breaking product search or appointment scheduling. This is the power of specialization in multi-agent systems.

Three pillars of MAS

Now that we've seen why multi-agent systems matter, let's understand what makes them work. Every multi-agent system, regardless of complexity, rests on three fundamental components. Understanding these pillars will help you design effective multi-agent architectures from the start.

  1. Agents: Autonomous entities with specialized roles, system prompts, tools, and handoff capabilities.

  2. Environment: Shared space where agents operate – conversation history, memory stores, external systems, task state.

  3. Interaction mechanisms: How agents coordinate – handoffs, shared message history, common memory, structured protocols.

Consider the following example with the research assistant:

class ResearchSystem:
    def __init__(self):
        self.coordinator = Agent(
            instructions="Break down research tasks and delegate",
            handoffs=[web_researcher, data_analyst, writer]
        )

        self.web_researcher = Agent(
            instructions="Find and summarize relevant articles",
            tools=[web_search, extract_content]
        )

        self.data_analyst = Agent(
            instructions="Analyze quantitative data and trends",
            tools=[query_database, compute_metrics]
        )

        self.writer = Agent(
            instructions="Synthesize findings into coherent report",
            tools=[create_document]
        )

# Coordinator delegates: "Find articles" → web_researcher
# "Analyze data" → data_analyst
# "Write report" → writer with results from both

This example shows how a coordinator agent orchestrates multiple specialists to complete a complex task. The coordinator doesn't need to know how to scrape websites or run statistical analysis. It just needs to understand the overall goal and know which specialist to call for each subtask. The specialists, in turn, don't need to understand the big picture – they just need to excel at their specific domain.

Memory in multi-agent systems

Here's where things get interesting. Remember from our first lecture that LLMs are stateless functions with no memory between calls? In multi-agent systems, this fundamental challenge doesn't just persist – it actually multiplies. Now you don't just have one stateless entity to manage; you have multiple stateless entities that somehow need to coordinate and share information. When the web researcher finds important articles and the data analyst computes key metrics, how does the writer agent know about both? This is where memory becomes absolutely critical.

Two approaches to share information:

  1. Shared memory store: All agents read/write common facts.

shared_memory = {
    "destination": "Tokyo",
    "dates": "March 15-22",
    "budget": "$2000",
    "interests": ["food", "history"]
}

# Each agent can access and update this shared state
  1. Explicit context in handoffs: Pass relevant info when delegating.

# Destination agent hands off to flight specialist
handoff(
    to=flight_specialist,
    context={
        "destination": "Tokyo",
        "dates": "March 15-22",
        "budget": "$2000",
        "preferences": ["direct flights"]
    }
)

Most systems use both approaches strategically: shared memory for persistent facts that multiple agents need over time, and explicit passing for immediate task context during handoffs. The key is choosing the right mechanism for each piece of information based on how it will be used.

Types of agent interactions

Not all multi-agent systems work the same way. The nature of how your agents interact shapes everything from how you write their system prompts to how you structure information flow between them.

Cooperative systems have all agents working toward a shared goal. Our customer service example is cooperative. Every agent, from the router to the specialists, wants the same thing: to resolve the customer's issue successfully. In cooperative systems, agents share information freely, hand off tasks based on expertise rather than competition, and genuinely help each other succeed. There's no conflict of interest because they all win or lose together.

Competitive systems involve agents with opposing goals. Consider a negotiation simulation where one agent represents a buyer trying to get the lowest price, and another represents a seller trying to get the highest price. Each agent is literally working against the other, though they still need to interact to reach any agreement at all. Competitive systems can also include debate scenarios where agents argue opposite sides of an issue while a judge agent evaluates the quality of their arguments.

Mixed systems are perhaps the most interesting because they combine cooperation and competition in productive ways. Imagine a software development team of agents building a product together. Most agents cooperate toward building good software. But you might also have a dedicated critic agent whose explicit job is to challenge decisions and find flaws. This critic cooperates with the team goal (building quality software) while competing with other agents (pointing out their mistakes). When designed well, this tension is healthy and leads to better outcomes.

When to use multi-agent systems

This is perhaps the most important question: when does the added complexity of multiple agents actually pay off? Not every problem needs a multi-agent solution, and choosing the wrong architecture can make your system harder to build and maintain without any real benefit. Here's how to think about this decision.

Use MAS when:

  • your product have clearly separable domains of expertise that would be difficult to combine in a single prompt. Example: chat, handling product catalog knowledge, refund policies, and scheduling logic

  • different tasks need fundamentally different tools. Example: a product query agent needs database access while a refund agent needs payment API access, and keeping both tool sets in one agent's context is wasteful

  • you are able to process tasks in parallel. If parts of your workflow are independent, having multiple agents work simultaneously can dramatically reduce latency compared to one agent doing everything sequentially

  • different operations require different access control levels

Stick with a single agent when:

  • your task is fundamentally unified and doesn't naturally decompose into separate concerns

  • your context window can comfortably fit everything you need, including the system prompt, all tool definitions, and typical conversation history with room to spare

  • you have extremely tight latency requirements and need the absolute fastest response times

  • you're in early prototyping phases. It's often smarter to start with one agent, understand your problem deeply, and split into multiple agents later when natural boundaries become clear

Cost and efficiency benefits

Let's talk about something that might not be immediately obvious but becomes critically important at scale: cost efficiency. Multi-agent systems can actually save you significant money on API costs compared to monolithic single agents. This happens because you only pay for what you actually use on each request.

Consider what happens with a single agent architecture. Every query, no matter how simple, loads the entire system. The system prompt might be 5,000 tokens of instructions covering every possible scenario. Tool definitions for all capabilities add another 3,000 tokens. So even when someone asks "What are your hours?" – a question that doesn't need any tools at all – you're paying for 8,000+ tokens just to get the context loaded before the LLM even starts processing.

Now compare that to a multi-agent approach:

# Single agent: Every query loads everything
context = system_prompt (5000 tokens) + all_tools (3000 tokens) + conversation
# Simple query: "What are your hours?" = 8000+ tokens

# Multi-agent: Only loads what's needed
router_context = router_prompt (500 tokens) + conversation
# Same query: 500 tokens
# Router realizes this doesn't need specialist

The difference is dramatic at scale. For complex queries needing multiple specialists, you only pay for the relevant agents used, not for every possible agent in your system. Each request is optimized to include only what's actually needed to solve that specific problem.

Conclusion

Multi-agent systems represent a fundamental shift in how we architect LLM applications. By breaking monolithic agents into specialized teams, we gain performance, maintainability, and cost efficiency. But this power comes with coordination complexity that requires thoughtful design. The foundation rests on understanding how agents, environment, and interaction mechanisms work together to create a coherent system. Here are the key takeaways from the topic:

  1. Specialization over generalization – Focused agents perform better than one mega-agent;

  2. Context efficiency – Only include what's needed for current task;

  3. Independent scaling – Run many instances of high-demand agents, few of rare ones;

  4. Isolation – Update one agent without touching others;

  5. Memory becomes critical – Coordination requires information sharing between agents;

  6. Not always better – Simple tasks don't need the complexity of multiple agents.

How did you like the theory?
Report a typo