Generative AIBuilding with foundation modelsBuilding agentsMulti-agent systems

Multi-agent collaboration patterns

14 minutes read

In our previous topic, we established why multi-agent systems matter and explored their fundamental components. But understanding that you need multiple agents is only the beginning. The real question is: how do these agents actually work together to accomplish complex goals? This is where collaboration patterns come in.

Modern LLM agents can coordinate in surprisingly sophisticated ways. They're not limited to rigid, predefined workflows where Agent A always passes to Agent B which always passes to Agent C. Instead, they can reason about who should handle what task, they can discuss problems together like a human team, and they can even challenge each other's ideas to find better solutions.

Let's explore three collaboration patterns that make this possible: delegation, collaborative teamwork, and debate.

Delegation

Delegation is perhaps the most intuitive pattern because it mirrors how work gets distributed in human organizations. One agent (the coordinator) receives a complex task, breaks it down into manageable subtasks, and assigns each subtask to the specialist best equipped to handle it. It's like a project manager who understands the big picture and knows exactly which team member should handle each piece of work. Take a look at the following example:

# Coordinator receives complex task
coordinator = Agent(
    instructions="Break down tasks and delegate to specialists",
    handoffs=[search_agent, analysis_agent, writing_agent]
)

# Example: Market research report
# Coordinator → search_agent: "Find EV market news from last 3 months"
# Coordinator → analysis_agent: "Analyze Tesla/Rivian revenue trends"
# Coordinator → writing_agent: "Create report from research + analysis"

And here's an implementation with handoffs:

router_agent = Agent(
    name="customer_service_router",
    instructions="""Route customers to appropriate specialist.

    Specialists:
    - product_expert: Product questions, features, availability
    - refund_specialist: Refunds, returns, cancellations
    - technical_support: Technical issues, bugs
    - scheduler: Appointments, schedule changes

    Include brief summary in handoff.""",

    handoff_agents=[product_expert, refund_specialist,
                    technical_support, scheduler]
)

# User: "I want to return my laptop bought 3 days ago"
# Router analyzes → hands off to refund_specialist
# Handoff includes: "Customer wants return. Order #12345. Item: laptop. 3 days ago."

What makes handoffs powerful is their flexibility. The router agent can make intelligent, context-aware decisions. When a customer says "I bought a laptop three days ago and it won't turn on, I want a refund," the router faces an interesting choice: is this primarily a technical support issue where the laptop might just need troubleshooting, or is it a refund issue where the customer has already made up their mind? An LLM-based router can analyze the customer's tone and intent to make this judgment call, something that would be difficult to encode in rigid rules.

Moreover, handoffs can carry rich context that helps the receiving agent perform better. Instead of just routing to the refund specialist, the handoff can include information like "Customer is frustrated, purchased recently, technical issue with product, may be eligible for exchange rather than refund." The refund specialist receives this context and can proactively offer an exchange option, potentially turning a refund into a satisfied customer.

Hierarchical delegation

As systems grow larger, flat delegation structures become unwieldy. Imagine trying to build a customer service system with fifty different specialist agents. Having one router agent that knows about all fifty specialists and decides between them for every query becomes complicated and error-prone. This is where hierarchical delegation becomes valuable.

Top Router
├── Sales Department Agent
│   ├── Pre-sales Team
│   └── Post-sales Team
├── Support Department Agent
│   ├── Technical Team
│   └── Account Team
└── Billing Department Agent

Think of this hierarchy like a large company's organizational chart. When you call a large organization, you don't immediately reach a specialist who knows everything. You first reach a general receptionist who routes you to a department. Then someone in that department routes you to a specific team. Finally, you reach a specialist within that team. Each level of routing has progressively more specific knowledge about where to send you, making the system both scalable and maintainable.

Each level has progressively specific knowledge. The top router just needs to distinguish between high-level categories: "This is sales versus support versus billing?" The department agent has deeper domain knowledge: "This is pre-sales versus post-sales?" The team agents know the individual specialists and their specific areas of expertise. This layered approach means no single agent needs to know about every possible specialist in the system, and you can add new specialists by just updating their immediate team-level manager.

When to use delegation:

Clear task hierarchy
Well-defined specialist roles
Sequential workflow (A → B → C)
Predictable task categories

Collaboration

Collaboration represents a fundamental shift from hierarchical delegation to peer-to-peer interaction. Instead of one agent directing others, multiple agents work as equals, each contributing their specialized perspective to a shared problem. Think of it like the difference between a manager assigning tasks to subordinates versus a group of colleagues brainstorming together around a whiteboard. In collaborative systems, agents build on each other's ideas through conversation, and solutions emerge from the dialogue rather than from top-down planning. Take a look at the following example with the group chat model:

group_chat = GroupChat(
    participants=[
        product_manager_agent,
        architect_agent,
        backend_dev_agent,
        qa_engineer_agent
    ],
    max_turns=20,
    speaker_selection="auto"  # or "round_robin"
)

# All agents see full conversation, contribute based on expertise
result = group_chat.run("Add multi-format data export feature")

The power of group chat collaboration lies in its flexibility and emergent intelligence. Unlike delegation where the coordinator must know enough to route tasks appropriately, in group chat collaboration the agents themselves recognize when they have something valuable to contribute. The system doesn't need to predict ahead of time when the QA Engineer should speak. The QA Engineer monitors the conversation and jumps in when testing concerns arise. This makes group chat particularly effective for open-ended problems where you can't predict in advance exactly which expertise will be needed at which moment.

Below, you can see an example of a dialogue:

Product Manager: "Users want JSON and Excel export. Currently only CSV."

Architect: "We could build separate exporters, or create abstraction
           layer with format serializers. Abstraction makes future
           formats easier."

Backend Dev: "Excel files are larger than CSV. Current pipeline loads
             everything in memory. Might need streaming to avoid
             memory issues."

QA Engineer: "Need test fixtures in all formats. Verify exported data
             can re-import. Watch for special characters in different
             encodings."

Product Manager: "Implement JSON first (most requested), then Excel.
                 Use abstraction approach for easier Excel later."

Notice how each agent contributes from their unique perspective, and the solution that emerges is better than what any single agent would have proposed.

Each agent contributes expertise without needing a coordinator to tell them when to speak. The solution emerges from dialogue rather than top-down planning. This distributed intelligence means you don't need one super-smart coordinator that knows everything. You just need specialists that recognize when their expertise is relevant. Below, you can see the key components:

class GroupChat:
    def select_next_speaker(self, conversation_history):
        """Decide who speaks next based on context"""
        # Option 1: Round-robin (simple, fair)
        # Option 2: LLM-based (intelligent, context-aware)

        context = {
            "conversation": conversation_history,
            "participants": self.get_participant_roles()
        }

        # LLM decides: "QA Engineer should speak next -
        # testing concerns haven't been addressed"
        return self.llm_select_speaker(context)

When to use collaboration:

Open-ended problems
Multiple perspectives beneficial
Solution not known in advance
Design discussions and brainstorming

Debate

Debate represents perhaps the most sophisticated form of multi-agent collaboration, and it's particularly fascinating because it leverages conflict productively. In debate systems, agents are explicitly positioned to argue different perspectives or critique each other's work. This adversarial interaction might seem counterproductive at first, but it's actually incredibly powerful for improving quality. Just as peer review makes academic papers better and code reviews catch bugs before they reach production, agent debate surfaces flaws and challenges assumptions in ways that a single agent working alone simply cannot. The structure is as follows:

debate_system = DebateSystem(
    proposer=developer_agent,
    critic=code_reviewer_agent,
    judge=senior_engineer_agent,
    max_rounds=5
)

Consider the following example with the code review debate:

Developer: [Writes function to solve problem]

Critic: "Issues found:
        1. No null input handling
        2. O(n²) complexity when O(n log n) exists
        3. Variable names violate conventions"

Developer: "Good catches. Adding null check. For complexity: typical
           input is <100 items where simpler algorithm is faster
           due to lower constants. Adding comment. Fixing names."

Critic: "Null handling good. Complexity comment helpful. Names better.
        Remaining: function throws on empty input. Return empty
        result instead for robustness."

Developer: [Makes final revision]

Judge: "Code improved significantly. Initial issues addressed.
       Current version handles edge cases, includes comments,
       follows conventions. Approved."

This debate pattern surfaces issues that might not be caught in a single-pass review. The back-and-forth forces the Developer agent to either defend its choices with sound reasoning (revealing whether the decisions were actually well-thought-out) or to accept valid criticism and improve the code. Meanwhile, the Critic agent goes beyond superficial concerns to understand the actual constraints and requirements. The judge ensures the debate stays productive and reaches a meaningful resolution rather than going in circles.

What makes debate especially valuable is that it catches different types of problems than other review methods. Automated tests catch bugs. Static analysis catches style violations. But debate can catch things like "this algorithm choice makes sense for small inputs but won't scale" or "this architecture decision will make future features harder to implement." These are judgment calls that require reasoning about tradeoffs, which is exactly what LLMs can do well in a debate structure. Take a look at the following implementation:

debater_a = Agent(
    name="proposer",
    instructions="""Propose solutions. Defend with reasoning.
    Consider criticism seriously but don't concede without cause."""
)

debater_b = Agent(
    name="critic",
    instructions="""Critically evaluate solutions. Look for edge cases,
    inefficiencies, violations. Be specific. Acknowledge improvements."""
)

judge = Agent(
    name="judge",
    instructions="""Evaluate debates. Assess if criticisms valid and
    responses adequate. Approve when solution is good. Intervene if
    debate goes in circles."""
)

When to use debate:

Correctness is critical
High cost of errors
Code review, legal analysis, medical second opinions
High-stakes decisions

Tradeoff: Multiple LLM calls increase cost and latency. Use when error cost justifies extra scrutiny.

The key insight about debate is knowing when the extra cost is worth it. For a simple internal tool, maybe you don't need debate. For code that handles financial transactions or medical data, the cost of catching even one critical flaw probably justifies the extra LLM calls required for debate.

Choosing the right pattern

With three distinct collaboration patterns available, how do you decide which one to use? The answer lies in understanding the nature of your problem and what type of agent coordination will actually help solve it. Different problems have fundamentally different structures, and matching your collaboration pattern to your problem structure is crucial for building effective systems.

Pattern	Best for	Example use cases
Delegation	Clear hierarchy, defined roles	Customer service, research pipelines, task workflows
Collaboration	Open-ended, multiple perspectives	Design discussions, brainstorming, complex problem-solving
Debate	Correctness critical	Code review, requirements validation, decision analysis

In practice: Sophisticated systems combine patterns. Research system might:

Use delegation to assign initial tasks
Use collaboration for analysis discussion
Use debate to validate findings

This isn't about picking one pattern and sticking with it rigidly. The most effective multi-agent systems often use different patterns for different phases of work. A research project might start with delegation to gather initial data, move to collaboration when analyzing what the data means, and finish with debate to ensure the conclusions are sound before publishing. Each pattern serves its purpose at the right moment in the workflow.

Conclusion

As we've seen, each pattern has its place. Delegation excels when you have clear hierarchies and well-defined specializations. Collaboration shines when open-ended problems benefit from multiple perspectives converging. Debate becomes invaluable when correctness is critical and the cost of errors is high. Understanding these patterns and their tradeoffs is essential for architecting effective multi-agent systems.

Different patterns for different problems – Delegation works for hierarchies, collaboration for open problems, debate for correctness
Patterns are composable – The most sophisticated systems combine multiple approaches rather than picking just one
Cost versus quality tradeoff – More sophisticated patterns deliver better results but cost more in terms of LLM calls and latency
Don't over-engineer – Always use the simplest pattern that actually solves your problem

How did you like the theory?

Report a typo