Generative AIBuilding with foundation modelsBuilding agentsMemory management in agents

Understanding LLM context

3 minutes read

When we talk about LLM context, we mean all the information that is given to a Large Language Model when making a request. This context represents everything the model can "see" and reason about during a single request. Unlike humans who maintain ongoing awareness, LLMs are stateless functions – they have no memory between calls and only know what's explicitly provided in each request.

Let's break down what goes into an LLM context when an agent makes a request.

Components of LLM context

1. System prompt: The foundational instructions that define the agent's behavior, personality, and capabilities.

system_prompt = """You are a helpful assistant capable of using tools.
Always think step-by-step before taking actions.
Be concise but thorough in your responses."""

2. Tools description: Detailed schemas of available functions the LLM can call.

tools = [{
    "name": "search_database",
    "description": "Search the knowledge base for relevant information",
    "parameters": {...}
}]

3. MCPs : External tools, connected to your agent, developed by external providers.

4. Message History: The conversation so far, including:

  • User messages

  • Assistant responses

  • Tool call results

  • System messages

Pasted illustration

How context is formed

When an agent makes a request to an LLM, here's what happens:

def prepare_context(system_prompt, tools, conversation_history, user_input):
    """Prepare the complete context for an LLM request"""
    context = {
        "model": "gpt-4",
        "messages": [
            {"role": "system", "content": system_prompt},
            *conversation_history,
            {"role": "user", "content": user_input}
        ],
        "tools": tools,
        "temperature": 0.7
    }
    return context

# The LLM receives everything in a single request
response = llm.complete(context)

Everything the model needs must be passed every time.

The stateless nature of LLMs

This is crucial to understand: LLMs remember nothing between calls. Each request is completely independent.

# Request 1
response1 = llm.complete({
    "messages": [{"role": "user", "content": "My name is Alice"}]
})
# Returns: "Nice to meet you, Alice!"

# Request 2 - The LLM has no memory of Request 1
response2 = llm.complete({
    "messages": [{"role": "user", "content": "What's my name?"}]
})
# Returns: "I don't know your name. Could you please tell me?"

To maintain continuity, we must explicitly include previous interactions:

# Request 2 with context
response2 = llm.complete({
    "messages": [
        {"role": "user", "content": "My name is Alice"},
        {"role": "assistant", "content": "Nice to meet you, Alice!"},
        {"role": "user", "content": "What's my name?"}
    ]
})
# Returns: "Your name is Alice."

Context window limitations

Every LLM has a context window – the maximum amount of tokens it can process in a single request:

  • GPT-4: 128,000 tokens

  • Claude 3: 200,000 tokens

  • Gemini Pro: 1,000,000 tokens

As conversations grow, managing what stays in context becomes critical. This leads us to the concept of memory in agents.

Price consideration

Another important aspect to remember when designing agents is cost. Agents are charged per token, and inefficient system prompts that consume large numbers of tokens increase costs.

Pasted illustration

When a model needs to be called multiple times in succession with unnecessary tool definitions being passed repeatedly, we end up overpaying for tokens. This might not be noticeable at small scales with simple agents, but when building complex systems, these considerations become critical for cost management.

Conclusion

Let's revise the key ideas from the topic:

  1. Everything must be explicit: If information isn't in the context, the LLM doesn't know it

  2. Context = cost: More tokens in context means higher API costs

  3. Context = latency: Larger contexts take longer to process

  4. Quality matters: Irrelevant information in context can degrade performance

Given all these limitations, we arrive at the importance of memory mechanisms in agents. We'll dive deeper into these mechanisms in the following topics.

How did you like the theory?
Report a typo