When we talk about LLM context, we mean all the information that is given to a Large Language Model when making a request. This context represents everything the model can "see" and reason about during a single request. Unlike humans who maintain ongoing awareness, LLMs are stateless functions – they have no memory between calls and only know what's explicitly provided in each request.
Let's break down what goes into an LLM context when an agent makes a request.
Components of LLM context
1. System prompt: The foundational instructions that define the agent's behavior, personality, and capabilities.
system_prompt = """You are a helpful assistant capable of using tools.
Always think step-by-step before taking actions.
Be concise but thorough in your responses."""2. Tools description: Detailed schemas of available functions the LLM can call.
tools = [{
"name": "search_database",
"description": "Search the knowledge base for relevant information",
"parameters": {...}
}]3. MCPs : External tools, connected to your agent, developed by external providers.
4. Message History: The conversation so far, including:
User messages
Assistant responses
Tool call results
System messages
How context is formed
When an agent makes a request to an LLM, here's what happens:
def prepare_context(system_prompt, tools, conversation_history, user_input):
"""Prepare the complete context for an LLM request"""
context = {
"model": "gpt-4",
"messages": [
{"role": "system", "content": system_prompt},
*conversation_history,
{"role": "user", "content": user_input}
],
"tools": tools,
"temperature": 0.7
}
return context
# The LLM receives everything in a single request
response = llm.complete(context)Everything the model needs must be passed every time.
The stateless nature of LLMs
This is crucial to understand: LLMs remember nothing between calls. Each request is completely independent.
# Request 1
response1 = llm.complete({
"messages": [{"role": "user", "content": "My name is Alice"}]
})
# Returns: "Nice to meet you, Alice!"
# Request 2 - The LLM has no memory of Request 1
response2 = llm.complete({
"messages": [{"role": "user", "content": "What's my name?"}]
})
# Returns: "I don't know your name. Could you please tell me?"To maintain continuity, we must explicitly include previous interactions:
# Request 2 with context
response2 = llm.complete({
"messages": [
{"role": "user", "content": "My name is Alice"},
{"role": "assistant", "content": "Nice to meet you, Alice!"},
{"role": "user", "content": "What's my name?"}
]
})
# Returns: "Your name is Alice."Context window limitations
Every LLM has a context window – the maximum amount of tokens it can process in a single request:
GPT-4: 128,000 tokens
Claude 3: 200,000 tokens
Gemini Pro: 1,000,000 tokens
As conversations grow, managing what stays in context becomes critical. This leads us to the concept of memory in agents.
Price consideration
Another important aspect to remember when designing agents is cost. Agents are charged per token, and inefficient system prompts that consume large numbers of tokens increase costs.
When a model needs to be called multiple times in succession with unnecessary tool definitions being passed repeatedly, we end up overpaying for tokens. This might not be noticeable at small scales with simple agents, but when building complex systems, these considerations become critical for cost management.
Conclusion
Let's revise the key ideas from the topic:
Everything must be explicit: If information isn't in the context, the LLM doesn't know it
Context = cost: More tokens in context means higher API costs
Context = latency: Larger contexts take longer to process
Quality matters: Irrelevant information in context can degrade performance
Given all these limitations, we arrive at the importance of memory mechanisms in agents. We'll dive deeper into these mechanisms in the following topics.