Generative AIBuilding with foundation modelsBuilding agentsMemory management in agents

Storing and retrieving memories for an agent

10 minutes read

In the previous topic, we learned about different types of long-term memory: episodic (personal experiences), procedural (learned skills), and semantic (general knowledge). Now let's explore the practical question: How do we actually store and retrieve these memories?

The challenge is real: your agent accumulates knowledge over time, but cramming everything into the LLM's context window isn't scalable. You need a persistent storage solution that allows selective retrieval of relevant memories when needed.

Think of it like organizing a library. You wouldn't dump all books on the floor and search through them every time you need information. Instead, you use a cataloging system to quickly find exactly what you need. For AI agents, we have four primary mechanisms for storing and retrieving memories.

In-memory stores

Keeping memories directly in your application's RAM (Python dictionaries, lists) without any external database. Perfect for short-term memory during a single conversation session. Fast and simple, but not persistent – data is lost when the program stops.

Storage:

Storage is straightforward. Just add new memory to your data structure, like a Python list.

memories = []

def store(new_memory):
    """Store a memory in the list"""
    memories.append({
        "content": new_memory,
        "timestamp": datetime.now()
    })

Retrieval:

Basically, there is no retrieval mechanism. You just pass everything (or the last n items) to the context.

def retrieve_all():
    """Retrieve all memories (no search needed)"""
    return memories

def retrieve_recent(n=5):
    """Get the most recent n memories"""
    return sorted(memories,
                  key=lambda x: x["timestamp"],
                  reverse=True)[:n]

When to use:

Short-term memory during a single conversation session.
Small amounts of data that fit comfortably in RAM.
Development and testing phases.
When persistence isn't required (data is lost when the program stops).

This approach is simple and extremely fast to implement. It's a go-to strategy for short-term memory and perfect for testing and development.

Vector databases

A specialized database that stores information as high-dimensional vectors (numerical representations) and retrieves it based on semantic similarity rather than exact matches. Text is converted into embeddings – numerical vectors that capture meaning. Similar concepts have similar vectors, enabling semantic search.

# Example: These sentences would have similar embeddings
"The cat sat on the mat"     → [0.2, 0.8, 0.1, ..., 0.3]
"A feline rested on the rug" → [0.2, 0.7, 0.1, ..., 0.4]
# These vectors would be "close" in vector space

Storage:

Store memories by converting text to vectors using an embedding model. The vector database handles the embedding and indexing.

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
    texts=["User prefers Italian food", "User allergic to peanuts"],
    embedding=embeddings,
    metadatas=[
        {"type": "preference", "date": "2025-01-15"},
        {"type": "allergy", "date": "2025-01-10"}
    ]
)

# Add new memories
vectorstore.add_texts(
    texts=["User enjoys morning walks"],
    metadatas=[{"type": "habit", "date": "2025-01-20"}]
)

You may also want to break large memories into chunks, as when retrieving you may need just a portion of information.

Retrieval:

Retrieve memories based on semantic similarity to a query. The database finds vectors closest to the query vector.

Important to remember: similarity search doesn't guarantee you'll find a perfect match from the data. It's only guaranteed to find the closest vector. So it's important here to choose an embedding function good enough to capture nuances of data. For general knowledge, using OpenAI embeddings is an option. But for example, if you store medical data, then you better use an embedding model pretrained with medical texts.

# Query: "What foods does the user avoid?"
results = vectorstore.similarity_search(
    query="What foods does the user avoid?",
    k=3  # Return top 3 most similar memories
)

# Returns: "User allergic to peanuts", "User prefers Italian food", etc.
# Even though "avoid" wasn't in the stored text!

When to use:

When you need semantic search (finding related concepts, not just keywords)
Large volumes of unstructured text
When users phrase questions differently but mean the same thing
Episodic memory (past conversations, experiences)
Semantic memory (facts, knowledge base)

Popular options: FAISS (local/in-memory), Pinecone (managed cloud), Chroma (easy prototyping), Weaviate (hybrid search), Qdrant (high-performance)

Usually, semantic retrieval isn't implemented alone. You may consider adding additional metadata filtering. Maybe not for the very first implementation, but as the most obvious way to improve quality of search.

Document stores (NoSQL databases)

A database that stores structured or semi-structured documents (usually JSON) and retrieves them using queries on specific fields or full-text search. Excellent for storing extracted facts with metadata and running complex queries.

Storage:

Store memories as JSON documents with structured fields. MongoDB and similar databases make this straightforward.

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017")
db = client.agent_memory
memories = db.memories

# Create text index for full-text search
memories.create_index([("content", "text")])

# Store a memory
memories.insert_one({
    "content": "User prefers window seats on flights",
    "type": "procedural",
    "category": "travel_preferences",
    "user_id": "user_456",
    "importance": 6,
    "timestamp": datetime.now()
})

Moreover, instead of storing raw conversations, extract and store structured facts:

# Raw conversation:
# "I'm planning a trip to Japan in April. I'm vegetarian and allergic to peanuts."

# Extract and store as structured facts:
memories.insert_many([
    {
        "content": "User is planning a trip to Japan in April",
        "type": "episodic",
        "user_id": "user_123",
        "topic": "travel",
        "destination": "Japan",
        "month": "April",
        "importance": 8,
        "timestamp": datetime.now()
    },
    {
        "content": "User is vegetarian",
        "type": "semantic",
        "user_id": "user_123",
        "category": "dietary_restriction",
        "importance": 9,
        "timestamp": datetime.now()
    },
    {
        "content": "User is allergic to peanuts",
        "type": "semantic",
        "user_id": "user_123",
        "category": "allergy",
        "importance": 10,
        "timestamp": datetime.now()
    }
])

Retrieval:

Retrieve memories using structured queries on any field or full-text search.

# Get memories by type
episodic_memories = list(memories.find(
    {"type": "episodic", "user_id": "user_123"}
).sort("timestamp", -1).limit(10))

# Full-text search
travel_memories = list(memories.find(
    {"$text": {"$search": "flight hotel travel Paris"}}
).limit(5))

When to use:

When memories have clear structure and categories
When you need to query on specific fields (date, user, importance, type)
Storing extracted facts, user preferences, procedural knowledge
When you need complex filtering and aggregation
When you want transactional guarantees

Popular options: MongoDB (most popular), Firebase Firestore (managed, real-time), Couchbase (high performance), DynamoDB (AWS serverless)

Graph databases

A database that stores information as nodes (entities) and edges (relationships), making it ideal for representing interconnected knowledge. Instead of documents or vectors, you explicitly model entities and their relationships.

(User)-[:PREFERS]->(Coffee)
(User)-[:WORKS_AT]->(Company)
(Company)-[:LOCATED_IN]->(City)
(City)-[:COUNTRY]->(France)

Storage:

Store entities as nodes and create relationships (edges) between them.

from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)

# Create entities (nodes)
with driver.session() as session:
    # Create user
    session.run("""
        CREATE (u:User {name: 'Alice', user_id: 'user_789'})
        RETURN u
    """)

    # Create restaurant
    session.run("""
        CREATE (r:Restaurant {
            name: 'Mario\\'s Italian Kitchen',
            cuisine: 'Italian',
            location: 'New York'
        })
        RETURN r
    """)

    # Create relationship
    session.run("""
        MATCH (u:User {user_id: 'user_789'})
        MATCH (r:Restaurant {name: 'Mario\\'s Italian Kitchen'})
        CREATE (u)-[:LIKES {strength: 9, last_visit: '2025-08-15'}]->(r)
    """)

Building a knowledge graph:

with driver.session() as session:
    # Create a preference network
    session.run("""
        CREATE (user:User {name: 'Charlie', user_id: 'user_999'})
        CREATE (coffee:Beverage {name: 'Coffee', type: 'hot_drink'})
        CREATE (morning:TimeOfDay {period: 'morning'})
        CREATE (anxiety:Condition {name: 'anxiety'})

        CREATE (user)-[:PREFERS {strength: 8}]->(coffee)
        CREATE (user)-[:DRINKS_DURING]->(morning)
        CREATE (coffee)-[:INCREASES {confidence: 'high'}]->(anxiety)
        CREATE (user)-[:HAS_CONDITION]->(anxiety)
    """)

Retrieval:

Retrieve information by traversing the graph and finding patterns.

# Query: "Where should I take my colleague Bob for dinner?"
with driver.session() as session:
    result = session.run("""
        MATCH (user:User {user_id: 'user_789'})-[:LIKES]->(restaurant:Restaurant)
        MATCH (user)-[:WORKS_WITH]->(colleague:User {name: 'Bob'})
        WHERE (colleague)-[:RECOMMENDS]->(restaurant)
        RETURN restaurant.name, restaurant.cuisine
    """)

    for record in result:
        print(record["restaurant.name"], record["restaurant.cuisine"])
    # Returns: "Mario's Italian Kitchen, Italian"

When to use:

When relationships between entities are as important as the entities themselves
Complex knowledge graphs
Recommendation systems (friend-of-friend, collaborative filtering)
Understanding social networks or organizational structures
Semantic memory with rich relationships

Popular options: Neo4j (most popular), Amazon Neptune (managed AWS), ArangoDB (multi-model), JanusGraph (scalable, distributed)

However, note that building the right graph with logical structure and traversing it will take a lot of effort.

Comparison table

Feature	In-Memory	Vector DB	Document Store	Graph DB
Best for	Short-term session memory	Semantic search	Structured facts	Relationship-heavy data
Persistence	❌ No	✅ Yes	✅ Yes	✅ Yes
Search type	Linear scan	Similarity	Keyword/field queries	Graph traversal
Scalability	Low	High	High	Medium
Setup complexity	Very low	Medium	Medium	High
Query speed	Fastest	Fast	Fast (if indexed)	Fast (for connections)
Memory types	Short-term only	All types	All types	Semantic + relationships

Practical recommendations

For most agent systems, use a combination:

In-memory store: Current conversation context (last 5-10 messages).
Vector database: Episodic and semantic memories for similarity-based retrieval.
Document store: Structured facts, user preferences, procedural knowledge.
Graph database (optional): Only if relationships are central to your domain.

Conclusion

Choosing the right storage mechanism isn't about finding the "best" option – it's about matching your needs to the right tool:

In-memory stores for temporary session state.
Vector databases when meaning matters more than exact matches.
Document stores when you need structured queries and fact extraction.
Graph databases when entities and their relationships are equally important.

Most production agent systems combine multiple approaches: vectors for semantic search, documents for structured facts, and in-memory for active context. Start simple, and add complexity only when you need it.

In the next topic, we'll tackle the equally important question: How do we update and decay memories over time? Because not all memories should last forever.

How did you like the theory?

Report a typo

Storing and retrieving memories for an agent

In-memory stores

Vector databases

Document stores (NoSQL databases)

Graph databases

Comparison table

Practical recommendations

Conclusion

Related topics