AI Agent Memory: RAG, Episodic, and Semantic Patterns

An AI agent without memory is an amnesiac — capable and smart, but starting fresh every conversation. Production agents need memory: what they've done before, what the user told them, what facts they've accumulated. Here are the three memory patterns I use and when each applies.

Why memory is the hard problem

LLM context windows are getting larger (100k, 200k, 1M tokens), but they're still bounded and expensive to fill. More importantly, information from one session doesn't persist to the next by default. Memory systems solve two problems:

Cross-session persistence — facts learned in session 1 are available in session 100
Context management — what to include in the current context window from the accumulated memory

These are different problems with different solutions.

Memory Type 1: Semantic Memory (RAG)

What it is: A knowledge base of facts the agent can query. Think of it as the agent's long-term factual memory.

Implementation: Embed facts as vectors, store in a vector DB, retrieve by semantic similarity to the current query.

class SemanticMemory:
    def __init__(self, vector_store: VectorStore, embedder: Embedder):
        self.store = vector_store
        self.embedder = embedder

    async def remember(self, fact: str, metadata: dict = None):
        embedding = await self.embedder.embed(fact)
        await self.store.upsert(
            id=hash(fact),
            vector=embedding,
            text=fact,
            metadata=metadata or {}
        )

    async def recall(self, query: str, k: int = 5) -> list[str]:
        query_embedding = await self.embedder.embed(query)
        results = await self.store.search(query_embedding, k=k)
        return [r.text for r in results]

When to use:

Factual knowledge that should be retrievable: product specs, user preferences, domain facts
Large knowledge bases that don't fit in context
Knowledge that needs to be searched, not recalled in order

Limitations:

Can't recall sequences of events
Doesn't naturally represent causality ("X happened because of Y")
Requires good embedding + retrieval quality

Memory Type 2: Episodic Memory

What it is: A chronological record of what happened — what the agent did, what the user said, what decisions were made. Think of it as a diary.

Implementation: Append-only log with timestamps and structured entries.

type Episode struct {
    ID        uuid.UUID
    AgentID   string
    SessionID string
    Timestamp time.Time
    Type      EpisodeType  // user_message, agent_action, system_event
    Content   string
    Metadata  map[string]any
}

type EpisodicMemory struct {
    db *pgxpool.Pool
}

func (m *EpisodicMemory) Record(ctx context.Context, ep Episode) error {
    _, err := m.db.Exec(ctx,
        `INSERT INTO episodes (id, agent_id, session_id, timestamp, type, content, metadata)
         VALUES ($1, $2, $3, $4, $5, $6, $7)`,
        ep.ID, ep.AgentID, ep.SessionID, ep.Timestamp, ep.Type, ep.Content, ep.Metadata,
    )
    return err
}

func (m *EpisodicMemory) Recall(ctx context.Context, agentID string, since time.Time, limit int) ([]Episode, error) {
    rows, err := m.db.Query(ctx,
        `SELECT * FROM episodes WHERE agent_id = $1 AND timestamp > $2
         ORDER BY timestamp DESC LIMIT $3`,
        agentID, since, limit,
    )
    // ...scan rows...
}

When to use:

Agents that need to track what they've done (task logs, conversation history)
Auditing and debugging ("what did the agent do at 2pm?")
Sequential reasoning ("I tried X, it failed, so now I'll try Y")
User-facing history ("summarize what we discussed last week")

Limitations:

Grows unboundedly — requires periodic summarization or archival
Not efficient for semantic search (sequential, not indexed by meaning)

Memory Type 3: Procedural Memory (Skills/Tools)

What it is: How to do things — procedures, patterns, and tools the agent has learned. Unlike semantic (what) or episodic (when), procedural memory is about how.

In coding agents, this is implemented as agent skills — domain-specific instructions loaded into context:

class ProceduralMemory:
    def __init__(self, skills_dir: Path):
        self.skills = {}
        for skill_file in skills_dir.glob("*.md"):
            skill_name = skill_file.stem
            self.skills[skill_name] = skill_file.read_text()

    def get_skill(self, name: str) -> str | None:
        return self.skills.get(name)

    def relevant_skills(self, task: str, threshold: float = 0.7) -> list[str]:
        # Semantic matching to find relevant skills
        return [name for name, content in self.skills.items()
                if self.similarity(task, content) > threshold]

When to use:

Reusable patterns that should be consistent across sessions
Domain expertise that doesn't change with experience (e.g., how to write a Go HTTP handler)
Tools and capabilities the agent should always know about

Combining memory types

Production agents use all three:

class AgentWithMemory:
    def __init__(self, semantic, episodic, procedural):
        self.semantic = semantic
        self.episodic = episodic
        self.procedural = procedural

    async def respond(self, user_message: str, session_id: str) -> str:
        # Gather relevant memory
        facts = await self.semantic.recall(user_message, k=3)
        history = await self.episodic.recall(self.agent_id, since=session_start, limit=10)
        skills = self.procedural.relevant_skills(user_message)

        # Build context
        context = f"""
        Relevant knowledge: {facts}
        Recent history: {history}
        Available skills: {skills}
        User message: {user_message}
        """

        # Record this interaction
        response = await self.llm.complete(context)
        await self.episodic.record(Episode(type="user_message", content=user_message, ...))
        await self.episodic.record(Episode(type="agent_response", content=response, ...))
        
        return response

Memory compression and forgetting

Episodic memory grows without bound. Two strategies:

Sliding window: Keep only the last N episodes. Simple, loses history.

Hierarchical summarization: Periodically summarize old episodes into semantic memory, then delete the raw episodes:

async def compress_old_episodes(agent_id: str, cutoff: datetime):
    old_episodes = await episodic.recall(agent_id, since=epoch, until=cutoff)
    summary = await llm.complete(f"Summarize these agent actions: {old_episodes}")
    await semantic.remember(summary, metadata={"type": "episodic_summary", "period": str(cutoff)})
    await episodic.delete_before(agent_id, cutoff)

This preserves the essence of old experiences without keeping every raw episode.

FAQ

What are the main types of AI agent memory? Three types: semantic (factual knowledge, retrieved by similarity), episodic (chronological log of events and actions), and procedural (skills and patterns for how to do things). Most production agents need all three.

What is RAG in the context of agent memory? RAG (Retrieval-Augmented Generation) implements semantic memory — it stores facts as vector embeddings and retrieves relevant ones for each query. It's not a complete memory system but handles the "what do I know" part.

How do you prevent agent memory from growing forever? Use hierarchical summarization: periodically summarize old episodic memories into semantic summaries, then delete the raw episodic records. This preserves knowledge while bounding storage growth.

Is a larger context window a substitute for memory systems? Partially. Larger context windows reduce the need for retrieval-based memory for within-session recall. They don't solve cross-session persistence or efficient retrieval from large knowledge bases. Memory systems remain necessary for agents that work across many sessions.

Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: Building common-knowledge: Persistent Memory for AI Agents · RAG in Production: Architecture That Actually Scales.