AI Agent Memory: RAG, Episodic, and Semantic Patterns
AI agents without memory forget everything. Here are the three memory patterns — RAG, episodic, and semantic — and when to use each in production agent systems.
An AI agent without memory is an amnesiac — capable and smart, but starting fresh every conversation. Production agents need memory: what they've done before, what the user told them, what facts they've accumulated. Here are the three memory patterns I use and when each applies.
Why memory is the hard problem
LLM context windows are getting larger (100k, 200k, 1M tokens), but they're still bounded and expensive to fill. More importantly, information from one session doesn't persist to the next by default. Memory systems solve two problems:
- Cross-session persistence — facts learned in session 1 are available in session 100
- Context management — what to include in the current context window from the accumulated memory
These are different problems with different solutions.
Memory Type 1: Semantic Memory (RAG)
What it is: A knowledge base of facts the agent can query. Think of it as the agent's long-term factual memory.
Implementation: Embed facts as vectors, store in a vector DB, retrieve by semantic similarity to the current query.
class SemanticMemory:
def __init__(self, vector_store: VectorStore, embedder: Embedder):
self.store = vector_store
self.embedder = embedder
async def remember(self, fact: str, metadata: dict = None):
embedding = await self.embedder.embed(fact)
await self.store.upsert(
id=hash(fact),
vector=embedding,
text=fact,
metadata=metadata or {}
)
async def recall(self, query: str, k: int = 5) -> list[str]:
query_embedding = await self.embedder.embed(query)
results = await self.store.search(query_embedding, k=k)
return [r.text for r in results]
When to use:
- Factual knowledge that should be retrievable: product specs, user preferences, domain facts
- Large knowledge bases that don't fit in context
- Knowledge that needs to be searched, not recalled in order
Limitations:
- Can't recall sequences of events
- Doesn't naturally represent causality ("X happened because of Y")
- Requires good embedding + retrieval quality
Memory Type 2: Episodic Memory
What it is: A chronological record of what happened — what the agent did, what the user said, what decisions were made. Think of it as a diary.
Implementation: Append-only log with timestamps and structured entries.
type Episode struct {
ID uuid.UUID
AgentID string
SessionID string
Timestamp time.Time
Type EpisodeType // user_message, agent_action, system_event
Content string
Metadata map[string]any
}
type EpisodicMemory struct {
db *pgxpool.Pool
}
func (m *EpisodicMemory) Record(ctx context.Context, ep Episode) error {
_, err := m.db.Exec(ctx,
`INSERT INTO episodes (id, agent_id, session_id, timestamp, type, content, metadata)
VALUES ($1, $2, $3, $4, $5, $6, $7)`,
ep.ID, ep.AgentID, ep.SessionID, ep.Timestamp, ep.Type, ep.Content, ep.Metadata,
)
return err
}
func (m *EpisodicMemory) Recall(ctx context.Context, agentID string, since time.Time, limit int) ([]Episode, error) {
rows, err := m.db.Query(ctx,
`SELECT * FROM episodes WHERE agent_id = $1 AND timestamp > $2
ORDER BY timestamp DESC LIMIT $3`,
agentID, since, limit,
)
// ...scan rows...
}
When to use:
- Agents that need to track what they've done (task logs, conversation history)
- Auditing and debugging ("what did the agent do at 2pm?")
- Sequential reasoning ("I tried X, it failed, so now I'll try Y")
- User-facing history ("summarize what we discussed last week")
Limitations:
- Grows unboundedly — requires periodic summarization or archival
- Not efficient for semantic search (sequential, not indexed by meaning)
Memory Type 3: Procedural Memory (Skills/Tools)
What it is: How to do things — procedures, patterns, and tools the agent has learned. Unlike semantic (what) or episodic (when), procedural memory is about how.
In coding agents, this is implemented as agent skills — domain-specific instructions loaded into context:
class ProceduralMemory:
def __init__(self, skills_dir: Path):
self.skills = {}
for skill_file in skills_dir.glob("*.md"):
skill_name = skill_file.stem
self.skills[skill_name] = skill_file.read_text()
def get_skill(self, name: str) -> str | None:
return self.skills.get(name)
def relevant_skills(self, task: str, threshold: float = 0.7) -> list[str]:
# Semantic matching to find relevant skills
return [name for name, content in self.skills.items()
if self.similarity(task, content) > threshold]
When to use:
- Reusable patterns that should be consistent across sessions
- Domain expertise that doesn't change with experience (e.g., how to write a Go HTTP handler)
- Tools and capabilities the agent should always know about
Combining memory types
Production agents use all three:
class AgentWithMemory:
def __init__(self, semantic, episodic, procedural):
self.semantic = semantic
self.episodic = episodic
self.procedural = procedural
async def respond(self, user_message: str, session_id: str) -> str:
# Gather relevant memory
facts = await self.semantic.recall(user_message, k=3)
history = await self.episodic.recall(self.agent_id, since=session_start, limit=10)
skills = self.procedural.relevant_skills(user_message)
# Build context
context = f"""
Relevant knowledge: {facts}
Recent history: {history}
Available skills: {skills}
User message: {user_message}
"""
# Record this interaction
response = await self.llm.complete(context)
await self.episodic.record(Episode(type="user_message", content=user_message, ...))
await self.episodic.record(Episode(type="agent_response", content=response, ...))
return response
Memory compression and forgetting
Episodic memory grows without bound. Two strategies:
Sliding window: Keep only the last N episodes. Simple, loses history.
Hierarchical summarization: Periodically summarize old episodes into semantic memory, then delete the raw episodes:
async def compress_old_episodes(agent_id: str, cutoff: datetime):
old_episodes = await episodic.recall(agent_id, since=epoch, until=cutoff)
summary = await llm.complete(f"Summarize these agent actions: {old_episodes}")
await semantic.remember(summary, metadata={"type": "episodic_summary", "period": str(cutoff)})
await episodic.delete_before(agent_id, cutoff)
This preserves the essence of old experiences without keeping every raw episode.
FAQ
What are the main types of AI agent memory? Three types: semantic (factual knowledge, retrieved by similarity), episodic (chronological log of events and actions), and procedural (skills and patterns for how to do things). Most production agents need all three.
What is RAG in the context of agent memory? RAG (Retrieval-Augmented Generation) implements semantic memory — it stores facts as vector embeddings and retrieves relevant ones for each query. It's not a complete memory system but handles the "what do I know" part.
How do you prevent agent memory from growing forever? Use hierarchical summarization: periodically summarize old episodic memories into semantic summaries, then delete the raw episodic records. This preserves knowledge while bounding storage growth.
Is a larger context window a substitute for memory systems? Partially. Larger context windows reduce the need for retrieval-based memory for within-session recall. They don't solve cross-session persistence or efficient retrieval from large knowledge bases. Memory systems remain necessary for agents that work across many sessions.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: Building common-knowledge: Persistent Memory for AI Agents · RAG in Production: Architecture That Actually Scales.