pgvector: Vector Search in PostgreSQL — Complete Guide
pgvector adds vector similarity search to PostgreSQL. Here's the complete guide: installation, indexing (ivfflat vs HNSW), hybrid search, and production tuning.
pgvector is the PostgreSQL extension that adds vector similarity search — enabling semantic search, RAG retrieval, recommendation systems, and anomaly detection without leaving your existing PostgreSQL infrastructure. Here's the complete production guide.
Installation
-- Enable the extension (requires pgvector installed)
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify
SELECT * FROM pg_extension WHERE extname = 'vector';
On AWS RDS: pgvector is available on PostgreSQL 15+ via CREATE EXTENSION vector. No additional installation needed.
On Supabase: available by default — CREATE EXTENSION vector.
On self-hosted PostgreSQL:
# Ubuntu/Debian
apt-get install postgresql-15-pgvector
# From source
git clone https://github.com/pgvector/pgvector
cd pgvector && make && make install
Adding vector columns
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI text-embedding-3-small dimension
source TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
Dimension must match your embedding model: | Model | Dimensions | |-------|-----------| | OpenAI text-embedding-3-small | 1536 | | OpenAI text-embedding-3-large | 3072 | | Cohere embed-english-v3 | 1024 | | Nomic embed-text-v1 | 768 | | BGE-large-en-v1.5 | 1024 |
Inserting vectors
import openai
import psycopg2
def embed(text: str) -> list[float]:
resp = openai.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return resp.data[0].embedding
def insert_document(conn, content: str, tenant_id: str, source: str):
embedding = embed(content)
with conn.cursor() as cur:
cur.execute(
"""
INSERT INTO documents (tenant_id, content, embedding, source)
VALUES (%s, %s, %s::vector, %s)
""",
(tenant_id, content, embedding, source)
)
conn.commit()
Similarity search operators
pgvector supports three distance operators:
-- Cosine distance (most common for text embeddings)
SELECT content, 1 - (embedding <=> $1) AS similarity
FROM documents WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 10;
-- L2 (Euclidean) distance
SELECT content FROM documents
ORDER BY embedding <-> $1
LIMIT 10;
-- Inner product (for normalized vectors)
SELECT content FROM documents
ORDER BY embedding <#> $1
LIMIT 10;
Use cosine distance (<=>) for text embeddings — it's invariant to vector magnitude and works best for semantic similarity tasks.
Indexing: ivfflat vs HNSW
Without an index, pgvector does exact nearest-neighbor search (full table scan). For production, you need an approximate index:
ivfflat (Inverted File with Flat Quantization)
-- Create ivfflat index
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Set probes at query time (higher = more accurate, slower)
SET ivfflat.probes = 10;
lists: number of clusters (guideline:sqrt(num_rows))probes: how many clusters to search (1–lists; higher = better recall, slower)- Build time: fast
- Memory: lower than HNSW
Best for: < 1M vectors, frequent inserts (index updates are fast), memory-constrained environments.
HNSW (Hierarchical Navigable Small World)
-- Create HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Set ef at query time
SET hnsw.ef_search = 40;
m: number of connections per node (higher = better recall, more memory)ef_construction: size of candidate list during build (higher = better recall, slower build)ef_search: size of candidate list during search (higher = better recall, slower query)
Best for: < 5M vectors, high-recall requirements, moderate insert rate (HNSW builds are slower than ivfflat).
Performance comparison (1M vectors, 1536 dims, 1× t3.medium)
| Index | Build time | Query time (P50) | Recall@10 | |-------|-----------|-----------------|-----------| | None (exact) | — | 450ms | 100% | | ivfflat (lists=1000, probes=10) | 8 min | 15ms | 95% | | HNSW (m=16, ef=64, ef_search=40) | 25 min | 8ms | 98% |
HNSW: faster queries, better recall, slower build, more memory. ivfflat: faster builds, lower memory, slightly lower recall. For most production cases: HNSW.
Hybrid search: vector + full-text
Combining semantic similarity with keyword search improves results significantly:
-- Add full-text search index
CREATE INDEX ON documents USING GIN (to_tsvector('english', content));
-- Hybrid query: rank by combined score
WITH semantic AS (
SELECT id, content, 1 - (embedding <=> $1::vector) AS sem_score
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 20
),
keyword AS (
SELECT id, content,
ts_rank(to_tsvector('english', content), plainto_tsquery('english', $3)) AS kw_score
FROM documents
WHERE tenant_id = $2
AND to_tsvector('english', content) @@ plainto_tsquery('english', $3)
LIMIT 20
),
combined AS (
SELECT id, content,
COALESCE(s.sem_score, 0) + COALESCE(k.kw_score, 0) AS total_score
FROM semantic s
FULL OUTER JOIN keyword k USING (id)
)
SELECT * FROM combined ORDER BY total_score DESC LIMIT 10;
Filtering with metadata
pgvector supports metadata filtering via standard SQL WHERE:
-- Filter by tenant + date, then search
SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
AND created_at > NOW() - INTERVAL '30 days'
AND source = 'blog'
ORDER BY embedding <=> $1::vector
LIMIT 10;
Important: Put metadata filters BEFORE the ORDER BY. PostgreSQL applies WHERE first, then sorts. Filtering after sort is inefficient.
For high-cardinality metadata filters, add partial indexes:
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WHERE source = 'blog';
Production checklist
-- 1. Confirm index exists
\d+ documents
-- 2. Check index usage
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM documents ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector LIMIT 10;
-- 3. Monitor index size
SELECT pg_size_pretty(pg_indexes_size('documents'));
-- 4. Check connection pool (pgBouncer, not session mode)
-- HNSW: uses SET hnsw.ef_search — needs session mode or connection-level setting
-- Use prepared statements with SET: SET hnsw.ef_search = 40; SELECT ...
FAQ
What is pgvector? pgvector is a PostgreSQL extension that adds vector data types and similarity search operators (cosine, L2, inner product) plus approximate nearest neighbor indexes (ivfflat, HNSW). It enables semantic search and RAG retrieval within your existing PostgreSQL database.
Which is better: ivfflat or HNSW? HNSW for most production use cases — faster queries, better recall. ivfflat for memory-constrained environments or datasets with very frequent inserts. Both are production-grade.
What's the maximum number of vectors pgvector can handle? Practically, 5–10M vectors with HNSW indexing on a well-configured PostgreSQL instance (16+ GB RAM). Beyond 10M vectors, dedicated vector databases (Pinecone, Weaviate) offer better scalability.
Does pgvector support filtering on metadata? Yes — standard SQL WHERE clauses filter before vector search. For high-cardinality filters, use partial HNSW indexes on specific metadata values.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: RAG in Production: Architecture That Actually Scales · Vector Databases Compared: pgvector vs Pinecone vs Weaviate.