pgvector vs dedicated vector DB

Introduction
Retrieval for LLM apps usually needs fast approximate nearest-neighbor (ANN) search over embeddings. You can store vectors inside PostgreSQL with the pgvector extension, or use a dedicated vector database (for example Qdrant, Milvus, Weaviate, Pinecone). Both approaches solve the same core problem; the tradeoffs are integration and simplicity versus scale and productized vector features.
What both solve
In both cases you index dense vectors (and optionally sparse vectors where the product supports them), run similarity search with a distance metric, and attach metadata (payload) for filtering. The difference is where that index lives and what else runs on the same infrastructure.
pgvector in PostgreSQL
pgvector adds a vector type and operators such as cosine / L2 distance, with indexes like HNSW and IVFFlat. Your embeddings sit in ordinary tables next to the rest of your application data.
In practice that means one row holds your tenancy keys, document identity, human-readable text, JSON metadata, and the embedding in the same table — one transaction can insert or update business fields and the vector together. The dimension below is shortened for readability; production models usually use hundreds or thousands of components per vector.
You must install and enable the pgvector extension (CREATE EXTENSION vector) for the vector type and ANN indexes. The lexical column uses tsvector with a GIN index: that path relies on PostgreSQL’s built-in full-text search (to_tsvector, @@ queries) — no second extension is required for a standard configuration like english. Optional extensions such as unaccent only appear if you need extra linguistics.
| Pros | Cons |
|---|---|
| One database for OLTP + vectors: joins, transactions, foreign keys, backups, and tooling you already use. | Vertical scaling limits on a single primary; very large corpora or extreme QPS (queries per second) may require careful sizing or splitting workload. |
| Filters in SQL: combine vector search with WHERE, JSONB, tenancy, and row-level security in one query. | Heavy ANN load can contend with transactional traffic on the same instance unless you isolate or scale reads. |
| Hybrid retrieval is natural to compose: pgvector plus tsvector / GIN for lexical search, or tag tables — fusion (weighted scores or RRF) is implemented in SQL or application code. | No single vendor “hybrid” primitive in core Postgres; you own score normalization and fusion compared to some vector DB APIs. |
| Recent pgvector versions add halfvec and sparsevec where applicable; still, the mainstream story is dense embeddings plus Postgres full-text for lexical work. | Operational maturity for vector-only SLOs (sharding, replication tuned for ANN) is often stronger in specialized products at the largest scale. |
Dedicated vector databases
Systems such as Qdrant focus on collections, named vectors per point, replication, and APIs aimed at high-throughput ANN and sometimes built-in hybrid query modes.
| Pros | Cons |
|---|---|
| Horizontal scaling and product stories for very large indexes and high QPS (queries per second) without overloading your OLTP database. | Another system to deploy, monitor, secure, and keep in sync with the source of truth (duplication and pipeline complexity). |
| Feature surface tuned for retrieval: multi-vector points, hybrid fusion in one API on many products, collection-level configuration. | Cross-document joins to canonical relational rows often happen in the application or via ETL, not always in one SQL statement. |
| Easier to give vector search its own capacity and SLOs separate from Postgres. | Cost and operational surface area increase (hosted or self-managed). |
On the Benchmarking site, for latency: the difference is tail latency where qdrant wins and the single node throughput pg vector wins.
For more info, check the resources on the bottom.
Does PG-Vector meet your latency and throughput requirements at your target scale? If your collection is under 5 million vectors and concrete query load is moderate,pg vector almost always work if you are at 10 million vectors with hundreds of concurrent queries then you need to test and measure it if it doesn't meet the requirements move to dedicated vector database!
Query sketches
Same shape: bind a query embedding, return the top-k neighbors with optional metadata filter. Postgres expresses it in SQL; Qdrant uses its HTTP/gRPC API (here the official Python client).
Hybrid search and metadata
Hybrid retrieval (dense + lexical) improves robustness: embeddings cover paraphrases; keyword-style signals cover exact product names, IDs, and jargon. In Postgres you combine pgvector with tsvector or tag columns; dedicated stacks often expose fusion in one request. Neither removes the need for good chunking and embedding choice — see Anatomy of RAG for the surrounding pipeline.
There is no universal winner: it depends what you compare and at what scale. Dense + sparse inside a dedicated vector database usually leads when you need very large corpora, high QPS (queries per second), and native sparse and dense ANN (sharded inverted lists plus HNSW-style stacks built for retrieval-only workloads). FTS (full-text search) + dense pgvector in the same Postgres often wins on operational simplicity, a single round-trip, joins with OLTP data, and moderate size and QPS (queries per second) — often cheaper and fast enough until the primary becomes the bottleneck (CPU, I/O, single-node ceilings) or you need Elasticsearch-grade lexical tuning at warehouse scale. FTS (full-text search) + dense is not the same algorithm as learned sparse embeddings; you trade different recall and latency profiles, not only raw speed.
Rule of thumb: start with FTS (full-text search) + pgvector on Postgres when scale is moderate; move to a dedicated stack when profiling shows Postgres cannot meet latency or throughput, or you need sparse plus dense at the largest scale.
One table: lexical + dense in PostgreSQL
One row keeps content, a generated content_tsv, and embedding together. The "sparse" side here is lexical full-text search, not a learned sparse vector over a huge vocabulary.
| Pros | Cons |
|---|---|
| One schema, one transaction, GIN and HNSW on the same table; simplest path to hybrid filters and vector ordering in SQL. | Ranking is ts_rank-style, not BM25; fusion with vector scores is yours to design; single-instance limits still apply. |
Two tables: dense row + explicit sparse weights
The chunk stays on doc_chunks; nonzero sparse dimensions live in a child table (chunk_id, dim, weight). Scoring is a join and sum — workable for modest sparsity, not a drop-in for Milvus-style sparse ANN.
| Pros | Cons |
|---|---|
| Stores true sparse coordinates; index on dim supports intersection-heavy scoring patterns. | More writes and joins; no built-in sparse vector index in core Postgres; hybrid with dense still needs a second query or application-level fusion. |
Dedicated vector DB: native dense + sparse
Products such as Qdrant let you define named dense vectors and sparse vectors on the same collection, then run prefetch subqueries and fusion (for example RRF) on the server. Collection names dense / sparse must match how the collection was created.
| Pros | Cons |
|---|---|
| Engine built for ANN at scale; hybrid query APIs; easier to isolate retrieval SLOs from OLTP Postgres. | Extra system to run and secure; payload sync with source of truth; cross-row relational work stays in the application or downstream. |
When to choose which
| Lean toward | When |
|---|---|
| pgvector | You already run Postgres; corpus and QPS (queries per second) are moderate; you want one stack and SQL filters next to vectors. |
| Dedicated vector DB | You need maximum ANN throughput, very large sharded indexes, or strict isolation of retrieval from OLTP Postgres. |
| Start pgvector, evolve later | Common path: ship RAG on Postgres; if metrics show index size or QPS (queries per second) as the bottleneck, extract hot retrieval to a dedicated service while keeping authority data in Postgres. |
Conclusion
There is no universal winner: pgvector wins on simplicity and colocation with relational data for many applications; dedicated vector databases win when scale, isolation, or vector-first APIs dominate. Measure latency, recall, and cost on your data; for product-level comparisons of vendors, see also Vector databases.