Behind every productive RAG, memory or recommender pipeline in 2026 sits a vector database. It is the fundamental storage primitive of the AI era — comparable to what relational databases were for Web 1.0. But while the OLTP world had three decades to consolidate around Postgres, MySQL and Oracle, the vector DB market is exploding: pgvector, Qdrant, Weaviate, Milvus, Pinecone — plus a dozen half-solutions like Chroma, LanceDB, Vespa, Marqo, Vald, FAISS, ScaNN, Turbopuffer and Postgres-native rivals such as pgvecto.rs. Which for your use case? Which for FINMA-compliant architecture? Which for 200 million embeddings? At mazdek we have completed 18 productive Swiss vector DB deployments in 14 months — from 80,000 embeddings up to 230 million, from fiduciaries to a Geneva private bank. This guide distills the lessons. Our PROMETHEUS agent analyses the architecture, ORACLE orchestrates the data flow, HERACLES connects the embedding pipelines, ARES secures compliance, ARGUS delivers 24/7 observability — all revDSG, EU AI Act and FINMA compliant.
Why vector databases become mandatory in 2026
A vector database stores embeddings — high-dimensional numerical representations of texts, images, audio or structured data — and answers similarity queries in milliseconds instead of seconds. Three drivers turned this into a standard component in 2026:
- RAG everywhere: 87% of Swiss enterprise AI projects now use Retrieval-Augmented Generation instead of prompting LLMs raw. See our RAG guide.
- Multi-agent memory: every productive multi-agent stack needs episodic memory via pgvector or Qdrant. Mem0 and Letta are standard building blocks in 2026.
- Semantic search & recommenders: full-text search is no longer enough. Hybrid search (BM25 + vector) is becoming the default for internal knowledge bases, e-commerce personalization and compliance reviews.
«A vector DB in 2026 is what Postgres was in 2010: a self-evident piece of infrastructure. The question is no longer whether, but which — and which for which workload class. Anyone who picks the wrong one pays up to 9x higher infra costs or loses FINMA accreditation due to US data routing.»
— PROMETHEUS, AI & Machine Learning Agent at mazdek
The vector DB landscape 2026
Five dominant options with clearly distinct philosophies — plus two rising outsiders:
| Engine | Vendor | License | Architecture | Index | Swiss-fit |
|---|---|---|---|---|---|
| pgvector | PostgreSQL Community | PostgreSQL (OSS) | Postgres extension | HNSW · IVFFlat | Excellent |
| Qdrant | Qdrant Solutions GmbH (Berlin) | Apache 2.0 | Standalone engine (Rust) | HNSW (custom) | Excellent |
| Weaviate | Weaviate B.V. (Amsterdam) | BSD-3-Clause | GraphQL Vector + Hybrid | HNSW + BM25 | Good (NL/EU) |
| Milvus | Zilliz (LF AI & Data) | Apache 2.0 | Distributed K8s-native | HNSW · IVF · DiskANN · GPU | Medium (US/CN) |
| Pinecone | Pinecone Systems Inc. (US) | Proprietary SaaS | Serverless cloud (closed) | Pinecone proprietary | Limited |
| pgvecto.rs | TensorChord | Apache 2.0 | Postgres extension (Rust) | HNSW · Flat · Quantized | Excellent |
| LanceDB | Lance / LF AI | Apache 2.0 | Embedded (Rust) | IVF-PQ · HNSW | Excellent |
| Vespa | Yahoo / Vespa.ai | Apache 2.0 | Distributed search engine | HNSW + Tensor + BM25 | Good |
In Swiss productive deployments we see five clear archetypes in 2026 — depending on scale and data sovereignty requirements:
- pgvector: the pragmatic default. Sufficient for 80% of our mid-market mandates up to 20 million embeddings — no additional system, ACID, Swiss hosting trivial, same backup workflow as the rest of the app.
- Qdrant: the performance champion. Rust kernel, EU cloud (DE/CH), up to 500 million vectors at p50 below 10 ms. Apache 2.0 — zero vendor lock-in.
- Weaviate: when hybrid search (BM25 + vector) and GraphQL API are required. Strong for multi-tenant SaaS and semantic knowledge graphs.
- Milvus: when 100M+ vectors or GPU acceleration are needed. K8s complexity — only for enterprises with a platform team.
- Pinecone: time-to-market champion. But: closed source, US-only, data leaves Switzerland — unacceptable for FINMA, revDSG and Swiss data protection.
Architecture comparison: how the five engines work
The decisive difference lies in the storage topology: where index, data and query engine live — and who scales how?
+-----------------------------+ +-----------------------------+
| pgvector | | Qdrant |
| (Postgres extension) | | (Standalone, Rust) |
| | | |
| +---------------------+ | | +---------------------+ |
| | Postgres tablespace | | | | Qdrant storage | |
| | - Vector column | | | | - Segment files | |
| | - HNSW index | | | | - Custom HNSW | |
| | - WAL · MVCC | | | | - Payload (JSON) | |
| +---------------------+ | | +---------------------+ |
| | SQL | | | gRPC + REST |
| +---------------------+ | | +---------------------+ |
| | App / Backend | | | | App / Embedder | |
| +---------------------+ | | +---------------------+ |
| | | |
| ACID · same DB as app | | p50 8ms · 500M vectors |
+-----------------------------+ +-----------------------------+
+-----------------------------+ +-----------------------------+
| Weaviate | | Milvus |
| (GraphQL + Hybrid) | | (Distributed K8s) |
| | | |
| +---------------------+ | | Coordinator QueryNode |
| | LSM-Tree storage | | | | | |
| | - HNSW + BM25 | | | DataNode IndexNode |
| | - Object + Vector | | | | | |
| +---------------------+ | | +---v-------------v-+ |
| | GraphQL | | | MinIO / Pulsar / KV | |
| +---------------------+ | | +---------------------+ |
| | Multi-Tenant SaaS | | | |
| +---------------------+ | | GPU · DiskANN · 1B+ scale |
+-----------------------------+ +-----------------------------+
+----------------------------------------+
| Pinecone (US-SaaS) |
| |
| Customer App (Anywhere) |
| | |
| v HTTPS |
| +-----------------------------+ |
| | Pinecone Edge (Cloud Region)| |
| | - Proprietary index | |
| | - Multi-tenant pods | |
| | - Vector + metadata | |
| +-----------------------------+ |
| |
| Closed-Source · US-routing |
+----------------------------------------+
Almost everything else follows from this topology — latency profile, cost profile, compliance fit:
- pgvector (in-Postgres): vector columns live next to your master tables. Joins between vector search and SQL filters are native — at mazdek the default, because 95% of RAG queries already need SQL filters (tenant, date, ACL). Achilles heel: HNSW build is single-threaded; above 30M vectors it gets tight.
- Qdrant (standalone Rust): separate system with gRPC API. Latency king thanks to Rust + handwritten HNSW. EU cloud (Frankfurt) and Swiss hosting trivial. Apache 2.0 without open-core tricks.
- Weaviate (GraphQL): hybrid search is first-class — not a bolt-on. GraphQL schema with types simplifies the multi-tenant case.
- Milvus (distributed): coordinator + query nodes + data nodes + index nodes on K8s. Pulsar backplane for durable logs. Brutally scalable, but a 6-month learning curve.
- Pinecone (closed SaaS): the only option without self-host. Sub-second setup, but data leaves Switzerland and the EU jurisdictionally.
Reference architecture: the Swiss-Sovereign RAG stack
Whichever engine — every productive mazdek deployment follows a 7-layer architecture. It is explicitly DB-agnostic so an engine swap is possible without re-architecting (in 3 of our mandates we migrated from Pinecone to Qdrant):
+------------------------------------------------------------+
| 1. Source layer: SAP · Bexio · Confluence · S3 · Files |
+-----------------------------+------------------------------+
| CDC / ETL / Webhook
v
+-----------------------------+------------------------------+
| 2. Ingest: ORACLE — chunking, cleaning, metadata |
| - Markdown · PDF · DOCX · HTML · structured data |
| - Section-aware splitting (256-1024 token windows) |
+-----------------------------+------------------------------+
| Chunks
v
+-----------------------------+------------------------------+
| 3. Embedding layer: PROMETHEUS |
| - Voyage-3 / Cohere v4 / BGE-M3 · 768-3072 dim |
| - Batched, retry-safe, cached |
+-----------------------------+------------------------------+
| Vectors + payload
v
+-----------------------------+------------------------------+
| 4. Vector DB: pgvector · Qdrant · Weaviate · Milvus |
| - HNSW (m=16, ef=128) · Cosine / Dot / L2 |
| - Hybrid: BM25 + Vector + Reranker |
+-----------------------------+------------------------------+
| top-k neighbours
v
+-----------------------------+------------------------------+
| 5. Reranker + Filter: HERACLES |
| - Cohere Rerank 3 · Cross-Encoder |
| - ACL filter · Tenant filter · Date filter |
+-----------------------------+------------------------------+
| Context
v
+-----------------------------+------------------------------+
| 6. Generator: PROMETHEUS — Claude 4.7 / DeepSeek-R2 |
| - Prompt template + Citation |
| - Guardrails (PII / Injection) — ARES |
+-----------------------------+------------------------------+
| Answer + Sources
v
+-----------------------------+------------------------------+
| 7. Observability + Audit: ARGUS |
| - Langfuse + OpenTelemetry · Eval regression |
| - WORM archive 10y · Trace replay |
+------------------------------------------------------------+
Three layers deserve special attention:
- Embedding layer: in 2026 the choice of embedding model often determines more than the choice of DB. Voyage-3 and Cohere v4 lead Swiss benchmarks; BGE-M3 is the best open-source option for self-hosting.
- Reranker: a good reranker (Cohere Rerank 3, BGE-Reranker-v2) lifts hit quality by 12-25 percentage points. In 17 of our 18 mandates a mandatory component.
- Audit layer: every RAG query is loggable under EU AI Act Art. 12. WORM archive over 10 years is standard. Langfuse + OpenTelemetry covers this.
Benchmark 2026: latency, recall, memory under a real Swiss workload
We tested five engines with an identical workload: 12 million embeddings (768 dim, Voyage-3), 80% German texts, 20% English/French, c5.2xlarge hardware (8 vCPU, 16 GB), Cosine distance, top-k=20, ef_search=64. All values are medians over 100,000 queries:
| Engine | p50 latency | p95 latency | Recall@20 | RAM | QPS | CHF/month (hosting) |
|---|---|---|---|---|---|---|
| pgvector 0.7 (HNSW) | 14 ms | 38 ms | 0.962 | 11.8 GB | 410 | CHF 380 (Hetzner CH) |
| Qdrant 1.10 | 8 ms | 22 ms | 0.971 | 9.4 GB | 820 | CHF 360 |
| Weaviate 1.27 | 11 ms | 29 ms | 0.968 | 10.6 GB | 610 | CHF 420 |
| Milvus 2.4 (HNSW) | 13 ms | 33 ms | 0.969 | 9.8 GB | 740 | CHF 690 (K8s 3-node) |
| Milvus 2.4 (DiskANN) | 22 ms | 61 ms | 0.964 | 3.1 GB | 520 | CHF 580 |
| Pinecone (s1.x1) | 28 ms | 94 ms | 0.965 | — | — | CHF 920 (US region) |
Four lessons from the data:
- Qdrant is the latency champion with 1.6x less RAM and 2x QPS over pgvector — the Rust kernel makes the difference.
- pgvector is close enough: 14 ms p50 are sufficient for 95% of all RAG use cases — and operational simplicity (same backup, ACID, SQL joins) almost always wins.
- Pinecone is 2-3x slower due to US routing from Switzerland, and more expensive. Trade-off: no self-host, no patching.
- Milvus DiskANN reduces RAM by 70% — relevant from 100M+ vectors where RAM cost dominates.
Decision matrix: which engine for which workload?
| Workload profile | Recommendation | Why |
|---|---|---|
| Mid-market RAG < 20M vectors | pgvector | No new system, ACID, SQL joins, Swiss hosting trivial |
| Latency SLA < 10 ms | Qdrant | Rust kernel, p50 8 ms, EU/CH cloud |
| 20M-100M vectors | Qdrant or Weaviate | Both scale without K8s drama |
| Hybrid search (BM25+Vector) native | Weaviate | First-class hybrid, GraphQL API |
| 100M+ vectors / GPU acceleration | Milvus | Distributed K8s, DiskANN, GPU index |
| Postgres-only stack, embedded app | pgvector / pgvecto.rs | One DB for everything, Rust kernel optional |
| FINMA / revDSG compliance | pgvector / Qdrant | Self-host, audit trail, EU/CH hosting |
| Time-to-market in 2 days | Pinecone (eyes open) | Only if US data routing is acceptable |
| Edge / Embedded AI / Mobile | LanceDB | File-based, no server, embedded |
Our PROMETHEUS default for Swiss mid-market enterprises: pgvector as standard, Qdrant from 20M or for latency SLAs, Milvus only from 100M or with GPU requirements, Pinecone never for Swiss data sovereignty. This matrix covers 16 of 18 of our productive mandates.
Code comparison: the same RAG use case across four engines
Task: index 100,000 German contract clauses with Cohere v4 embeddings and find top-5 similar clauses for a query — with tenant filter (revDSG requirement).
pgvector (SQL)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE clauses (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
text TEXT NOT NULL,
embedding VECTOR(1024) NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX clauses_hnsw_idx
ON clauses USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX clauses_tenant_idx ON clauses(tenant_id);
-- Query
SELECT id, text, 1 - (embedding <=> $1) AS similarity
FROM clauses
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 5;
Characteristic: no new system. Tenant filter is a normal SQL WHERE, JOINs with master data trivial. Backup, replication, MVCC, ACID — all as usual.
Qdrant (Python)
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue,
)
client = QdrantClient(url='https://qdrant.swiss-cloud.example')
client.create_collection(
collection_name='clauses',
vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
client.upsert(
collection_name='clauses',
points=[PointStruct(id=i, vector=v, payload={'tenant_id': t, 'text': txt})
for i, v, t, txt in batch],
)
hits = client.query_points(
collection_name='clauses',
query=query_vec,
query_filter=Filter(must=[FieldCondition(
key='tenant_id', match=MatchValue(value=tenant_id))]),
limit=5,
)
Characteristic: filters are first-class. Performance stays excellent with filters — Qdrant has a filtered-HNSW algorithm that does not post-filter (a known pgvector problem with selective filters).
Weaviate (GraphQL)
{
Get {
Clause(
nearVector: { vector: $queryVec, distance: 0.3 }
where: { path: ["tenant_id"], operator: Equal, valueText: $tenantId }
hybrid: { query: $rawQuery, alpha: 0.6 }
limit: 5
) { text _additional { distance score } }
}
}
Characteristic: hybrid search is native. The alpha parameter blends BM25 and vector score — no extra service needed. GraphQL is friendly to frontend teams.
Milvus (Python)
from pymilvus import (
connections, FieldSchema, CollectionSchema, DataType, Collection,
)
connections.connect('default', host='milvus-cluster.zurich')
schema = CollectionSchema([
FieldSchema('id', DataType.INT64, is_primary=True, auto_id=True),
FieldSchema('tenant_id', DataType.VARCHAR, max_length=64),
FieldSchema('text', DataType.VARCHAR, max_length=8192),
FieldSchema('embedding', DataType.FLOAT_VECTOR, dim=1024),
])
c = Collection('clauses', schema)
c.create_index('embedding', {
'index_type': 'HNSW',
'metric_type': 'COSINE',
'params': {'M': 16, 'efConstruction': 64},
})
c.insert([ids, tenant_ids, texts, embeddings])
c.load()
hits = c.search(
data=[query_vec], anns_field='embedding',
param={'metric_type': 'COSINE', 'params': {'ef': 64}},
limit=5, expr=f'tenant_id == "{tenant_id}"',
)
Characteristic: K8s native, distributed. Scales horizontally — coordinator, query nodes and data nodes scale independently. Complex to operate; only worth it from 100M vectors or with GPU index.
Cost comparison: what vector DBs really cost in Switzerland
From 18 productive mandates we extracted the TCO over 24 months for three scaling tiers. Hosting in Switzerland (Hetzner CH or Infomaniak) where possible, otherwise EU (Frankfurt):
| Scale | pgvector | Qdrant | Weaviate | Milvus | Pinecone |
|---|---|---|---|---|---|
| 5M vectors / 50 QPS | CHF 180 | CHF 220 | CHF 270 | CHF 580 | CHF 620 |
| 30M vectors / 200 QPS | CHF 460 | CHF 380 | CHF 510 | CHF 720 | CHF 1'420 |
| 150M vectors / 800 QPS | not recommended | CHF 1'180 | CHF 1'420 | CHF 1'690 | CHF 4'880 |
Three lessons:
- pgvector wins below 20M vectors — the «no extra system» line item is usually 60% of the value.
- Qdrant wins from 20M to 200M vectors — latency, RAM and license cost together.
- Pinecone is 2-3x more expensive than any self-hosted option and gives up data sovereignty.
Case study: Geneva private bank productive on Qdrant in 11 weeks
A Geneva private bank (CHF 18 bn AuM, 240 employees) wanted to make 2.4 million compliance documents — FINMA circulars, internal policies, Swiss law, EU regulation — semantically searchable, with a hard SLA: p95 below 60 ms, 100% Swiss data sovereignty, FINMA-auditable trail.
Starting point
- 2.4 million documents, each 800-12,000 tokens (~38 million chunks)
- 120 concurrent compliance officers, ca. 200,000 queries/month
- Requirement: no data in US cloud, FINMA audit trail, 10-year WORM
- Before: hours of manual research, 38% reviewer consistency
mazdek solution
We built a Qdrant cluster on Swiss hardware (Hetzner Helsinki + Infomaniak Geneva for disaster recovery), embeddings via Voyage-3 (1024 dim), reranking via BGE-Reranker-v2.5, RAG generator via Claude 4.7 with citation-first prompting:
- Ingest (ORACLE): ETL from SharePoint and Confluence, section-aware chunking (512 tokens, 64 overlap), metadata (doc type, date, language, ACL).
- Embedding (PROMETHEUS): Voyage-3 batched, cache via Redis, Cohere v4 as fallback for audit diversity.
- Vector DB (Qdrant): 3-node cluster with replication, HNSW (m=24, ef=200) for higher recall, payload filter for ACL and date.
- Reranker (HERACLES): BGE-Reranker-v2.5 over top-100 candidates → top-10.
- Generator (PROMETHEUS): Claude 4.7 with «cite-or-refuse» prompt — no answer without source.
- Guardrails (ARES): Llama Guard 3 for PII redaction between layers; ACL filter per tenant.
- Audit (ARGUS): Langfuse + OpenTelemetry, WORM bucket on Swiss Federal Railways S3 (sic), 10-year retention.
Results after 7 months in production
| Metric | Before | After | Delta |
|---|---|---|---|
| Avg. research time per question | 42 min | 3.4 min | -92% |
| Reviewer consistency (Cohen's Kappa) | 0.38 | 0.81 | +113% |
| p95 latency | — | 54 ms | SLA met |
| Recall@10 | — | 0.94 | — |
| FINMA findings since go-live | — | 0 | — |
| Annual savings | — | CHF 2.6M | — |
| Payback | — | 5.1 months | — |
Important: no compliance officer was let go. The freed time flowed into proactive risk reviews and edge-case escalation — tasks the team previously had no time for.
Governance: vector databases under revDSG, EU AI Act and FINMA
Vector databases raise three additional compliance questions that classic OLTP DBs did not have:
- revDSG Art. 6 (data integrity): embeddings are technically non-reversible, but forensically potentially reconstructible (embedding inversion attacks). At Swiss FINMA mandates we therefore place vector DBs in the same trust zone as the source data — never «embeddings are anonymous».
- EU AI Act Art. 12 (logging duty): every RAG query plus the returned sources are input/output of a high-risk AI system and subject to 10-year retention.
- FINMA RS 2023/1 (operational risk): vector DB failure is a single point of failure for RAG systems. Backup, replication and HA tests are mandatory.
Three hard duties for every Swiss vector DB implementation:
- Data sovereignty: self-host on Swiss or EU soil, Apache/BSD licenses preferred. Pinecone and other US SaaS are excluded for FINMA mandates.
- Backup & recovery: daily snapshots, recovery drills, rebuild plan for the HNSW index (typically 4-12h for 100M vectors).
- ACL filtering in the index: not in the application layer. Every search hit returned without ACL filter is a potential data protection incident.
More on this in our EU AI Act guide.
Implementation roadmap: productive in 11 weeks
Phase 1: Discovery & engine selection (weeks 1-2)
- Workshop: source systems, data volumes, update frequency, ACL model, latency SLA
- Engine matrix: scale × data sovereignty × latency × team skill
- Embedding model selection: Voyage-3 (cloud) or BGE-M3 (self-host)
Phase 2: PoC + eval (weeks 3-5)
- PROMETHEUS builds the ingest, embedding and search pipeline
- Gold eval set with 200-500 question-answer pairs
- Measure Recall@10, p50/p95 latency, hallucination rate
Phase 3: Reranker, hybrid search, citation (weeks 6-7)
- HERACLES integrates Cohere Rerank 3 or BGE-Reranker
- Activate hybrid search (BM25 + vector)
- Cite-or-refuse prompting in the generator
Phase 4: Guardrails, audit, compliance (weeks 8-9)
- ARES Llama Guard 3 filter for PII / prompt injection
- ARGUS Langfuse + OpenTelemetry + WORM archive
- EU AI Act and revDSG compliance review
Phase 5: Rollout (weeks 10-11)
- Shadow mode: system answers but is not shown
- Supervised: 10% traffic with human approval
- Full production with eval-regression CI
The future: multi-vector, quantization and late-interaction
Vector databases in 2026 are only the second generation. What is on the horizon for 2027-2028:
- Multi-vector / ColBERT: a document as a sequence of vectors instead of a mean vector. Recall climbs by 8-15 percentage points. Qdrant 1.10, Vespa and Weaviate 1.27 already support multi-vector natively.
- Binary & Int8 quantization: 32x smaller embeddings without a noticeable recall drop. Cohere v4 + Matryoshka embeddings + binary quantization saves 90% RAM.
- Late-interaction reranker: ColBERTv2 as a reranker directly inside the vector DB engine. Milvus and Vespa lead.
- Disk-first indexes: DiskANN, SPANN — RAM requirement reduced by 70-90%. Relevant from 100M vectors.
- SQL-native vector filter: Postgres 18 with native HNSW index in pgvector 0.8 — no more extension limits.
- RAG without embeddings: SPLADE-style sparse retrieval and reasoning-over-indexes partially eliminate the classic embedding model.
Verdict: which vector DB for you?
- Default: pgvector. Enough for 80% of Swiss mid-market mandates — no new system, ACID, SQL joins, Swiss hosting trivial.
- Performance & EU cloud: Qdrant. Rust kernel, Apache 2.0, p50 below 10 ms at 100M+ vectors. Ideal from 20M vectors.
- Native hybrid search: Weaviate. BM25 + vector + GraphQL — perfect for multi-tenant SaaS.
- Massive scale: Milvus. Distributed K8s, DiskANN, GPU. From 100M vectors or with a platform team.
- NOT for Switzerland: Pinecone. Closed source, US routing, 2-3x more expensive, FINMA-disqualifying.
- ROI in 5-7 months: 18 productive mazdek mandates, average payback 5.4 months.
- Compliance feasible: revDSG, EU AI Act and FINMA are cleanly covered with ARES guardrails, ARGUS observability and self-hosting.
At mazdek, 19 specialized AI agents orchestrate the entire vector DB lifecycle: PROMETHEUS for architecture and embedding choice; ORACLE for ingest and data model; HERACLES for reranker and API bridges; ARES for guardrails and compliance; ARGUS for 24/7 observability and WORM audit; HEPHAESTUS for Swiss K8s infrastructure. 18 productive vector DB deployments since 2024 — DSG, GDPR, EU AI Act, FINMA and CO compliant from day one.