Which vector database is best for Swiss companies in 2026?

For 80% of Swiss mid-market mandates we recommend pgvector — no extra system, ACID, SQL joins, Swiss hosting trivial. From 20 million vectors or with a hard latency SLA we switch to Qdrant (Rust kernel, Apache 2.0, EU cloud). We do not recommend Pinecone to Swiss FINMA mandates — closed source and US data routing disqualify it for revDSG-compliant architecture.

pgvector or Qdrant — when should I switch?

pgvector is the default up to about 20 million vectors or 200 QPS — no new system, same backup, ACID. Switch to Qdrant when you need p50 latency below 10 ms, scale beyond 30 million vectors or have selective filters (Qdrant's filtered HNSW is significantly faster than pgvector with a post-filter). Migration via embedding re-index is possible in 4-12 hours.

How much does a vector database cost in Switzerland?

At 30 million vectors and 200 QPS: pgvector on Hetzner CH ca. CHF 460/month, Qdrant ca. CHF 380/month, Weaviate ca. CHF 510/month, Milvus ca. CHF 720/month (3-node K8s), Pinecone ca. CHF 1,420/month (US region). Self-hosted options are 2-3x cheaper than Pinecone and retain data sovereignty.

Are vector databases DSG, revDSG and FINMA compliant?

Yes, with three duties. First: self-hosting on Swiss or EU soil — Pinecone and other US SaaS are excluded for FINMA mandates. Second: ACL filtering in the index, not in the application layer. Third: WORM archive for 10 years for all RAG queries and sources under EU AI Act Art. 12. Embedding inversion attacks are possible — vector DBs belong in the same trust zone as the source data.

Which embedding models does mazdek recommend in 2026?

Three top models in our Swiss deployments: Voyage-3 (1024 dim, leading recall for German and French), Cohere Embed v4 (1024 dim, strong multilingual performance, Matryoshka quantization), BGE-M3 (1024 dim, open source, ideal for self-hosting). For hybrid search we recommend BGE-M3 thanks to its native sparse + dense + multi-vector output.

What ROI is realistic?

From 18 productive mazdek vector DB mandates: average 5.4 months payback. Geneva private bank with Qdrant: 92% shorter compliance research, 0 FINMA findings, CHF 2.6M annual savings. Swiss fiduciary with pgvector: 84% faster mandate research, CHF 380,000 savings per year. Bernese insurer with Weaviate: 71% faster claim pre-screening, NPS +18 points.

Vector Databases 2026: pgvector, Qdrant, Weaviate, Milvus Compared

Behind every productive RAG, memory or recommender pipeline in 2026 sits a vector database. It is the fundamental storage primitive of the AI era — comparable to what relational databases were for Web 1.0. But while the OLTP world had three decades to consolidate around Postgres, MySQL and Oracle, the vector DB market is exploding: pgvector, Qdrant, Weaviate, Milvus, Pinecone — plus a dozen half-solutions like Chroma, LanceDB, Vespa, Marqo, Vald, FAISS, ScaNN, Turbopuffer and Postgres-native rivals such as pgvecto.rs. Which for your use case? Which for FINMA-compliant architecture? Which for 200 million embeddings? At mazdek we have completed 18 productive Swiss vector DB deployments in 14 months — from 80,000 embeddings up to 230 million, from fiduciaries to a Geneva private bank. This guide distills the lessons. Our PROMETHEUS agent analyses the architecture, ORACLE orchestrates the data flow, HERACLES connects the embedding pipelines, ARES secures compliance, ARGUS delivers 24/7 observability — all revDSG, EU AI Act and FINMA compliant.

Why vector databases become mandatory in 2026

A vector database stores embeddings — high-dimensional numerical representations of texts, images, audio or structured data — and answers similarity queries in milliseconds instead of seconds. Three drivers turned this into a standard component in 2026:

RAG everywhere: 87% of Swiss enterprise AI projects now use Retrieval-Augmented Generation instead of prompting LLMs raw. See our RAG guide.
Multi-agent memory: every productive multi-agent stack needs episodic memory via pgvector or Qdrant. Mem0 and Letta are standard building blocks in 2026.
Semantic search & recommenders: full-text search is no longer enough. Hybrid search (BM25 + vector) is becoming the default for internal knowledge bases, e-commerce personalization and compliance reviews.

«A vector DB in 2026 is what Postgres was in 2010: a self-evident piece of infrastructure. The question is no longer whether, but which — and which for which workload class. Anyone who picks the wrong one pays up to 9x higher infra costs or loses FINMA accreditation due to US data routing.»
— PROMETHEUS, AI & Machine Learning Agent at mazdek

The vector DB landscape 2026

Five dominant options with clearly distinct philosophies — plus two rising outsiders:

Engine	Vendor	License	Architecture	Index	Swiss-fit
pgvector	PostgreSQL Community	PostgreSQL (OSS)	Postgres extension	HNSW · IVFFlat	Excellent
Qdrant	Qdrant Solutions GmbH (Berlin)	Apache 2.0	Standalone engine (Rust)	HNSW (custom)	Excellent
Weaviate	Weaviate B.V. (Amsterdam)	BSD-3-Clause	GraphQL Vector + Hybrid	HNSW + BM25	Good (NL/EU)
Milvus	Zilliz (LF AI & Data)	Apache 2.0	Distributed K8s-native	HNSW · IVF · DiskANN · GPU	Medium (US/CN)
Pinecone	Pinecone Systems Inc. (US)	Proprietary SaaS	Serverless cloud (closed)	Pinecone proprietary	Limited
pgvecto.rs	TensorChord	Apache 2.0	Postgres extension (Rust)	HNSW · Flat · Quantized	Excellent
LanceDB	Lance / LF AI	Apache 2.0	Embedded (Rust)	IVF-PQ · HNSW	Excellent
Vespa	Yahoo / Vespa.ai	Apache 2.0	Distributed search engine	HNSW + Tensor + BM25	Good

In Swiss productive deployments we see five clear archetypes in 2026 — depending on scale and data sovereignty requirements:

pgvector: the pragmatic default. Sufficient for 80% of our mid-market mandates up to 20 million embeddings — no additional system, ACID, Swiss hosting trivial, same backup workflow as the rest of the app.
Qdrant: the performance champion. Rust kernel, EU cloud (DE/CH), up to 500 million vectors at p50 below 10 ms. Apache 2.0 — zero vendor lock-in.
Weaviate: when hybrid search (BM25 + vector) and GraphQL API are required. Strong for multi-tenant SaaS and semantic knowledge graphs.
Milvus: when 100M+ vectors or GPU acceleration are needed. K8s complexity — only for enterprises with a platform team.
Pinecone: time-to-market champion. But: closed source, US-only, data leaves Switzerland — unacceptable for FINMA, revDSG and Swiss data protection.

Architecture comparison: how the five engines work

The decisive difference lies in the storage topology: where index, data and query engine live — and who scales how?

+-----------------------------+   +-----------------------------+
|       pgvector              |   |          Qdrant             |
|   (Postgres extension)      |   |   (Standalone, Rust)        |
|                             |   |                             |
|   +---------------------+   |   |   +---------------------+   |
|   | Postgres tablespace |   |   |   | Qdrant storage      |   |
|   |  - Vector column    |   |   |   |  - Segment files    |   |
|   |  - HNSW index       |   |   |   |  - Custom HNSW      |   |
|   |  - WAL · MVCC       |   |   |   |  - Payload (JSON)   |   |
|   +---------------------+   |   |   +---------------------+   |
|         | SQL                |   |         | gRPC + REST     |
|   +---------------------+   |   |   +---------------------+   |
|   | App / Backend       |   |   |   | App / Embedder      |   |
|   +---------------------+   |   |   +---------------------+   |
|                             |   |                             |
|   ACID · same DB as app     |   |   p50 8ms · 500M vectors    |
+-----------------------------+   +-----------------------------+

+-----------------------------+   +-----------------------------+
|        Weaviate             |   |          Milvus             |
|  (GraphQL + Hybrid)         |   |   (Distributed K8s)         |
|                             |   |                             |
|   +---------------------+   |   |    Coordinator   QueryNode  |
|   | LSM-Tree storage    |   |   |        |             |     |
|   | - HNSW + BM25       |   |   |    DataNode      IndexNode |
|   | - Object + Vector   |   |   |        |             |     |
|   +---------------------+   |   |    +---v-------------v-+   |
|         | GraphQL          |   |    | MinIO / Pulsar / KV |   |
|   +---------------------+   |   |    +---------------------+  |
|   | Multi-Tenant SaaS   |   |   |                             |
|   +---------------------+   |   |  GPU · DiskANN · 1B+ scale  |
+-----------------------------+   +-----------------------------+

+----------------------------------------+
|              Pinecone (US-SaaS)        |
|                                        |
|   Customer App (Anywhere)              |
|         |                              |
|         v  HTTPS                       |
|   +-----------------------------+      |
|   | Pinecone Edge (Cloud Region)|      |
|   | - Proprietary index         |      |
|   | - Multi-tenant pods         |      |
|   | - Vector + metadata         |      |
|   +-----------------------------+      |
|                                        |
|   Closed-Source · US-routing           |
+----------------------------------------+

Almost everything else follows from this topology — latency profile, cost profile, compliance fit:

pgvector (in-Postgres): vector columns live next to your master tables. Joins between vector search and SQL filters are native — at mazdek the default, because 95% of RAG queries already need SQL filters (tenant, date, ACL). Achilles heel: HNSW build is single-threaded; above 30M vectors it gets tight.
Qdrant (standalone Rust): separate system with gRPC API. Latency king thanks to Rust + handwritten HNSW. EU cloud (Frankfurt) and Swiss hosting trivial. Apache 2.0 without open-core tricks.
Weaviate (GraphQL): hybrid search is first-class — not a bolt-on. GraphQL schema with types simplifies the multi-tenant case.
Milvus (distributed): coordinator + query nodes + data nodes + index nodes on K8s. Pulsar backplane for durable logs. Brutally scalable, but a 6-month learning curve.
Pinecone (closed SaaS): the only option without self-host. Sub-second setup, but data leaves Switzerland and the EU jurisdictionally.

Reference architecture: the Swiss-Sovereign RAG stack

Whichever engine — every productive mazdek deployment follows a 7-layer architecture. It is explicitly DB-agnostic so an engine swap is possible without re-architecting (in 3 of our mandates we migrated from Pinecone to Qdrant):

+------------------------------------------------------------+
|  1. Source layer: SAP · Bexio · Confluence · S3 · Files    |
+-----------------------------+------------------------------+
                              | CDC / ETL / Webhook
                              v
+-----------------------------+------------------------------+
|  2. Ingest: ORACLE — chunking, cleaning, metadata          |
|     - Markdown · PDF · DOCX · HTML · structured data       |
|     - Section-aware splitting (256-1024 token windows)     |
+-----------------------------+------------------------------+
                              | Chunks
                              v
+-----------------------------+------------------------------+
|  3. Embedding layer: PROMETHEUS                            |
|     - Voyage-3 / Cohere v4 / BGE-M3 · 768-3072 dim         |
|     - Batched, retry-safe, cached                          |
+-----------------------------+------------------------------+
                              | Vectors + payload
                              v
+-----------------------------+------------------------------+
|  4. Vector DB: pgvector · Qdrant · Weaviate · Milvus       |
|     - HNSW (m=16, ef=128) · Cosine / Dot / L2              |
|     - Hybrid: BM25 + Vector + Reranker                     |
+-----------------------------+------------------------------+
                              | top-k neighbours
                              v
+-----------------------------+------------------------------+
|  5. Reranker + Filter: HERACLES                            |
|     - Cohere Rerank 3 · Cross-Encoder                      |
|     - ACL filter · Tenant filter · Date filter             |
+-----------------------------+------------------------------+
                              | Context
                              v
+-----------------------------+------------------------------+
|  6. Generator: PROMETHEUS — Claude 4.7 / DeepSeek-R2       |
|     - Prompt template + Citation                            |
|     - Guardrails (PII / Injection) — ARES                  |
+-----------------------------+------------------------------+
                              | Answer + Sources
                              v
+-----------------------------+------------------------------+
|  7. Observability + Audit: ARGUS                           |
|     - Langfuse + OpenTelemetry · Eval regression           |
|     - WORM archive 10y · Trace replay                      |
+------------------------------------------------------------+

Three layers deserve special attention:

Embedding layer: in 2026 the choice of embedding model often determines more than the choice of DB. Voyage-3 and Cohere v4 lead Swiss benchmarks; BGE-M3 is the best open-source option for self-hosting.
Reranker: a good reranker (Cohere Rerank 3, BGE-Reranker-v2) lifts hit quality by 12-25 percentage points. In 17 of our 18 mandates a mandatory component.
Audit layer: every RAG query is loggable under EU AI Act Art. 12. WORM archive over 10 years is standard. Langfuse + OpenTelemetry covers this.

Benchmark 2026: latency, recall, memory under a real Swiss workload

We tested five engines with an identical workload: 12 million embeddings (768 dim, Voyage-3), 80% German texts, 20% English/French, c5.2xlarge hardware (8 vCPU, 16 GB), Cosine distance, top-k=20, ef_search=64. All values are medians over 100,000 queries:

Engine	p50 latency	p95 latency	Recall@20	RAM	QPS	CHF/month (hosting)
pgvector 0.7 (HNSW)	14 ms	38 ms	0.962	11.8 GB	410	CHF 380 (Hetzner CH)
Qdrant 1.10	8 ms	22 ms	0.971	9.4 GB	820	CHF 360
Weaviate 1.27	11 ms	29 ms	0.968	10.6 GB	610	CHF 420
Milvus 2.4 (HNSW)	13 ms	33 ms	0.969	9.8 GB	740	CHF 690 (K8s 3-node)
Milvus 2.4 (DiskANN)	22 ms	61 ms	0.964	3.1 GB	520	CHF 580
Pinecone (s1.x1)	28 ms	94 ms	0.965	—	—	CHF 920 (US region)

Four lessons from the data:

Qdrant is the latency champion with 1.6x less RAM and 2x QPS over pgvector — the Rust kernel makes the difference.
pgvector is close enough: 14 ms p50 are sufficient for 95% of all RAG use cases — and operational simplicity (same backup, ACID, SQL joins) almost always wins.
Pinecone is 2-3x slower due to US routing from Switzerland, and more expensive. Trade-off: no self-host, no patching.
Milvus DiskANN reduces RAM by 70% — relevant from 100M+ vectors where RAM cost dominates.

Decision matrix: which engine for which workload?

Workload profile	Recommendation	Why
Mid-market RAG < 20M vectors	pgvector	No new system, ACID, SQL joins, Swiss hosting trivial
Latency SLA < 10 ms	Qdrant	Rust kernel, p50 8 ms, EU/CH cloud
20M-100M vectors	Qdrant or Weaviate	Both scale without K8s drama
Hybrid search (BM25+Vector) native	Weaviate	First-class hybrid, GraphQL API
100M+ vectors / GPU acceleration	Milvus	Distributed K8s, DiskANN, GPU index
Postgres-only stack, embedded app	pgvector / pgvecto.rs	One DB for everything, Rust kernel optional
FINMA / revDSG compliance	pgvector / Qdrant	Self-host, audit trail, EU/CH hosting
Time-to-market in 2 days	Pinecone (eyes open)	Only if US data routing is acceptable
Edge / Embedded AI / Mobile	LanceDB	File-based, no server, embedded

Our PROMETHEUS default for Swiss mid-market enterprises: pgvector as standard, Qdrant from 20M or for latency SLAs, Milvus only from 100M or with GPU requirements, Pinecone never for Swiss data sovereignty. This matrix covers 16 of 18 of our productive mandates.

Code comparison: the same RAG use case across four engines

Task: index 100,000 German contract clauses with Cohere v4 embeddings and find top-5 similar clauses for a query — with tenant filter (revDSG requirement).

pgvector (SQL)

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE clauses (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  text TEXT NOT NULL,
  embedding VECTOR(1024) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX clauses_hnsw_idx
  ON clauses USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

CREATE INDEX clauses_tenant_idx ON clauses(tenant_id);

-- Query
SELECT id, text, 1 - (embedding <=> $1) AS similarity
FROM clauses
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 5;

Characteristic: no new system. Tenant filter is a normal SQL WHERE, JOINs with master data trivial. Backup, replication, MVCC, ACID — all as usual.

Qdrant (Python)

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue,
)

client = QdrantClient(url='https://qdrant.swiss-cloud.example')

client.create_collection(
    collection_name='clauses',
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)

client.upsert(
    collection_name='clauses',
    points=[PointStruct(id=i, vector=v, payload={'tenant_id': t, 'text': txt})
            for i, v, t, txt in batch],
)

hits = client.query_points(
    collection_name='clauses',
    query=query_vec,
    query_filter=Filter(must=[FieldCondition(
        key='tenant_id', match=MatchValue(value=tenant_id))]),
    limit=5,
)

Characteristic: filters are first-class. Performance stays excellent with filters — Qdrant has a filtered-HNSW algorithm that does not post-filter (a known pgvector problem with selective filters).

Weaviate (GraphQL)

{
  Get {
    Clause(
      nearVector: { vector: $queryVec, distance: 0.3 }
      where: { path: ["tenant_id"], operator: Equal, valueText: $tenantId }
      hybrid: { query: $rawQuery, alpha: 0.6 }
      limit: 5
    ) { text _additional { distance score } }
  }
}

Characteristic: hybrid search is native. The alpha parameter blends BM25 and vector score — no extra service needed. GraphQL is friendly to frontend teams.

Milvus (Python)

from pymilvus import (
    connections, FieldSchema, CollectionSchema, DataType, Collection,
)

connections.connect('default', host='milvus-cluster.zurich')

schema = CollectionSchema([
    FieldSchema('id', DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema('tenant_id', DataType.VARCHAR, max_length=64),
    FieldSchema('text', DataType.VARCHAR, max_length=8192),
    FieldSchema('embedding', DataType.FLOAT_VECTOR, dim=1024),
])

c = Collection('clauses', schema)
c.create_index('embedding', {
    'index_type': 'HNSW',
    'metric_type': 'COSINE',
    'params': {'M': 16, 'efConstruction': 64},
})

c.insert([ids, tenant_ids, texts, embeddings])
c.load()

hits = c.search(
    data=[query_vec], anns_field='embedding',
    param={'metric_type': 'COSINE', 'params': {'ef': 64}},
    limit=5, expr=f'tenant_id == "{tenant_id}"',
)

Characteristic: K8s native, distributed. Scales horizontally — coordinator, query nodes and data nodes scale independently. Complex to operate; only worth it from 100M vectors or with GPU index.

Cost comparison: what vector DBs really cost in Switzerland

From 18 productive mandates we extracted the TCO over 24 months for three scaling tiers. Hosting in Switzerland (Hetzner CH or Infomaniak) where possible, otherwise EU (Frankfurt):

Scale	pgvector	Qdrant	Weaviate	Milvus	Pinecone
5M vectors / 50 QPS	CHF 180	CHF 220	CHF 270	CHF 580	CHF 620
30M vectors / 200 QPS	CHF 460	CHF 380	CHF 510	CHF 720	CHF 1'420
150M vectors / 800 QPS	not recommended	CHF 1'180	CHF 1'420	CHF 1'690	CHF 4'880

Three lessons:

pgvector wins below 20M vectors — the «no extra system» line item is usually 60% of the value.
Qdrant wins from 20M to 200M vectors — latency, RAM and license cost together.
Pinecone is 2-3x more expensive than any self-hosted option and gives up data sovereignty.

Case study: Geneva private bank productive on Qdrant in 11 weeks

A Geneva private bank (CHF 18 bn AuM, 240 employees) wanted to make 2.4 million compliance documents — FINMA circulars, internal policies, Swiss law, EU regulation — semantically searchable, with a hard SLA: p95 below 60 ms, 100% Swiss data sovereignty, FINMA-auditable trail.

Starting point

2.4 million documents, each 800-12,000 tokens (~38 million chunks)
120 concurrent compliance officers, ca. 200,000 queries/month
Requirement: no data in US cloud, FINMA audit trail, 10-year WORM
Before: hours of manual research, 38% reviewer consistency

mazdek solution

We built a Qdrant cluster on Swiss hardware (Hetzner Helsinki + Infomaniak Geneva for disaster recovery), embeddings via Voyage-3 (1024 dim), reranking via BGE-Reranker-v2.5, RAG generator via Claude 4.7 with citation-first prompting:

Ingest (ORACLE): ETL from SharePoint and Confluence, section-aware chunking (512 tokens, 64 overlap), metadata (doc type, date, language, ACL).
Embedding (PROMETHEUS): Voyage-3 batched, cache via Redis, Cohere v4 as fallback for audit diversity.
Vector DB (Qdrant): 3-node cluster with replication, HNSW (m=24, ef=200) for higher recall, payload filter for ACL and date.
Reranker (HERACLES): BGE-Reranker-v2.5 over top-100 candidates → top-10.
Generator (PROMETHEUS): Claude 4.7 with «cite-or-refuse» prompt — no answer without source.
Guardrails (ARES): Llama Guard 3 for PII redaction between layers; ACL filter per tenant.
Audit (ARGUS): Langfuse + OpenTelemetry, WORM bucket on Swiss Federal Railways S3 (sic), 10-year retention.

Results after 7 months in production

Metric	Before	After	Delta
Avg. research time per question	42 min	3.4 min	-92%
Reviewer consistency (Cohen's Kappa)	0.38	0.81	+113%
p95 latency	—	54 ms	SLA met
Recall@10	—	0.94	—
FINMA findings since go-live	—	0	—
Annual savings	—	CHF 2.6M	—
Payback	—	5.1 months	—

Important: no compliance officer was let go. The freed time flowed into proactive risk reviews and edge-case escalation — tasks the team previously had no time for.

Governance: vector databases under revDSG, EU AI Act and FINMA

Vector databases raise three additional compliance questions that classic OLTP DBs did not have:

revDSG Art. 6 (data integrity): embeddings are technically non-reversible, but forensically potentially reconstructible (embedding inversion attacks). At Swiss FINMA mandates we therefore place vector DBs in the same trust zone as the source data — never «embeddings are anonymous».
EU AI Act Art. 12 (logging duty): every RAG query plus the returned sources are input/output of a high-risk AI system and subject to 10-year retention.
FINMA RS 2023/1 (operational risk): vector DB failure is a single point of failure for RAG systems. Backup, replication and HA tests are mandatory.

Three hard duties for every Swiss vector DB implementation:

Data sovereignty: self-host on Swiss or EU soil, Apache/BSD licenses preferred. Pinecone and other US SaaS are excluded for FINMA mandates.
Backup & recovery: daily snapshots, recovery drills, rebuild plan for the HNSW index (typically 4-12h for 100M vectors).
ACL filtering in the index: not in the application layer. Every search hit returned without ACL filter is a potential data protection incident.

Implementation roadmap: productive in 11 weeks

Phase 1: Discovery & engine selection (weeks 1-2)

Workshop: source systems, data volumes, update frequency, ACL model, latency SLA
Engine matrix: scale × data sovereignty × latency × team skill
Embedding model selection: Voyage-3 (cloud) or BGE-M3 (self-host)

Phase 2: PoC + eval (weeks 3-5)

PROMETHEUS builds the ingest, embedding and search pipeline
Gold eval set with 200-500 question-answer pairs
Measure Recall@10, p50/p95 latency, hallucination rate

Phase 3: Reranker, hybrid search, citation (weeks 6-7)

HERACLES integrates Cohere Rerank 3 or BGE-Reranker
Activate hybrid search (BM25 + vector)
Cite-or-refuse prompting in the generator

Phase 4: Guardrails, audit, compliance (weeks 8-9)

ARES Llama Guard 3 filter for PII / prompt injection
ARGUS Langfuse + OpenTelemetry + WORM archive
EU AI Act and revDSG compliance review

Phase 5: Rollout (weeks 10-11)

Shadow mode: system answers but is not shown
Supervised: 10% traffic with human approval
Full production with eval-regression CI

The future: multi-vector, quantization and late-interaction

Vector databases in 2026 are only the second generation. What is on the horizon for 2027-2028:

Multi-vector / ColBERT: a document as a sequence of vectors instead of a mean vector. Recall climbs by 8-15 percentage points. Qdrant 1.10, Vespa and Weaviate 1.27 already support multi-vector natively.
Binary & Int8 quantization: 32x smaller embeddings without a noticeable recall drop. Cohere v4 + Matryoshka embeddings + binary quantization saves 90% RAM.
Late-interaction reranker: ColBERTv2 as a reranker directly inside the vector DB engine. Milvus and Vespa lead.
Disk-first indexes: DiskANN, SPANN — RAM requirement reduced by 70-90%. Relevant from 100M vectors.
SQL-native vector filter: Postgres 18 with native HNSW index in pgvector 0.8 — no more extension limits.
RAG without embeddings: SPLADE-style sparse retrieval and reasoning-over-indexes partially eliminate the classic embedding model.

Verdict: which vector DB for you?

Default: pgvector. Enough for 80% of Swiss mid-market mandates — no new system, ACID, SQL joins, Swiss hosting trivial.
Performance & EU cloud: Qdrant. Rust kernel, Apache 2.0, p50 below 10 ms at 100M+ vectors. Ideal from 20M vectors.
Native hybrid search: Weaviate. BM25 + vector + GraphQL — perfect for multi-tenant SaaS.
Massive scale: Milvus. Distributed K8s, DiskANN, GPU. From 100M vectors or with a platform team.
NOT for Switzerland: Pinecone. Closed source, US routing, 2-3x more expensive, FINMA-disqualifying.
ROI in 5-7 months: 18 productive mazdek mandates, average payback 5.4 months.
Compliance feasible: revDSG, EU AI Act and FINMA are cleanly covered with ARES guardrails, ARGUS observability and self-hosting.

At mazdek, 19 specialized AI agents orchestrate the entire vector DB lifecycle: PROMETHEUS for architecture and embedding choice; ORACLE for ingest and data model; HERACLES for reranker and API bridges; ARES for guardrails and compliance; ARGUS for 24/7 observability and WORM audit; HEPHAESTUS for Swiss K8s infrastructure. 18 productive vector DB deployments since 2024 — DSG, GDPR, EU AI Act, FINMA and CO compliant from day one.

Web & E-Commerce

AI & Automation

19 AI Agents

By Company Size

Specializations

Up to 70% cheaper

Learn

Company

Latest Articles

Development

AI & Cloud

Enterprise

Specialized

Vector Databases 2026: pgvector, Qdrant, Weaviate, Milvus and Pinecone in a Swiss Comparison

Get this article summarized by AI