Vector DB Search Tuning Cheat Sheet

Balance recall, filters, and latency

Last Updated: November 21, 2025

Focus Areas

Control Why it matters
Distance metric Euclidean for dense, cosine for normalized embeddings.
Dimension reduction PCA + quantization reduce storage and speed up queries.
Filter clauses Apply metadata filters before vector scoring to reduce noise.
Shards/replicas Scale queries by adjusting shard count and replication.

Search Tactics

index.search(query_vector, k=10, filter={'status': 'published'})
Combine semantic score with business metadata.
client.create_index(name='docs', metric='cosine', dimension=1536)
Match your embedding dimension and metric to your model.
index.update_config({'ef': 128, 'pq': 16})
Tune query accuracy vs latency for HNSW/IVF indexes.
reranker(query, candidates)
Use a small cross-encoder to boost top hits.

Summary

Tuning vector search means balancing recall (smart filters + rerankers) and latency (shard configuration + index metrics). Monitor both.

💡 Pro Tip: Log recall + latency after each change so you can rollback noisy embeddings.
← Back to Databases & APIs | Browse all categories | View all cheat sheets