Last Updated: November 21, 2025
Focus Areas
| Control | Why it matters |
|---|---|
Distance metric
|
Euclidean for dense, cosine for normalized embeddings. |
Dimension reduction
|
PCA + quantization reduce storage and speed up queries. |
Filter clauses
|
Apply metadata filters before vector scoring to reduce noise. |
Shards/replicas
|
Scale queries by adjusting shard count and replication. |
Search Tactics
index.search(query_vector, k=10, filter={'status': 'published'})
Combine semantic score with business metadata.
client.create_index(name='docs', metric='cosine', dimension=1536)
Match your embedding dimension and metric to your model.
index.update_config({'ef': 128, 'pq': 16})
Tune query accuracy vs latency for HNSW/IVF indexes.
reranker(query, candidates)
Use a small cross-encoder to boost top hits.
Summary
Tuning vector search means balancing recall (smart filters + rerankers) and latency (shard configuration + index metrics). Monitor both.
💡 Pro Tip:
Log recall + latency after each change so you can rollback noisy embeddings.