turbovec
turbovec is a vector search index written in Rust with Python bindings, built on Google Research’s TurboQuant quantization algorithm. It targets RAG workloads where memory, latency, or privacy matter — a 10M-document corpus that needs 31 GB as float32 fits in ~4 GB while searching faster than FAISS.
What sets it apart
Section titled “What sets it apart”- TurboQuant quantizer — data-oblivious, matches the Shannon lower bound on distortion, with no codebook training and no separate train phase
- Online ingest — add vectors and they are indexed immediately; no train step, no parameter tuning, no rebuilds as the corpus grows
- Faster than FAISS — hand-written NEON (ARM) and AVX-512BW (x86) SIMD kernels beat FAISS
IndexPQFastScanby 12–20% on ARM and match or beat it on x86 - Filtered search — pass an id allowlist (or slot bitmask) to
search(); filtering happens inside the SIMD kernel at 32-vector block granularity, so selective allowlists skip most of the work instead of over-fetching and discarding - Pure local — no managed service, no data leaving the machine or VPC; pairs with any open-source embedding model for an air-gapped RAG stack
Available on PyPI (pip install turbovec) and crates.io (cargo add turbovec). Core types:
TurboQuantIndex— basic add/search/persistIdMapIndex— stable externaluint64ids that survive deletes (add_with_ids,remove, allowlist filtering)
from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)index.add(vectors)scores, indices = index.search(query, k=10)index.write("my_index.tq")Framework integrations
Section titled “Framework integrations”Drop-in replacements for the in-tree reference stores in each framework — same public surface and persistence semantics:
- LangChain — replaces
InMemoryVectorStore - LlamaIndex — replaces
SimpleVectorStore - Haystack — replaces
InMemoryDocumentStore - Agno — replaces
LanceDb
Sources
Section titled “Sources”- turbovec on GitHub — README, API, integrations (accessed 2026-06-08)
- TurboQuant paper (arXiv:2504.19874) — underlying quantization algorithm