Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm
Vector search underpins most retrieval-augmented era (RAG) pipelines. At scale, it will get costly. Storing 10 million doc embeddings in float32 consumes 31 GB of RAM. For dev groups operating native or on-premise inference, that quantity creates actual constraints.
A new open-source library known as turbovec addresses this instantly. It is a vector index written in Rust with Python bindings. It is constructed on TurboQuant, a quantization algorithm from Google Research. The identical 10-million-document corpus matches in 4 GB with turbovec. On ARM {hardware}, search pace beats FAISS IndexPQFastScan by 12–20%.
The TurboQuant Paper
TurboQuant was launched by Google’s analysis workforce. The Google workforce proposes TurboQuant as a data-oblivious quantizer. It achieves near-optimal distortion charges throughout all bit-widths and dimensions. It requires zero coaching and zero passes over the info.
Most production-grade vector quantizers, together with FAISS’s Product Quantization, requires a codebook coaching step. You should run k-means over a consultant pattern of your vectors earlier than indexing begins. If your corpus grows or shifts, you might must retrain and rebuild the index completely. TurboQuant skips all of that. It makes use of an analytical property of rotated vectors as an alternative of a data-dependent calibration.
How turbovec Quantizes Vectors
The quantization pipeline has 4 steps:
(1) Each vector is normalized. The size (norm) is stripped and saved as a single float. Every vector turns into a unit route on a high-dimensional hypersphere.
(2) A random rotation is utilized. All vectors are multiplied by the identical random orthogonal matrix. After rotation, every coordinate independently follows a Beta distribution. In excessive dimensions, this converges to Gaussian N(0, 1/d). This holds for any enter information — the rotation makes the coordinate distribution predictable.
(3) Lloyd-Max scalar quantization is utilized. Because the distribution is understood analytically, the optimum bucket boundaries and centroids will be precomputed from the maths alone. For 2-bit quantization, which means 4 buckets per coordinate. For 4-bit, it means 16 buckets. No information passes are wanted.
(4) The quantized coordinates are bit-packed into bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes at 2-bit. That is a 16x compression ratio.
At search time, the question is rotated as soon as into the identical area. Scoring occurs instantly in opposition to the codebook values. The scoring kernel makes use of SIMD intrinsics — NEON on ARM and AVX-512BW on fashionable x86, with an AVX2 fallback — with nibble-split lookup tables for throughput.
TurboQuant achieves distortion inside roughly 2.7x of the information-theoretic Shannon decrease certain.
Recall and Speed: The Numbers
All benchmarks use 100K vectors, 1,000 queries, okay=64, and report the median of 5 runs.
For recall, turbovec compares in opposition to FAISS IndexPQ (LUT256, nbits=8, float32 LUT). This is a robust baseline: FAISS makes use of a higher-precision LUT at scoring time and k-means++ for codebook coaching. Despite this, TurboQuant and FAISS are inside 0–1 level at R@1 for OpenAI embeddings at d=1536 and d=3072. Both converge to 1.0 recall by okay=4–8. GloVe at d=200 is tougher. At that dimension, TurboQuant trails FAISS by 3–6 factors at R@1, closing by okay≈16–32.
On pace, ARM outcomes (Apple M3 Max) present turbovec beating FAISS IndexPQFastScan by 12–20% throughout each configuration. On x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs), turbovec wins each 4-bit configuration by 1–6%. It runs inside ~1% of FAISS on 2-bit single-threaded. Two configurations sit barely behind FAISS: 2-bit multi-threaded at d=1536 and d=3072. There, the interior accumulate loop is just too brief for unrolling amortization. FAISS’s AVX-512 VBMI path holds the sting in these two circumstances (2–4%).
Python API
Installation is a single command: pip set up turbovec. The main class is TurboQuantIndex, initialized with a dimension and bit width.
from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(question, okay=10)
index.write("my_index.tq")
A second class, IdMapIndex, helps secure exterior uint64 IDs that survive deletes. Removal is O(1) by ID. This is helpful for doc shops the place vectors are ceaselessly up to date or deleted.
turbovec integrates with LangChain (pip set up turbovec[langchain]), LlamaIndex (pip set up turbovec[llama-index]), and Haystack (pip set up turbovec[haystack]). The Rust crate is out there through cargo add turbovec.
Marktechpost’s Visual Explainer
01 / 07
Key Takeaways
- No codebook coaching. turbovec indexes vectors immediately — no k-means, no rebuilds because the corpus grows.
- 16x compression. A 1536-dim float32 vector shrinks from 6,144 bytes to 384 bytes at 2-bit quantization.
- Faster than FAISS on ARM. turbovec beats FAISS IndexPQFastScan by 12–20% on ARM throughout each configuration.
- Near-optimal distortion. TurboQuant achieves distortion inside ~2.7x of the Shannon decrease certain — provably close to the theoretical restrict.
- Fully native. No managed service, no information egress — pairs with any open-source embedding mannequin for an air-gapped RAG stack.
Check out the Repo here. Also, be happy to comply with us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and many others.? Connect with us
The submit Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm appeared first on MarkTechPost.
