A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

In this tutorial, we construct a full pgvector playground inside Google Colab and discover how PostgreSQL can work as a highly effective vector database for contemporary AI purposes. We begin by putting in PostgreSQL, compiling the pgvector extension, connecting via Psycopg, and registering vector varieties for clean Python integration. Then, we create embeddings with SentenceTransformers, retailer them in PostgreSQL, construct HNSW indexes, and run semantic search, filtered search, distance metric comparisons, half-precision storage, binary quantization, sparse vector search, hybrid retrieval, and vector aggregation. Through this workflow, we learn the way pgvector helps sensible retrieval-augmented era, suggestion, similarity search, and hybrid search methods utilizing solely open-source instruments.

Copy Code

import os
import subprocess
import sys
import time
def sh(cmd: str, examine: bool = True):
   """Run a shell command, streaming a compact log."""
   print(f"  $ {cmd}")
   return subprocess.run(cmd, shell=True, examine=examine,
                         stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
print("[0/10] Installing PostgreSQL + constructing pgvector (≈1–2 min)...")
sh("apt-get -qq replace")
sh("apt-get -qq set up -y postgresql postgresql-contrib "
  "postgresql-server-dev-all build-essential git")
if not os.path.exists("/tmp/pgvector"):
   sh("git clone --depth 1 https://github.com/pgvector/pgvector.git /tmp/pgvector")
sh("cd /tmp/pgvector && make && make set up")
sh("service postgresql begin")
time.sleep(3)
sh("""sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" """)
print("[0/10] Installing Python packages...")
sh(f"{sys.executable} -m pip set up -q pgvector psycopg[binary] "
  f"sentence-transformers numpy")

We arrange the whole PostgreSQL and pgvector surroundings. We set up the required system packages, clone and construct pgvector from supply, begin the PostgreSQL service, and configure the database password. We additionally set up the Python dependencies wanted to join to PostgreSQL and work with vector embeddings.

Copy Code

import numpy as np
import psycopg
from pgvector import HalfVector, SparseVector
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
print("n[1/10] Connecting and enabling the 'vector' extension...")
conn = psycopg.join(
   "host=127.0.0.1 port=5432 dbname=postgres consumer=postgres password=postgres",
   autocommit=True,
)
conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
register_vector(conn)
ver = conn.execute("SELECT extversion FROM pg_extension WHERE extname='vector'").fetchone()[0]
print(f"      pgvector model: {ver}")
print("n[2/10] Loading embedding mannequin + encoding corpus...")
mannequin = SentenceTransformer("all-MiniLM-L6-v2")
DIM = mannequin.get_sentence_embedding_dimension()
corpus = [
   ("Octopuses have three hearts and blue blood.",             "animals"),
   ("Transformers revolutionized natural language processing.","technology"),
   ("Quantum computers exploit superposition and entanglement.","technology"),
   ("GPUs accelerate deep learning by parallelizing matrix math.","technology"),
   ("Sourdough bread relies on wild yeast and lactobacilli.",  "food"),
   ("Dark chocolate contains flavonoid antioxidants.",         "food"),
   ("A black hole's gravity is so strong light cannot escape.","space")
]
contents   = [c for c, _ in corpus]
classes = [k for _, k in corpus]
embeddings = mannequin.encode(contents, normalize_embeddings=True)
conn.execute("DROP TABLE IF EXISTS paperwork")
conn.execute(f"""
   CREATE TABLE paperwork (
       id        bigserial PRIMARY KEY,
       content material   textual content,
       class  textual content,
       embedding vector({DIM})
   )
""")
with conn.cursor() as cur:
   cur.executemany(
       "INSERT INTO paperwork (content material, class, embedding) VALUES (%s, %s, %s)",
       record(zip(contents, classes, [np.asarray(e) for e in embeddings])),
   )
print(f"      Inserted {len(corpus)} paperwork with {DIM}-d embeddings.")

We join to PostgreSQL, allow the pgvector extension, and register vector help with Psycopg. We load the SentenceTransformers mannequin, outline a small textual content corpus, generate normalized embeddings, and create a PostgreSQL desk for storing paperwork. We then insert every doc with its class and vector illustration in order that we are able to carry out semantic search later.

Copy Code

print("n[3/10] Building HNSW index and operating semantic search...")
conn.execute(
   "CREATE INDEX ON paperwork USING hnsw (embedding vector_cosine_ops) "
   "WITH (m = 16, ef_construction = 64)"
)
conn.execute("SET hnsw.ef_search = 100")
def semantic_search(question: str, ok: int = 4):
   q = np.asarray(mannequin.encode(question, normalize_embeddings=True))
   return conn.execute(
       "SELECT content material, class, embedding <=> %s AS distance "
       "FROM paperwork ORDER BY distance LIMIT %s",
       (q, ok),
   ).fetchall()
for content material, cat, dist in semantic_search("animals which can be unusually fast"):
   print(f"      {dist:.3f}  [{cat:<10}] {content material}")
print("n[4/10] Filtered search (solely class = 'house')...")
q = np.asarray(mannequin.encode("objects with excessive gravity", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content material, embedding <=> %s AS distance "
   "FROM paperwork WHERE class = %s ORDER BY distance LIMIT 3",
   (q, "house"),
).fetchall()
for content material, dist in rows:
   print(f"      {dist:.3f}  {content material}")
print("n[5/10] Same question below completely different distance metrics (high hit every)...")
q = np.asarray(mannequin.encode("brewing a sizzling caffeinated drink", normalize_embeddings=True))
for op, label in [("<->", "L2"), ("<=>", "cosine"), ("<#>", "neg-inner"), ("<+>", "L1")]:
   content material, rating = conn.execute(
       f"SELECT content material, embedding {op} %s AS s FROM paperwork ORDER BY s LIMIT 1", (q,)
   ).fetchone()
   print(f"      {label:<10} {rating:+.3f}  {content material}")

We construct an HNSW index on the embedding column to allow quicker, extra environment friendly vector search. We outline a semantic search operate that converts a question into an embedding and retrieves essentially the most comparable paperwork utilizing cosine similarity. We additionally carry out metadata-filtered search and examine completely different pgvector distance operators reminiscent of L2, cosine, destructive interior product, and L1.

Copy Code

print("n[6/10] Half-precision storage with halfvec...")
conn.execute(f"ALTER TABLE paperwork ADD COLUMN IF NOT EXISTS embedding_half halfvec({DIM})")
conn.execute("UPDATE paperwork SET embedding_half = embedding::halfvec")
conn.execute(
   "CREATE INDEX ON paperwork USING hnsw (embedding_half halfvec_cosine_ops)"
)
q_half = HalfVector(mannequin.encode("the galaxy we dwell in", normalize_embeddings=True))
rows = conn.execute(
   "SELECT content material, embedding_half <=> %s AS d FROM paperwork ORDER BY d LIMIT 2",
   (q_half,),
).fetchall()
for content material, d in rows:
   print(f"      {d:.3f}  {content material}")
print("n[7/10] Binary quantization (Hamming) + precise re-rank...")
conn.execute(
   f"CREATE INDEX ON paperwork "
   f"USING hnsw ((binary_quantize(embedding)::bit({DIM})) bit_hamming_ops)"
)
q = np.asarray(mannequin.encode("parallel {hardware} for AI coaching", normalize_embeddings=True))
rerank_sql = f"""
   SELECT content material, candidates.embedding <=> %(q)s AS exact_distance
   FROM (
       SELECT content material, embedding
       FROM paperwork
       ORDER BY binary_quantize(embedding)::bit({DIM})
             <~> binary_quantize(%(q)s)::bit({DIM})
       LIMIT 8
   ) AS candidates
   ORDER BY exact_distance
   LIMIT 3
"""
for content material, d in conn.execute(rerank_sql, {"q": q}).fetchall():
   print(f"      {d:.3f}  {content material}")
print("n[8/10] Native sparse vectors...")
conn.execute("DROP TABLE IF EXISTS sparse_items")
conn.execute("CREATE TABLE sparse_items (id bigserial PRIMARY KEY, embedding sparsevec(10))")
sparse_data = [
   SparseVector({0: 1.0, 3: 2.0, 7: 1.5}, 10),
   SparseVector({1: 0.5, 3: 1.0, 9: 3.0}, 10),
   SparseVector({0: 0.2, 4: 2.5, 7: 0.8}, 10),
]
with conn.cursor() as cur:
   cur.executemany("INSERT INTO sparse_items (embedding) VALUES (%s)",
                   [(v,) for v in sparse_data])
query_sparse = SparseVector({0: 1.0, 7: 1.0}, 10)
rows = conn.execute(
   "SELECT id, embedding, embedding <#> %s AS neg_ip "
   "FROM sparse_items ORDER BY neg_ip LIMIT 3",
   (query_sparse,),
).fetchall()
for _id, vec, neg_ip in rows:
   print(f"      id={_id}  inner_product={-neg_ip:.2f}  nnz_indices={vec.indices()}")

We discover superior pgvector storage and retrieval methods past commonplace dense vectors. We convert embeddings into half-precision vectors to scale back storage, use binary quantization with Hamming seek for quick candidate retrieval, and then re-rank outcomes with full-precision vectors. We additionally create sparse vectors and question them utilizing inner-product similarity, which is beneficial for keyword-weighted or SPLADE-style retrieval.

Copy Code

print("n[9/10] Hybrid search (vector + full-text) through RRF...")
user_query = "quick animal"
qvec = np.asarray(mannequin.encode(user_query, normalize_embeddings=True))
hybrid_sql = """
WITH semantic AS (
   SELECT id, RANK() OVER (ORDER BY embedding <=> %(qvec)s) AS rank
   FROM paperwork
   ORDER BY embedding <=> %(qvec)s
   LIMIT 20
),
key phrase AS (
   SELECT d.id,
          RANK() OVER (ORDER BY ts_rank_cd(to_tsvector('english', d.content material), q) DESC) AS rank
   FROM paperwork d, plainto_tsquery('english', %(qtext)s) AS q
   WHERE to_tsvector('english', d.content material) @@ q
   LIMIT 20
)
SELECT d.content material,
      COALESCE(1.0 / (60 + semantic.rank), 0.0)
    + COALESCE(1.0 / (60 + key phrase.rank),  0.0) AS rrf_score
FROM paperwork d
LEFT JOIN semantic ON d.id = semantic.id
LEFT JOIN key phrase  ON d.id = key phrase.id
WHERE semantic.id IS NOT NULL OR key phrase.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 4
"""
for content material, rating in conn.execute(hybrid_sql, {"qvec": qvec, "qtext": user_query}).fetchall():
   print(f"      {rating:.5f}  {content material}")
print("n[10/10] Aggregating vectors with AVG (class centroid)...")
centroid = conn.execute(
   "SELECT AVG(embedding) FROM paperwork WHERE class = %s", ("meals",)
).fetchone()[0]
typical = conn.execute(
   "SELECT content material, embedding <=> %s AS d FROM paperwork "
   "WHERE class = %s ORDER BY d LIMIT 1",
   (np.asarray(centroid), "meals"),
).fetchone()
print(f"      Centroid dim = {len(centroid)}")
print(f"      Most consultant 'meals' doc: {typical[0]}")
print("n Done. You now have a working pgvector playground inside Colab.")
print("   Try enhancing `corpus`, the queries, or swap in your individual embedding mannequin.")

We mix semantic vector search with PostgreSQL full-text search utilizing Reciprocal Rank Fusion. We retrieve outcomes from each semantic and key phrase rankings, merge their scores, and produce a stronger hybrid search output. Finally, we compute the common embedding for a class and use it as a centroid to discover essentially the most consultant doc in that group.

In conclusion, we now have a working pgvector-based retrieval system that runs fully in Google Colab, with out exterior companies or API keys. We used PostgreSQL not simply as a conventional relational database, however as a versatile vector search engine that helps dense vectors, half-precision vectors, binary-quantized retrieval, sparse vectors, full-text search, and aggregation. We additionally noticed how metadata filtering, HNSW indexing, Reciprocal Rank Fusion, and centroid-based evaluation make pgvector helpful for real-world AI search pipelines.

Check out the Full Codes with Notebook here. Also, be at liberty to comply with us on Twitter and don’t neglect to be part of our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The publish A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System appeared first on MarkTechPost.

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

Copilot vs Claude for Excel: Which AI assistant wins for formula building?

DeepSeek Just Released a 3B OCR Model: A 3B VLM Designed for High-Performance OCR and Structured Document Conversion

TwinMind Introduces Ear-3 Model: A New Voice AI Model that Sets New Industry Records in Accuracy, Speaker Labeling, Languages and Price

Using Lift to Turn Research PDFs into Structured JSON with Controlled, Schema-Guided Field-Level Evaluation

Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model for Structured Document AI at Scale

A Step-by-Step Coding Tutorial on NVIDIA PhysicsNeMo: Darcy Flow, FNOs, PINNs, Surrogate Models, and Inference Benchmarking

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!