How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama

In this tutorial, we discover how to construct and question a native data base with OpenKB utilizing a free, open mannequin by way of OpenRouter. We securely retrieve the API key with getpass, arrange the atmosphere with out hardcoding secrets and techniques, and initialize a structured, wiki-style data base from scratch. As we transfer by the workflow, we add supply paperwork, generate summaries and idea pages, examine the ensuing wiki construction, run queries, save explorations, and even carry out programmatic evaluation of cross-links and web page relationships. Also, we display how we flip uncooked Markdown paperwork into a navigable, synthesized data system that helps each interactive querying and incremental updates.

Copy Code

import subprocess, sys


def run(cmd, seize=False, cwd=None):
   outcome = subprocess.run(
       cmd, shell=True, textual content=True,
       capture_output=seize, cwd=cwd
   )
   if seize:
       return outcome.stdout.strip(), outcome.stderr.strip()
   return outcome.returncode


print(" Installing OpenKB…")
run("pip set up openkb --quiet")
print(" OpenKB put in.n")


import getpass, os


print("━" * 60)
print("    Secure API Key Setup")
print("━" * 60)
print("  Provider : OpenRouter  (https://openrouter.ai)")
print("  Model    : meta-llama/llama-3.3-70b-instruct:free")
print("  Sign-up  : free, no bank card required")
print("━" * 60)


OPENROUTER_API_KEY = getpass.getpass("nPaste your OpenRouter API key (hidden): ").strip()


if not OPENROUTER_API_KEY:
   increase ValueError(" No API key supplied. Please re-run and enter a legitimate key.")


os.environ["OPENROUTER_API_KEY"] = OPENROUTER_API_KEY
os.environ["LLM_API_KEY"]        = OPENROUTER_API_KEY


LLM_MODEL = "openrouter/meta-llama/llama-3.3-70b-instruct:free"


print(" API key set (not printed). Model:", LLM_MODEL, "n")


import json, textwrap, time, re, shutil
from pathlib import Path
from collections import Counter


KB_DIR   = Path("/content material/my_knowledge_base")
wiki_dir = KB_DIR / "wiki"
raw_dir  = KB_DIR / "uncooked"


def kb_cmd(command: str) -> str:
   stdout, stderr = run(f"openkb {command}", seize=True, cwd=str(KB_DIR))
   return stdout or stderr


def part(title: str):
   bar = "─" * (len(title) + 4)
   print(f"n┌{bar}┐")
   print(f"│  {title}  │")
   print(f"└{bar}┘")


def show_tree(root: Path, indent=0, max_depth=3):
   if indent > max_depth:
       return
   prefix = "  " * indent + ("└─ " if indent else "")
   print(prefix + root.identify + ("/" if root.is_dir() else ""))
   if root.is_dir():
       for youngster in sorted(root.iterdir()):
           show_tree(youngster, indent + 1, max_depth)


def show_md(path: Path, max_lines=35):
   traces = path.read_text().splitlines()
   for line in traces[:max_lines]:
       print(line)
   if len(traces) > max_lines:
       print(f"  … ({len(traces) - max_lines} extra traces)")


def print_wrapped(textual content: str, width=90):
   for line in textual content.splitlines():
       print(textwrap.fill(line, width=width, subsequent_indent="   ") if line else "")

We set up OpenKB and put together the Colab atmosphere to run the total workflow easily. We securely gather the OpenRouter API key utilizing getpass, retailer it in atmosphere variables, and configure the free Llama 3.3 70B mannequin with out hardcoding any secrets and techniques. We additionally import all required libraries, outline the core paths, and create helper capabilities we use all through the tutorial to run instructions, print sections, and examine generated recordsdata.

Copy Code

DOCS = {
   "transformer_architecture.md": textwrap.dedent("""
       # Transformer Architecture


       ## Overview
       The Transformer is a deep studying structure launched in "Attention Is All
       You Need" (Vaswani et al., 2017). It changed recurrent networks with a
       self-attention mechanism, enabling parallel coaching and higher long-range
       dependency modelling.


       ## Key Components
       - **Multi-Head Self-Attention**: Computes consideration in h parallel heads, every
         with its personal discovered Q/Ok/V projections, then concatenates and initiatives.
       - **Feed-Forward Network (FFN)**: Two linear layers with a ReLU activation,
         utilized position-wise.
       - **Positional Encoding**: Sinusoidal or discovered embeddings that inject
         sequence-order data, since consideration is permutation-invariant.
       - **Layer Normalisation**: Applied earlier than (Pre-LN) or after (Post-LN) every
         sub-layer, stabilising gradients.
       - **Residual Connections**: Added round every sub-layer to ease gradient stream.


       ## Encoder vs Decoder
       The encoder stack processes enter tokens bidirectionally (e.g. BERT).
       The decoder stack makes use of causal (masked) consideration over earlier outputs plus
       cross-attention over encoder outputs (e.g. GPT, T5).


       ## Scaling Laws
       Kaplan et al. (2020) confirmed that mannequin loss decreases predictably as a energy
       legislation with compute, information, and parameter depend. This motivated GPT-3 (175B) and
       subsequent massive language fashions.


       ## Limitations
       - Quadratic complexity in sequence size: O(n^2)
       - No inherent recurrence -> long-context challenges
       - High reminiscence footprint throughout coaching


       ## References
       Vaswani et al. (2017). Attention Is All You Need. NeurIPS.
       Kaplan et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
   """),


   "rag_systems.md": textwrap.dedent("""
       # Retrieval-Augmented Generation (RAG)


       ## Definition
       RAG augments a generative LLM with a retrieval step: given a question, related
       paperwork are fetched from a corpus and prepended to the immediate, giving the
       mannequin grounded context past its coaching information.


       ## Architecture
       1. **Indexing Phase** — Documents are chunked, embedded by way of a bi-encoder
          (e.g. text-embedding-3-large), and saved in a vector database (e.g.
          Faiss, Pinecone, Weaviate).
       2. **Retrieval Phase** — The person question is embedded; approximate nearest-
          neighbour (ANN) search returns the top-k chunks.
       3. **Generation Phase** — Retrieved chunks + question are handed to the LLM
          which synthesises a remaining reply.


       ## Variants
       - **Dense Retrieval**: DPR, Contriever — queries and docs in the identical area.
       - **Sparse Retrieval**: BM25 — time period frequency-based, no embeddings wanted.
       - **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
       - **Re-ranking**: A cross-encoder re-scores the top-k earlier than the LLM sees them.


       ## Challenges
       - Context window limits: lengthy retrieved passages might not match.
       - Retrieval high quality is a arduous ceiling on era high quality.
       - Chunking technique considerably impacts recall.
       - Multi-hop questions require iterative retrieval (IRCoT, ReAct).


       ## Relationship to Transformers
       RAG techniques depend on transformer-based encoders for embedding and decoder
       fashions for era. The high quality of the embedding mannequin instantly determines
       retrieval precision and recall.


       ## References
       Lewis et al. (2020). RAG for Knowledge-Intensive NLP Tasks. NeurIPS.
       Gao et al. (2023). RAG for Large Language Models. arXiv:2312.10997.
   """),


   "knowledge_graph_integration.md": textwrap.dedent("""
       # Knowledge Graphs and LLM Integration


       ## What is a Knowledge Graph?
       A data graph (KG) is a directed labelled graph of entities (nodes) and
       relations (edges): (topic, predicate, object) triples, e.g.
       (Vaswani, authored, "Attention Is All You Need").


       ## Why Combine KGs with LLMs?
       LLMs hallucinate information; KGs present structured, verifiable floor fact.
       KGs are arduous to question in pure language; LLMs present the interface.
       Together they permit devoted, grounded, explainable query answering.


       ## Integration Strategies
       ### KG-Augmented Generation (KGAG)
       Retrieve triples or sub-graphs as an alternative of textual content chunks, serialise into textual content,
       then feed to the LLM immediate.


       ### LLM-Assisted KG Construction
       LLMs extract (topic, relation, object) triples from unstructured textual content,
       lowering guide curation effort considerably.


       ### GraphRAG (Microsoft Research, 2024)
       GraphRAG clusters doc communities, generates group summaries, and
       shops them in a KG. Queries answered by map-reduce over group summaries
       outperform flat-vector RAG on sensemaking duties.


       ## Challenges
       - KG building high quality is determined by extraction LLM accuracy.
       - Graph databases add infrastructure complexity.
       - Ontology design requires area experience.
       - KGs go stale with out steady replace pipelines.


       ## Relation to RAG and Transformers
       KG integration addresses two key RAG limitations: lack of structured reasoning
       and incapability to comply with multi-hop relations.


       ## References
       Pan et al. (2023). Unifying LLMs and KGs. IEEE Intelligent Systems.
   """),
}

We outline the pattern supply paperwork that we wish to load into the data base. We put together wealthy Markdown content material on transformer structure, retrieval-augmented era, and data graph integration in order that OpenKB has significant materials to summarise and join. We basically construct the uncooked data corpus right here, which serves as the inspiration for all subsequent indexing, synthesis, and querying steps.

Copy Code

part("Step 1 — Initialise Knowledge Base")


if KB_DIR.exists():
   shutil.rmtree(KB_DIR)
KB_DIR.mkdir(mother and father=True)


config_dir = KB_DIR / ".openkb"
config_dir.mkdir()
(config_dir / "config.yaml").write_text(
   f"mannequin: {LLM_MODEL}nlanguage: ennpageindex_threshold: 20n"
)
(KB_DIR / ".env").write_text(
   f"OPENROUTER_API_KEY={OPENROUTER_API_KEY}n"
   f"LLM_API_KEY={OPENROUTER_API_KEY}n"
)


for sub in ["sources", "summaries", "concepts", "explorations", "reports"]:
   (wiki_dir / sub).mkdir(mother and father=True)


(wiki_dir / "AGENTS.md").write_text(textwrap.dedent("""
   # Wiki Schema


   ## Conventions
   - All pages use Markdown with [[wikilinks]] for cross-references.
   - `summaries/` -- one web page per supply doc.
   - `ideas/`  -- cross-document synthesis pages.
   - `index.md`   -- data base overview.
   - `log.md`     -- operations timeline.


   ## Concept web page template
   # <Concept Title>
   ## Overview
   ## Key Points
   ## Related Concepts
   ## Sources
"""))
(wiki_dir / "index.md").write_text("# Knowledge Base IndexnnNo paperwork listed but.n")
(wiki_dir / "log.md").write_text("# Operations Lognn")


raw_dir.mkdir()
for fname, content material in DOCS.gadgets():
   (raw_dir / fname).write_text(content material)


print(f" Knowledge base initialised at: {KB_DIR}")
print(f"   Model  : {LLM_MODEL}")
print(f"   Docs   : {record(DOCS.keys())}")


part("Step 2 — Compile Documents into the Wiki")


print("Each doc is learn by the LLM, which writes summaries + idea pages.n")


for fname in DOCS:
   doc_path = raw_dir / fname
   print(f"   Adding: {fname}")
   out = kb_cmd(f"add {doc_path}")
   print(textwrap.indent(out[:600], "     "))
   print()
   time.sleep(1)


print(" All paperwork compiled.")


part("Step 3 — Explore the Generated Wiki")


print("n Directory tree (wiki/):n")
show_tree(wiki_dir, max_depth=3)


print("nn wiki/index.md:")
print("─" * 50)
show_md(wiki_dir / "index.md")


print("nn wiki/log.md:")
print("─" * 50)
show_md(wiki_dir / "log.md")


ideas = record((wiki_dir / "ideas").glob("*.md"))
print(f"nn Generated idea pages ({len(ideas)}):")
for cp in sorted(ideas):
   print(f"  • {cp.identify}")


if ideas:
   print(f"nn Sample idea — {ideas[0].identify}:")
   print("─" * 50)
   show_md(ideas[0])

We initialize the OpenKB data base, create the required listing construction, and write the configuration and atmosphere recordsdata wanted by the instrument. We then save the pattern paperwork into the uncooked folder and compile them into the wiki in order that OpenKB can generate summaries, ideas, and cross-linked data pages. After that, we examine the generated wiki construction, preview essential recordsdata such because the index and log, and overview the idea pages generated from our enter paperwork.

Copy Code

part("Step 4 — List Indexed Content & Status")


print("── openkb record ──")
print(kb_cmd("record"))


print("n── openkb standing ──")
print(kb_cmd("standing"))


part("Step 5 — Query the Knowledge Base")


QUERIES = [
   "What is the Transformer architecture and what problem did it solve?",
   "How does RAG differ from a traditional knowledge base like OpenKB?",
   "What are the connections between knowledge graphs, RAG, and transformers?",
   "What are the shared limitations across all three AI topics covered?",
]


for i, question in enumerate(QUERIES, 1):
   print(f"n Query {i}: {question}")
   print("─" * 60)
   print_wrapped(kb_cmd(f'question "{question}"'))


part("Step 6 — Save a Deep Synthesis Query")


deep_query = (
   "Synthesise the important thing architectural themes throughout transformers, RAG, and "
   "data graphs into a unified psychological mannequin."
)
print(f" Query: {deep_query}n")
out = kb_cmd(f'question "{deep_query}" --save')
print_wrapped(out[:800])


explorations = record((wiki_dir / "explorations").glob("*.md"))
if explorations:
   print(f"n Saved → {explorations[-1].identify}")
   print("─" * 50)
   show_md(explorations[-1])


part("Step 7 — Lint: Wiki Health Checks")


print(kb_cmd("lint"))


reviews = record((wiki_dir / "reviews").glob("*.md"))
if reviews:
   print(f"n Report — {reviews[-1].identify}:")
   print("─" * 50)
   show_md(reviews[-1])

We study the listed content material utilizing the built-in record and standing instructions to perceive what OpenKB has created to this point. We then question the data base with a number of more and more complicated questions and observe how the system synthesizes solutions from the saved data. Finally, we save a deeper exploration question into the wiki and run lint checks to consider the well being, consistency, and completeness of the generated data base.

Copy Code

part("Step 8 — Programmatic Wiki Analysis (Python)")


wiki_pages = {}
for md_file in wiki_dir.rglob("*.md"):
   rel     = str(md_file.relative_to(wiki_dir))
   content material = md_file.read_text()
   hyperlinks   = re.findall(r'[[([^]]+)]]', content material)
   wiki_pages[rel] = {"traces": len(content material.splitlines()), "wikilinks": hyperlinks}


print(f"Total wiki pages : {len(wiki_pages)}n")
print(f"{'Page':<45} {'Lines':>6}  {'Links':>5}")
print("─" * 60)
for web page, m in sorted(wiki_pages.gadgets()):
   print(f"  {web page:<43} {m['lines']:>6}  {len(m['wikilinks']):>5}")


link_targets = Counter(
   hyperlink for m in wiki_pages.values() for hyperlink in m["wikilinks"]
)
if link_targets:
   print("n Most-referenced wiki pages (hub ideas):")
   for web page, depend in link_targets.most_common(8):
       print(f"  {depend:>3}x  [[{page}]]")


print("n Cross-reference graph:")
for web page, m in sorted(wiki_pages.gadgets()):
   if m["wikilinks"]:
       proven = ", ".be part of(f"[[{l}]]" for l in m["wikilinks"][:4])
       extra  = f"  +{len(m['wikilinks'])-4} extra" if len(m["wikilinks"]) > 4 else ""
       print(f"  {web page}")
       print(f"    -> {proven}{extra}")


part("Step 9 — Incremental Update: Add a 4th Document")


new_doc = raw_dir / "sparse_attention.md"
new_doc.write_text(textwrap.dedent("""
   # Sparse Attention Mechanisms


   ## Motivation
   Standard transformer consideration is O(n^2) in sequence size, limiting context
   home windows. Sparse consideration patterns cut back this to O(n log n) or O(n*sqrt(n)).


   ## Key Approaches
   - **Longformer** (Beltagy et al., 2020): native sliding-window + world tokens.
   - **BigBird** (Zaheer et al., 2020): random + window + world; Turing-complete.
   - **Flash Attention** (Dao et al., 2022): actual consideration, hardware-aware CUDA
     tiling. Not sparse however dramatically sooner in observe.


   ## Impact on RAG
   Larger context home windows cut back the necessity for chunking and retrieval. However,
   retrieval nonetheless helps for corpora bigger than any single context window.


   ## References
   Beltagy et al. (2020). Longformer. arXiv:2004.05150.
   Zaheer et al. (2020). Big Bird. NeurIPS.
   Dao et al. (2022). FlashAttention. NeurIPS.
"""))


concepts_before = len(record((wiki_dir / "ideas").glob("*.md")))
print(f"Adding: {new_doc.identify}")
print_wrapped(kb_cmd(f"add {new_doc}")[:500])


concepts_after = record((wiki_dir / "ideas").glob("*.md"))
print(f"n Concept pages: {concepts_before} -> {len(concepts_after)}")
for c in sorted(concepts_after, key=lambda p: p.stat().st_mtime, reverse=True)[:3]:
   print(f"  • {c.identify}")


part("Tutorial Complete ")


print(textwrap.dedent(f"""
 What we lined
 ───────────────
 1.  Installed OpenKB
 2.  Entered API key securely by way of getpass (by no means printed/saved in code)
 3.  Used FREE open mannequin: meta-llama/llama-3.3-70b-instruct by way of OpenRouter
 4.  Initialised KB at {KB_DIR}
 5.  Created 3 AI analysis docs and compiled them into a wiki
 6.  Explored auto-generated summaries, idea pages, and index
 7.  Listed content material (openkb record) and checked stats (openkb standing)
 8.  Ran 4 queries of accelerating complexity
 9.  Saved a deep synthesis question to wiki/explorations/
 10. Linted the wiki for well being points (contradictions, orphans, gaps)
 11. Analysed the wiki graph programmatically (hub pages, cross-refs)
 12. Added a 4th doc -- demonstrated incremental reside updates


 Other free OpenRouter fashions to attempt (change LLM_MODEL):
 ────────────────────────────────────────────────────────
   openrouter/mistralai/mistral-7b-instruct:free
   openrouter/google/gemma-3-27b-it:free
   openrouter/qwen/qwen3-14b:free
   openrouter/microsoft/phi-4-reasoning:free


 Docs: https://github.com/VectifyAI/OpenKB
"""))

We analyze the generated wiki programmatically by studying Markdown pages, counting traces, extracting wikilinks, and figuring out probably the most referenced ideas. We additionally visualize the inner cross-reference construction in order that we will higher perceive how the data base is linked and which pages act as hubs. In the ultimate half, we add a new doc incrementally, observe how the wiki updates, and conclude the tutorial with a full abstract of every little thing we constructed and examined.

In conclusion, we created a full OpenKB workflow utilizing a Llama mannequin served by OpenRouter, whereas conserving API entry safe and simple to handle. We initialized the data base, ingested a number of analysis paperwork, generated linked wiki artifacts, queried the compiled data, and validated the construction by linting and Python-based inspection. We additionally present how we lengthen the data base incrementally by including new materials with out rebuilding every little thing from scratch. Also, it supplies a sensible, reproducible basis for utilizing OpenKB as a light-weight, LLM-powered system for data group, synthesis, and exploration.

Check out the Full Codes here. Find 100s of ML/Data Science Colab Notebooks here. Also, be happy to comply with us on Twitter and don’t overlook to be part of our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The publish How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama appeared first on MarkTechPost.

How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama

Alibaba Qwen Team Releases Qwen3-ASR: A New Speech Recognition Model Built Upon Qwen3-Omni Achieving Robust Speech Recogition Performance

How to Build a Fully Self-Verifying Data Operations AI Agent Using Local Hugging Face Models for Automated Planning, Execution, and Testing

xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers

How to Design a Swiss Army Knife Research Agent with Tool-Using AI, Web Search, PDF Analysis, Vision, and Automated Reporting

How to Build an Advanced End-to-End Voice AI Agent Using Hugging Face Pipelines?

The implications of AI in DeFi

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!