How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama
In this tutorial, we discover how to construct and question a native data base with OpenKB utilizing a free, open mannequin by way of OpenRouter. We securely retrieve the API key with getpass, arrange the atmosphere with out hardcoding secrets and techniques, and initialize a structured, wiki-style data base from scratch. As we transfer by the workflow, we add supply paperwork, generate summaries and idea pages, examine the ensuing wiki construction, run queries, save explorations, and even carry out programmatic evaluation of cross-links and web page relationships. Also, we display how we flip uncooked Markdown paperwork into a navigable, synthesized data system that helps each interactive querying and incremental updates.
import subprocess, sys
def run(cmd, seize=False, cwd=None):
outcome = subprocess.run(
cmd, shell=True, textual content=True,
capture_output=seize, cwd=cwd
)
if seize:
return outcome.stdout.strip(), outcome.stderr.strip()
return outcome.returncode
print("
Installing OpenKB…")
run("pip set up openkb --quiet")
print("
OpenKB put in.n")
import getpass, os
print("━" * 60)
print("
Secure API Key Setup")
print("━" * 60)
print(" Provider : OpenRouter (https://openrouter.ai)")
print(" Model : meta-llama/llama-3.3-70b-instruct:free")
print(" Sign-up : free, no bank card required")
print("━" * 60)
OPENROUTER_API_KEY = getpass.getpass("nPaste your OpenRouter API key (hidden): ").strip()
if not OPENROUTER_API_KEY:
increase ValueError("
No API key supplied. Please re-run and enter a legitimate key.")
os.environ["OPENROUTER_API_KEY"] = OPENROUTER_API_KEY
os.environ["LLM_API_KEY"] = OPENROUTER_API_KEY
LLM_MODEL = "openrouter/meta-llama/llama-3.3-70b-instruct:free"
print("
API key set (not printed). Model:", LLM_MODEL, "n")
import json, textwrap, time, re, shutil
from pathlib import Path
from collections import Counter
KB_DIR = Path("/content material/my_knowledge_base")
wiki_dir = KB_DIR / "wiki"
raw_dir = KB_DIR / "uncooked"
def kb_cmd(command: str) -> str:
stdout, stderr = run(f"openkb {command}", seize=True, cwd=str(KB_DIR))
return stdout or stderr
def part(title: str):
bar = "─" * (len(title) + 4)
print(f"n┌{bar}┐")
print(f"│ {title} │")
print(f"└{bar}┘")
def show_tree(root: Path, indent=0, max_depth=3):
if indent > max_depth:
return
prefix = " " * indent + ("└─ " if indent else "")
print(prefix + root.identify + ("/" if root.is_dir() else ""))
if root.is_dir():
for youngster in sorted(root.iterdir()):
show_tree(youngster, indent + 1, max_depth)
def show_md(path: Path, max_lines=35):
traces = path.read_text().splitlines()
for line in traces[:max_lines]:
print(line)
if len(traces) > max_lines:
print(f" … ({len(traces) - max_lines} extra traces)")
def print_wrapped(textual content: str, width=90):
for line in textual content.splitlines():
print(textwrap.fill(line, width=width, subsequent_indent=" ") if line else "")
We set up OpenKB and put together the Colab atmosphere to run the total workflow easily. We securely gather the OpenRouter API key utilizing getpass, retailer it in atmosphere variables, and configure the free Llama 3.3 70B mannequin with out hardcoding any secrets and techniques. We additionally import all required libraries, outline the core paths, and create helper capabilities we use all through the tutorial to run instructions, print sections, and examine generated recordsdata.
DOCS = {
"transformer_architecture.md": textwrap.dedent("""
# Transformer Architecture
## Overview
The Transformer is a deep studying structure launched in "Attention Is All
You Need" (Vaswani et al., 2017). It changed recurrent networks with a
self-attention mechanism, enabling parallel coaching and higher long-range
dependency modelling.
## Key Components
- **Multi-Head Self-Attention**: Computes consideration in h parallel heads, every
with its personal discovered Q/Ok/V projections, then concatenates and initiatives.
- **Feed-Forward Network (FFN)**: Two linear layers with a ReLU activation,
utilized position-wise.
- **Positional Encoding**: Sinusoidal or discovered embeddings that inject
sequence-order data, since consideration is permutation-invariant.
- **Layer Normalisation**: Applied earlier than (Pre-LN) or after (Post-LN) every
sub-layer, stabilising gradients.
- **Residual Connections**: Added round every sub-layer to ease gradient stream.
## Encoder vs Decoder
The encoder stack processes enter tokens bidirectionally (e.g. BERT).
The decoder stack makes use of causal (masked) consideration over earlier outputs plus
cross-attention over encoder outputs (e.g. GPT, T5).
## Scaling Laws
Kaplan et al. (2020) confirmed that mannequin loss decreases predictably as a energy
legislation with compute, information, and parameter depend. This motivated GPT-3 (175B) and
subsequent massive language fashions.
## Limitations
- Quadratic complexity in sequence size: O(n^2)
- No inherent recurrence -> long-context challenges
- High reminiscence footprint throughout coaching
## References
Vaswani et al. (2017). Attention Is All You Need. NeurIPS.
Kaplan et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
"""),
"rag_systems.md": textwrap.dedent("""
# Retrieval-Augmented Generation (RAG)
## Definition
RAG augments a generative LLM with a retrieval step: given a question, related
paperwork are fetched from a corpus and prepended to the immediate, giving the
mannequin grounded context past its coaching information.
## Architecture
1. **Indexing Phase** — Documents are chunked, embedded by way of a bi-encoder
(e.g. text-embedding-3-large), and saved in a vector database (e.g.
Faiss, Pinecone, Weaviate).
2. **Retrieval Phase** — The person question is embedded; approximate nearest-
neighbour (ANN) search returns the top-k chunks.
3. **Generation Phase** — Retrieved chunks + question are handed to the LLM
which synthesises a remaining reply.
## Variants
- **Dense Retrieval**: DPR, Contriever — queries and docs in the identical area.
- **Sparse Retrieval**: BM25 — time period frequency-based, no embeddings wanted.
- **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
- **Re-ranking**: A cross-encoder re-scores the top-k earlier than the LLM sees them.
## Challenges
- Context window limits: lengthy retrieved passages might not match.
- Retrieval high quality is a arduous ceiling on era high quality.
- Chunking technique considerably impacts recall.
- Multi-hop questions require iterative retrieval (IRCoT, ReAct).
## Relationship to Transformers
RAG techniques depend on transformer-based encoders for embedding and decoder
fashions for era. The high quality of the embedding mannequin instantly determines
retrieval precision and recall.
## References
Lewis et al. (2020). RAG for Knowledge-Intensive NLP Tasks. NeurIPS.
Gao et al. (2023). RAG for Large Language Models. arXiv:2312.10997.
"""),
"knowledge_graph_integration.md": textwrap.dedent("""
# Knowledge Graphs and LLM Integration
## What is a Knowledge Graph?
A data graph (KG) is a directed labelled graph of entities (nodes) and
relations (edges): (topic, predicate, object) triples, e.g.
(Vaswani, authored, "Attention Is All You Need").
## Why Combine KGs with LLMs?
LLMs hallucinate information; KGs present structured, verifiable floor fact.
KGs are arduous to question in pure language; LLMs present the interface.
Together they permit devoted, grounded, explainable query answering.
## Integration Strategies
### KG-Augmented Generation (KGAG)
Retrieve triples or sub-graphs as an alternative of textual content chunks, serialise into textual content,
then feed to the LLM immediate.
### LLM-Assisted KG Construction
LLMs extract (topic, relation, object) triples from unstructured textual content,
lowering guide curation effort considerably.
### GraphRAG (Microsoft Research, 2024)
GraphRAG clusters doc communities, generates group summaries, and
shops them in a KG. Queries answered by map-reduce over group summaries
outperform flat-vector RAG on sensemaking duties.
## Challenges
- KG building high quality is determined by extraction LLM accuracy.
- Graph databases add infrastructure complexity.
- Ontology design requires area experience.
- KGs go stale with out steady replace pipelines.
## Relation to RAG and Transformers
KG integration addresses two key RAG limitations: lack of structured reasoning
and incapability to comply with multi-hop relations.
## References
Pan et al. (2023). Unifying LLMs and KGs. IEEE Intelligent Systems.
"""),
}
We outline the pattern supply paperwork that we wish to load into the data base. We put together wealthy Markdown content material on transformer structure, retrieval-augmented era, and data graph integration in order that OpenKB has significant materials to summarise and join. We basically construct the uncooked data corpus right here, which serves as the inspiration for all subsequent indexing, synthesis, and querying steps.
part("Step 1 — Initialise Knowledge Base")
if KB_DIR.exists():
shutil.rmtree(KB_DIR)
KB_DIR.mkdir(mother and father=True)
config_dir = KB_DIR / ".openkb"
config_dir.mkdir()
(config_dir / "config.yaml").write_text(
f"mannequin: {LLM_MODEL}nlanguage: ennpageindex_threshold: 20n"
)
(KB_DIR / ".env").write_text(
f"OPENROUTER_API_KEY={OPENROUTER_API_KEY}n"
f"LLM_API_KEY={OPENROUTER_API_KEY}n"
)
for sub in ["sources", "summaries", "concepts", "explorations", "reports"]:
(wiki_dir / sub).mkdir(mother and father=True)
(wiki_dir / "AGENTS.md").write_text(textwrap.dedent("""
# Wiki Schema
## Conventions
- All pages use Markdown with [[wikilinks]] for cross-references.
- `summaries/` -- one web page per supply doc.
- `ideas/` -- cross-document synthesis pages.
- `index.md` -- data base overview.
- `log.md` -- operations timeline.
## Concept web page template
# <Concept Title>
## Overview
## Key Points
## Related Concepts
## Sources
"""))
(wiki_dir / "index.md").write_text("# Knowledge Base IndexnnNo paperwork listed but.n")
(wiki_dir / "log.md").write_text("# Operations Lognn")
raw_dir.mkdir()
for fname, content material in DOCS.gadgets():
(raw_dir / fname).write_text(content material)
print(f"
Knowledge base initialised at: {KB_DIR}")
print(f" Model : {LLM_MODEL}")
print(f" Docs : {record(DOCS.keys())}")
part("Step 2 — Compile Documents into the Wiki")
print("Each doc is learn by the LLM, which writes summaries + idea pages.n")
for fname in DOCS:
doc_path = raw_dir / fname
print(f"
Adding: {fname}")
out = kb_cmd(f"add {doc_path}")
print(textwrap.indent(out[:600], " "))
print()
time.sleep(1)
print("
All paperwork compiled.")
part("Step 3 — Explore the Generated Wiki")
print("n
Directory tree (wiki/):n")
show_tree(wiki_dir, max_depth=3)
print("nn
wiki/index.md:")
print("─" * 50)
show_md(wiki_dir / "index.md")
print("nn
wiki/log.md:")
print("─" * 50)
show_md(wiki_dir / "log.md")
ideas = record((wiki_dir / "ideas").glob("*.md"))
print(f"nn
Generated idea pages ({len(ideas)}):")
for cp in sorted(ideas):
print(f" • {cp.identify}")
if ideas:
print(f"nn
Sample idea — {ideas[0].identify}:")
print("─" * 50)
show_md(ideas[0])
We initialize the OpenKB data base, create the required listing construction, and write the configuration and atmosphere recordsdata wanted by the instrument. We then save the pattern paperwork into the uncooked folder and compile them into the wiki in order that OpenKB can generate summaries, ideas, and cross-linked data pages. After that, we examine the generated wiki construction, preview essential recordsdata such because the index and log, and overview the idea pages generated from our enter paperwork.
part("Step 4 — List Indexed Content & Status")
print("── openkb record ──")
print(kb_cmd("record"))
print("n── openkb standing ──")
print(kb_cmd("standing"))
part("Step 5 — Query the Knowledge Base")
QUERIES = [
"What is the Transformer architecture and what problem did it solve?",
"How does RAG differ from a traditional knowledge base like OpenKB?",
"What are the connections between knowledge graphs, RAG, and transformers?",
"What are the shared limitations across all three AI topics covered?",
]
for i, question in enumerate(QUERIES, 1):
print(f"n
Query {i}: {question}")
print("─" * 60)
print_wrapped(kb_cmd(f'question "{question}"'))
part("Step 6 — Save a Deep Synthesis Query")
deep_query = (
"Synthesise the important thing architectural themes throughout transformers, RAG, and "
"data graphs into a unified psychological mannequin."
)
print(f"
Query: {deep_query}n")
out = kb_cmd(f'question "{deep_query}" --save')
print_wrapped(out[:800])
explorations = record((wiki_dir / "explorations").glob("*.md"))
if explorations:
print(f"n
Saved → {explorations[-1].identify}")
print("─" * 50)
show_md(explorations[-1])
part("Step 7 — Lint: Wiki Health Checks")
print(kb_cmd("lint"))
reviews = record((wiki_dir / "reviews").glob("*.md"))
if reviews:
print(f"n
Report — {reviews[-1].identify}:")
print("─" * 50)
show_md(reviews[-1])
We study the listed content material utilizing the built-in record and standing instructions to perceive what OpenKB has created to this point. We then question the data base with a number of more and more complicated questions and observe how the system synthesizes solutions from the saved data. Finally, we save a deeper exploration question into the wiki and run lint checks to consider the well being, consistency, and completeness of the generated data base.
part("Step 8 — Programmatic Wiki Analysis (Python)")
wiki_pages = {}
for md_file in wiki_dir.rglob("*.md"):
rel = str(md_file.relative_to(wiki_dir))
content material = md_file.read_text()
hyperlinks = re.findall(r'[[([^]]+)]]', content material)
wiki_pages[rel] = {"traces": len(content material.splitlines()), "wikilinks": hyperlinks}
print(f"Total wiki pages : {len(wiki_pages)}n")
print(f"{'Page':<45} {'Lines':>6} {'Links':>5}")
print("─" * 60)
for web page, m in sorted(wiki_pages.gadgets()):
print(f" {web page:<43} {m['lines']:>6} {len(m['wikilinks']):>5}")
link_targets = Counter(
hyperlink for m in wiki_pages.values() for hyperlink in m["wikilinks"]
)
if link_targets:
print("n
Most-referenced wiki pages (hub ideas):")
for web page, depend in link_targets.most_common(8):
print(f" {depend:>3}x [[{page}]]")
print("n
Cross-reference graph:")
for web page, m in sorted(wiki_pages.gadgets()):
if m["wikilinks"]:
proven = ", ".be part of(f"[[{l}]]" for l in m["wikilinks"][:4])
extra = f" +{len(m['wikilinks'])-4} extra" if len(m["wikilinks"]) > 4 else ""
print(f" {web page}")
print(f" -> {proven}{extra}")
part("Step 9 — Incremental Update: Add a 4th Document")
new_doc = raw_dir / "sparse_attention.md"
new_doc.write_text(textwrap.dedent("""
# Sparse Attention Mechanisms
## Motivation
Standard transformer consideration is O(n^2) in sequence size, limiting context
home windows. Sparse consideration patterns cut back this to O(n log n) or O(n*sqrt(n)).
## Key Approaches
- **Longformer** (Beltagy et al., 2020): native sliding-window + world tokens.
- **BigBird** (Zaheer et al., 2020): random + window + world; Turing-complete.
- **Flash Attention** (Dao et al., 2022): actual consideration, hardware-aware CUDA
tiling. Not sparse however dramatically sooner in observe.
## Impact on RAG
Larger context home windows cut back the necessity for chunking and retrieval. However,
retrieval nonetheless helps for corpora bigger than any single context window.
## References
Beltagy et al. (2020). Longformer. arXiv:2004.05150.
Zaheer et al. (2020). Big Bird. NeurIPS.
Dao et al. (2022). FlashAttention. NeurIPS.
"""))
concepts_before = len(record((wiki_dir / "ideas").glob("*.md")))
print(f"Adding: {new_doc.identify}")
print_wrapped(kb_cmd(f"add {new_doc}")[:500])
concepts_after = record((wiki_dir / "ideas").glob("*.md"))
print(f"n
Concept pages: {concepts_before} -> {len(concepts_after)}")
for c in sorted(concepts_after, key=lambda p: p.stat().st_mtime, reverse=True)[:3]:
print(f" • {c.identify}")
part("Tutorial Complete
")
print(textwrap.dedent(f"""
What we lined
───────────────
1. Installed OpenKB
2. Entered API key securely by way of getpass (by no means printed/saved in code)
3. Used FREE open mannequin: meta-llama/llama-3.3-70b-instruct by way of OpenRouter
4. Initialised KB at {KB_DIR}
5. Created 3 AI analysis docs and compiled them into a wiki
6. Explored auto-generated summaries, idea pages, and index
7. Listed content material (openkb record) and checked stats (openkb standing)
8. Ran 4 queries of accelerating complexity
9. Saved a deep synthesis question to wiki/explorations/
10. Linted the wiki for well being points (contradictions, orphans, gaps)
11. Analysed the wiki graph programmatically (hub pages, cross-refs)
12. Added a 4th doc -- demonstrated incremental reside updates
Other free OpenRouter fashions to attempt (change LLM_MODEL):
────────────────────────────────────────────────────────
openrouter/mistralai/mistral-7b-instruct:free
openrouter/google/gemma-3-27b-it:free
openrouter/qwen/qwen3-14b:free
openrouter/microsoft/phi-4-reasoning:free
Docs: https://github.com/VectifyAI/OpenKB
"""))
We analyze the generated wiki programmatically by studying Markdown pages, counting traces, extracting wikilinks, and figuring out probably the most referenced ideas. We additionally visualize the inner cross-reference construction in order that we will higher perceive how the data base is linked and which pages act as hubs. In the ultimate half, we add a new doc incrementally, observe how the wiki updates, and conclude the tutorial with a full abstract of every little thing we constructed and examined.
In conclusion, we created a full OpenKB workflow utilizing a Llama mannequin served by OpenRouter, whereas conserving API entry safe and simple to handle. We initialized the data base, ingested a number of analysis paperwork, generated linked wiki artifacts, queried the compiled data, and validated the construction by linting and Python-based inspection. We additionally present how we lengthen the data base incrementally by including new materials with out rebuilding every little thing from scratch. Also, it supplies a sensible, reproducible basis for utilizing OpenKB as a light-weight, LLM-powered system for data group, synthesis, and exploration.
Check out the Full Codes here. Find 100s of ML/Data Science Colab Notebooks here. Also, be happy to comply with us on Twitter and don’t overlook to be part of our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us
The publish How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama appeared first on MarkTechPost.
