How to Build an Agentic Decision-Tree RAG System with Intelligent Query Routing, Self-Checking, and Iterative Refinement?

In this tutorial, we construct an superior Agentic Retrieval-Augmented Generation (RAG) system that goes past easy query answering. We design it to intelligently route queries to the suitable data sources, carry out self-checks to assess reply high quality, and iteratively refine responses for improved accuracy. We implement your complete system utilizing open-source instruments like FAISS, SentenceTransformers, and Flan-T5. As we progress, we discover how routing, retrieval, technology, and self-evaluation mix to type a decision-tree-style RAG pipeline that mimics real-world agentic reasoning. Check out the FULL CODES here.

Copy Code

print(" Setting up dependencies...")
import subprocess
import sys
def install_packages():
   packages = ['sentence-transformers', 'transformers', 'torch', 'faiss-cpu', 'numpy', 'accelerate']
   for bundle in packages:
       print(f"Installing {bundle}...")
       subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])
attempt:
   import faiss
besides ImportError:
   install_packages()
   print("✓ All dependencies put in! Importing modules...n")
import torch
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import faiss
from typing import List, Dict, Tuple
import warnings
warnings.filterwarnings('ignore')
print("✓ All modules loaded efficiently!n")

We start by putting in all crucial dependencies, together with Transformers, FAISS, and SentenceTransformers, to guarantee easy native execution. We confirm installations and set up important modules reminiscent of NumPy, PyTorch, and FAISS for embedding, retrieval, and technology. We affirm that every one libraries load efficiently earlier than transferring forward with the principle pipeline. Check out the FULL CODES here.

Copy Code

class VectorStore:
   def __init__(self, embedding_model='all-MiniLM-L6-v2'):
       print(f"Loading embedding mannequin: {embedding_model}...")
       self.embedder = SentenceTransformer(embedding_model)
       self.paperwork = []
       self.index = None
   def add_documents(self, docs: List[str], sources: List[str]):
       self.paperwork = [{"text": doc, "source": src} for doc, src in zip(docs, sources)]
       embeddings = self.embedder.encode(docs, show_progress_bar=False)
       dimension = embeddings.form[1]
       self.index = faiss.IndexFlatL2(dimension)
       self.index.add(embeddings.astype('float32'))
       print(f"✓ Indexed {len(docs)} documentsn")
   def search(self, question: str, okay: int = 3) -> List[Dict]:
       query_vec = self.embedder.encode([query]).astype('float32')
       distances, indices = self.index.search(query_vec, okay)
       return [self.documents[i] for i in indices[0]]

We design the VectorStore class to retailer and retrieve paperwork effectively utilizing FAISS-based similarity search. We embed every doc utilizing a transformer mannequin and construct an index for quick retrieval. This permits us to shortly fetch probably the most related context for any incoming question. Check out the FULL CODES here.

Copy Code

class QueryRouter:
   def __init__(self):
       self.classes = {
           'technical': ['how', 'implement', 'code', 'function', 'algorithm', 'debug'],
           'factual': ['what', 'who', 'when', 'where', 'define', 'explain'],
           'comparative': ['compare', 'difference', 'versus', 'vs', 'better', 'which'],
           'procedural': ['steps', 'process', 'guide', 'tutorial', 'how to']
       }
   def route(self, question: str) -> str:
       query_lower = question.decrease()
       scores = {}
       for class, key phrases in self.classes.objects():
           rating = sum(1 for kw in key phrases if kw in query_lower)
           scores[category] = rating
       best_category = max(scores, key=scores.get)
       return best_category if scores[best_category] > 0 else 'factual'

We introduce the QueryRouter class to classify queries by intent, technical, factual, comparative, or procedural. We use key phrase matching to decide which class most closely fits the enter query. This routing step ensures that the retrieval technique adapts dynamically to totally different question types. Check out the FULL CODES here.

Copy Code

class AnswerGenerator:
   def __init__(self, model_name='google/flan-t5-base'):
       print(f"Loading technology mannequin: {model_name}...")
       self.generator = pipeline('text2text-generation', mannequin=model_name, machine=0 if torch.cuda.is_available() else -1, max_length=256)
       device_type = "GPU" if torch.cuda.is_available() else "CPU"
       print(f"✓ Generator prepared (utilizing {device_type})n")
   def generate(self, question: str, context: List[Dict], query_type: str) -> str:
       context_text = "nn".be part of([f"[{doc['source']}]: {doc['text']}" for doc in context])
      
Context:
{context_text}


Question: {question}


Answer:"""
       reply = self.generator(immediate, max_length=200, do_sample=False)[0]['generated_text']
       return reply.strip()
   def self_check(self, question: str, reply: str, context: List[Dict]) -> Tuple[bool, str]:
       if len(reply) < 10:
           return False, "Answer too brief - wants extra element"
       context_keywords = set()
       for doc in context:
           context_keywords.replace(doc['text'].decrease().cut up()[:20])
       answer_words = set(reply.decrease().cut up())
       overlap = len(context_keywords.intersection(answer_words))
       if overlap < 2:
           return False, "Answer not grounded in context - wants extra proof"
       query_keywords = set(question.decrease().cut up())
       if len(query_keywords.intersection(answer_words)) < 1:
           return False, "Answer would not handle the question - rephrase wanted"
       return True, "Answer high quality acceptable"

We constructed the AnswerGenerator class to deal with reply creation and self-evaluation. Using the Flan-T5 mannequin, we generate textual content responses grounded in retrieved paperwork. Then, we carry out a self-check to assess the size of the reply, context grounding, and relevance, making certain our output is significant and correct. Check out the FULL CODES here.

Copy Code

class AgenticRAG:
   def __init__(self):
       self.vector_store = VectorStore()
       self.router = QueryRouter()
       self.generator = AnswerGenerator()
       self.max_iterations = 2
   def add_knowledge(self, paperwork: List[str], sources: List[str]):
       self.vector_store.add_documents(paperwork, sources)
   def question(self, query: str, verbose: bool = True) -> Dict:
       if verbose:
           print(f"n{'='*60}")
           print(f" Query: {query}")
           print(f"{'='*60}")
       query_type = self.router.route(query)
       if verbose:
           print(f" Route: {query_type.higher()} question detected")
       k_docs = {'technical': 2, 'comparative': 4, 'procedural': 3}.get(query_type, 3)
       iteration = 0
       answer_accepted = False
       whereas iteration < self.max_iterations and not answer_accepted:
           iteration += 1
           if verbose:
               print(f"n Iteration {iteration}")
           context = self.vector_store.search(query, okay=k_docs)
           if verbose:
               print(f" Retrieved {len(context)} paperwork from sources:")
               for doc in context:
                   print(f"   - {doc['source']}")
           reply = self.generator.generate(query, context, query_type)
           if verbose:
               print(f" Generated reply: {reply[:100]}...")
           answer_accepted, suggestions = self.generator.self_check(query, reply, context)
           if verbose:
               standing = "✓ ACCEPTED" if answer_accepted else "✗ REJECTED"
               print(f" Self-check: {standing}")
               print(f"   Feedback: {suggestions}")
           if not answer_accepted and iteration < self.max_iterations:
               query = f"{query} (present extra particular particulars)"
               k_docs += 1
       return {'reply': reply, 'query_type': query_type, 'iterations': iteration, 'accepted': answer_accepted, 'sources': [doc['source'] for doc in context]}

We mix all elements into the AgenticRAG system, which orchestrates routing, retrieval, technology, and high quality checking. The system iteratively refines its solutions based mostly on self-evaluation suggestions, adjusting the question or increasing context when crucial. This creates a feedback-driven decision-tree RAG that robotically improves efficiency. Check out the FULL CODES here.

Copy Code

def essential():
   print("n" + "="*60)
   print(" AGENTIC RAG WITH ROUTING & SELF-CHECK")
   print("="*60 + "n")
   paperwork = [
       "RAG (Retrieval-Augmented Generation) combines information retrieval with text generation. It retrieves relevant documents and uses them as context for generating accurate answers."
   ]
   sources = ["Python Documentation", "ML Textbook", "Neural Networks Guide", "Deep Learning Paper", "Transformer Architecture", "RAG Research Paper"]
   rag = AgenticRAG()
   rag.add_knowledge(paperwork, sources)
   test_queries = ["What is Python?", "How does machine learning work?", "Compare neural networks and deep learning"]
   for question in test_queries:
       consequence = rag.question(question, verbose=True)
       print(f"n{'='*60}")
       print(f" FINAL RESULT:")
       print(f"   Answer: {consequence['answer']}")
       print(f"   Query Type: {consequence['query_type']}")
       print(f"   Iterations: {consequence['iterations']}")
       print(f"   Accepted: {consequence['accepted']}")
       print(f"{'='*60}n")
if __name__ == "__main__":
   essential()

We finalize the demo by loading a small data base and operating check queries by means of the Agentic RAG pipeline. We observe how the mannequin routes, retrieves, and refines solutions step-by-step, printing intermediate outcomes for transparency. By the tip, we affirm that our system efficiently delivers correct, self-validated solutions utilizing solely native computation.

In conclusion, we create a totally purposeful Agentic RAG framework that autonomously retrieves, causes, and refines its solutions. We witness how the system dynamically routes totally different question varieties, evaluates its personal responses, and improves them by means of iterative suggestions, all inside a light-weight, native atmosphere. Through this train, we deepen our understanding of RAG architectures and additionally expertise how agentic elements can remodel static retrieval methods into self-improving clever brokers.

Check out the FULL CODES here. Feel free to try our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit How to Build an Agentic Decision-Tree RAG System with Intelligent Query Routing, Self-Checking, and Iterative Refinement? appeared first on MarkTechPost.