How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?

In this tutorial, we stroll by the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. We design it in order that the agent does extra than simply retrieve paperwork; it actively decides when retrieval is required, selects the perfect retrieval technique, and synthesizes responses with contextual consciousness. By combining embeddings, FAISS indexing, and a mock LLM, we create a sensible demonstration of how agentic decision-making can elevate the usual RAG pipeline into one thing extra adaptive and clever. Check out the FULL CODES here.

Copy Code

import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import json
import re
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
from enum import Enum


class MockLLM:
   def generate(self, immediate: str, max_tokens: int = 150) -> str:
       prompt_lower = immediate.decrease()
      
       if "determine whether or not to retrieve" in prompt_lower:
           if any(phrase in prompt_lower for phrase in ["specific", "recent", "data", "facts", "when", "who", "what"]):
               return "RETRIEVE: The question requires particular factual data that wants to be retrieved."
           else:
               return "NO_RETRIEVE: This is a normal query that may be answered with present information."
      
       elif "select retrieval technique" in prompt_lower:
           if "comparability" in prompt_lower or "versus" in prompt_lower:
               return "STRATEGY: multi_query - Need to retrieve details about a number of entities for comparability."
           elif "latest" in prompt_lower or "newest" in prompt_lower:
               return "STRATEGY: temporal - Focus on latest data."
           else:
               return "STRATEGY: semantic - Standard semantic similarity search."
      
       elif "synthesize" in prompt_lower and "context:" in prompt_lower:
           return "Based on the retrieved data, here is a complete reply that mixes a number of sources and offers particular particulars with correct context."
      
       return "This is a mock response. In follow, use an actual LLM like OpenAI's GPT or related."


class RetrievalStrategy(Enum):
   SEMANTIC = "semantic"
   MULTI_QUERY = "multi_query"
   TEMPORAL = "temporal"
   HYBRID = "hybrid"


@dataclass
class Document:
   id: str
   content material: str
   metadata: Dict[str, Any]
   embedding: Optional[np.ndarray] = None

We arrange the inspiration of our Agentic RAG system. We outline a mock LLM to simulate decision-making, create a retrieval technique enum, and design a Document dataclass so we are able to construction and handle our information base effectively. Check out the FULL CODES here.

Copy Code

class AgenticRAGSystem:
   def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
       self.encoder = SentenceTransformer(model_name)
       self.llm = MockLLM()
       self.paperwork: List[Document] = []
       self.index: Optional[faiss.Index] = None
      
   def add_documents(self, paperwork: List[Dict[str, Any]]) -> None:
       print(f"Processing {len(paperwork)} paperwork...")
      
       for i, doc in enumerate(paperwork):
           doc_obj = Document(
               id=doc.get('id', str(i)),
               content material=doc['content'],
               metadata=doc.get('metadata', {})
           )
           self.paperwork.append(doc_obj)
      
       contents = [doc.content for doc in self.documents]
       embeddings = self.encoder.encode(contents, show_progress_bar=True)
      
       for doc, embedding in zip(self.paperwork, embeddings):
           doc.embedding = embedding
      
       dimension = embeddings.form[1]
       self.index = faiss.IndexFlatIP(dimension)
      
       faiss.normalize_L2(embeddings)
       self.index.add(embeddings.astype('float32'))
      
       print(f"Knowledge base constructed with {len(self.paperwork)} paperwork")

We construct the core of our Agentic RAG system. We initialize the embedding mannequin, arrange the FAISS index, and add paperwork by encoding their contents into vectors, enabling quick and correct semantic retrieval from our information base. Check out the FULL CODES here.

Copy Code

 def decide_retrieval(self, question: str) -> bool:
       decision_prompt = f"""
       Analyze the next question and determine whether or not to retrieve data:
       Query: "{question}"
      
       Decide whether or not to retrieve data from the information base.
       Consider if this wants particular info, latest information, or will be answered usually.
      
       Respond with both:
       RETRIEVE: [reason] or NO_RETRIEVE: [reason]
       """
      
       response = self.llm.generate(decision_prompt)
       should_retrieve = response.startswith("RETRIEVE:")
      
       print(f" Agent Decision: {'Retrieve' if should_retrieve else 'Direct Answer'}")
       print(f"   Reasoning: {response.cut up(':', 1)[1].strip() if ':' in response else response}")
      
       return should_retrieve
  
   def choose_strategy(self, question: str) -> RetrievalStrategy:
       strategy_prompt = f"""
       Choose the perfect retrieval technique for this question:
       Query: "{question}"
      
       Available methods:
       - semantic: Standard similarity search
       - multi_query: Multiple associated queries (for comparisons)
       - temporal: Focus on latest data
       - hybrid: Combination strategy
      
       Choose retrieval technique and clarify why.
       Respond with: STRATEGY: [strategy_name] - [reasoning]
       """
      
       response = self.llm.generate(strategy_prompt)
      
       if "multi_query" in response.decrease():
           technique = RetrievalStrategy.MULTI_QUERY
       elif "temporal" in response.decrease():
           technique = RetrievalStrategy.TEMPORAL
       elif "hybrid" in response.decrease():
           technique = RetrievalStrategy.HYBRID
       else:
           technique = RetrievalStrategy.SEMANTIC
      
       print(f" Retrieval Strategy: {technique.worth}")
       print(f"   Reasoning: {response.cut up('-', 1)[1].strip() if '-' in response else response}")
      
       return technique

We give our agent the flexibility to suppose earlier than it fetches. We first decide if a question actually requires retrieval, then we choose probably the most appropriate technique: semantic, multi-query, temporal, or hybrid. This permits us to goal the right context with clear, printed reasoning for every step. Check out the FULL CODES here.

Copy Code

  def retrieve_documents(self, question: str, technique: RetrievalStrategy, okay: int = 3) -> List[Document]:
       if not self.index:
           print(" No information base obtainable")
           return []
      
       if technique == RetrievalStrategy.MULTI_QUERY:
           queries = [query, f"advantages of {query}", f"disadvantages of {query}"]
           all_docs = []
           for q in queries:
               docs = self._semantic_search(q, okay=2)
               all_docs.prolong(docs)
           seen_ids = set()
           unique_docs = []
           for doc in all_docs:
               if doc.id not in seen_ids:
                   unique_docs.append(doc)
                   seen_ids.add(doc.id)
           return unique_docs[:k]
      
       elif technique == RetrievalStrategy.TEMPORAL:
           docs = self._semantic_search(question, okay=okay*2)
           docs_with_dates = [(doc, doc.metadata.get('date', '1900-01-01')) for doc in docs]
           docs_with_dates.type(key=lambda x: x[1], reverse=True)
           return [doc for doc, _ in docs_with_dates[:k]]
      
       else:
           return self._semantic_search(question, okay=okay)
  
   def _semantic_search(self, question: str, okay: int) -> List[Document]:
       query_embedding = self.encoder.encode([query])
       faiss.normalize_L2(query_embedding)
      
       scores, indices = self.index.search(query_embedding.astype('float32'), okay)
      
       outcomes = []
       for rating, idx in zip(scores[0], indices[0]):
           if idx < len(self.paperwork):
               outcomes.append(self.paperwork[idx])
      
       return outcomes
  
   def synthesize_response(self, question: str, retrieved_docs: List[Document]) -> str:
       if not retrieved_docs:
           return self.llm.generate(f"Answer this question: {question}")
      
       context = "nn".be a part of([f"Document {i+1}: {doc.content}"
                             for i, doc in enumerate(retrieved_docs)])
      
       synthesis_prompt = f"""
       Query: {question}
      
       Context: {context}
      
       Synthesize a complete reply utilizing the offered context.
       Be particular and reference the knowledge sources when related.
       """
      
       return self.llm.generate(synthesis_prompt, max_tokens=200)

We implement how we truly fetch and use information. We carry out semantic search, department into multi-query or temporal re-ranking when wanted, deduplicate outcomes, and then synthesize a centered reply from the retrieved context. In doing so, we keep environment friendly, clear, and tightly aligned retrieval. Check out the FULL CODES here.

Copy Code

   def question(self, question: str) -> str:
       print(f"n Processing Query: '{question}'")
       print("=" * 50)
      
       if not self.decide_retrieval(question):
           print("n Generating direct response...")
           return self.llm.generate(f"Answer this question: {question}")
      
       technique = self.choose_strategy(question)
      
       print(f"n Retrieving paperwork utilizing {technique.worth} technique...")
       retrieved_docs = self.retrieve_documents(question, technique)
       print(f"   Retrieved {len(retrieved_docs)} paperwork")
      
       print("n Synthesizing response...")
       response = self.synthesize_response(question, retrieved_docs)
      
       if retrieved_docs:
           print("n Retrieved Context:")
           for i, doc in enumerate(retrieved_docs[:2], 1):
               print(f"   {i}. {doc.content material[:100]}...")
      
       return response

We deliver all of the components collectively right into a single pipeline. When we run a question, we first decide if retrieval is critical, then choose the suitable technique, fetch paperwork accordingly, and lastly synthesize a response whereas additionally displaying the retrieved context for transparency. This makes the system really feel extra agentic and explainable. Check out the FULL CODES here.

Copy Code

def create_sample_knowledge_base():
   return [
       {
           "id": "ai_1",
           "content": "Artificial Intelligence (AI) refers to computer systems that can perform tasks requiring human intelligence",
           "metadata": {"topic": "AI basics", "date": "2024-01-15"}
       },
       {
           "id": "ml_1",
           "content": "ML is a subset of AI.",
           "metadata": {"topic": "Machine Learning", "date": "2024-02-10"}
       },
       {
           "id": "rag_1",
           "content": "Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval to provide more accurate and up-to-date responses.",
           "metadata": {"topic": "RAG", "date": "2024-03-05"}
       },
       {
           "id": "agents_1",
           "content": "AI agents",
           "metadata": {"topic": "AI Agents", "date": "2024-03-20"}
       }
   ]


if __name__ == "__main__":
   print(" Initializing Agentic RAG System...")
  
   rag_system = AgenticRAGSystem()
  
   docs = create_sample_knowledge_base()
   rag_system.add_documents(docs)
  
   demo_queries = [
       "What is artificial intelligence?",
       "How are you today?",
       "Compare AI and Machine Learning",
   ]
  
   for question in demo_queries:
       response = rag_system.question(question)
       print(f"n Final Response: {response}")
       print("n" + "="*80)
  
   print("n Agentic RAG Tutorial Complete!")
   print("nKey Features Demonstrated:")
   print("• Agent-driven retrieval selections")
   print("• Dynamic technique choice")
   print("• Multi-modal retrieval approaches")
   print("• Transparent reasoning course of")

We wrap every part right into a runnable demo. We create a small information base of AI-related paperwork, initialize the Agentic RAG system, and run pattern queries that spotlight numerous behaviors, together with retrieval, direct answering, and comparability. This last block ties the entire tutorial collectively and showcases the agent’s reasoning in motion.

In conclusion, we see how agent-driven retrieval selections, dynamic technique choice, and clear reasoning come collectively to type an superior Agentic RAG workflow. We now have a working system that highlights the potential of including company to RAG, making data retrieval smarter, extra focused, and extra human-like in its adaptability. This basis permits us to prolong the system with actual LLMs, bigger information bases, and extra subtle methods in future iterations.

Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter.

The publish How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval? appeared first on MarkTechPost.