How to Build a Multi-Round Deep Research Agent with Gemini, DuckDuckGo API, and Automated Reporting?

We start this tutorial by designing a modular deep analysis system that runs instantly on Google Colab. We configure Gemini because the core reasoning engine, combine DuckDuckGo’s Instantaneous Reply API for light-weight internet search, and orchestrate multi-round querying with deduplication and delay dealing with. We emphasize effectivity by limiting API calls, parsing concise snippets, and utilizing structured prompts to extract key factors, themes, and insights. Each part, from supply assortment to JSON-based evaluation, permits us to experiment rapidly and adapt the workflow for deeper or broader analysis queries. Take a look at the FULL CODES here.

Copy Code

import os
import json
import time
import requests
from typing import Record, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from urllib.parse import quote_plus
import re

We begin by importing important Python libraries that deal with system operations, JSON processing, internet requests, and information buildings. We additionally incorporate Google’s Generative AI SDK and utilities, similar to URL encoding, to make sure our analysis system operates easily. Take a look at the FULL CODES here.

Copy Code

@dataclass
class ResearchConfig:
   gemini_api_key: str
   max_sources: int = 10
   max_content_length: int = 5000
   search_delay: float = 1.0


class DeepResearchSystem:
   def __init__(self, config: ResearchConfig):
       self.config = config
       genai.configure(api_key=config.gemini_api_key)
       self.mannequin = genai.GenerativeModel('gemini-1.5-flash')


   def search_web(self, question: str, num_results: int = 5) -> Record[Dict[str, str]]:
       """Search internet utilizing DuckDuckGo Instantaneous Reply API"""
       attempt:
           encoded_query = quote_plus(question)
           url = f"https://api.duckduckgo.com/?q={encoded_query}&format=json&no_redirect=1"


           response = requests.get(url, timeout=10)
           information = response.json()


           outcomes = []


           if 'RelatedTopics' in information:
               for subject in information['RelatedTopics'][:num_results]:
                   if isinstance(subject, dict) and 'Textual content' in subject:
                       outcomes.append({
                           'title': subject.get('Textual content', '')[:100] + '...',
                           'url': subject.get('FirstURL', ''),
                           'snippet': subject.get('Textual content', '')
                       })


           if not outcomes:
               outcomes = [{
                   'title': f"Research on: {query}",
                   'url': f"https://search.example.com/q={encoded_query}",
                   'snippet': f"General information and research about {query}"
               }]


           return outcomes


       besides Exception as e:
           print(f"Search error: {e}")
           return [{'title': f"Research: {query}", 'url': '', 'snippet': f"Topic: {query}"}]


   def extract_key_points(self, content material: str) -> Record[str]:
       """Extract key factors utilizing Gemini"""
       immediate = f"""
       Extract 5-7 key factors from this content material. Be concise and factual:


       {content material[:2000]}


       Return as numbered listing:
       """


       attempt:
           response = self.mannequin.generate_content(immediate)
           return [line.strip() for line in response.text.split('n') if line.strip()]
       besides:
           return ["Key information extracted from source"]


   def analyze_sources(self, sources: Record[Dict[str, str]], question: str) -> Dict[str, Any]:
       """Analyze sources for relevance and extract insights"""
       evaluation = {
           'total_sources': len(sources),
           'key_themes': [],
           'insights': [],
           'confidence_score': 0.7
       }


       all_content = " ".be a part of([s.get('snippet', '') for s in sources])


       if len(all_content) > 100:
           immediate = f"""
           Analyze this analysis content material for the question: "{question}"


           Content material: {all_content[:1500]}


           Present:
           1. 3-4 key themes (one line every)
           2. 3-4 most important insights (one line every)
           3. General confidence (0.1-1.0)


           Format as JSON with keys: themes, insights, confidence
           """


           attempt:
               response = self.mannequin.generate_content(immediate)
               textual content = response.textual content
               if 'themes' in textual content.decrease():
                   evaluation['key_themes'] = ["Theme extracted from analysis"]
                   evaluation['insights'] = ["Insight derived from sources"]
           besides:
               cross


       return evaluation


   def generate_comprehensive_report(self, question: str, sources: Record[Dict[str, str]],
                                   evaluation: Dict[str, Any]) -> str:
       """Generate closing analysis report"""


       sources_text = "n".be a part of([f"- {s['title']}: {s['snippet'][:200]}"
                                for s in sources[:5]])


       immediate = f"""
       Create a complete analysis report on: "{question}"


       Based mostly on these sources:
       {sources_text}


       Evaluation abstract:
       - Complete sources: {evaluation['total_sources']}
       - Confidence: {evaluation['confidence_score']}


       Construction the report with:
       1. Govt Abstract (2-3 sentences)
       2. Key Findings (3-5 bullet factors)
       3. Detailed Evaluation (2-3 paragraphs)
       4. Conclusions & Implications (1-2 paragraphs)
       5. Analysis Limitations


       Be factual, well-structured, and insightful.
       """


       attempt:
           response = self.mannequin.generate_content(immediate)
           return response.textual content
       besides Exception as e:
           return f"""
# Analysis Report: {question}


## Govt Abstract
Analysis performed on "{question}" utilizing {evaluation['total_sources']} sources.


## Key Findings
- A number of views analyzed
- Complete info gathered
- Analysis accomplished efficiently


## Evaluation
The analysis course of concerned systematic assortment and evaluation of data associated to {question}. Varied sources had been consulted to supply a balanced perspective.


## Conclusions
The analysis supplies a basis for understanding {question} primarily based on out there info.


## Analysis Limitations
Restricted by API constraints and supply availability.
           """


   def conduct_research(self, question: str, depth: str = "customary") -> Dict[str, Any]:
       """Primary analysis orchestration technique"""
       print(f" Beginning analysis on: {question}")


       search_rounds = {"fundamental": 1, "customary": 2, "deep": 3}.get(depth, 2)
       sources_per_round = {"fundamental": 3, "customary": 5, "deep": 7}.get(depth, 5)


       all_sources = []


       search_queries = [query]


       if depth in ["standard", "deep"]:
           attempt:
               related_prompt = f"Generate 2 associated search queries for: {question}. One line every."
               response = self.mannequin.generate_content(related_prompt)
               additional_queries = [q.strip() for q in response.text.split('n') if q.strip()][:2]
               search_queries.lengthen(additional_queries)
           besides:
               cross


       for i, search_query in enumerate(search_queries[:search_rounds]):
           print(f" Search spherical {i+1}: {search_query}")
           sources = self.search_web(search_query, sources_per_round)
           all_sources.lengthen(sources)
           time.sleep(self.config.search_delay)


       unique_sources = []
       seen_urls = set()
       for supply in all_sources:
           if supply['url'] not in seen_urls:
               unique_sources.append(supply)
               seen_urls.add(supply['url'])


       print(f" Analyzing {len(unique_sources)} distinctive sources...")


       evaluation = self.analyze_sources(unique_sources[:self.config.max_sources], question)


       print(" Producing complete report...")


       report = self.generate_comprehensive_report(question, unique_sources, evaluation)


       return {
           'question': question,
           'sources_found': len(unique_sources),
           'evaluation': evaluation,
           'report': report,
           'sources': unique_sources[:10]
       }

We outline a ResearchConfig dataclass to handle parameters like API keys, supply limits, and delays, after which construct a DeepResearchSystem class that integrates Gemini with DuckDuckGo search. We implement strategies for internet search, key level extraction, supply evaluation, and report era, permitting us to orchestrate multi-round analysis and produce structured insights in a streamlined workflow. Take a look at the FULL CODES here.

Copy Code

def setup_research_system(api_key: str) -> DeepResearchSystem:
   """Fast setup for Google Colab"""
   config = ResearchConfig(
       gemini_api_key=api_key,
       max_sources=15,
       max_content_length=6000,
       search_delay=0.5
   )
   return DeepResearchSystem(config)

We create a setup_research_system perform that simplifies initialization in Google Colab by wrapping our configuration in ResearchConfig and returning a ready-to-use DeepResearchSystem occasion with customized limits and delays. Take a look at the FULL CODES here.

Copy Code

if __name__ == "__main__":
   API_KEY = "Use Your Personal API Key Right here"


   researcher = setup_research_system(API_KEY)


   question = "Deep Analysis Agent Structure"
   outcomes = researcher.conduct_research(question, depth="customary")


   print("="*50)
   print("RESEARCH RESULTS")
   print("="*50)
   print(f"Question: {outcomes['query']}")
   print(f"Sources discovered: {outcomes['sources_found']}")
   print(f"Confidence: {outcomes['analysis']['confidence_score']}")
   print("n" + "="*50)
   print("COMPREHENSIVE REPORT")
   print("="*50)
   print(outcomes['report'])


   print("n" + "="*50)
   print("SOURCES CONSULTED")
   print("="*50)
   for i, supply in enumerate(outcomes['sources'][:5], 1):
       print(f"{i}. {supply['title']}")
       print(f"   URL: {supply['url']}")
       print(f"   Preview: {supply['snippet'][:150]}...")
       print()

We add a most important execution block the place we initialize the analysis system with our API key, run a question on “Deep Analysis Agent Structure,” after which show structured outputs. We print analysis outcomes, a complete report generated by Gemini, and a listing of consulted sources with titles, URLs, and previews.

In conclusion, we see how your complete pipeline persistently transforms unstructured snippets right into a structured, well-organized report. We efficiently mix search, language modeling, and evaluation layers to simulate a whole analysis workflow inside Colab. Through the use of Gemini for extraction, synthesis, and reporting, and DuckDuckGo without spending a dime search entry, we create a reusable basis for extra superior agentic analysis programs. This pocket book supplies a sensible, technically detailed template that we will now broaden with extra fashions, customized rating, or domain-specific integrations, whereas nonetheless retaining a compact, end-to-end structure.

Take a look at the FULL CODES here. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter.

The submit How to Build a Multi-Round Deep Research Agent with Gemini, DuckDuckGo API, and Automated Reporting? appeared first on MarkTechPost.