RAG-Anything Tutorial: Build a Multimodal Retrieval Pipeline for Text, Tables, Equations, and Images in Colab

In this tutorial, we construct a RAG-Anything workflow and use it to discover how multimodal retrieval works throughout textual content, tables, equations, and pictures. We begin by getting ready the Colab atmosphere, putting in the required packages, and securely getting into our OpenAI API key at runtime to maintain the pocket book sensible and protected to run. We then create a artificial multimodal report, generate a chart and PDF, convert the content material into RAG-Anything’s direct content_list format, and insert it into the retrieval system. As we transfer by way of the tutorial, we configure clear OpenAI-based chat, imaginative and prescient, and embedding features, initialize RAG-Anything, and take a look at completely different retrieval modes reminiscent of naive, native, international, and hybrid.

Installing RAG-Anything Dependencies

Copy Code

import os
import re
import sys
import json
import time
import shutil
import hashlib
import asyncio
import examine
import getpass
import subprocess
import importlib
import importlib.metadata
from pathlib import Path
from typing import List, Dict, Any
def run_shell(cmd, examine=True):
   print(f"n$ {cmd}")
   consequence = subprocess.run(cmd, shell=True, textual content=True)
   if examine and consequence.returncode != 0:
       increase RuntimeError(f"Command failed: {cmd}")
   return consequence.returncode
print("=" * 80)
print("RAG-Anything Advanced Colab Tutorial")
print("=" * 80)
print("n[1/10] Installing dependencies...")
for module_name in record(sys.modules):
   if module_name == "PIL" or module_name.startswith("PIL."):
       del sys.modules[module_name]
run_shell(
   'pip -q set up -U '
   '"raganything[image,text]" '
   '"openai>=1.0.0" '
   '"python-dotenv" '
   '"reportlab" '
   '"pandas" '
   '"matplotlib" '
   '"tabulate"'
)
run_shell('pip -q set up --no-cache-dir --force-reinstall "pillow==11.3.0"')
for module_name in record(sys.modules):
   if module_name == "PIL" or module_name.startswith("PIL."):
       del sys.modules[module_name]
importlib.invalidate_caches()
strive:
   print("Pillow model:", importlib.metadata.model("Pillow"))
besides Exception as e:
   print("Could not learn Pillow model:", repr(e))
print("n[2/10] Importing libraries...")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.show import show
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from reportlab.lib.items import inch
from openai import AsyncOpenAI
from raganything import RAGAnything, RAGAnythingConfig
from lightrag.utils import EmbeddingFunc
print("Imports profitable.")

We start by organising the whole Colab atmosphere for the RAG-Anything workflow. We set up the required libraries, restore the Pillow dependency, and import all of the modules wanted for plotting, PDF creation, OpenAI entry, and RAG-Anything. We additionally outline a reusable shell helper so the setup stays clear and simple to rerun.

Configuring Directories, Runtime Variables

Copy Code

print("n[3/10] Preparing directories and runtime settings...")
BASE_DIR = Path("/content material/raganything_advanced_tutorial") if Path("/content material").exists() else Path.cwd() / "raganything_advanced_tutorial"
ASSET_DIR = BASE_DIR / "belongings"
OUTPUT_DIR = BASE_DIR / "output"
WORKING_DIR = BASE_DIR / "rag_storage"
LOG_DIR = BASE_DIR / "logs"
RESET_STORAGE = True
RUN_FULL_DOCUMENT_PARSE = False
PARSER_FOR_FULL_PARSE = "mineru"
PARSE_METHOD = "auto"
for d in [BASE_DIR, ASSET_DIR, OUTPUT_DIR, WORKING_DIR, LOG_DIR]:
   d.mkdir(dad and mom=True, exist_ok=True)
if RESET_STORAGE and WORKING_DIR.exists():
   shutil.rmtree(WORKING_DIR)
   WORKING_DIR.mkdir(dad and mom=True, exist_ok=True)
os.environ["LOG_DIR"] = str(LOG_DIR)
os.environ["SUMMARY_LANGUAGE"] = "English"
os.environ["ENABLE_LLM_CACHE"] = "false"
os.environ["ENABLE_LLM_CACHE_FOR_EXTRACT"] = "false"
os.environ["MAX_ASYNC"] = "2"
os.environ["CHUNK_SIZE"] = "900"
os.environ["CHUNK_OVERLAP_SIZE"] = "120"
os.environ["TIMEOUT"] = "240"
for var in [
   "OPENAI_API_KEY",
   "OPENAI_ORG_ID",
   "OPENAI_ORGANIZATION",
   "OPENAI_PROJECT",
   "OPENAI_DEFAULT_HEADERS",
   "LLM_BINDING_API_KEY",
   "LLM_BINDING_HOST",
]:
   os.environ.pop(var, None)
print(f"Base listing: {BASE_DIR}")
print(f"Assets listing: {ASSET_DIR}")
print(f"Storage listing: {WORKING_DIR}")
print("n[4/10] Entering OpenAI API key securely...")
def clean_api_key(raw_value: str) -> str:
   raw_value = str(raw_value or "").strip()
   raw_value = raw_value.substitute("Bearer ", "").substitute("bearer ", "").strip()
   raw_value = raw_value.strip("'").strip('"').strip("`").strip()
   if "=" in raw_value:
       raw_value = raw_value.break up("=", 1)[1].strip().strip("'").strip('"').strip("`")
   raw_value = re.sub(r"s+", "", raw_value)
   raw_value = raw_value.encode("ascii", errors="ignore").decode("ascii").strip()
   return raw_value
OPENAI_API_KEY_RAW = getpass.getpass("Paste your OpenAI API key right here. Input is hidden: ")
OPENAI_API_KEY = clean_api_key(OPENAI_API_KEY_RAW)
if not OPENAI_API_KEY:
   increase ValueError(
       "No API key was captured. Paste the important thing into the hidden enter field and press Enter."
   )
print("Captured key size:", len(OPENAI_API_KEY))
print("Captured key prefix:", OPENAI_API_KEY[:12] + "...")
print("Captured key suffix:", "..." + OPENAI_API_KEY[-6:])
LLM_MODEL = "gpt-4o-mini"
VISION_MODEL = "gpt-4o-mini"
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIM = 1536
openai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
os.environ["LLM_MODEL"] = LLM_MODEL
os.environ["VISION_MODEL"] = VISION_MODEL
os.environ["EMBEDDING_MODEL"] = EMBEDDING_MODEL
os.environ["EMBEDDING_DIM"] = str(EMBEDDING_DIM)
print("Testing OpenAI chat API with the captured key...")
strive:
   test_response = await openai_client.chat.completions.create(
       mannequin=LLM_MODEL,
       messages=[{"role": "user", "content": "Reply with exactly: ok"}],
       temperature=0,
   )
   print("Chat API take a look at response:", test_response.decisions[0].message.content material)
besides Exception as e:
   increase RuntimeError(
       "The key was captured, however OpenAI rejected the request or the account/mannequin entry failed. "
       "Check billing, undertaking permissions, and make sure that that is an OpenAI Platform API key."
   ) from e
print("nTesting OpenAI embedding API...")
strive:
   test_embedding = await openai_client.embeddings.create(
       mannequin=EMBEDDING_MODEL,
       enter=["RAG-Anything embedding test"],
   )
   print("Embedding vector size:", len(test_embedding.information[0].embedding))
besides Exception as e:
   increase RuntimeError(
       "Chat labored, however embeddings failed. Make positive your API key has permission for embeddings."
   ) from e
print("OpenAI API key's working.")
print(f"Chat mannequin: {LLM_MODEL}")
print(f"Vision mannequin: {VISION_MODEL}")
print(f"Embedding mannequin: {EMBEDDING_MODEL}")
print(f"Embedding dimension: {EMBEDDING_DIM}")

We put together the working directories, output folders, logs, and runtime atmosphere variables that RAG-Anything makes use of throughout execution. We securely seize the OpenAI API key by way of a hidden enter, clear the pasted worth, and confirm that each the chat and embedding calls work accurately. We additionally outline the fashions and embedding dimensions that energy the remainder of the tutorial.

Generating a Synthetic Multimodal Report

Copy Code

print("n[5/10] Creating a artificial multimodal report...")
monthly_data = pd.DataFrame(
   {
       "Month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
       "Query Volume": [1200, 1700, 2100, 2600, 3300, 4100],
       "Hybrid Accuracy": [0.71, 0.74, 0.79, 0.83, 0.87, 0.91],
       "Average Latency ms": [980, 920, 850, 790, 760, 730],
   }
)
table_md = monthly_data.to_markdown(index=False)
plt.determine(figsize=(8, 4.8))
plt.plot(monthly_data["Month"], monthly_data["Query Volume"], marker="o", label="Query Volume")
plt.plot(monthly_data["Month"], monthly_data["Hybrid Accuracy"] * 4000, marker="s", label="Hybrid Accuracy scaled")
plt.title("Multimodal RAG Usage and Quality Trend")
plt.xlabel("Month")
plt.ylabel("Volume / Scaled Accuracy")
plt.legend()
plt.grid(True, alpha=0.3)
plt.textual content(
   0.02,
   0.95,
   "Synthetic determine: utilization rises whereas latency falls",
   rework=plt.gca().transAxes,
   fontsize=9,
   verticalalignment="high",
   bbox=dict(boxstyle="spherical", alpha=0.15),
)
chart_path = ASSET_DIR / "raganything_quality_trend.png"
plt.tight_layout()
plt.savefig(chart_path, dpi=180)
plt.shut()
report_pdf_path = ASSET_DIR / "synthetic_multimodal_rag_report.pdf"
c = canvas.Canvas(str(report_pdf_path), pagesize=letter)
width, top = letter
c.setFont("Helvetica-Bold", 18)
c.drawString(0.8 * inch, top - 0.8 * inch, "Synthetic Multimodal RAG Evaluation Report")
c.setFont("Helvetica", 10)
intro_lines = [
   "This report evaluates a synthetic multimodal RAG pipeline for enterprise documents.",
   "The knowledge base includes text, tables, equations, and visual evidence.",
   "The central hypothesis is that hybrid retrieval improves answer quality when evidence spans modalities.",
]
y = top - 1.25 * inch
for line in intro_lines:
   c.drawString(0.8 * inch, y, line)
   y -= 0.22 * inch
c.setFont("Helvetica-Bold", 12)
c.drawString(0.8 * inch, y - 0.1 * inch, "Table 1. Monthly system measurements")
y -= 0.4 * inch
c.setFont("Courier", 7.5)
for row in table_md.splitlines():
   c.drawString(0.8 * inch, y, row[:120])
   y -= 0.17 * inch
c.setFont("Helvetica-Bold", 12)
c.drawString(0.8 * inch, y - 0.15 * inch, "Equation 1. Weighted multimodal rating")
y -= 0.45 * inch
c.setFont("Helvetica", 9)
c.drawString(
   0.8 * inch,
   y,
   "Score(q, d) = alpha * Sim_text(q, d) + beta * Sim_graph(q, d) + gamma * Sim_visual(q, d)",
)
y -= 0.5 * inch
c.drawImage(str(chart_path), 0.8 * inch, y - 2.8 * inch, width=6.5 * inch, top=2.6 * inch)
c.presentPage()
c.setFont("Helvetica-Bold", 16)
c.drawString(0.8 * inch, top - 0.8 * inch, "Interpretation and Findings")
c.setFont("Helvetica", 10)
findings = [
   "Hybrid retrieval combines semantic similarity with graph-based relationship navigation.",
   "The synthetic table shows accuracy improving from 0.71 to 0.91 over six months.",
   "The generated figure shows query volume increasing while latency gradually decreases.",
   "Equation-level retrieval is useful when the question depends on scoring logic rather than plain prose.",
   "A multimodal system should preserve page index, captions, footnotes, and local image paths for traceability.",
]
y = top - 1.25 * inch
for discovering in findings:
   c.drawString(0.8 * inch, y, "- " + discovering)
   y -= 0.28 * inch
c.save()
print(f"Created chart: {chart_path}")
print(f"Created PDF: {report_pdf_path}")
print("nSynthetic desk:")
show(monthly_data)

We create a artificial multimodal report that gives lifelike content material for testing in RAG-Anything. We construct a small efficiency desk, generate a chart, and export a PDF containing textual content, a desk, an equation, and a determine. We use this managed doc to obviously observe how the system handles completely different content material sorts.

Building the RAG-Anything content_list for Text

Copy Code

print("n[6/10] Building direct multimodal content_list...")
content_list: List[Dict[str, Any]] = [
   {
       "type": "text",
       "text": (
           "This synthetic report evaluates a multimodal retrieval augmented generation system. "
           "The system indexes textual explanations, a structured performance table, a scoring equation, "
           "and a trend figure. The main goal is to answer questions whose evidence is distributed across "
           "several document modalities rather than one plain text passage."
       ),
       "page_idx": 0,
   },
   {
       "type": "table",
       "table_body": table_md,
       "table_caption": ["Table 1: Monthly query volume, hybrid accuracy, and average latency."],
       "table_footnote": ["Synthetic measurements created for a Colab tutorial."],
       "page_idx": 0,
   },
   {
       "kind": "equation",
       "latex": r"Score(q,d)=alpha cdot Sim_{textual content}(q,d)+beta cdot Sim_{graph}(q,d)+gamma cdot Sim_{visible}(q,d)",
       "textual content": (
           "Weighted multimodal retrieval rating. Alpha controls textual content similarity, beta controls graph relationship "
           "similarity, and gamma controls visible similarity."
       ),
       "page_idx": 0,
   },
   {
       "kind": "picture",
       "img_path": str(chart_path.resolve()),
       "image_caption": ["Figure 1: Multimodal RAG usage and quality trend."],
       "image_footnote": ["The line chart is synthetic and generated inside this tutorial."],
       "page_idx": 0,
   },
   {
       "kind": "textual content",
       "textual content": (
           "The key discovering is that hybrid retrieval is most well-liked for cross-modal questions. "
           "Local retrieval is helpful for entity-specific lookup, international retrieval is helpful for broader themes, "
           "and naive retrieval is a easier baseline. In this report, hybrid accuracy rises from 0.71 in January "
           "to 0.91 in June, whereas common latency drops from 980 milliseconds to 730 milliseconds."
       ),
       "page_idx": 1,
   },
]
content_list_path = ASSET_DIR / "content_list.json"
with open(content_list_path, "w", encoding="utf-8") as f:
   json.dump(content_list, f, indent=2, ensure_ascii=False)
print(f"Saved content material record: {content_list_path}")

We convert the artificial report into RAG-Anything’s direct content_list format. We signify textual content, tables, equations, and pictures as separate structured blocks with captions, footnotes, web page indexes, and picture paths. We save this record of multimodal content material as JSON so the workflow stays clear and reusable.

Defining OpenAI Chat, Vision, and Embedding Functions

Copy Code

print("n[7/10] Defining clear OpenAI mannequin and embedding features...")
async def llm_model_func(immediate, system_prompt=None, history_messages=None, **kwargs):
   messages = []
   if system_prompt:
       messages.append({"position": "system", "content material": str(system_prompt)})
   for msg in history_messages or []:
       if isinstance(msg, dict) and "position" in msg and "content material" in msg:
           messages.append(msg)
   messages.append({"position": "consumer", "content material": str(immediate)})
   allowed_kwargs = {}
   for key in ["temperature", "top_p", "max_tokens", "response_format"]:
       if key in kwargs and kwargs[key] shouldn't be None:
           allowed_kwargs[key] = kwargs[key]
   response = await openai_client.chat.completions.create(
       mannequin=LLM_MODEL,
       messages=messages,
       **allowed_kwargs,
   )
   return response.decisions[0].message.content material or ""
async def vision_model_func(
   immediate,
   system_prompt=None,
   history_messages=None,
   image_data=None,
   messages=None,
   **kwargs,
):
   allowed_kwargs = {}
   for key in ["temperature", "top_p", "max_tokens", "response_format"]:
       if key in kwargs and kwargs[key] shouldn't be None:
           allowed_kwargs[key] = kwargs[key]
   if messages:
       clean_messages = [m for m in messages if m is not None]
       response = await openai_client.chat.completions.create(
           mannequin=VISION_MODEL,
           messages=clean_messages,
           **allowed_kwargs,
       )
       return response.decisions[0].message.content material or ""
   built_messages = []
   if system_prompt:
       built_messages.append({"position": "system", "content material": str(system_prompt)})
   for msg in history_messages or []:
       if isinstance(msg, dict) and "position" in msg and "content material" in msg:
           built_messages.append(msg)
   if image_data:
       built_messages.append(
           {
               "position": "consumer",
               "content material": [
                   {"type": "text", "text": str(prompt)},
                   {
                       "type": "image_url",
                       "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
                   },
               ],
           }
       )
   else:
       built_messages.append({"position": "consumer", "content material": str(immediate)})
   response = await openai_client.chat.completions.create(
       mannequin=VISION_MODEL,
       messages=built_messages,
       **allowed_kwargs,
   )
   return response.decisions[0].message.content material or ""
async def openai_embedding_func(texts, **kwargs):
   if isinstance(texts, str):
       texts = [texts]
   texts = [str(t) for t in texts]
   response = await openai_client.embeddings.create(
       mannequin=EMBEDDING_MODEL,
       enter=texts,
   )
   vectors = [item.embedding for item in response.data]
   return np.array(vectors, dtype=np.float32)
embedding_func = EmbeddingFunc(
   embedding_dim=EMBEDDING_DIM,
   max_token_size=8192,
   func=openai_embedding_func,
)
print("Model and embedding features prepared.")

We outline clear OpenAI-powered features for textual content technology, imaginative and prescient technology, and embedding creation. We deal with system prompts, chat historical past, multimodal picture inputs, and optionally available mannequin parameters in a managed manner. We then wrap the embedding operate with LightRAG’s EmbeddingFunc so RAG-Anything can use it throughout indexing and retrieval.

Initializing RAG-Anything and Running Hybrid Retrieval

Copy Code

print("n[8/10] Initializing RAG-Anything...")
config = RAGAnythingConfig(
   working_dir=str(WORKING_DIR),
   parser=PARSER_FOR_FULL_PARSE,
   parse_method=PARSE_METHOD,
   enable_image_processing=True,
   enable_table_processing=True,
   enable_equation_processing=True,
)
rag = RAGAnything(
   config=config,
   llm_model_func=llm_model_func,
   vision_model_func=vision_model_func,
   embedding_func=embedding_func,
)
async def maybe_await(worth):
   if examine.isawaitable(worth):
       return await worth
   return worth
if hasattr(rag, "initialize_storages"):
   strive:
       await maybe_await(rag.initialize_storages())
       print("RAG-Anything storages initialized.")
   besides Exception as e:
       print("Storage initialization skipped or already dealt with:", repr(e))
print(f"Working listing: {WORKING_DIR}")
print("n[9/10] Inserting multimodal content material and working retrieval queries...")
async def insert_demo_content():
   await rag.insert_content_list(
       content_list=content_list,
       file_path=str(report_pdf_path.title),
       split_by_character=None,
       split_by_character_only=False,
       doc_id="synthetic-multimodal-rag-report",
       display_stats=True,
   )
await insert_demo_content()
print("Insertion full.")
queries = [
   "What is the main purpose of the multimodal RAG report?",
   "How did hybrid accuracy and latency change from January to June?",
   "Why is hybrid retrieval better than naive retrieval for this report?",
   "What does the weighted multimodal score equation mean?",
]
async def safe_aquery(query, mode="hybrid", vlm_enhanced=False):
   strive:
       return await rag.aquery(query, mode=mode, vlm_enhanced=vlm_enhanced)
   besides TypeError:
       return await rag.aquery(query, mode=mode)
async def run_query_suite():
   outcomes = []
   for mode in ["naive", "local", "global", "hybrid"]:
       print("n" + "=" * 80)
       print(f"QUERY MODE: {mode.higher()}")
       print("=" * 80)
       for q in queries:
           print(f"nQuestion: {q}")
           strive:
               reply = await safe_aquery(q, mode=mode, vlm_enhanced=False)
           besides Exception as e:
               reply = f"Query failed in mode={mode}: {repr(e)}"
           print("nAnswer:")
           print(reply)
           print("-" * 80)
           outcomes.append(
               {
                   "mode": mode,
                   "query": q,
                   "answer_preview": str(reply)[:700],
               }
           )
   return pd.DataFrame(outcomes)
query_results_df = await run_query_suite()
print("nQuery consequence preview:")
show(query_results_df)

We initialize RAG-Anything with the working listing, parser settings, and multimodal processing choices enabled. We insert the ready multimodal content material record into the system and let RAG-Anything course of the textual content, desk, equation, and picture blocks. We then run a number of retrieval modes to check the habits of naive, native, international, and hybrid queries.

Running Explicit Multimodal Queries

Copy Code

print("n[10/10] Running express multimodal queries...")
async def run_multimodal_queries():
   multimodal_cases = [
       {
           "name": "Table-aware query",
           "question": (
               "Using the supplied table, identify the month with the highest hybrid accuracy, "
               "the month with the lowest latency, and explain whether the trend supports the report conclusion."
           ),
           "multimodal_content": [
               {
                   "type": "table",
                   "table_data": table_md,
                   "table_caption": "Monthly performance table",
               }
           ],
       },
       {
           "title": "Equation-aware question",
           "query": (
               "Explain how this scoring equation ought to have an effect on retrieval when the consumer's query wants "
               "textual, graph, and visible proof on the similar time."
           ),
           "multimodal_content": [
               {
                   "type": "equation",
                   "latex": r"Score(q,d)=alpha Sim_{text}(q,d)+beta Sim_{graph}(q,d)+gamma Sim_{visual}(q,d)",
                   "equation_caption": "Weighted multimodal retrieval score",
               }
           ],
       },
       {
           "title": "Combined multimodal question",
           "query": (
               "Connect the desk, equation, and doc conclusion into one clarification of why a multimodal "
               "hybrid retriever is helpful."
           ),
           "multimodal_content": [
               {
                   "type": "table",
                   "table_data": table_md,
                   "table_caption": "Monthly performance table",
               },
               {
                   "type": "equation",
                   "latex": r"Score(q,d)=alpha Sim_{text}(q,d)+beta Sim_{graph}(q,d)+gamma Sim_{visual}(q,d)",
                   "equation_caption": "Weighted multimodal retrieval score",
               },
           ],
       },
   ]
   outputs = []
   for case in multimodal_cases:
       print("n" + "=" * 80)
       print(case["name"])
       print("=" * 80)
       print("Question:", case["question"])
       strive:
           reply = await rag.aquery_with_multimodal(
               case["question"],
               multimodal_content=case["multimodal_content"],
               mode="hybrid",
           )
       besides Exception as e:
           reply = f"Multimodal question failed: {repr(e)}"
       print("nAnswer:")
       print(reply)
       outputs.append(
           {
               "case": case["name"],
               "query": case["question"],
               "answer_preview": str(reply)[:900],
           }
       )
   return pd.DataFrame(outputs)
multimodal_results_df = await run_multimodal_queries()
print("nMultimodal consequence preview:")
show(multimodal_results_df)
print("nOptional full-parser path:")
print("RUN_FULL_DOCUMENT_PARSE is at present:", RUN_FULL_DOCUMENT_PARSE)
async def optional_full_document_parse():
   if not RUN_FULL_DOCUMENT_PARSE:
       print(
           "Skipping parser-based PDF ingestion. "
           "Set RUN_FULL_DOCUMENT_PARSE=True close to the highest to check MinerU/Docling/PaddleOCR parsing."
       )
       return
   print("Starting full doc parsing.")
   await rag.process_document_complete(
       file_path=str(report_pdf_path),
       output_dir=str(OUTPUT_DIR),
       parse_method=PARSE_METHOD,
       parser=PARSER_FOR_FULL_PARSE,
       display_stats=True,
       doc_id="parser-processed-synthetic-report",
   )
   reply = await safe_aquery(
       "After full parsing, what figures, tables, and equations are current in the report?",
       mode="hybrid",
       vlm_enhanced=False,
   )
   print(reply)
await optional_full_document_parse()
print("n" + "=" * 80)
print("Tutorial full.")
print("=" * 80)
print(f"Assets listing: {ASSET_DIR}")
print(f"RAG storage listing: {WORKING_DIR}")
print(f"Output listing: {OUTPUT_DIR}")
print("nGenerated information:")
for path in sorted(ASSET_DIR.glob("*")):
   print(" -", path)

We take a look at express multimodal queries that straight present desk and equation content material at question time. We ask the system to motive over structured values, scoring logic, and mixed multimodal proof. We end by exhibiting the consequence previews, protecting the optionally available full-document parser path accessible for deeper PDF ingestion experiments.

Conclusion

In conclusion, we’ve a working RAG-Anything pipeline that may ingest multimodal content material and reply questions utilizing each commonplace and multimodal retrieval paths. We noticed how textual content passages, markdown tables, LaTeX equations, and generated figures will be represented as separate content material blocks whereas nonetheless contributing to a shared retrieval system. We additionally examined a number of question modes and in contrast the system’s responses when questions require direct details, broader context, or cross-modal reasoning.

Check out the Full Codes here. Also, be happy to observe us on Twitter and don’t overlook to affix our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to accomplice with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The submit RAG-Anything Tutorial: Build a Multimodal Retrieval Pipeline for Text, Tables, Equations, and Images in Colab appeared first on MarkTechPost.

RAG-Anything Tutorial: Build a Multimodal Retrieval Pipeline for Text, Tables, Equations, and Images in Colab

Installing RAG-Anything Dependencies

Configuring Directories, Runtime Variables

Generating a Synthetic Multimodal Report

Building the RAG-Anything content_list for Text

Defining OpenAI Chat, Vision, and Embedding Functions

Initializing RAG-Anything and Running Hybrid Retrieval

Running Explicit Multimodal Queries

Conclusion

JSON Prompting for LLMs: A Practical Guide with Python Coding Examples

Small AI models can now see for powerful language models like GPT-4

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

MLPerf Inference v5.1 (2025): Results Explained for GPUs, CPUs, and AI Accelerators

How to Design a Fully Interactive, Reactive, and Dynamic Terminal-Based Data Dashboard Using Textual?

A Coding Implementation on Building Self-Organizing Zettelkasten Knowledge Graphs and Sleep-Consolidation Mechanisms

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Installing RAG-Anything Dependencies

Configuring Directories, Runtime Variables

Generating a Synthetic Multimodal Report

Building the RAG-Anything content_list for Text

Defining OpenAI Chat, Vision, and Embedding Functions

Initializing RAG-Anything and Running Hybrid Retrieval

Running Explicit Multimodal Queries

Conclusion

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!