|

Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

Today, Mistral AI launched OCR 4, its newest document-understanding mannequin. This new launch provides bounding containers, block classification, and inline confidence scores alongside extracted textual content. It helps 170 languages throughout 10 language teams and runs in a single container for totally self-hosted deployments. OCR 4 additionally serves as an ingestion element for enterprise search, RAG, and domain-specific retrieval pipelines.

TL;DR

  • OCR 4 returns bounding containers, typed-block labels, and per-word confidence scores, not simply textual content.
  • It helps 170 languages throughout 10 teams, with positive factors on uncommon and low-resource languages.
  • Independent annotators most popular OCR 4 over each system examined, averaging 72% win charges.
  • Pricing is $4 per 1,000 pages, dropping to $2 with the Batch-API low cost.
  • One endpoint serves each uncooked extraction and schema-driven Document AI output.

Mistral OCR 4

Mistral OCR 4 extracts and buildings content material from a variety of paperwork. Previous generations centered on changing a web page into clear textual content and tables. OCR 4 as a substitute returns a structured illustration of the entire doc.

Each block is localized with a bounding field and categorised by sort. Block varieties embrace titles, tables, equations, signatures, and extra. Inline confidence scores are generated per-page and per-word.

Downstream methods due to this fact study greater than what a doc says. They additionally study the place every aspect sits, what position it performs, and how assured the mannequin is. That additional context issues for citations, redactions, and human-in-the-loop verification.

OCR 4 accepts frequent enterprise codecs, together with PDF, DOC, PPT, and OpenDocument. The mannequin is compact sufficient to deploy in a single container. Self-managed deployment is out there to enterprise clients for knowledge residency and compliance.

Benchmark

Mistral in contrast OCR 4 in opposition to AI-native OCR fashions, frontier general-purpose fashions, enterprise doc companies, and Mistral OCR 3.

A variety of unbiased annotators most popular OCR 4 over each main system examined. Win charges averaged 72% throughout the comparability set. The analysis used 600+ paperwork throughout 12+ languages, sourced from third-party distributors. Annotators ranked every competitor’s output in opposition to OCR 4’s, doc by doc.

On automated benchmarks, OCR 4 scored 85.20 on the general public OlmOCRBench. It scored 93.07 on OmniDocBench and .98 on Mistral’s inner Crawl Multilingual analysis.

Two buyer knowledge factors add context. Rogo reported equal accuracy at roughly 8x decrease value and 17x decrease latency versus main agentic parsers. Anaqua measured roughly 4x quicker per web page than its incumbent supplier.

Segmentation, Not Just Text

Bounding containers have been Mistral’s most-requested functionality. They localize textual content for in-context highlighting and dependable knowledge pipelines.

Block varieties and confidence scores serve completely different jobs. They drive source-grounded citations, redactions, and human-in-the-loop verification. This construction helps a number of downstream workloads.

Clean, categorised blocks turn into higher retrieval items for RAG. Agents achieve structural primitives to act on paperwork, not simply learn them. Connectors obtain constant, typed output for ingestion and indexing.

OCR 4 can be an ingestion element of Mistral Search Toolkit, now in public preview. Search Toolkit is Mistral’s open-source, composable search framework. Its structured output provides citation-ready inputs to retrieval and analysis workflows.

Use Cases With Examples

OCR 4 helps each high-volume pipelines and interactive doc workflows.

  • Document parsing and extraction: Turn a multilingual contract into clear, structured markdown for indexing.
  • Retrieval-Augmented Generation (RAG): Feed categorised blocks into Search Toolkit for source-grounded solutions with citations.
  • Agentic workflows: Give an invoice-processing agent typed fields and bounding containers to fill types mechanically.
  • Confidence-gated pipelines: Route low-confidence areas to human verifiers, and auto-approve the remainder.
  • Enterprise search: Use OCR 4 as a data-source element for ingestion and entity extraction throughout an archive.

Early customers apply OCR 4 to flip invoices into structured fields and digitize firm archives. Others extract clear textual content from technical experiences or energy enterprise search.

A observe on scope from Mistral official launch: OCR 4 is a document-understanding mannequin, not a decision-maker. It will not be meant for medical analysis, authorized judgment, or high-stakes monetary selections. It can be unsuited to safety-critical methods, real-time processing, or non-document inputs like uncooked audio or video.

Comparison: Pure Extraction vs Document AI

OCR 4 ships behind a single API endpoint. Every request runs the identical mannequin. It at all times returns extracted content material, bounding containers, block varieties, confidence scores, and markdown. What varies is how a lot you layer on prime.

Capability Pure Extraction Mode Document AI Mode (similar endpoint)
Output Markdown, bboxes, block varieties, confidence Structured JSON in a schema you outline
How it really works Raw OCR response OCR output fed to mistral-small-2603
Image annotation Not utilized Per-image vision-language name on schema
Custom immediate No Yes, guides interpretation or abstract
Best for Pipelines, brokers, batch ingestion Business customers, pilots, no parsing logic
Price $4 / 1,000 pages ($2 batch) $5 / 1,000 pages
Self-hosting Available for enterprise Available for enterprise

The resolution rule is straightforward. Need uncooked extracted content material? Use OCR 4 as-is. Need the output reshaped right into a schema or annotated with area fields? Add the Document AI parameters to the identical name.

Working With the API

Basic extraction takes a doc URL and returns structured pages. Set include_blocks=True to get the typed blocks and bounding containers.

import os
from mistralai.consumer import Mistral

consumer = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

ocr_response = consumer.ocr.course of(
    mannequin="mistral-ocr-latest",
    doc={
        "sort": "document_url",
        "document_url": "https://arxiv.org/pdf/2201.04234"
    },
    include_blocks=True,                  # typed blocks + bounding containers
    table_format="html",                  # None (inline), "markdown", or "html"
    include_image_base64=True
)

The response is a JSON object with a pages array. Each web page carries markdown, photos, tables, hyperlinks, dimensions, and confidence_scores. To gate a human-review pipeline, request per-word confidence.

ocr_response = consumer.ocr.course of(
    mannequin="mistral-ocr-latest",
    doc={"sort": "document_url",
              "document_url": "https://arxiv.org/pdf/2201.04234"},
    confidence_scores_granularity="phrase"   # or "web page" for aggregates
)

The "phrase" setting provides a word_confidence_scores array per web page and per desk entry. For high-volume jobs, Mistral recommends the Batch Inference service, which halves the per-page value.


Try It: Interactive Output Explorer

The embed under visualizes OCR 4’s structured output. Switch between pattern paperwork, toggle bounding containers and block varieties, and activate the arrogance heatmap. The Markdown and JSON tabs present the 2 output shapes aspect by aspect. The pattern knowledge is illustrative, not a reside API name.