Document AI Toolkit

From PDF to LLM Answers,
With Source Verification

Build document Q&A experiences with pinpoint source highlighting. No hallucination anxiety. Users see exactly where answers come from.

The Pipeline

PDF in, verified answers out. Every step optimized for RAG.

Input
PDF
Extract
PyMuPDF4LLM
Process
Your LLM
Locate
Source Locator
Display
WebViewer

Three Building Blocks

Each component works standalone. Together, they're a complete RAG toolkit.

PyMuPDF4LLM

PDF → LLM-Ready Markdown

Extracts text with layout awareness. Tables stay tables. Headers stay headers. Output optimized for LLM consumption.

  • Layout-preserving extraction
  • Table detection & formatting
  • Image extraction with captions
  • Chunk-ready output

Source Locator

LLM Output → PDF Coordinates

Maps quoted text from LLM responses back to exact PDF positions. Fuzzy matching handles extraction differences. Zero token cost.

  • Fuse.js fuzzy matching
  • No additional LLM calls
  • Handles whitespace & encoding diffs
  • Returns exact coordinates

MuPDF WebViewer

In-Browser PDF Experience

Full PDF viewer in pure JavaScript. Highlight, annotate, redact. All client-side — no server roundtrips, works air-gapped.

  • One-line integration
  • WebAssembly performance
  • Programmatic highlighting
  • Works with any framework
// 1. Extract PDF to Markdown
const markdown = await pymupdf4llm.to_markdown(pdfBuffer);

// 2. Get answer from your LLM (with source quotes)
const answer = await yourLLM.ask(markdown, question);

// 3. Locate source in PDF
const coordinates = sourceLocator.find(answer.sourceText);

// 4. Highlight in WebViewer
webViewer.highlight(coordinates);

Built for RAG Developers

Any workflow where users need to verify AI answers against source documents.

Legal

Contract Analysis

Ask questions about contracts and instantly see the exact clause. No more manual searching through 100-page agreements.

Finance

Financial Report Q&A

Query earnings reports and 10-Ks. Highlight the exact figures and footnotes that support the answer.

Research

Academic Paper Review

Search across papers and pinpoint methodology sections, citations, or specific findings instantly.

Enterprise

Internal Knowledge Base

Turn policy documents and SOPs into a Q&A system. Employees get answers with source verification.

See It in Action

Try the demo or talk to us about your RAG project.