mailrag

Your email, answerable on your own hardware.

Turns a mail archive into a queryable knowledge base — hybrid dense+sparse retrieval, thread-aware answers assembled from whole conversations, optional local-LLM cleanup. Runs on your hardware, on open models, with nothing required to leave your network.

Python 3.11+ LlamaIndex bge-m3 hybrid self-hosted recall@5 46→93% Apache-2.0
thread-aware contextual RAG over the public Enron demo
$ git clone https://github.com/fmasi/mailrag && cd mailrag
$ make demo                # Qdrant + a thread-aware index over 100 public Enron emails
$ mailrag ask "who approved the Q3 budget, and when?"
  → retrieves the matching message, expands to its full thread,
    and answers from the whole conversation — with citations.

Why it exists

Making your mailbox searchable by an AI usually means handing your entire email history to someone else's servers. For your real correspondence, that's a non-starter.

So I built the opposite: an email RAG you run yourself — on your hardware, on open models, with nothing required to leave your network. And the result isn't just search — these emails are private context my own AI agents can draw on for total recall, without renting my memory to anyone. mailrag covers email; parley covers calls and meetings. Independent tools; my agents know about both and reach for whatever fits.

What it does

Thread-aware answers

The flagship: match a single message, then answer from its entire conversation. Lifts recall@5 from 62% → 93% — the single biggest lever, and it needs no LLM.

Hybrid retrieval

bge-m3 dense + learned sparse vectors, RRF-fused in Qdrant. Gets both the concept and the rare exact token — acronyms, IDs, reference numbers.

Email-aware preprocessing

Reply-chain stripping, calendar-invite collapsing, noise/newsletter filtering, and exact-text chunk dedup.

Local-LLM summaries

Optional summarize step: a local LLM writes a per-email summary + noise judgement, content-addressed and cached so re-runs are free. Point it at a model on 127.0.0.1.

Pluggable loaders

Public Enron corpus, local .eml archives, or Azure Blob — behind one EmailLoader interface, ready for live sources next.

Measured methodology

A 360-query eval that prices every technique, controls for confounds, reports significance — and in several cases overturned the intuitive choice.

How it works — the hard parts

Architecture

From a raw mailbox to an answered question — every stage runs on your own hardware.

mailrag pipeline: loaders (.eml / Enron / Azure Blob) to clean, chunk, embed (bge-m3), Qdrant hybrid store, thread-aware retrieval, and a local-LLM answer — with llm-none / llm-verify / llm-all personas, all on your own machine

The compound effect

Stacking the ladder — each technique added one at a time and individually measured on 360 real-email questions. recall@5 = how often the right email lands in the top 5 results:

Techniquerecall@5gain
plain dense (baseline)46%
+ learned sparse49%+3
+ contextual summary62%+13
+ reranking64%+2
+ thread reconstruction ★93%+29

★ The final step switches the goal from “find the exact email” to “find its thread” — a legitimately easier, and more useful, target. The two biggest levers (thread reconstruction +29, contextual summary +13) are both about understanding the conversation — not a fancier embedding model. Read the full benchmark →

Measured on a real work mailbox — 360 questions, each with a known correct answer (a hard label, no subjective judging) — then cross-checked on the public Enron-QA set (same ordering, so it isn't a quirk of one inbox). Run the identical comparison on legal e-discovery (the TREC Legal benchmark) and the order flips: a general-purpose dense+rerank stack wins there instead — same systems, opposite result, because the task is different. All references anonymized; the public make demo reproduces the method, not the figures. Full write-up in the benchmark post and the case study.

On the roadmap

mailrag is built to be one node in a private context stack — so the next steps make it easier for agents to reach, and keep its memory current:

MCP server

Expose search, ask, and attachment fetch over the Model Context Protocol, so any agent — yours or a teammate's — can query your mail without touching the internals.

Live ingestion

Move from one-time imports to incremental ingest of incoming mail, so the index stays current — a living context source, not a static snapshot.

Guided TUI

A full-screen terminal UI for the cleanup pipeline: pick a persona, watch the funnel, and approve the calibrate gate — replacing today's prompt-by-prompt flow.