Retrieval-Augmented Generation (arXiv:2005.11401v4) — Report

PDF: RAG - 2005.11401v4.pdf

Overview

Introduces RAG, a framework that marries dense retrieval (DPR) with sequence-to-sequence generation to tackle knowledge-intensive NLP tasks.
Two variants: RAG-Token (retrieves per decoding step) and RAG-Sequence (retrieves once per sequence), both combining retrieved document embeddings with the generator via marginalisation.
Demonstrates substantial accuracy improvements on QA (Natural Questions, TriviaQA), fact verification (FEVER), and JEOPARDY-style tasks while providing provenance through retrieved passages.

Dense Passage Retriever (DPR): Dual-encoder trained with in-batch negatives to embed questions and passages; enables efficient nearest-neighbour search.
Marginalised likelihood: Generator conditions on multiple retrieved passages; probabilities are marginalised across documents, blending retrieval evidence with generation.
End-to-end fine-tuning: Retrieval and generation are jointly optimised, aligning retriever embeddings with generator needs.
Provenance-aware outputs: Retrieved passages supply evidence, improving factuality and enabling attribution.

Highlights the benefit of explicit retrieval alongside learned gists; we can treat the MegaContext Tree as an internal retriever feeding focused spans to the Working Context (W_max = 8,192 tokens in POC Implementation).
Suggests we log provenance metadata for each expansion so users can audit responses—aligns with recommendations for provenance tracking and Node Metadata.
Provides baselines for knowledge-intensive evaluation, relevant when measuring MegaContext’s impact on fact-heavy queries using ΔNLL@H metrics.
RAG’s external retrieval complements MegaContext’s hierarchical compression: RAG fetches external knowledge, MegaContext refocuses internal history.

Adopt dual-encoder style scoring within LensNet to rank candidate gist nodes for expansion, potentially blending with focus scores.
Use RAG’s marginalisation trick to fuse multiple gist expansions—treat each expansion as a retrieved document and marginalise across them when scoring focus adjustments in the Focus Allocator.
Integrate provenance logging by storing references (node IDs, absolute positions) in Node Metadata, enabling downstream explanations.
Leverage RAG benchmarks (NQ, TriviaQA) as evaluation tasks once MegaContext supports long-form question answering beyond the POC scope.

Dense retrieval relies on external corpora updates; MegaContext must keep its gist store fresh and searchable to maintain accuracy (see MegaCuration for future pruning strategies).
RAG can still hallucinate when retrieved passages are weak; we need guard rails (confidence scoring via ΔNLL@H, fallback strategies) before surfacing expansions.
Joint training may be resource-intensive; we may start with frozen base model and GistNet checkpoint (per POC Implementation) and iterate toward full end-to-end MegaContext End-to-End Training.

REALM, RETRO, Atlas for alternative retrieval-augmented LMs that emphasise differentiable memory or larger document stores.
FiD (Fusion-in-Decoder) to compare multi-document encoding strategies.
Self-RAG / Retrieval-augmented reinforcement learning for ideas on how the model can decide when to retrieve or rely on memory.

How to balance learned gists vs explicit retrieval—should LensNet prefer expansions that match external retriever hits or trust hierarchical summaries from the MegaContext Tree?
Can we reuse RAG’s index tooling (FAISS neighbors, ANN search) to explore MegaContext memory outside of strict chronological traversal, potentially enabling semantic retrieval across LOD levels?
What telemetry best captures retrieval success so MegaContext knows when to ingest new data or re-gist outdated nodes (deferred to post-POC per Future Plan)?