Introduces RAG, a framework that marries dense retrieval (DPR) with
sequence-to-sequence generation to tackle knowledge-intensive NLP tasks.
Two variants: RAG-Token (retrieves per decoding step) and RAG-Sequence
(retrieves once per sequence), both combining retrieved document embeddings
with the generator via marginalisation.
Demonstrates substantial accuracy improvements on QA (Natural Questions,
TriviaQA), fact verification (FEVER), and JEOPARDY-style tasks while providing
provenance through retrieved passages.
Core Concepts
Dense Passage Retriever (DPR): Dual-encoder trained with in-batch negatives
to embed questions and passages; enables efficient nearest-neighbour search.
Marginalised likelihood: Generator conditions on multiple retrieved
passages; probabilities are marginalised across documents, blending retrieval
evidence with generation.
End-to-end fine-tuning: Retrieval and generation are jointly optimised,
aligning retriever embeddings with generator needs.
Highlights the benefit of explicit retrieval alongside learned gists; we
can treat the MegaContext Tree as an internal retriever feeding focused spans to
the Working Context (W_max = 8,192 tokens in POC Implementation).
Adopt dual-encoder style scoring within LensNet to rank candidate gist
nodes for expansion, potentially blending with focus scores.
Use RAG’s marginalisation trick to fuse multiple gist expansions—treat
each expansion as a retrieved document and marginalise across them when
scoring focus adjustments in the Focus Allocator.
Integrate provenance logging by storing references (node IDs, absolute positions) in
Node Metadata, enabling downstream explanations.
Leverage RAG benchmarks (NQ, TriviaQA) as evaluation tasks once MegaContext
supports long-form question answering beyond the POC scope.
Limitations & Risks
Dense retrieval relies on external corpora updates; MegaContext must keep
its gist store fresh and searchable to maintain accuracy (see MegaCuration
for future pruning strategies).
RAG can still hallucinate when retrieved passages are weak; we need guard
rails (confidence scoring via ΔNLL@H, fallback strategies) before
surfacing expansions.
REALM, RETRO, Atlas for alternative retrieval-augmented LMs that emphasise
differentiable memory or larger document stores.
FiD (Fusion-in-Decoder) to compare multi-document encoding strategies.
Self-RAG / Retrieval-augmented reinforcement learning for ideas on how the
model can decide when to retrieve or rely on memory.
Open Questions for MegaContext
How to balance learned gists vs explicit retrieval—should LensNet prefer
expansions that match external retriever hits or trust hierarchical summaries from
the MegaContext Tree?
Can we reuse RAG’s index tooling (FAISS neighbors, ANN search) to explore
MegaContext memory outside of strict chronological traversal,
potentially enabling semantic retrieval across LOD levels?
What telemetry best captures retrieval success so MegaContext knows when
to ingest new data or re-gist outdated nodes (deferred to post-POC per Future Plan)?