Related Work

This page serves as the unified bibliography for all external research cited in MegaContext documentation. Each entry links to a detailed paper analysis in the reference/papers/ directory.

Papers are referenced using local numbering within each document (e.g., [1], [2], [3]) rather than global numbers. This makes maintenance easier when adding new papers.

Core Inspiration

MegaTexture (Carmack, 2007) — Analysis — Virtual texturing system that inspired the core hierarchical streaming architecture
Perceiver (Jaegle et al., 2021) — Analysis — Latent cross-attention bottleneck architecture
Perceiver IO (Jaegle et al., 2021) — Analysis — Query-based decoding for arbitrary structured outputs
Slot Attention (Locatello et al., 2020) — Analysis — Object-centric iterative attention mechanism

Compression & Summarization

Gist Tokens (Mu et al., 2023) — Analysis — Learned prompt compression via attention masking
LLMLingua-2 (Pan et al., 2024) — Analysis — Task-agnostic prompt compression via token classification
Compressive Transformer (Rae et al., 2019) — Analysis — Long-term compressed memory for transformers
DeepSeek OCR (DeepSeek AI, 2024) — Analysis — 10× optical compression via rasterization

Retrieval & Memory Augmentation

RAG (Lewis et al., 2020) — Analysis — Retrieval-augmented generation baseline
RETRO (Borgeaud et al., 2022) — Analysis — Retrieval-enhanced autoregressive transformers
Memorizing Transformers (Wu et al., 2022) — Analysis — kNN-augmented approximate retrieval

Long Context Methods

Transformer-XL (Dai et al., 2019) — Analysis — Segment-level recurrence and relative positional encoding
LongLoRA (Chen et al., 2023) — Analysis — Efficient finetuning for extended context windows
RoPE (Su et al., 2021) — Analysis — Rotary position embeddings used throughout MegaContext

Attention Mechanisms

Sparse Transformers (Child et al., 2019) — Analysis — Factorized sparse attention patterns
Reformer (Kitaev et al., 2020) — Analysis — LSH attention and reversible layers
Flash Attention (Dao et al., 2022) — Analysis — IO-aware exact attention algorithm

Focus & Selection

Neural Turing Machines (Graves et al., 2014) — Analysis — Content-based addressing and memory controllers
Differentiable Neural Computer (Graves et al., 2016) — Analysis — Learned memory allocation and routing

Training Methods

LoRA (Hu et al., 2021) — Analysis — Low-rank adaptation used in GistNet/LensNet training
Knowledge Distillation (Hinton et al., 2015) — Analysis — Teacher-student framework for GistNet training

Papers by MegaContext Component

GistNet

Gist Tokens
LLMLingua-2
Compressive Transformer
DeepSeek OCR
Knowledge Distillation

LensNet

Perceiver
Perceiver IO
Slot Attention
Neural Turing Machines
Differentiable Neural Computer

Focus Allocator

Slot Attention
Neural Turing Machines
Differentiable Neural Computer

MegaContext Tree

MegaTexture
Compressive Transformer
Transformer-XL

Working Context

RoPE
Sparse Transformers
Reformer
Flash Attention

Training & Operations

LoRA
Knowledge Distillation
Alternating optimization (see GAN training literature)

See Comparisons for detailed comparisons between MegaContext and:

RAG systems (MegaContext & RAG)
Standard long-context LLMs
Sparse attention methods
Compressive transformers

Contributing References

When adding a new reference to the documentation:

Add inline citation in the source document using square brackets: [1], [2], etc. with local numbering specific to that document

Add to References section at bottom of that document:

## References
 
1. Short Title (Authors, Year) — add `reference/papers/<Paper Title>.md` and link it from the source doc.
2. Another Paper (Authors, Year) — same pattern for the new citation.

Add to this page in the appropriate category (without numbers)
Create analysis page at obsidian/reference/papers/Paper Name.md following the template established in existing analyses

See any existing paper analysis (e.g., Gist Tokens) for the standard format.

Paper Analysis Status

Complete Analyses (7 papers with PDFs)

Gist Tokens
LLMLingua-2
Perceiver
Perceiver IO
RAG
Slot Attention
DeepSeek OCR

Placeholder Analyses (14 papers awaiting PDFs)

MegaTexture
Compressive Transformer
RETRO
Memorizing Transformers
Transformer-XL
LongLoRA
RoPE
Sparse Transformers
Reformer
Flash Attention
Neural Turing Machines
DNC
LoRA
Knowledge Distillation

Mega Context

Explorer

Related Work

Core Inspiration

Compression & Summarization

Retrieval & Memory Augmentation

Long Context Methods

Attention Mechanisms

Focus & Selection

Training Methods

Papers by MegaContext Component

GistNet

LensNet

Focus Allocator

MegaContext Tree

Working Context

Training & Operations

Contributing References

Paper Analysis Status

Complete Analyses (7 papers with PDFs)

Placeholder Analyses (14 papers awaiting PDFs)

Graph View

Table of Contents

Backlinks

Mega Context

Explorer

Related Work

Core Inspiration

Compression & Summarization

Retrieval & Memory Augmentation

Long Context Methods

Attention Mechanisms

Focus & Selection

Training Methods

Papers by MegaContext Component

GistNet

LensNet

Focus Allocator

MegaContext Tree

Working Context

Training & Operations

Related Comparisons

Contributing References

Paper Analysis Status

Complete Analyses (7 papers with PDFs)

Placeholder Analyses (14 papers awaiting PDFs)

Graph View

Table of Contents

Backlinks