MegaContext

A system architecture for virtualized hierarchical LLM memory.


What is MegaContext?


The MegaTexture Analogy

MegaContext is inspired by “MegaTexture”, a graphics technology from id Software that virtualized texture memory—streaming visible portions of vast textures at appropriate resolutions. MegaContext applies this same principle to language model context.

See MegaTexture Analogy for the complete explanation.


Why MegaContext Matters

  • Long context only helps if the model can focus. MegaContext treats focus as a learned, continuous process rather than a static prompt.
  • Resulting capabilities (detail in Grand Vision and Context Focus):
    • Persistent memory at constant compute
    • Dynamic attention that can re-expand old context when needed
    • Smaller, reasoning-focused models whose knowledge lives in the tree
    • Agentic workflows and large, updatable “core knowledge” prompts

Quick Start (Today)

  1. Read: skim this page, then dive into How MegaContext Works.
  2. Environment: follow README.md / SETUP.md for Python 3.11 + CUDA 12.x prerequisites.
  3. Train: use the matrix in Training & Operations—it’s the source of truth for run10.sh, speedrun.sh, run1000.sh, telemetry expectations, and troubleshooting.
  4. Evaluate / chat: follow Base Runtime for CLI/web demos once checkpoints land in ~/.cache/nanochat.

Documentation Navigation

There’s a lot of ground to cover—use these hubs to find the level you need:

Orientation

Architecture & Components

Plans & Roadmap

Execution & Operations

Reference


Project Status

Active work now follows the PRD stack rather than the legacy “Phase 0–4” plan. The near-term focus is:

  • Implement the POR PRDs (End-to-End Training, MegaAttention, MegaPrediction, Cognitive-Core, Hierarchical KV caching) on top of the landed nanochat fork.
  • Scale single-GPU ideas (e.g., run10.sh) into the larger speedrun.sh / run1000.sh trajectories and keep telemetry flowing through tree-aware metrics.

Use the PRD index for requirements and the migration tracker for execution progress.


Contributing

This is an open research project. We welcome contributions that advance the proof-of-concept and the broader vision.

  • Follow conventions in AGENTS.md for directory layout, testing, linting, and communication
  • Bootstrap: uv venv, uv sync, then uv run pytest --maxfail=1 --disable-warnings for smoke tests
  • Lint/format via uv run ruff check src tests and uv run ruff format src tests
  • Update progress in the Obsidian plan notes before hand-off so the next contributor has full context

See the GitHub repository for more details.


MegaContext virtualizes sequence memory just as MegaTexture virtualized textures—focusing detailed computation only where needed. It opens a path to persistent, updatable, and truly lifelong language models.