Runtime Loop

The runtime loop ingests tokens into the MegaContext Tree, scores the working window via LensNet, applies the Focus Allocator, and decodes with the frozen base LLM while logging telemetry.

Stages: ingest → gist update → focus scoring → allocation → decode → telemetry.
Budget: Working Context remains within W_max using block-aligned actions.
Demo goals: Planned tools/run_poc_loop script will showcase expansion/collapse within limits once the runtime loop lands (tracked in MegaContext End-to-End Training).
Telemetry: swap rates, ΔNLL, latency feed pruning and training loops.

Details

The streaming runtime keeps a frozen base LLM within a fixed working window while preserving the full MegaContext history.

End-to-end flow

Ingest & chunk: Incoming text is tokenized into 32-token blocks and appended to the MegaContext Tree gist tree (src/megacontext/memory/tree.py).
Gist updates: GistNet generates or refreshes hierarchical summaries so higher-level nodes stay in sync with new tokens.
Focus scoring: LensNet evaluates the working window, producing signed scores that suggest which spans deserve finer or coarser detail.
Allocation step: The Focus Allocator expands high-score spans to raw tokens or collapses low-score spans to gists, ensuring the window adheres to W_max.
Decode: The Working Context feeds the frozen base model (src/runtime/base_model.py), producing next-token predictions or downstream logits.
Telemetry: Swap events, ΔNLL comparisons, and latency stats are logged for analysis and future tuning.

Demo targets

Near-term goal: a dedicated tools/run_poc_loop entry point (TBD) streams a synthetic session, showing expansion/collapse while maintaining budget invariants (MegaContext End-to-End Training, POC Architecture). Until then, use the nanochat trainer or unit tests to exercise individual components.
Research milestone: benchmarking harnesses compare MegaContext runs against baselines and track swap rate, loss, and latency (Research Papers sequence, see Paper 2).

Focus heuristics

Greedy but bounded: Hysteresis and cooldowns prevent thrashing when spans hover near the decision boundary.
Multi-scale awareness: Bundles of raw tokens plus their parent gists let the allocator choose hybrid representations.
Telemetry-driven evolution: Access counts and ΔNLL sensitivity inform pruning strategies explored in Training & Operations Track B/D of Future Plan.

Mega Context

Explorer

Runtime Loop

Details

End-to-end flow

Demo targets

Focus heuristics

Graph View

Table of Contents

Backlinks