Working Context

The Working Context is the active memory that the base LLM consumes at inference time. Unlike the unbounded MegaContext Tree, the working context maintains a fixed token budget (W_max) by dynamically mixing raw tokens and compressed gists based on relevance.

What is the Working Context?

The Working Context is the fixed-size attention window that sits in GPU memory and gets fed directly into the frozen base model for inference. While the MegaContext Tree can grow unboundedly on disk (or RAM in the POC), the working context must fit within a strict token budget.

Think of it as the “spotlight” that illuminates different parts of your memory at different resolutions:

High resolution (LOD0): Raw tokens for the most relevant parts
Medium resolution (LOD1): 32:1 gists for moderately relevant regions
Low resolution (LOD2): 1024:1 gists for distant or less important context

This spotlight is continuously refocused by LensNet and Focus Allocator as new information arrives and priorities shift.

Core Properties

Fixed Budget

The working context operates under a strict token budget called W_max. In the POC, this is 8,192 tokens (configurable to 16k–32k). The system maintains the invariant that the sum of all entry costs never exceeds this budget.

For detailed budget mechanics, see Budget Invariant.

Mixed Levels of Detail

The working context contains entries at different LODs based on relevance:

Entry Type	Token Cost	Coverage
LOD0 block	32 tokens	32 raw tokens
LOD1 gist	1 token	32 tokens (32:1 compression)
LOD2 gist	1 token	1,024 tokens (1024:1 compression)

Contiguous Tiling

Entries tile the MegaContext Tree timeline without gaps or overlaps, maintaining perfect temporal continuity:

Timeline: [0 ─────────────────────────────────────────────────── T]
Working Context: [LOD0: 0-32] [LOD1: 32-64] [LOD1: 64-96] [LOD0: 96-128] ...

This contiguity ensures:

Coherent narrative flow for the base model
Consistent RoPE [1] positional encodings
No discontinuities during focus changes

Relationship to MegaContext Tree

The working context is not a separate data structure—it’s a dynamic view into the MegaContext Tree:

Aspect	MegaContext Tree	Working Context
Storage	Persistent (disk/RAM)	Ephemeral (GPU)
Scope	Complete history	Recent window
Size	Unbounded	Fixed (W_max)
Content	All LODs stored	Selective LODs
Mutability	Append-only	Dynamic refocus
Role	Long-term memory	Active attention

Analogy: The MegaContext Tree is your brain’s long-term memory (everything you’ve ever learned). The working context is your conscious attention right now (the small subset you’re actively thinking about).

Assembly Process

The working context is assembled from the MegaContext Tree by:

Selecting a temporal span to cover
Choosing appropriate LOD for each region based on relevance (using sparse attention patterns [2])
Fetching data from the tree’s storage
Materializing embeddings (tokens → embeddings, or gist vectors)
Concatenating into a single contiguous tensor

For full assembly details, see Working Context Assembly.

Refocusing

The working context evolves continuously as the conversation progresses:

Cadence: Refocus every K tokens (K=32 in POC)
Process: LensNet scores entries, Focus Allocator applies expand/collapse operations
Budget: Expansions and collapses are balanced to maintain the W_max constraint

Why refocus? Dynamic relevance means what was important 1000 tokens ago may no longer matter. Refocusing allows the system to zoom in on newly relevant regions and zoom out on now-irrelevant ones.

For full refocusing mechanics, see Working Context Refocusing.

Base LLM Interaction

From the base model’s perspective, the working context is just another context window. It doesn’t know some embeddings are gists rather than raw tokens:

Dimensionality match: Gists live in the same embedding space as tokens
RoPE compatibility: Gists positioned at central token index for consistent encoding [1]
Substitutability: GistNet trained so gists produce similar hidden states to original tokens

# Standard forward pass
outputs = base_model(
    inputs_embeds=working_context,  # [N, d] - mixed tokens & gists
    attention_mask=attention_mask,
    position_ids=position_ids        # Absolute indices for RoPE [1]
)

System Invariants

The working context maintains several critical invariants:

Budget Invariant: Total token cost ≤ W_max
Contiguity Invariant: No gaps or overlaps in temporal coverage
Block Alignment Invariant: All boundaries align with K-token blocks
Level Consistency Invariant: Entry LOD matches span size
RoPE Invariant: Consistent positional encoding

For detailed invariant definitions, see Invariants.

Role in the System

The Working Context is the central coordination point for all components:

For the base model: The only context it ever sees—a seemingly normal attention window
For LensNet: Input to analyze and score for focus adjustments
For Focus Allocator: The budget-constrained space where it applies expand/collapse decisions
For GistNet: Provides examples of which gists are actually used (for on-policy training)
For Runtime Loop: The working state that persists across decode steps

POC Implementation

In the proof-of-concept:

Size: W_max = 8,192 tokens
Update frequency: Refocus every K=32 tokens
Initial assembly: Start with most recent tokens/gists from the tree
No streaming: Entire working context in GPU memory
Simple heuristics: Initial focus policy with recency bias

See POC Implementation for full details and constraints.

Working Context Assembly - Detailed assembly process and algorithms
Working Context Refocusing - Refocusing mechanics and LensNet/Focus Allocator interaction
Invariants - System-wide constraints including budget, contiguity, and alignment
MegaContext Tree - The persistent memory structure backing the working context
LensNet - Neural network that scores entries for relevance
Focus Allocator - Component that applies expand/collapse operations
GistNet - Neural network that produces compressed gist embeddings
POC Implementation - Proof-of-concept constraints and implementation details

References

RoPE (Su et al., 2021) — Analysis — Rotary position embeddings used throughout MegaContext
Sparse Transformers (Child et al., 2019) — Analysis — Factorized sparse attention patterns

See Related Work for the complete bibliography of all research papers referenced throughout the documentation.

Mega Context

Explorer

Working Context

What is the Working Context?

Core Properties

Fixed Budget

Mixed Levels of Detail

Contiguous Tiling

Relationship to MegaContext Tree

Assembly Process

Refocusing

Base LLM Interaction

System Invariants

Role in the System

POC Implementation

References

Graph View

Table of Contents

Backlinks

Mega Context

Explorer

Working Context

What is the Working Context?

Core Properties

Fixed Budget

Mixed Levels of Detail

Contiguous Tiling

Relationship to MegaContext Tree

Assembly Process

Refocusing

Base LLM Interaction

System Invariants

Role in the System

POC Implementation

Related Pages

References

Graph View

Table of Contents

Backlinks