Research Papers Roadmap

Purpose: Outline a series of publications that build on one another. Each paper introduces one primary innovation while leveraging results from prior work, ensuring clarity and narrative cohesion across the program.

PaperWorking TitleNovel ContributionPrerequisites
Paper 0MegaContext: Virtualized Context for Frozen LLMsDemonstrate GistNet + LensNet + Focus Allocator + MegaContext Tree (in-memory) on a single GPU (Phase 1)Implementation Roadmap Phase 1
Paper 1Gaussian RoPE for Global Position AwarenessIntroduce the Gaussian RoPE positional encoding driven by global positions/LOD, showing improved long-horizon stabilityPaper 0
Paper 2MegaAttention: Hierarchical Attention over Working-Context TreesPresent MegaAttention masks (pyramidal attention) plus MegaPrediction heads for gist-first decoding; evaluate in 8×H100 settingPapers 0–1
Paper 3MegaCache & MegaCuration: Streaming MegaContext at ScaleDescribe MegaCache (disk↔RAM↔GPU cache) and telemetry-driven MegaCuration for long-lived deploymentsPapers 0–2
Paper 4Cognitive Core & Composite MegaContextsTrain a 1B-parameter Cognitive Core on curated contexts; show composite contexts, dynamic services, and marketplace integrationPapers 0–3

Each paper sits on top of the prior one(s) and cites them explicitly, so reviewers/readers can follow a clear progression of ideas.


Paper 0 — MegaContext: Virtualized Context for Frozen LLMs

Novelty: Introduce MegaContext as a two-context system (MegaContext Tree + Working Context) using GistNet, LensNet, and the Focus Allocator to keep a frozen LLM within fixed compute budgets.

Core sections:

  • Problem framing: frozen LLMs struggle with long-horizon edits.
  • Components: GistNet compression (32→1/1024→1), LensNet focus scoring, Focus Allocator policies, in-memory MegaContext Tree & Working Context.
  • Training recipe: run10 (--mc) on a single GPU with Phase 1 telemetry (ΔNLL@H, swap rate, residency, MFU).
  • Results: show substitutability, focus accuracy, latency.

Artifacts: Reference implementation (mc/), run10 scripts, documentation in Training & Operations.


Paper 1 — Gaussian RoPE for Global Position Awareness

Novelty: Replace standard RoPE with Gaussian RoPE driven by global positions + LOD-derived variance, enabling consistent positional reasoning when swapping gists/tokens.

Core sections:

  • Background: rotary embeddings, limitations for sparse/teleported context.
  • Method: track global positions/LOD in MC; derive (μ, σ) for each entry; compute Gaussian-attenuated cos/sin tensors; override GPT’s rotary embeddings.
  • Experiments: compare ΔNLL@H, swap-rate stability, long-context QA across simple vs Gaussian RoPE; include telemetry from mc/positional.py.
  • Integration: maintain stock RoPE when --mc is disabled (compatibility story).

Artifacts: mc/positional.py, GPT cos_sin_override, CLI knobs (--positional <impl>).


Paper 2 — MegaAttention: Hierarchical Attention over Working-Context Trees

Novelty: Introduce MegaAttention masks operating over wLOD trees and MegaPrediction heads, enabling gist-first decoding and structured attention patterns at scale.

Core sections:

  • Motivation: multi-scale attention vs flat sequences.
  • Method: pyramidal masks, GistNet-informed key sharing, MegaPrediction head predicting gist trajectories before tokens.
  • Experiments: speedrun.sh --mc runs on 8×H100 comparing baseline vs MegaAttention; evaluate sparsity, latency, gist prediction accuracy.
  • Tooling: MC/WC visualizations showing attention flows; telemetry dashboards for cache hits, gist accuracy.

Prereqs: Papers 0–1 infrastructure; ensures positional metadata and MC interfaces exist.


Paper 3 — MegaCache & MegaCuration: Streaming MegaContext at Scale

Novelty: Present the MegaCache (disk↔RAM↔GPU hierarchical cache) plus MegaCuration (telemetry-driven pruning/compaction) for long-lived deployments.

Core sections:

  • MegaCache: architecture, prefetch policies, hit/miss telemetry, streaming latencies.
  • MegaCuration: lens-guided pruning, policy design, before/after tree stats.
  • Experiments: long-running workloads showing storage savings, streaming latency improvements, curation effects on ΔNLL/residency.
  • Ops guidance: dashboards, alerts, runbooks from Phase 3.

Prereqs: Papers 0–2; demonstrates the leap from in-memory prototypes to disk-backed pipelines.


Paper 4 — Cognitive Core & Composite MegaContexts

Novelty: Train a 1B-parameter Cognitive Core over curated MegaContexts, enable LoRA-based “MC-izing” of pretrained bases, and launch the MegaContext marketplace + dynamic services.

Core sections:

  • Cognitive Core: architecture, training on ≥1T tokens, reasoning evaluations.
  • LoRA retrofit pipeline: toolchain for adding MC adapters to existing models.
  • Marketplace & dynamic services: sharing/selling curated contexts, file-watcher services, etc.
  • Experiments: quality gains from composite contexts, end-to-end workflow demos (marketplace, streaming, services).

Prereqs: Papers 0–3; pushes the system to production-scale ecosystems.


Next Steps

  • Use Implementation Roadmap phases to track engineering progress and know when each paper’s prerequisite is satisfied.
  • When a phase completes, create dedicated paper-specific branches with reproducible configs, notebooks, and telemetry snapshots so drafting the paper becomes mechanical.