Migration Plan — nanochat Integration

Status: Migration program (in-flight). Use this plan alongside TODO.md (single source for outstanding tasks) and Training & Operations for the active workflow.

Goal: move the active MegaContext implementation from notebooks + PyTorch Lightning experiments onto the nanochat repository, layering the new PRD-driven architecture on top while keeping the upstream baseline intact.

Phase 0 — Due diligence & repo prep ✅

nanochat audit: confirmed the training entrypoints (train.py, scripts/run_*), data pipeline (tokenizer → binarized dataset), and logging (wandb/local JSON) align with MegaContext needs. Critical hook points:
- Model definition lives in nanochat/model.py — future home for GistNet/LensNet adapters.
- Trainer loop in nanochat/train.py — natural place for MegaContext end-to-end loss wiring and telemetry.
- Inference utilities (chat.py, sample.py) will host MegaPrediction multi-LOD heads.
Repo strategy: fork upstream nanochat, maintain two branches:
- main: mirrors upstream, used for regular rebase pulls.
- megacontext: primary development branch containing MegaContext extensions.
Constraints captured:
- Preserve nanochat’s baseline scripts/CLI so vanilla runs remain reproducible.
- Keep dependency additions modular (extra requirements via optional install).
- Ensure checkpoints/configs stay compatible with upstream formats.

Phase 1 — Skeleton alignment

Reproduce the vanilla nanochat pipeline end-to-end on local hardware to confirm parity. (Already exercised in a separate repo; plan is to bring the tuned “ $5 o n 5090” scr i pt in t o t h e f or k o n ce u p s t re am co d e l an d s, t h e n sc a l e t o$ 100 and $1000 runs for publication.)
Identify hook points for MegaContext subsystems:
- Training loop integration for MegaContext end-to-end plan.
- Decoder extension location for MegaPrediction multi-LOD heads.
- Memory/model boundaries needed by the Cognitive-Core Training PRD.
Document required dependency additions (tree storage, telemetry, notebooks migration).

Phase 2 — Legacy snapshot & archival

After Phase 1 parity is confirmed, move notebook/Lightning code to an _old/ folder (or equivalent archive) so all active work happens in the nanochat fork.
Update README/onboarding to signal the new baseline stack before integrating MegaContext code.
Maintain read-only access to notebooks solely for historical reference; no further development expected.

Phase 3 — MegaContext core scaffolding ✅

Implement the following as isolated modules so nanochat baseline remains functional:
- MegaContext Tree construction utilities (ingest, refresh, serialization).
- Working Context assembly with dynamic mixing of LOD entries.
- GistNet and LensNet entry points compatible with nanochat model classes.
- Registry/config glue for selecting between vanilla nanochat runs and MegaContext runs.
Add end-to-end smoke tests mirroring the current notebook experiments.

Phase 4 — PRD feature layering 🔄

E2E Training: Embed the MegaContext End-to-End Training loop into nanochat’s trainer, including horizon ΔNLL evaluation and gist losses. ✅ (multi-WC batches, opportunistic LOD2 horizons, and telemetry hooks now live; remaining work is dashboarding + inference parity).
MegaPrediction: Attach the multi-LOD decoding head defined in MegaPrediction Training; expose CLI flags for gist-first inference. ⏳
Cognitive Core: Scaffold composite memory management and CK reliance checks per Cognitive-Core Training. ⏳

Phase 5 — Migration & validation

Port critical datasets/configs from notebooks; ensure determinism with nanochat pipelines.
Recreate telemetry for ΔNLL, swap rate, budget utilization, gist regression losses, and composite MC metrics within nanochat’s logging framework.
Validate each subsystem in isolation before full e2e runs:
- GistNet substitutability checks (ΔNLL@H, gist cosine).
- LensNet focus audits (target vs. predicted utilities, budget invariants).
- MegaPrediction decoding smoke tests (gist-first vs. token-first).
- Composite Working Context assembly sanity (LOD mixes, budgets).
Conduct parity runs against the legacy notebooks, monitoring loss curves, ΔNLL@H, latency, and telemetry parity.

Phase 6 — Upstream parity & maintenance

Keep main synced with upstream nanochat releases; regularly rebase the megacontext branch.
Audit MegaContext-specific diffs to ensure baseline scripts remain intact.
Establish a cadence for dependency bumps, CI validation, and telemetry regression checks.

Mega Context

Explorer

Migration Plan - Nanochat Integration