The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the *system level*: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.
File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.
## Goal
Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).
All metrics are validated against a **ground truth fixture repository** (`test/fixtures/system-docs/`) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module.
| Metric | Target | How Measured |
|--------|--------|-------------|
| Subsystem detection accuracy | ≥90% of modules correctly clustered | Compare `subsystem.js` output against `expected-subsystems.json` fixture. Accuracy = correctly assigned files / total files. |
| Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured | Compare dependency matrix against `expected-deps.json`. Recall = captured edges / expected edges. |
| Contract extraction recall | ≥80% of exported interfaces/types extracted | Compare extracted contracts against `expected-contracts.json`. Recall = extracted / total annotated. |
| Incremental update precision | Only subsystems touched by semantic diff get regenerated | Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness). |
| Cascading invalidation | Shared subsystem change propagates to dependents | Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration. |
| LLM cost per full generation | ≤$2 (using local Ollama for drafting) | BACKLOGGED — measure token count statically in CI (e.g. via `tiktoken`) without hitting API. |
| Flow tracer terminates | All traces complete in <5s on 4,325-file graph | Wall-clock assertion on OpenClaw snapshot. |
## Ground Truth Fixture Repository
Located at `test/fixtures/system-docs/`. Contains:
3.**Graph-based (future):** Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.
Subsystems where >60% of edges are **inbound** from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as `cross-cutting`. Examples: `utils/`, `config/`, `types/`. The metric is `inbound_edges / total_edges > 0.6`. Cross-cutting subsystems are:
- Excluded from the dependency matrix visualization (reduces hairball)
- Documented separately as "Shared Infrastructure" in the reference docs
- Still tracked in the raw dependency data for completeness
- If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior)
- Re-exported interfaces (`export { Foo } from './types'`) are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition
- Deeply nested type literals (>3 levels) are flattened to `object` to avoid graph bloat
**Tests (7B):**
| Test | Input | Expected |
|------|-------|----------|
| Interface extraction | `gateway/types.ts` with 3 interfaces | 3 Interface entities with correct fields |
| Type alias | `type SessionKey = string` | 1 TypeAlias entity |
| Enum extraction | `enum Status { Active, Inactive }` | 1 Enum entity with 2 members |
| Re-exported interface | `gateway/types.ts` re-exports from `config/types.ts` | Resolved to original definition |
| Parse failure | Malformed TS file | Skipped with warning, no crash |
2. BFS through CALLS edges, recording subsystem transitions
3.**Cycle detection:** Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter).
4.**God object pruning:** Before tracing, compute in-degree for all nodes. Nodes with in-degree > `godThreshold` (default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes."
5.**Depth limit:** Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow).
**LLM narration (separate step):** The deterministic JSON flow is the testable artifact. LLM narration is applied *after* as a formatting pass in 7D. This means:
- Flow correctness is tested against `expected-flows.json` (deterministic)
- LLM prose quality is evaluated separately (human review, not CI)
**Performance guarantee:** BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout.
**Tests (7C):**
| Test | Input | Expected |
|------|-------|----------|
| Simple linear flow | Fixture entry point A→B→C across 3 subsystems | Matches `expected-flows.json` |
| Cycle detection | Fixture circular dep gateway↔agents | Cycle recorded, trace continues without loop |
| God object exclusion | Entry point that calls `utils/logger.ts:log` (high in-degree) | `log` excluded from trace |
| Depth limit | Deep call chain (>8 hops) | Trace stops at depth 8 |
| Test file exclusion | Entry point that calls a test helper | Test file skipped |
- **Tutorials:** Human-authored only. Not generated.
- **Reference:** System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose.
- **Explanation:** Architecture patterns (from dependency analysis), data flows (from flow traces — these explain *how the system works*, not *how to do a task*), design decisions (from architecture.md + commit history).
- **How-To:** Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred.
- Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs).