# Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation **Status:** DRAFT v2 (post-SPA Round 1) **Author:** Max (AI) + Brian (Human) **Date:** 2026-03-09 **Depends on:** Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen) --- ## Problem Statement The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the *system level*: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives. File-level docs are reference material. System-level docs are what engineers actually read to understand how things work. ## Goal Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation). ## Success Criteria All metrics are validated against a **ground truth fixture repository** (`test/fixtures/system-docs/`) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module. | Metric | Target | How Measured | |--------|--------|-------------| | Subsystem detection accuracy | ≥90% of modules correctly clustered | Compare `subsystem.js` output against `expected-subsystems.json` fixture. Accuracy = correctly assigned files / total files. | | Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured | Compare dependency matrix against `expected-deps.json`. Recall = captured edges / expected edges. | | Contract extraction recall | ≥80% of exported interfaces/types extracted | Compare extracted contracts against `expected-contracts.json`. Recall = extracted / total annotated. | | Generated doc structure | Matches Divio 4-category template | Structural assertion: verify directory layout, required sections present in each generated .md file. | | Incremental update precision | Only subsystems touched by semantic diff get regenerated | Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness). | | Cascading invalidation | Shared subsystem change propagates to dependents | Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration. | | LLM cost per full generation | ≤$2 (using local Ollama for drafting) | BACKLOGGED — measure token count statically in CI (e.g. via `tiktoken`) without hitting API. | | Flow tracer terminates | All traces complete in <5s on 4,325-file graph | Wall-clock assertion on OpenClaw snapshot. | ## Ground Truth Fixture Repository Located at `test/fixtures/system-docs/`. Contains: ``` test/fixtures/system-docs/ ├── src/ │ ├── gateway/ (5 files: server.ts, session.ts, middleware.ts, types.ts, utils.ts) │ ├── agents/ (5 files: runner.ts, scope.ts, tools.ts, types.ts, defaults.ts) │ ├── channels/ │ │ ├── telegram.ts │ │ └── discord.ts │ ├── config/ (3 files: config.ts, schema.ts, types.ts) │ └── utils/ (3 files: logger.ts, crypto.ts, fs-helpers.ts) ├── expected-subsystems.json ← hand-labeled subsystem assignments ├── expected-deps.json ← hand-labeled inter-subsystem edges ├── expected-contracts.json ← hand-labeled interfaces/types ├── expected-flows.json ← hand-labeled flow traces for 2 entry points ├── expected-diagrams/ ← expected Mermaid source for each diagram type └── architecture.md ← mock architecture doc for ingestion testing ``` **Edge cases included in fixtures:** - `utils/` as a cross-cutting concern (high fan-out, should be tagged as `cross-cutting`) - Circular dependency: `gateway/session.ts` ↔ `agents/runner.ts` (mutual CALLS) - Orphan file: `config/schema.ts` (no inbound edges, only exports) - Re-exported interface: `gateway/types.ts` re-exports from `config/types.ts` - Empty subsystem: `channels/` has only 2 files with no internal CALLS edges ## Architecture ### 7A: Subsystem Aggregator (`subsystem.js`) **Purpose:** Group file-level entities into logical subsystems and compute inter-subsystem relationships. **Clustering Strategy (tiered):** 1. **Directory-based (default):** Top-level directory under `src/` = subsystem. `gateway/`, `agents/`, `cli/`, `telegram/`, etc. Simple, deterministic, zero-config. 2. **Config-driven (override):** Optional `subsystems.yaml` that maps directories to named subsystems with human labels and grouping overrides. ```yaml subsystems: - name: Gateway label: "Session & Request Gateway" paths: ["gateway/", "routing/"] - name: Agents label: "AI Agent Runtime" paths: ["agents/", "auto-reply/"] - name: Channels label: "Channel Adapters" paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"] ``` 3. **Graph-based (future):** Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries. **Cross-cutting concern detection:** Subsystems where >60% of edges are **inbound** from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as `cross-cutting`. Examples: `utils/`, `config/`, `types/`. The metric is `inbound_edges / total_edges > 0.6`. Cross-cutting subsystems are: - Excluded from the dependency matrix visualization (reduces hairball) - Documented separately as "Shared Infrastructure" in the reference docs - Still tracked in the raw dependency data for completeness **Output:** ```json { "subsystems": [ { "name": "gateway", "label": "Session & Request Gateway", "kind": "domain", "files": ["gateway/session-utils.ts", "gateway/server.ts"], "entities": { "functions": 142, "classes": 3, "modules": 28 }, "publicExports": ["deriveSessionTitle", "loadSessionEntry"], "internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}], "externalDeps": ["commander", "node:fs", "node:path"] } ], "crossCutting": ["utils", "config"], "dependencyMatrix": { "gateway→agents": { "calls": 89, "imports": 34 }, "agents→config": { "calls": 156, "imports": 120 } } } ``` **Tests (7A):** | Test | Input | Expected | |------|-------|----------| | Directory clustering | Fixture repo | Matches `expected-subsystems.json` (5 subsystems) | | Config override | Fixture + `subsystems.yaml` merging gateway+routing | Merged subsystem with combined files | | Cross-cutting detection | Fixture `utils/` (high fan-out) | Tagged as `cross-cutting` | | Empty subsystem | Fixture `channels/` (2 files, no internal calls) | Valid subsystem with 0 internal edges | | Orphan file | `config/schema.ts` (no inbound) | Assigned to `config` subsystem, not dropped | ### 7B: Contract Extractor (`contracts.js`) **Purpose:** Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities. **What to extract:** - `interface Foo { ... }` → entity type `Interface`, with fields as properties - `type Foo = { ... }` → entity type `TypeAlias` - `enum Foo { ... }` → entity type `Enum`, with members - Exported `const` objects used as config defaults → entity type `ConfigContract` - YAML schema keys (from config files) → entity type `ConfigSchema` **Relationships:** - `IMPLEMENTS` — class → interface - `ACCEPTS` — function parameter → interface/type (function signature contracts) - `RETURNS` — function → return type - `EXTENDS` — interface → interface **Error handling:** - If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior) - Re-exported interfaces (`export { Foo } from './types'`) are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition - Deeply nested type literals (>3 levels) are flattened to `object` to avoid graph bloat **Tests (7B):** | Test | Input | Expected | |------|-------|----------| | Interface extraction | `gateway/types.ts` with 3 interfaces | 3 Interface entities with correct fields | | Type alias | `type SessionKey = string` | 1 TypeAlias entity | | Enum extraction | `enum Status { Active, Inactive }` | 1 Enum entity with 2 members | | Re-exported interface | `gateway/types.ts` re-exports from `config/types.ts` | Resolved to original definition | | Parse failure | Malformed TS file | Skipped with warning, no crash | | Recall benchmark | Fixture repo | ≥80% of `expected-contracts.json` extracted | ### 7C: Flow Tracer (`flow.js`) **Purpose:** Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow. **Algorithm:** 1. Start at entry point entity (e.g., `telegram/bot-handlers.ts:onMessage`) 2. BFS through CALLS edges, recording subsystem transitions 3. **Cycle detection:** Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter). 4. **God object pruning:** Before tracing, compute in-degree for all nodes. Nodes with in-degree > `godThreshold` (default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes." 5. **Depth limit:** Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow). 6. **Test file exclusion:** Skip any file matching `*.test.*`, `*.spec.*`, `test/`, `__tests__/`. 7. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call 8. Output: ordered list of subsystem hops with the specific function calls that cross boundaries **Output (deterministic JSON — testable without LLM):** ```json { "entryPoint": "telegram/bot-handlers.ts:onMessage", "depth": 8, "godThreshold": 50, "excludedNodes": ["utils/logger.ts:log", "config/config.ts:getConfig"], "cyclesDetected": [ { "at": "gateway/session.ts:loadSession", "backEdgeTo": "agents/runner.ts:runAgent" } ], "flow": [ { "subsystem": "telegram", "entity": "telegram/bot-handlers.ts:onMessage", "depth": 0 }, { "subsystem": "routing", "entity": "routing/session-key.ts:resolveKey", "depth": 1, "crossedVia": "CALLS" }, { "subsystem": "gateway", "entity": "gateway/session.ts:loadSession", "depth": 2, "crossedVia": "CALLS" }, { "subsystem": "agents", "entity": "agents/runner.ts:runAgent", "depth": 3, "crossedVia": "CALLS" } ], "subsystemSequence": ["telegram", "routing", "gateway", "agents"] } ``` **LLM narration (separate step):** The deterministic JSON flow is the testable artifact. LLM narration is applied *after* as a formatting pass in 7D. This means: - Flow correctness is tested against `expected-flows.json` (deterministic) - LLM prose quality is evaluated separately (human review, not CI) **Performance guarantee:** BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout. **Tests (7C):** | Test | Input | Expected | |------|-------|----------| | Simple linear flow | Fixture entry point A→B→C across 3 subsystems | Matches `expected-flows.json` | | Cycle detection | Fixture circular dep gateway↔agents | Cycle recorded, trace continues without loop | | God object exclusion | Entry point that calls `utils/logger.ts:log` (high in-degree) | `log` excluded from trace | | Depth limit | Deep call chain (>8 hops) | Trace stops at depth 8 | | Test file exclusion | Entry point that calls a test helper | Test file skipped | | Performance | OpenClaw full snapshot | <5s wall clock | | Empty trace | Entry point with no outgoing CALLS | Returns flow with single entry, no hops | ### 7D: Hierarchical Doc Generator (`sysdoc.js`) **Purpose:** Orchestrate 7A-7C to produce a complete documentation site in Divio structure. **Output structure:** ``` docs/ ├── tutorials/ │ └── (human-authored only — not auto-generated) ├── reference/ │ ├── system-architecture.md ← from subsystem aggregator + dependency matrix │ ├── subsystems/ │ │ ├── gateway.md ← per-subsystem: purpose, exports, deps, key modules │ │ ├── agents.md │ │ └── ... │ ├── contracts/ │ │ ├── session-types.md ← from contract extractor │ │ └── ... │ └── modules/ │ └── (existing file-level docs from Phase 6) ├── explanation/ │ ├── architecture-patterns.md ← from dependency matrix analysis │ ├── data-flows.md ← from flow tracer (LLM-narrated flow traces) │ └── design-decisions.md ← from architecture.md ingestion + commit history ``` **Divio category mapping (corrected):** - **Tutorials:** Human-authored only. Not generated. - **Reference:** System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose. - **Explanation:** Architecture patterns (from dependency analysis), data flows (from flow traces — these explain *how the system works*, not *how to do a task*), design decisions (from architecture.md + commit history). - **How-To:** Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred. **Generation pipeline:** 1. Run subsystem aggregator → subsystem map + dependency matrix 2. Run contract extractor → interface/type entities added to graph 3. Run flow tracer on configured entry points → deterministic flow JSONs 4. For each subsystem: generate reference doc (LLM with subsystem context + architecture.md sections) 5. Generate system architecture overview (LLM with full dependency matrix) 6. Generate data flow explanations (LLM narrates flow JSONs into prose) 7. Generate Mermaid diagrams (7E) and embed in docs **Incremental updates with cascading invalidation:** 1. Semantic diff identifies changed files 2. Map changed files → directly affected subsystems (set A) 3. For each subsystem in A, find all subsystems that depend on it (set B = dependents of A in dependency matrix) 4. Regeneration set = A ∪ B 5. System architecture overview regenerated only if dependency matrix changed (new/removed inter-subsystem edges) 6. Flow traces regenerated only if any entity in the trace path was modified **Tests (7D):** | Test | Input | Expected | |------|-------|----------| | Full generation | Fixture repo | Correct directory structure with all expected .md files | | Section completeness | Generated subsystem doc | Contains: Purpose, Key Modules, Public API, Dependencies sections | | Incremental: direct change | Modify `gateway/server.ts` | Only `gateway.md` + dependents regenerated | | Incremental: cascading | Modify `config/types.ts` (shared) | `config.md` + all subsystems importing config regenerated | | Incremental: no-op | No semantic diff | Zero files regenerated | | Architecture.md ingestion | Fixture with `architecture.md` | LLM prompt includes architecture.md content | ### 7E: Diagram Generator (`diagrams.js`) **Purpose:** Auto-generate Mermaid diagrams from graph analysis outputs. **Diagram types:** 1. **Subsystem Dependency Graph** (from 7A dependency matrix) - Nodes = subsystems (excluding cross-cutting) - Edges = inter-subsystem CALLS/IMPORTS with edge weight labels - Cross-cutting subsystems shown as a separate "Shared" cluster 2. **Flow Sequence Diagram** (from 7C flow traces) - Participants = subsystems in flow order - Messages = function calls at boundary crossings - Cycles shown as self-referencing notes 3. **Contract Relationship Diagram** (from 7B contracts) - Classes/interfaces with fields - IMPLEMENTS/EXTENDS relationships as arrows **Rendering:** Use `mmdr` (Rust Mermaid renderer) to produce SVG. Embed in generated Markdown docs as `![diagram](./diagrams/subsystem-deps.svg)`. **Tests (7E):** | Test | Input | Expected | |------|-------|----------| | Dependency diagram | Fixture dependency matrix | Valid Mermaid syntax, matches `expected-diagrams/deps.mmd` | | Sequence diagram | Fixture flow trace | Valid Mermaid syntax, correct participant order | | Contract diagram | Fixture contracts | Valid Mermaid syntax, correct relationships | | Rendering | Any generated .mmd file | mmdr produces valid SVG without errors | ## Architecture.md Ingestion Each repo may contain human-written architecture documentation. The pipeline: 1. **Discovery:** Scan for `architecture.md`, `docs/architecture.md`, `ARCHITECTURE.md`, `docs/design.md` in repo root 2. **Parsing:** Extract sections (headings → content blocks) as structured context 3. **Injection:** When generating subsystem docs or explanation docs, include relevant architecture.md sections in the LLM prompt alongside graph data 4. **Diff tracking:** If `architecture.md` changes between releases, flag it in the semantic diff as a documentation-relevant change ## Cross-Repo Output Model Two output modes: **Per-repo (reference only):** - Subsystem architecture docs - Contract reference - Module reference - Mermaid diagrams - Useful for repo maintainers **Unified (full Divio):** - Merges per-repo graphs via namespace registry (Phase 3) into super-graph - Runs 7A-7E on super-graph - Generates cross-repo flow traces and dependency diagrams - Includes human-authored tutorials and explanation docs - Useful for platform consumers and new engineers ## Implementation Phases | Phase | Module | Effort | Depends On | |-------|--------|--------|------------| | 7-fixtures | Ground truth fixture repo | 0.5 day | — | | 7A | `subsystem.js` + tests | 1 day | graph.js, fixtures | | 7B | `contracts.js` + tests | 2 days | extract.js, fixtures | | 7C | `flow.js` + tests | 2 days | graph.js, subsystem.js, fixtures | | 7D | `sysdoc.js` + tests | 2 days | 7A, 7B, 7C, docgen.js | | 7E | `diagrams.js` + tests | 1 day | 7A, 7C, 7B | | 7F | `supergraph.js` (Multi-repo Merge) | 1 day | namespace.js, graph.js | **Total: ~9.5 days** **Critical path:** fixtures → 7A → 7C → 7D **Parallel:** 7B, 7E, and 7F can run in parallel with core phases. **Build loop (BMad Wiggum):** Each phase follows: build → test → BMad review → revise → re-review until GO. ## Constraints - No new external dependencies (same as Phases 1-5) - LLM calls only for prose generation — all structural analysis is deterministic - tree-sitter@0.21.1 compatibility maintained - Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement) - Must work on OpenClaw codebase (4,325 files) as primary benchmark - Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot - Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs). ## Resolved Decisions 1. **Tutorials:** Human-authored only. Flow traces inform but don't generate tutorials — domain knowledge required. 2. **Design decisions:** Infer from commit history + semantic diffs AND parse `architecture.md` from each repo. 3. **Cross-repo:** Both per-repo (reference) and unified (full Divio). Different audiences. 4. **Mermaid diagrams:** Yes, via 7E. Three diagram types: dependency, sequence, contract. 5. **Architecture.md ingestion:** Parsed and injected as LLM context for subsystem and explanation docs. 6. **Flow traces are Explanation, not How-To:** Corrected Divio mapping. How-To deferred from MVP. 7. **LLM output is not CI-tested:** All testable artifacts are deterministic JSON. LLM prose is a formatting pass evaluated by human review.