- extract-helm.js: strips Go templates, parses Chart.yaml/values.yaml/templates - Extracts K8s resource kinds, cross-chart interactions, shared secrets, ports - generateHelmDiagram() for Mermaid interaction graphs - Integrated into sysdoc.js: Helm entities merge into main knowledge graph - Dir-based filenames to handle duplicate chart names - .gitignore for node_modules, snapshots, venv, wasm - 76 charts, 1813 entities, 1769 relationships on Foxtrot
20 KiB
Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation
Status: DRAFT v2 (post-SPA Round 1) Author: Max (AI) + Brian (Human) Date: 2026-03-09 Depends on: Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)
Problem Statement
The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the system level: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.
File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.
Goal
Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).
Success Criteria
All metrics are validated against a ground truth fixture repository (test/fixtures/system-docs/) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module.
| Metric | Target | How Measured |
|---|---|---|
| Subsystem detection accuracy | ≥90% of modules correctly clustered | Compare subsystem.js output against expected-subsystems.json fixture. Accuracy = correctly assigned files / total files. |
| Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured | Compare dependency matrix against expected-deps.json. Recall = captured edges / expected edges. |
| Contract extraction recall | ≥80% of exported interfaces/types extracted | Compare extracted contracts against expected-contracts.json. Recall = extracted / total annotated. |
| Generated doc structure | Matches Divio 4-category template | Structural assertion: verify directory layout, required sections present in each generated .md file. |
| Incremental update precision | Only subsystems touched by semantic diff get regenerated | Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness). |
| Cascading invalidation | Shared subsystem change propagates to dependents | Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration. |
| LLM cost per full generation | ≤$2 (using local Ollama for drafting) | BACKLOGGED — measure token count statically in CI (e.g. via tiktoken) without hitting API. |
| Flow tracer terminates | All traces complete in <5s on 4,325-file graph | Wall-clock assertion on OpenClaw snapshot. |
Ground Truth Fixture Repository
Located at test/fixtures/system-docs/. Contains:
test/fixtures/system-docs/
├── src/
│ ├── gateway/ (5 files: server.ts, session.ts, middleware.ts, types.ts, utils.ts)
│ ├── agents/ (5 files: runner.ts, scope.ts, tools.ts, types.ts, defaults.ts)
│ ├── channels/
│ │ ├── telegram.ts
│ │ └── discord.ts
│ ├── config/ (3 files: config.ts, schema.ts, types.ts)
│ └── utils/ (3 files: logger.ts, crypto.ts, fs-helpers.ts)
├── expected-subsystems.json ← hand-labeled subsystem assignments
├── expected-deps.json ← hand-labeled inter-subsystem edges
├── expected-contracts.json ← hand-labeled interfaces/types
├── expected-flows.json ← hand-labeled flow traces for 2 entry points
├── expected-diagrams/ ← expected Mermaid source for each diagram type
└── architecture.md ← mock architecture doc for ingestion testing
Edge cases included in fixtures:
utils/as a cross-cutting concern (high fan-out, should be tagged ascross-cutting)- Circular dependency:
gateway/session.ts↔agents/runner.ts(mutual CALLS) - Orphan file:
config/schema.ts(no inbound edges, only exports) - Re-exported interface:
gateway/types.tsre-exports fromconfig/types.ts - Empty subsystem:
channels/has only 2 files with no internal CALLS edges
Architecture
7A: Subsystem Aggregator (subsystem.js)
Purpose: Group file-level entities into logical subsystems and compute inter-subsystem relationships.
Clustering Strategy (tiered):
-
Directory-based (default): Top-level directory under
src/= subsystem.gateway/,agents/,cli/,telegram/, etc. Simple, deterministic, zero-config. -
Config-driven (override): Optional
subsystems.yamlthat maps directories to named subsystems with human labels and grouping overrides.subsystems: - name: Gateway label: "Session & Request Gateway" paths: ["gateway/", "routing/"] - name: Agents label: "AI Agent Runtime" paths: ["agents/", "auto-reply/"] - name: Channels label: "Channel Adapters" paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"] -
Graph-based (future): Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.
Cross-cutting concern detection:
Subsystems where >60% of edges are inbound from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as cross-cutting. Examples: utils/, config/, types/. The metric is inbound_edges / total_edges > 0.6. Cross-cutting subsystems are:
- Excluded from the dependency matrix visualization (reduces hairball)
- Documented separately as "Shared Infrastructure" in the reference docs
- Still tracked in the raw dependency data for completeness
Output:
{
"subsystems": [
{
"name": "gateway",
"label": "Session & Request Gateway",
"kind": "domain",
"files": ["gateway/session-utils.ts", "gateway/server.ts"],
"entities": { "functions": 142, "classes": 3, "modules": 28 },
"publicExports": ["deriveSessionTitle", "loadSessionEntry"],
"internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],
"externalDeps": ["commander", "node:fs", "node:path"]
}
],
"crossCutting": ["utils", "config"],
"dependencyMatrix": {
"gateway→agents": { "calls": 89, "imports": 34 },
"agents→config": { "calls": 156, "imports": 120 }
}
}
Tests (7A):
| Test | Input | Expected |
|---|---|---|
| Directory clustering | Fixture repo | Matches expected-subsystems.json (5 subsystems) |
| Config override | Fixture + subsystems.yaml merging gateway+routing |
Merged subsystem with combined files |
| Cross-cutting detection | Fixture utils/ (high fan-out) |
Tagged as cross-cutting |
| Empty subsystem | Fixture channels/ (2 files, no internal calls) |
Valid subsystem with 0 internal edges |
| Orphan file | config/schema.ts (no inbound) |
Assigned to config subsystem, not dropped |
7B: Contract Extractor (contracts.js)
Purpose: Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.
What to extract:
interface Foo { ... }→ entity typeInterface, with fields as propertiestype Foo = { ... }→ entity typeTypeAliasenum Foo { ... }→ entity typeEnum, with members- Exported
constobjects used as config defaults → entity typeConfigContract - YAML schema keys (from config files) → entity type
ConfigSchema
Relationships:
IMPLEMENTS— class → interfaceACCEPTS— function parameter → interface/type (function signature contracts)RETURNS— function → return typeEXTENDS— interface → interface
Error handling:
- If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior)
- Re-exported interfaces (
export { Foo } from './types') are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition - Deeply nested type literals (>3 levels) are flattened to
objectto avoid graph bloat
Tests (7B):
| Test | Input | Expected |
|---|---|---|
| Interface extraction | gateway/types.ts with 3 interfaces |
3 Interface entities with correct fields |
| Type alias | type SessionKey = string |
1 TypeAlias entity |
| Enum extraction | enum Status { Active, Inactive } |
1 Enum entity with 2 members |
| Re-exported interface | gateway/types.ts re-exports from config/types.ts |
Resolved to original definition |
| Parse failure | Malformed TS file | Skipped with warning, no crash |
| Recall benchmark | Fixture repo | ≥80% of expected-contracts.json extracted |
7C: Flow Tracer (flow.js)
Purpose: Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow.
Algorithm:
- Start at entry point entity (e.g.,
telegram/bot-handlers.ts:onMessage) - BFS through CALLS edges, recording subsystem transitions
- Cycle detection: Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter).
- God object pruning: Before tracing, compute in-degree for all nodes. Nodes with in-degree >
godThreshold(default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes." - Depth limit: Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow).
- Test file exclusion: Skip any file matching
*.test.*,*.spec.*,test/,__tests__/. - At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
- Output: ordered list of subsystem hops with the specific function calls that cross boundaries
Output (deterministic JSON — testable without LLM):
{
"entryPoint": "telegram/bot-handlers.ts:onMessage",
"depth": 8,
"godThreshold": 50,
"excludedNodes": ["utils/logger.ts:log", "config/config.ts:getConfig"],
"cyclesDetected": [
{ "at": "gateway/session.ts:loadSession", "backEdgeTo": "agents/runner.ts:runAgent" }
],
"flow": [
{ "subsystem": "telegram", "entity": "telegram/bot-handlers.ts:onMessage", "depth": 0 },
{ "subsystem": "routing", "entity": "routing/session-key.ts:resolveKey", "depth": 1, "crossedVia": "CALLS" },
{ "subsystem": "gateway", "entity": "gateway/session.ts:loadSession", "depth": 2, "crossedVia": "CALLS" },
{ "subsystem": "agents", "entity": "agents/runner.ts:runAgent", "depth": 3, "crossedVia": "CALLS" }
],
"subsystemSequence": ["telegram", "routing", "gateway", "agents"]
}
LLM narration (separate step): The deterministic JSON flow is the testable artifact. LLM narration is applied after as a formatting pass in 7D. This means:
- Flow correctness is tested against
expected-flows.json(deterministic) - LLM prose quality is evaluated separately (human review, not CI)
Performance guarantee: BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout.
Tests (7C):
| Test | Input | Expected |
|---|---|---|
| Simple linear flow | Fixture entry point A→B→C across 3 subsystems | Matches expected-flows.json |
| Cycle detection | Fixture circular dep gateway↔agents | Cycle recorded, trace continues without loop |
| God object exclusion | Entry point that calls utils/logger.ts:log (high in-degree) |
log excluded from trace |
| Depth limit | Deep call chain (>8 hops) | Trace stops at depth 8 |
| Test file exclusion | Entry point that calls a test helper | Test file skipped |
| Performance | OpenClaw full snapshot | <5s wall clock |
| Empty trace | Entry point with no outgoing CALLS | Returns flow with single entry, no hops |
7D: Hierarchical Doc Generator (sysdoc.js)
Purpose: Orchestrate 7A-7C to produce a complete documentation site in Divio structure.
Output structure:
docs/
├── tutorials/
│ └── (human-authored only — not auto-generated)
├── reference/
│ ├── system-architecture.md ← from subsystem aggregator + dependency matrix
│ ├── subsystems/
│ │ ├── gateway.md ← per-subsystem: purpose, exports, deps, key modules
│ │ ├── agents.md
│ │ └── ...
│ ├── contracts/
│ │ ├── session-types.md ← from contract extractor
│ │ └── ...
│ └── modules/
│ └── (existing file-level docs from Phase 6)
├── explanation/
│ ├── architecture-patterns.md ← from dependency matrix analysis
│ ├── data-flows.md ← from flow tracer (LLM-narrated flow traces)
│ └── design-decisions.md ← from architecture.md ingestion + commit history
Divio category mapping (corrected):
- Tutorials: Human-authored only. Not generated.
- Reference: System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose.
- Explanation: Architecture patterns (from dependency analysis), data flows (from flow traces — these explain how the system works, not how to do a task), design decisions (from architecture.md + commit history).
- How-To: Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred.
Generation pipeline:
- Run subsystem aggregator → subsystem map + dependency matrix
- Run contract extractor → interface/type entities added to graph
- Run flow tracer on configured entry points → deterministic flow JSONs
- For each subsystem: generate reference doc (LLM with subsystem context + architecture.md sections)
- Generate system architecture overview (LLM with full dependency matrix)
- Generate data flow explanations (LLM narrates flow JSONs into prose)
- Generate Mermaid diagrams (7E) and embed in docs
Incremental updates with cascading invalidation:
- Semantic diff identifies changed files
- Map changed files → directly affected subsystems (set A)
- For each subsystem in A, find all subsystems that depend on it (set B = dependents of A in dependency matrix)
- Regeneration set = A ∪ B
- System architecture overview regenerated only if dependency matrix changed (new/removed inter-subsystem edges)
- Flow traces regenerated only if any entity in the trace path was modified
Tests (7D):
| Test | Input | Expected |
|---|---|---|
| Full generation | Fixture repo | Correct directory structure with all expected .md files |
| Section completeness | Generated subsystem doc | Contains: Purpose, Key Modules, Public API, Dependencies sections |
| Incremental: direct change | Modify gateway/server.ts |
Only gateway.md + dependents regenerated |
| Incremental: cascading | Modify config/types.ts (shared) |
config.md + all subsystems importing config regenerated |
| Incremental: no-op | No semantic diff | Zero files regenerated |
| Architecture.md ingestion | Fixture with architecture.md |
LLM prompt includes architecture.md content |
7E: Diagram Generator (diagrams.js)
Purpose: Auto-generate Mermaid diagrams from graph analysis outputs.
Diagram types:
-
Subsystem Dependency Graph (from 7A dependency matrix)
- Nodes = subsystems (excluding cross-cutting)
- Edges = inter-subsystem CALLS/IMPORTS with edge weight labels
- Cross-cutting subsystems shown as a separate "Shared" cluster
-
Flow Sequence Diagram (from 7C flow traces)
- Participants = subsystems in flow order
- Messages = function calls at boundary crossings
- Cycles shown as self-referencing notes
-
Contract Relationship Diagram (from 7B contracts)
- Classes/interfaces with fields
- IMPLEMENTS/EXTENDS relationships as arrows
Rendering: Use mmdr (Rust Mermaid renderer) to produce SVG. Embed in generated Markdown docs as .
Tests (7E):
| Test | Input | Expected |
|---|---|---|
| Dependency diagram | Fixture dependency matrix | Valid Mermaid syntax, matches expected-diagrams/deps.mmd |
| Sequence diagram | Fixture flow trace | Valid Mermaid syntax, correct participant order |
| Contract diagram | Fixture contracts | Valid Mermaid syntax, correct relationships |
| Rendering | Any generated .mmd file | mmdr produces valid SVG without errors |
Architecture.md Ingestion
Each repo may contain human-written architecture documentation. The pipeline:
- Discovery: Scan for
architecture.md,docs/architecture.md,ARCHITECTURE.md,docs/design.mdin repo root - Parsing: Extract sections (headings → content blocks) as structured context
- Injection: When generating subsystem docs or explanation docs, include relevant architecture.md sections in the LLM prompt alongside graph data
- Diff tracking: If
architecture.mdchanges between releases, flag it in the semantic diff as a documentation-relevant change
Cross-Repo Output Model
Two output modes:
Per-repo (reference only):
- Subsystem architecture docs
- Contract reference
- Module reference
- Mermaid diagrams
- Useful for repo maintainers
Unified (full Divio):
- Merges per-repo graphs via namespace registry (Phase 3) into super-graph
- Runs 7A-7E on super-graph
- Generates cross-repo flow traces and dependency diagrams
- Includes human-authored tutorials and explanation docs
- Useful for platform consumers and new engineers
Implementation Phases
| Phase | Module | Effort | Depends On |
|---|---|---|---|
| 7-fixtures | Ground truth fixture repo | 0.5 day | — |
| 7A | subsystem.js + tests |
1 day | graph.js, fixtures |
| 7B | contracts.js + tests |
2 days | extract.js, fixtures |
| 7C | flow.js + tests |
2 days | graph.js, subsystem.js, fixtures |
| 7D | sysdoc.js + tests |
2 days | 7A, 7B, 7C, docgen.js |
| 7E | diagrams.js + tests |
1 day | 7A, 7C, 7B |
| 7F | supergraph.js (Multi-repo Merge) |
1 day | namespace.js, graph.js |
Total: ~9.5 days
Critical path: fixtures → 7A → 7C → 7D Parallel: 7B, 7E, and 7F can run in parallel with core phases.
Build loop (BMad Wiggum): Each phase follows: build → test → BMad review → revise → re-review until GO.
Constraints
- No new external dependencies (same as Phases 1-5)
- LLM calls only for prose generation — all structural analysis is deterministic
- tree-sitter@0.21.1 compatibility maintained
- Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)
- Must work on OpenClaw codebase (4,325 files) as primary benchmark
- Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot
- Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs).
Resolved Decisions
- Tutorials: Human-authored only. Flow traces inform but don't generate tutorials — domain knowledge required.
- Design decisions: Infer from commit history + semantic diffs AND parse
architecture.mdfrom each repo. - Cross-repo: Both per-repo (reference) and unified (full Divio). Different audiences.
- Mermaid diagrams: Yes, via 7E. Three diagram types: dependency, sequence, contract.
- Architecture.md ingestion: Parsed and injected as LLM context for subsystem and explanation docs.
- Flow traces are Explanation, not How-To: Corrected Divio mapping. How-To deferred from MVP.
- LLM output is not CI-tested: All testable artifacts are deterministic JSON. LLM prose is a formatting pass evaluated by human review.