Files
dev-intel-v2/specs/system-docs-spec.md
Jarvis Prime f49a6c2dd9 Phase 8: Helm chart extraction with Go template support
- extract-helm.js: strips Go templates, parses Chart.yaml/values.yaml/templates
- Extracts K8s resource kinds, cross-chart interactions, shared secrets, ports
- generateHelmDiagram() for Mermaid interaction graphs
- Integrated into sysdoc.js: Helm entities merge into main knowledge graph
- Dir-based filenames to handle duplicate chart names
- .gitignore for node_modules, snapshots, venv, wasm
- 76 charts, 1813 entities, 1769 relationships on Foxtrot
2026-03-09 20:03:04 +00:00

365 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation
**Status:** DRAFT v2 (post-SPA Round 1)
**Author:** Max (AI) + Brian (Human)
**Date:** 2026-03-09
**Depends on:** Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)
---
## Problem Statement
The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the *system level*: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.
File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.
## Goal
Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).
## Success Criteria
All metrics are validated against a **ground truth fixture repository** (`test/fixtures/system-docs/`) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module.
| Metric | Target | How Measured |
|--------|--------|-------------|
| Subsystem detection accuracy | ≥90% of modules correctly clustered | Compare `subsystem.js` output against `expected-subsystems.json` fixture. Accuracy = correctly assigned files / total files. |
| Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured | Compare dependency matrix against `expected-deps.json`. Recall = captured edges / expected edges. |
| Contract extraction recall | ≥80% of exported interfaces/types extracted | Compare extracted contracts against `expected-contracts.json`. Recall = extracted / total annotated. |
| Generated doc structure | Matches Divio 4-category template | Structural assertion: verify directory layout, required sections present in each generated .md file. |
| Incremental update precision | Only subsystems touched by semantic diff get regenerated | Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness). |
| Cascading invalidation | Shared subsystem change propagates to dependents | Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration. |
| LLM cost per full generation | ≤$2 (using local Ollama for drafting) | BACKLOGGED — measure token count statically in CI (e.g. via `tiktoken`) without hitting API. |
| Flow tracer terminates | All traces complete in <5s on 4,325-file graph | Wall-clock assertion on OpenClaw snapshot. |
## Ground Truth Fixture Repository
Located at `test/fixtures/system-docs/`. Contains:
```
test/fixtures/system-docs/
├── src/
│ ├── gateway/ (5 files: server.ts, session.ts, middleware.ts, types.ts, utils.ts)
│ ├── agents/ (5 files: runner.ts, scope.ts, tools.ts, types.ts, defaults.ts)
│ ├── channels/
│ │ ├── telegram.ts
│ │ └── discord.ts
│ ├── config/ (3 files: config.ts, schema.ts, types.ts)
│ └── utils/ (3 files: logger.ts, crypto.ts, fs-helpers.ts)
├── expected-subsystems.json ← hand-labeled subsystem assignments
├── expected-deps.json ← hand-labeled inter-subsystem edges
├── expected-contracts.json ← hand-labeled interfaces/types
├── expected-flows.json ← hand-labeled flow traces for 2 entry points
├── expected-diagrams/ ← expected Mermaid source for each diagram type
└── architecture.md ← mock architecture doc for ingestion testing
```
**Edge cases included in fixtures:**
- `utils/` as a cross-cutting concern (high fan-out, should be tagged as `cross-cutting`)
- Circular dependency: `gateway/session.ts``agents/runner.ts` (mutual CALLS)
- Orphan file: `config/schema.ts` (no inbound edges, only exports)
- Re-exported interface: `gateway/types.ts` re-exports from `config/types.ts`
- Empty subsystem: `channels/` has only 2 files with no internal CALLS edges
## Architecture
### 7A: Subsystem Aggregator (`subsystem.js`)
**Purpose:** Group file-level entities into logical subsystems and compute inter-subsystem relationships.
**Clustering Strategy (tiered):**
1. **Directory-based (default):** Top-level directory under `src/` = subsystem. `gateway/`, `agents/`, `cli/`, `telegram/`, etc. Simple, deterministic, zero-config.
2. **Config-driven (override):** Optional `subsystems.yaml` that maps directories to named subsystems with human labels and grouping overrides.
```yaml
subsystems:
- name: Gateway
label: "Session & Request Gateway"
paths: ["gateway/", "routing/"]
- name: Agents
label: "AI Agent Runtime"
paths: ["agents/", "auto-reply/"]
- name: Channels
label: "Channel Adapters"
paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"]
```
3. **Graph-based (future):** Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.
**Cross-cutting concern detection:**
Subsystems where >60% of edges are **inbound** from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as `cross-cutting`. Examples: `utils/`, `config/`, `types/`. The metric is `inbound_edges / total_edges > 0.6`. Cross-cutting subsystems are:
- Excluded from the dependency matrix visualization (reduces hairball)
- Documented separately as "Shared Infrastructure" in the reference docs
- Still tracked in the raw dependency data for completeness
**Output:**
```json
{
"subsystems": [
{
"name": "gateway",
"label": "Session & Request Gateway",
"kind": "domain",
"files": ["gateway/session-utils.ts", "gateway/server.ts"],
"entities": { "functions": 142, "classes": 3, "modules": 28 },
"publicExports": ["deriveSessionTitle", "loadSessionEntry"],
"internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],
"externalDeps": ["commander", "node:fs", "node:path"]
}
],
"crossCutting": ["utils", "config"],
"dependencyMatrix": {
"gateway→agents": { "calls": 89, "imports": 34 },
"agents→config": { "calls": 156, "imports": 120 }
}
}
```
**Tests (7A):**
| Test | Input | Expected |
|------|-------|----------|
| Directory clustering | Fixture repo | Matches `expected-subsystems.json` (5 subsystems) |
| Config override | Fixture + `subsystems.yaml` merging gateway+routing | Merged subsystem with combined files |
| Cross-cutting detection | Fixture `utils/` (high fan-out) | Tagged as `cross-cutting` |
| Empty subsystem | Fixture `channels/` (2 files, no internal calls) | Valid subsystem with 0 internal edges |
| Orphan file | `config/schema.ts` (no inbound) | Assigned to `config` subsystem, not dropped |
### 7B: Contract Extractor (`contracts.js`)
**Purpose:** Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.
**What to extract:**
- `interface Foo { ... }` → entity type `Interface`, with fields as properties
- `type Foo = { ... }` → entity type `TypeAlias`
- `enum Foo { ... }` → entity type `Enum`, with members
- Exported `const` objects used as config defaults → entity type `ConfigContract`
- YAML schema keys (from config files) → entity type `ConfigSchema`
**Relationships:**
- `IMPLEMENTS` — class → interface
- `ACCEPTS` — function parameter → interface/type (function signature contracts)
- `RETURNS` — function → return type
- `EXTENDS` — interface → interface
**Error handling:**
- If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior)
- Re-exported interfaces (`export { Foo } from './types'`) are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition
- Deeply nested type literals (>3 levels) are flattened to `object` to avoid graph bloat
**Tests (7B):**
| Test | Input | Expected |
|------|-------|----------|
| Interface extraction | `gateway/types.ts` with 3 interfaces | 3 Interface entities with correct fields |
| Type alias | `type SessionKey = string` | 1 TypeAlias entity |
| Enum extraction | `enum Status { Active, Inactive }` | 1 Enum entity with 2 members |
| Re-exported interface | `gateway/types.ts` re-exports from `config/types.ts` | Resolved to original definition |
| Parse failure | Malformed TS file | Skipped with warning, no crash |
| Recall benchmark | Fixture repo | ≥80% of `expected-contracts.json` extracted |
### 7C: Flow Tracer (`flow.js`)
**Purpose:** Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow.
**Algorithm:**
1. Start at entry point entity (e.g., `telegram/bot-handlers.ts:onMessage`)
2. BFS through CALLS edges, recording subsystem transitions
3. **Cycle detection:** Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter).
4. **God object pruning:** Before tracing, compute in-degree for all nodes. Nodes with in-degree > `godThreshold` (default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes."
5. **Depth limit:** Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow).
6. **Test file exclusion:** Skip any file matching `*.test.*`, `*.spec.*`, `test/`, `__tests__/`.
7. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
8. Output: ordered list of subsystem hops with the specific function calls that cross boundaries
**Output (deterministic JSON — testable without LLM):**
```json
{
"entryPoint": "telegram/bot-handlers.ts:onMessage",
"depth": 8,
"godThreshold": 50,
"excludedNodes": ["utils/logger.ts:log", "config/config.ts:getConfig"],
"cyclesDetected": [
{ "at": "gateway/session.ts:loadSession", "backEdgeTo": "agents/runner.ts:runAgent" }
],
"flow": [
{ "subsystem": "telegram", "entity": "telegram/bot-handlers.ts:onMessage", "depth": 0 },
{ "subsystem": "routing", "entity": "routing/session-key.ts:resolveKey", "depth": 1, "crossedVia": "CALLS" },
{ "subsystem": "gateway", "entity": "gateway/session.ts:loadSession", "depth": 2, "crossedVia": "CALLS" },
{ "subsystem": "agents", "entity": "agents/runner.ts:runAgent", "depth": 3, "crossedVia": "CALLS" }
],
"subsystemSequence": ["telegram", "routing", "gateway", "agents"]
}
```
**LLM narration (separate step):** The deterministic JSON flow is the testable artifact. LLM narration is applied *after* as a formatting pass in 7D. This means:
- Flow correctness is tested against `expected-flows.json` (deterministic)
- LLM prose quality is evaluated separately (human review, not CI)
**Performance guarantee:** BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout.
**Tests (7C):**
| Test | Input | Expected |
|------|-------|----------|
| Simple linear flow | Fixture entry point A→B→C across 3 subsystems | Matches `expected-flows.json` |
| Cycle detection | Fixture circular dep gateway↔agents | Cycle recorded, trace continues without loop |
| God object exclusion | Entry point that calls `utils/logger.ts:log` (high in-degree) | `log` excluded from trace |
| Depth limit | Deep call chain (>8 hops) | Trace stops at depth 8 |
| Test file exclusion | Entry point that calls a test helper | Test file skipped |
| Performance | OpenClaw full snapshot | <5s wall clock |
| Empty trace | Entry point with no outgoing CALLS | Returns flow with single entry, no hops |
### 7D: Hierarchical Doc Generator (`sysdoc.js`)
**Purpose:** Orchestrate 7A-7C to produce a complete documentation site in Divio structure.
**Output structure:**
```
docs/
├── tutorials/
│ └── (human-authored only — not auto-generated)
├── reference/
│ ├── system-architecture.md ← from subsystem aggregator + dependency matrix
│ ├── subsystems/
│ │ ├── gateway.md ← per-subsystem: purpose, exports, deps, key modules
│ │ ├── agents.md
│ │ └── ...
│ ├── contracts/
│ │ ├── session-types.md ← from contract extractor
│ │ └── ...
│ └── modules/
│ └── (existing file-level docs from Phase 6)
├── explanation/
│ ├── architecture-patterns.md ← from dependency matrix analysis
│ ├── data-flows.md ← from flow tracer (LLM-narrated flow traces)
│ └── design-decisions.md ← from architecture.md ingestion + commit history
```
**Divio category mapping (corrected):**
- **Tutorials:** Human-authored only. Not generated.
- **Reference:** System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose.
- **Explanation:** Architecture patterns (from dependency analysis), data flows (from flow traces — these explain *how the system works*, not *how to do a task*), design decisions (from architecture.md + commit history).
- **How-To:** Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred.
**Generation pipeline:**
1. Run subsystem aggregator → subsystem map + dependency matrix
2. Run contract extractor → interface/type entities added to graph
3. Run flow tracer on configured entry points → deterministic flow JSONs
4. For each subsystem: generate reference doc (LLM with subsystem context + architecture.md sections)
5. Generate system architecture overview (LLM with full dependency matrix)
6. Generate data flow explanations (LLM narrates flow JSONs into prose)
7. Generate Mermaid diagrams (7E) and embed in docs
**Incremental updates with cascading invalidation:**
1. Semantic diff identifies changed files
2. Map changed files → directly affected subsystems (set A)
3. For each subsystem in A, find all subsystems that depend on it (set B = dependents of A in dependency matrix)
4. Regeneration set = A B
5. System architecture overview regenerated only if dependency matrix changed (new/removed inter-subsystem edges)
6. Flow traces regenerated only if any entity in the trace path was modified
**Tests (7D):**
| Test | Input | Expected |
|------|-------|----------|
| Full generation | Fixture repo | Correct directory structure with all expected .md files |
| Section completeness | Generated subsystem doc | Contains: Purpose, Key Modules, Public API, Dependencies sections |
| Incremental: direct change | Modify `gateway/server.ts` | Only `gateway.md` + dependents regenerated |
| Incremental: cascading | Modify `config/types.ts` (shared) | `config.md` + all subsystems importing config regenerated |
| Incremental: no-op | No semantic diff | Zero files regenerated |
| Architecture.md ingestion | Fixture with `architecture.md` | LLM prompt includes architecture.md content |
### 7E: Diagram Generator (`diagrams.js`)
**Purpose:** Auto-generate Mermaid diagrams from graph analysis outputs.
**Diagram types:**
1. **Subsystem Dependency Graph** (from 7A dependency matrix)
- Nodes = subsystems (excluding cross-cutting)
- Edges = inter-subsystem CALLS/IMPORTS with edge weight labels
- Cross-cutting subsystems shown as a separate "Shared" cluster
2. **Flow Sequence Diagram** (from 7C flow traces)
- Participants = subsystems in flow order
- Messages = function calls at boundary crossings
- Cycles shown as self-referencing notes
3. **Contract Relationship Diagram** (from 7B contracts)
- Classes/interfaces with fields
- IMPLEMENTS/EXTENDS relationships as arrows
**Rendering:** Use `mmdr` (Rust Mermaid renderer) to produce SVG. Embed in generated Markdown docs as `![diagram](./diagrams/subsystem-deps.svg)`.
**Tests (7E):**
| Test | Input | Expected |
|------|-------|----------|
| Dependency diagram | Fixture dependency matrix | Valid Mermaid syntax, matches `expected-diagrams/deps.mmd` |
| Sequence diagram | Fixture flow trace | Valid Mermaid syntax, correct participant order |
| Contract diagram | Fixture contracts | Valid Mermaid syntax, correct relationships |
| Rendering | Any generated .mmd file | mmdr produces valid SVG without errors |
## Architecture.md Ingestion
Each repo may contain human-written architecture documentation. The pipeline:
1. **Discovery:** Scan for `architecture.md`, `docs/architecture.md`, `ARCHITECTURE.md`, `docs/design.md` in repo root
2. **Parsing:** Extract sections (headings → content blocks) as structured context
3. **Injection:** When generating subsystem docs or explanation docs, include relevant architecture.md sections in the LLM prompt alongside graph data
4. **Diff tracking:** If `architecture.md` changes between releases, flag it in the semantic diff as a documentation-relevant change
## Cross-Repo Output Model
Two output modes:
**Per-repo (reference only):**
- Subsystem architecture docs
- Contract reference
- Module reference
- Mermaid diagrams
- Useful for repo maintainers
**Unified (full Divio):**
- Merges per-repo graphs via namespace registry (Phase 3) into super-graph
- Runs 7A-7E on super-graph
- Generates cross-repo flow traces and dependency diagrams
- Includes human-authored tutorials and explanation docs
- Useful for platform consumers and new engineers
## Implementation Phases
| Phase | Module | Effort | Depends On |
|-------|--------|--------|------------|
| 7-fixtures | Ground truth fixture repo | 0.5 day | — |
| 7A | `subsystem.js` + tests | 1 day | graph.js, fixtures |
| 7B | `contracts.js` + tests | 2 days | extract.js, fixtures |
| 7C | `flow.js` + tests | 2 days | graph.js, subsystem.js, fixtures |
| 7D | `sysdoc.js` + tests | 2 days | 7A, 7B, 7C, docgen.js |
| 7E | `diagrams.js` + tests | 1 day | 7A, 7C, 7B |
| 7F | `supergraph.js` (Multi-repo Merge) | 1 day | namespace.js, graph.js |
**Total: ~9.5 days**
**Critical path:** fixtures → 7A → 7C → 7D
**Parallel:** 7B, 7E, and 7F can run in parallel with core phases.
**Build loop (BMad Wiggum):** Each phase follows: build → test → BMad review → revise → re-review until GO.
## Constraints
- No new external dependencies (same as Phases 1-5)
- LLM calls only for prose generation — all structural analysis is deterministic
- tree-sitter@0.21.1 compatibility maintained
- Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)
- Must work on OpenClaw codebase (4,325 files) as primary benchmark
- Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot
- Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs).
## Resolved Decisions
1. **Tutorials:** Human-authored only. Flow traces inform but don't generate tutorials — domain knowledge required.
2. **Design decisions:** Infer from commit history + semantic diffs AND parse `architecture.md` from each repo.
3. **Cross-repo:** Both per-repo (reference) and unified (full Divio). Different audiences.
4. **Mermaid diagrams:** Yes, via 7E. Three diagram types: dependency, sequence, contract.
5. **Architecture.md ingestion:** Parsed and injected as LLM context for subsystem and explanation docs.
6. **Flow traces are Explanation, not How-To:** Corrected Divio mapping. How-To deferred from MVP.
7. **LLM output is not CI-tested:** All testable artifacts are deterministic JSON. LLM prose is a formatting pass evaluated by human review.