Files
dev-intel-v2/specs/system-docs-spec.md
Jarvis Prime f49a6c2dd9 Phase 8: Helm chart extraction with Go template support
- extract-helm.js: strips Go templates, parses Chart.yaml/values.yaml/templates
- Extracts K8s resource kinds, cross-chart interactions, shared secrets, ports
- generateHelmDiagram() for Mermaid interaction graphs
- Integrated into sysdoc.js: Helm entities merge into main knowledge graph
- Dir-based filenames to handle duplicate chart names
- .gitignore for node_modules, snapshots, venv, wasm
- 76 charts, 1813 entities, 1769 relationships on Foxtrot
2026-03-09 20:03:04 +00:00

20 KiB
Raw Blame History

Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation

Status: DRAFT v2 (post-SPA Round 1) Author: Max (AI) + Brian (Human) Date: 2026-03-09 Depends on: Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)


Problem Statement

The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the system level: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.

File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.

Goal

Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).

Success Criteria

All metrics are validated against a ground truth fixture repository (test/fixtures/system-docs/) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module.

Metric Target How Measured
Subsystem detection accuracy ≥90% of modules correctly clustered Compare subsystem.js output against expected-subsystems.json fixture. Accuracy = correctly assigned files / total files.
Cross-subsystem dependency completeness ≥85% of actual inter-subsystem edges captured Compare dependency matrix against expected-deps.json. Recall = captured edges / expected edges.
Contract extraction recall ≥80% of exported interfaces/types extracted Compare extracted contracts against expected-contracts.json. Recall = extracted / total annotated.
Generated doc structure Matches Divio 4-category template Structural assertion: verify directory layout, required sections present in each generated .md file.
Incremental update precision Only subsystems touched by semantic diff get regenerated Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness).
Cascading invalidation Shared subsystem change propagates to dependents Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration.
LLM cost per full generation ≤$2 (using local Ollama for drafting) BACKLOGGED — measure token count statically in CI (e.g. via tiktoken) without hitting API.
Flow tracer terminates All traces complete in <5s on 4,325-file graph Wall-clock assertion on OpenClaw snapshot.

Ground Truth Fixture Repository

Located at test/fixtures/system-docs/. Contains:

test/fixtures/system-docs/
├── src/
│   ├── gateway/          (5 files: server.ts, session.ts, middleware.ts, types.ts, utils.ts)
│   ├── agents/           (5 files: runner.ts, scope.ts, tools.ts, types.ts, defaults.ts)
│   ├── channels/
│   │   ├── telegram.ts
│   │   └── discord.ts
│   ├── config/           (3 files: config.ts, schema.ts, types.ts)
│   └── utils/            (3 files: logger.ts, crypto.ts, fs-helpers.ts)
├── expected-subsystems.json       ← hand-labeled subsystem assignments
├── expected-deps.json             ← hand-labeled inter-subsystem edges
├── expected-contracts.json        ← hand-labeled interfaces/types
├── expected-flows.json            ← hand-labeled flow traces for 2 entry points
├── expected-diagrams/             ← expected Mermaid source for each diagram type
└── architecture.md                ← mock architecture doc for ingestion testing

Edge cases included in fixtures:

  • utils/ as a cross-cutting concern (high fan-out, should be tagged as cross-cutting)
  • Circular dependency: gateway/session.tsagents/runner.ts (mutual CALLS)
  • Orphan file: config/schema.ts (no inbound edges, only exports)
  • Re-exported interface: gateway/types.ts re-exports from config/types.ts
  • Empty subsystem: channels/ has only 2 files with no internal CALLS edges

Architecture

7A: Subsystem Aggregator (subsystem.js)

Purpose: Group file-level entities into logical subsystems and compute inter-subsystem relationships.

Clustering Strategy (tiered):

  1. Directory-based (default): Top-level directory under src/ = subsystem. gateway/, agents/, cli/, telegram/, etc. Simple, deterministic, zero-config.

  2. Config-driven (override): Optional subsystems.yaml that maps directories to named subsystems with human labels and grouping overrides.

    subsystems:
      - name: Gateway
        label: "Session & Request Gateway"
        paths: ["gateway/", "routing/"]
      - name: Agents
        label: "AI Agent Runtime"
        paths: ["agents/", "auto-reply/"]
      - name: Channels
        label: "Channel Adapters"
        paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"]
    
  3. Graph-based (future): Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.

Cross-cutting concern detection: Subsystems where >60% of edges are inbound from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as cross-cutting. Examples: utils/, config/, types/. The metric is inbound_edges / total_edges > 0.6. Cross-cutting subsystems are:

  • Excluded from the dependency matrix visualization (reduces hairball)
  • Documented separately as "Shared Infrastructure" in the reference docs
  • Still tracked in the raw dependency data for completeness

Output:

{
  "subsystems": [
    {
      "name": "gateway",
      "label": "Session & Request Gateway",
      "kind": "domain",
      "files": ["gateway/session-utils.ts", "gateway/server.ts"],
      "entities": { "functions": 142, "classes": 3, "modules": 28 },
      "publicExports": ["deriveSessionTitle", "loadSessionEntry"],
      "internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],
      "externalDeps": ["commander", "node:fs", "node:path"]
    }
  ],
  "crossCutting": ["utils", "config"],
  "dependencyMatrix": {
    "gateway→agents": { "calls": 89, "imports": 34 },
    "agents→config": { "calls": 156, "imports": 120 }
  }
}

Tests (7A):

Test Input Expected
Directory clustering Fixture repo Matches expected-subsystems.json (5 subsystems)
Config override Fixture + subsystems.yaml merging gateway+routing Merged subsystem with combined files
Cross-cutting detection Fixture utils/ (high fan-out) Tagged as cross-cutting
Empty subsystem Fixture channels/ (2 files, no internal calls) Valid subsystem with 0 internal edges
Orphan file config/schema.ts (no inbound) Assigned to config subsystem, not dropped

7B: Contract Extractor (contracts.js)

Purpose: Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.

What to extract:

  • interface Foo { ... } → entity type Interface, with fields as properties
  • type Foo = { ... } → entity type TypeAlias
  • enum Foo { ... } → entity type Enum, with members
  • Exported const objects used as config defaults → entity type ConfigContract
  • YAML schema keys (from config files) → entity type ConfigSchema

Relationships:

  • IMPLEMENTS — class → interface
  • ACCEPTS — function parameter → interface/type (function signature contracts)
  • RETURNS — function → return type
  • EXTENDS — interface → interface

Error handling:

  • If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior)
  • Re-exported interfaces (export { Foo } from './types') are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition
  • Deeply nested type literals (>3 levels) are flattened to object to avoid graph bloat

Tests (7B):

Test Input Expected
Interface extraction gateway/types.ts with 3 interfaces 3 Interface entities with correct fields
Type alias type SessionKey = string 1 TypeAlias entity
Enum extraction enum Status { Active, Inactive } 1 Enum entity with 2 members
Re-exported interface gateway/types.ts re-exports from config/types.ts Resolved to original definition
Parse failure Malformed TS file Skipped with warning, no crash
Recall benchmark Fixture repo ≥80% of expected-contracts.json extracted

7C: Flow Tracer (flow.js)

Purpose: Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow.

Algorithm:

  1. Start at entry point entity (e.g., telegram/bot-handlers.ts:onMessage)
  2. BFS through CALLS edges, recording subsystem transitions
  3. Cycle detection: Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter).
  4. God object pruning: Before tracing, compute in-degree for all nodes. Nodes with in-degree > godThreshold (default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes."
  5. Depth limit: Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow).
  6. Test file exclusion: Skip any file matching *.test.*, *.spec.*, test/, __tests__/.
  7. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
  8. Output: ordered list of subsystem hops with the specific function calls that cross boundaries

Output (deterministic JSON — testable without LLM):

{
  "entryPoint": "telegram/bot-handlers.ts:onMessage",
  "depth": 8,
  "godThreshold": 50,
  "excludedNodes": ["utils/logger.ts:log", "config/config.ts:getConfig"],
  "cyclesDetected": [
    { "at": "gateway/session.ts:loadSession", "backEdgeTo": "agents/runner.ts:runAgent" }
  ],
  "flow": [
    { "subsystem": "telegram", "entity": "telegram/bot-handlers.ts:onMessage", "depth": 0 },
    { "subsystem": "routing", "entity": "routing/session-key.ts:resolveKey", "depth": 1, "crossedVia": "CALLS" },
    { "subsystem": "gateway", "entity": "gateway/session.ts:loadSession", "depth": 2, "crossedVia": "CALLS" },
    { "subsystem": "agents", "entity": "agents/runner.ts:runAgent", "depth": 3, "crossedVia": "CALLS" }
  ],
  "subsystemSequence": ["telegram", "routing", "gateway", "agents"]
}

LLM narration (separate step): The deterministic JSON flow is the testable artifact. LLM narration is applied after as a formatting pass in 7D. This means:

  • Flow correctness is tested against expected-flows.json (deterministic)
  • LLM prose quality is evaluated separately (human review, not CI)

Performance guarantee: BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout.

Tests (7C):

Test Input Expected
Simple linear flow Fixture entry point A→B→C across 3 subsystems Matches expected-flows.json
Cycle detection Fixture circular dep gateway↔agents Cycle recorded, trace continues without loop
God object exclusion Entry point that calls utils/logger.ts:log (high in-degree) log excluded from trace
Depth limit Deep call chain (>8 hops) Trace stops at depth 8
Test file exclusion Entry point that calls a test helper Test file skipped
Performance OpenClaw full snapshot <5s wall clock
Empty trace Entry point with no outgoing CALLS Returns flow with single entry, no hops

7D: Hierarchical Doc Generator (sysdoc.js)

Purpose: Orchestrate 7A-7C to produce a complete documentation site in Divio structure.

Output structure:

docs/
├── tutorials/
│   └── (human-authored only — not auto-generated)
├── reference/
│   ├── system-architecture.md      ← from subsystem aggregator + dependency matrix
│   ├── subsystems/
│   │   ├── gateway.md              ← per-subsystem: purpose, exports, deps, key modules
│   │   ├── agents.md
│   │   └── ...
│   ├── contracts/
│   │   ├── session-types.md        ← from contract extractor
│   │   └── ...
│   └── modules/
│       └── (existing file-level docs from Phase 6)
├── explanation/
│   ├── architecture-patterns.md    ← from dependency matrix analysis
│   ├── data-flows.md              ← from flow tracer (LLM-narrated flow traces)
│   └── design-decisions.md        ← from architecture.md ingestion + commit history

Divio category mapping (corrected):

  • Tutorials: Human-authored only. Not generated.
  • Reference: System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose.
  • Explanation: Architecture patterns (from dependency analysis), data flows (from flow traces — these explain how the system works, not how to do a task), design decisions (from architecture.md + commit history).
  • How-To: Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred.

Generation pipeline:

  1. Run subsystem aggregator → subsystem map + dependency matrix
  2. Run contract extractor → interface/type entities added to graph
  3. Run flow tracer on configured entry points → deterministic flow JSONs
  4. For each subsystem: generate reference doc (LLM with subsystem context + architecture.md sections)
  5. Generate system architecture overview (LLM with full dependency matrix)
  6. Generate data flow explanations (LLM narrates flow JSONs into prose)
  7. Generate Mermaid diagrams (7E) and embed in docs

Incremental updates with cascading invalidation:

  1. Semantic diff identifies changed files
  2. Map changed files → directly affected subsystems (set A)
  3. For each subsystem in A, find all subsystems that depend on it (set B = dependents of A in dependency matrix)
  4. Regeneration set = A B
  5. System architecture overview regenerated only if dependency matrix changed (new/removed inter-subsystem edges)
  6. Flow traces regenerated only if any entity in the trace path was modified

Tests (7D):

Test Input Expected
Full generation Fixture repo Correct directory structure with all expected .md files
Section completeness Generated subsystem doc Contains: Purpose, Key Modules, Public API, Dependencies sections
Incremental: direct change Modify gateway/server.ts Only gateway.md + dependents regenerated
Incremental: cascading Modify config/types.ts (shared) config.md + all subsystems importing config regenerated
Incremental: no-op No semantic diff Zero files regenerated
Architecture.md ingestion Fixture with architecture.md LLM prompt includes architecture.md content

7E: Diagram Generator (diagrams.js)

Purpose: Auto-generate Mermaid diagrams from graph analysis outputs.

Diagram types:

  1. Subsystem Dependency Graph (from 7A dependency matrix)

    • Nodes = subsystems (excluding cross-cutting)
    • Edges = inter-subsystem CALLS/IMPORTS with edge weight labels
    • Cross-cutting subsystems shown as a separate "Shared" cluster
  2. Flow Sequence Diagram (from 7C flow traces)

    • Participants = subsystems in flow order
    • Messages = function calls at boundary crossings
    • Cycles shown as self-referencing notes
  3. Contract Relationship Diagram (from 7B contracts)

    • Classes/interfaces with fields
    • IMPLEMENTS/EXTENDS relationships as arrows

Rendering: Use mmdr (Rust Mermaid renderer) to produce SVG. Embed in generated Markdown docs as ![diagram](./diagrams/subsystem-deps.svg).

Tests (7E):

Test Input Expected
Dependency diagram Fixture dependency matrix Valid Mermaid syntax, matches expected-diagrams/deps.mmd
Sequence diagram Fixture flow trace Valid Mermaid syntax, correct participant order
Contract diagram Fixture contracts Valid Mermaid syntax, correct relationships
Rendering Any generated .mmd file mmdr produces valid SVG without errors

Architecture.md Ingestion

Each repo may contain human-written architecture documentation. The pipeline:

  1. Discovery: Scan for architecture.md, docs/architecture.md, ARCHITECTURE.md, docs/design.md in repo root
  2. Parsing: Extract sections (headings → content blocks) as structured context
  3. Injection: When generating subsystem docs or explanation docs, include relevant architecture.md sections in the LLM prompt alongside graph data
  4. Diff tracking: If architecture.md changes between releases, flag it in the semantic diff as a documentation-relevant change

Cross-Repo Output Model

Two output modes:

Per-repo (reference only):

  • Subsystem architecture docs
  • Contract reference
  • Module reference
  • Mermaid diagrams
  • Useful for repo maintainers

Unified (full Divio):

  • Merges per-repo graphs via namespace registry (Phase 3) into super-graph
  • Runs 7A-7E on super-graph
  • Generates cross-repo flow traces and dependency diagrams
  • Includes human-authored tutorials and explanation docs
  • Useful for platform consumers and new engineers

Implementation Phases

Phase Module Effort Depends On
7-fixtures Ground truth fixture repo 0.5 day
7A subsystem.js + tests 1 day graph.js, fixtures
7B contracts.js + tests 2 days extract.js, fixtures
7C flow.js + tests 2 days graph.js, subsystem.js, fixtures
7D sysdoc.js + tests 2 days 7A, 7B, 7C, docgen.js
7E diagrams.js + tests 1 day 7A, 7C, 7B
7F supergraph.js (Multi-repo Merge) 1 day namespace.js, graph.js

Total: ~9.5 days

Critical path: fixtures → 7A → 7C → 7D Parallel: 7B, 7E, and 7F can run in parallel with core phases.

Build loop (BMad Wiggum): Each phase follows: build → test → BMad review → revise → re-review until GO.

Constraints

  • No new external dependencies (same as Phases 1-5)
  • LLM calls only for prose generation — all structural analysis is deterministic
  • tree-sitter@0.21.1 compatibility maintained
  • Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)
  • Must work on OpenClaw codebase (4,325 files) as primary benchmark
  • Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot
  • Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs).

Resolved Decisions

  1. Tutorials: Human-authored only. Flow traces inform but don't generate tutorials — domain knowledge required.
  2. Design decisions: Infer from commit history + semantic diffs AND parse architecture.md from each repo.
  3. Cross-repo: Both per-repo (reference) and unified (full Divio). Different audiences.
  4. Mermaid diagrams: Yes, via 7E. Three diagram types: dependency, sequence, contract.
  5. Architecture.md ingestion: Parsed and injected as LLM context for subsystem and explanation docs.
  6. Flow traces are Explanation, not How-To: Corrected Divio mapping. How-To deferred from MVP.
  7. LLM output is not CI-tested: All testable artifacts are deterministic JSON. LLM prose is a formatting pass evaluated by human review.