Files

Jarvis Prime f49a6c2dd9 Phase 8: Helm chart extraction with Go template support

- extract-helm.js: strips Go templates, parses Chart.yaml/values.yaml/templates
- Extracts K8s resource kinds, cross-chart interactions, shared secrets, ports
- generateHelmDiagram() for Mermaid interaction graphs
- Integrated into sysdoc.js: Helm entities merge into main knowledge graph
- Dir-based filenames to handle duplicate chart names
- .gitignore for node_modules, snapshots, venv, wasm
- 76 charts, 1813 entities, 1769 relationships on Foxtrot

2026-03-09 20:03:04 +00:00

20 KiB

Raw Blame History

Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation

Status: DRAFT v2 (post-SPA Round 1) Author: Max (AI) + Brian (Human) Date: 2026-03-09 Depends on: Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)

Problem Statement

The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the system level: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.

File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.

Goal

Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).

Success Criteria

All metrics are validated against a ground truth fixture repository (test/fixtures/system-docs/) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module.

Metric	Target	How Measured
Subsystem detection accuracy	≥90% of modules correctly clustered	Compare `subsystem.js` output against `expected-subsystems.json` fixture. Accuracy = correctly assigned files / total files.
Cross-subsystem dependency completeness	≥85% of actual inter-subsystem edges captured	Compare dependency matrix against `expected-deps.json`. Recall = captured edges / expected edges.
Contract extraction recall	≥80% of exported interfaces/types extracted	Compare extracted contracts against `expected-contracts.json`. Recall = extracted / total annotated.
Generated doc structure	Matches Divio 4-category template	Structural assertion: verify directory layout, required sections present in each generated .md file.
Incremental update precision	Only subsystems touched by semantic diff get regenerated	Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness).
Cascading invalidation	Shared subsystem change propagates to dependents	Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration.
LLM cost per full generation	≤$2 (using local Ollama for drafting)	BACKLOGGED — measure token count statically in CI (e.g. via `tiktoken`) without hitting API.
Flow tracer terminates	All traces complete in <5s on 4,325-file graph	Wall-clock assertion on OpenClaw snapshot.

Ground Truth Fixture Repository

Located at test/fixtures/system-docs/. Contains:

test/fixtures/system-docs/
├── src/
│   ├── gateway/          (5 files: server.ts, session.ts, middleware.ts, types.ts, utils.ts)
│   ├── agents/           (5 files: runner.ts, scope.ts, tools.ts, types.ts, defaults.ts)
│   ├── channels/
│   │   ├── telegram.ts
│   │   └── discord.ts
│   ├── config/           (3 files: config.ts, schema.ts, types.ts)
│   └── utils/            (3 files: logger.ts, crypto.ts, fs-helpers.ts)
├── expected-subsystems.json       ← hand-labeled subsystem assignments
├── expected-deps.json             ← hand-labeled inter-subsystem edges
├── expected-contracts.json        ← hand-labeled interfaces/types
├── expected-flows.json            ← hand-labeled flow traces for 2 entry points
├── expected-diagrams/             ← expected Mermaid source for each diagram type
└── architecture.md                ← mock architecture doc for ingestion testing

Edge cases included in fixtures:

utils/ as a cross-cutting concern (high fan-out, should be tagged as cross-cutting)
Circular dependency: gateway/session.ts ↔ agents/runner.ts (mutual CALLS)
Orphan file: config/schema.ts (no inbound edges, only exports)
Re-exported interface: gateway/types.ts re-exports from config/types.ts
Empty subsystem: channels/ has only 2 files with no internal CALLS edges

Architecture

7A: Subsystem Aggregator (`subsystem.js`)

Purpose: Group file-level entities into logical subsystems and compute inter-subsystem relationships.

Clustering Strategy (tiered):

Directory-based (default): Top-level directory under src/ = subsystem. gateway/, agents/, cli/, telegram/, etc. Simple, deterministic, zero-config.

Config-driven (override): Optional subsystems.yaml that maps directories to named subsystems with human labels and grouping overrides.

subsystems:
  - name: Gateway
    label: "Session & Request Gateway"
    paths: ["gateway/", "routing/"]
  - name: Agents
    label: "AI Agent Runtime"
    paths: ["agents/", "auto-reply/"]
  - name: Channels
    label: "Channel Adapters"
    paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"]

Graph-based (future): Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.

Cross-cutting concern detection: Subsystems where >60% of edges are inbound from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as cross-cutting. Examples: utils/, config/, types/. The metric is inbound_edges / total_edges > 0.6. Cross-cutting subsystems are:

Excluded from the dependency matrix visualization (reduces hairball)
Documented separately as "Shared Infrastructure" in the reference docs
Still tracked in the raw dependency data for completeness

Output:

{
  "subsystems": [
    {
      "name": "gateway",
      "label": "Session & Request Gateway",
      "kind": "domain",
      "files": ["gateway/session-utils.ts", "gateway/server.ts"],
      "entities": { "functions": 142, "classes": 3, "modules": 28 },
      "publicExports": ["deriveSessionTitle", "loadSessionEntry"],
      "internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],
      "externalDeps": ["commander", "node:fs", "node:path"]
    }
  ],
  "crossCutting": ["utils", "config"],
  "dependencyMatrix": {
    "gateway→agents": { "calls": 89, "imports": 34 },
    "agents→config": { "calls": 156, "imports": 120 }
  }
}

Tests (7A):

Test	Input	Expected
Directory clustering	Fixture repo	Matches `expected-subsystems.json` (5 subsystems)
Config override	Fixture + `subsystems.yaml` merging gateway+routing	Merged subsystem with combined files
Cross-cutting detection	Fixture `utils/` (high fan-out)	Tagged as `cross-cutting`
Empty subsystem	Fixture `channels/` (2 files, no internal calls)	Valid subsystem with 0 internal edges
Orphan file	`config/schema.ts` (no inbound)	Assigned to `config` subsystem, not dropped

7B: Contract Extractor (`contracts.js`)

Purpose: Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.

What to extract:

interface Foo { ... } → entity type Interface, with fields as properties
type Foo = { ... } → entity type TypeAlias
enum Foo { ... } → entity type Enum, with members
Exported const objects used as config defaults → entity type ConfigContract
YAML schema keys (from config files) → entity type ConfigSchema

Relationships:

IMPLEMENTS — class → interface
ACCEPTS — function parameter → interface/type (function signature contracts)
RETURNS — function → return type
EXTENDS — interface → interface

Error handling:

If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior)
Re-exported interfaces (export { Foo } from './types') are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition
Deeply nested type literals (>3 levels) are flattened to object to avoid graph bloat

Tests (7B):

Test	Input	Expected
Interface extraction	`gateway/types.ts` with 3 interfaces	3 Interface entities with correct fields
Type alias	`type SessionKey = string`	1 TypeAlias entity
Enum extraction	`enum Status { Active, Inactive }`	1 Enum entity with 2 members
Re-exported interface	`gateway/types.ts` re-exports from `config/types.ts`	Resolved to original definition
Parse failure	Malformed TS file	Skipped with warning, no crash
Recall benchmark	Fixture repo	≥80% of `expected-contracts.json` extracted

7C: Flow Tracer (`flow.js`)

Purpose: Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow.

Algorithm:

Start at entry point entity (e.g., telegram/bot-handlers.ts:onMessage)
BFS through CALLS edges, recording subsystem transitions
Cycle detection: Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter).
God object pruning: Before tracing, compute in-degree for all nodes. Nodes with in-degree > godThreshold (default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes."
Depth limit: Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow).
Test file exclusion: Skip any file matching *.test.*, *.spec.*, test/, __tests__/.
At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
Output: ordered list of subsystem hops with the specific function calls that cross boundaries

Output (deterministic JSON — testable without LLM):

{
  "entryPoint": "telegram/bot-handlers.ts:onMessage",
  "depth": 8,
  "godThreshold": 50,
  "excludedNodes": ["utils/logger.ts:log", "config/config.ts:getConfig"],
  "cyclesDetected": [
    { "at": "gateway/session.ts:loadSession", "backEdgeTo": "agents/runner.ts:runAgent" }
  ],
  "flow": [
    { "subsystem": "telegram", "entity": "telegram/bot-handlers.ts:onMessage", "depth": 0 },
    { "subsystem": "routing", "entity": "routing/session-key.ts:resolveKey", "depth": 1, "crossedVia": "CALLS" },
    { "subsystem": "gateway", "entity": "gateway/session.ts:loadSession", "depth": 2, "crossedVia": "CALLS" },
    { "subsystem": "agents", "entity": "agents/runner.ts:runAgent", "depth": 3, "crossedVia": "CALLS" }
  ],
  "subsystemSequence": ["telegram", "routing", "gateway", "agents"]
}

LLM narration (separate step): The deterministic JSON flow is the testable artifact. LLM narration is applied after as a formatting pass in 7D. This means:

Flow correctness is tested against expected-flows.json (deterministic)
LLM prose quality is evaluated separately (human review, not CI)

Performance guarantee: BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout.

Tests (7C):

Test	Input	Expected
Simple linear flow	Fixture entry point A→B→C across 3 subsystems	Matches `expected-flows.json`
Cycle detection	Fixture circular dep gateway↔agents	Cycle recorded, trace continues without loop
God object exclusion	Entry point that calls `utils/logger.ts:log` (high in-degree)	`log` excluded from trace
Depth limit	Deep call chain (>8 hops)	Trace stops at depth 8
Test file exclusion	Entry point that calls a test helper	Test file skipped
Performance	OpenClaw full snapshot	<5s wall clock
Empty trace	Entry point with no outgoing CALLS	Returns flow with single entry, no hops

7D: Hierarchical Doc Generator (`sysdoc.js`)

Purpose: Orchestrate 7A-7C to produce a complete documentation site in Divio structure.

Output structure:

docs/
├── tutorials/
│   └── (human-authored only — not auto-generated)
├── reference/
│   ├── system-architecture.md      ← from subsystem aggregator + dependency matrix
│   ├── subsystems/
│   │   ├── gateway.md              ← per-subsystem: purpose, exports, deps, key modules
│   │   ├── agents.md
│   │   └── ...
│   ├── contracts/
│   │   ├── session-types.md        ← from contract extractor
│   │   └── ...
│   └── modules/
│       └── (existing file-level docs from Phase 6)
├── explanation/
│   ├── architecture-patterns.md    ← from dependency matrix analysis
│   ├── data-flows.md              ← from flow tracer (LLM-narrated flow traces)
│   └── design-decisions.md        ← from architecture.md ingestion + commit history

Divio category mapping (corrected):

Tutorials: Human-authored only. Not generated.
Reference: System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose.
Explanation: Architecture patterns (from dependency analysis), data flows (from flow traces — these explain how the system works, not how to do a task), design decisions (from architecture.md + commit history).
How-To: Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred.

Generation pipeline:

Run subsystem aggregator → subsystem map + dependency matrix
Run contract extractor → interface/type entities added to graph
Run flow tracer on configured entry points → deterministic flow JSONs
For each subsystem: generate reference doc (LLM with subsystem context + architecture.md sections)
Generate system architecture overview (LLM with full dependency matrix)
Generate data flow explanations (LLM narrates flow JSONs into prose)
Generate Mermaid diagrams (7E) and embed in docs

Incremental updates with cascading invalidation:

Semantic diff identifies changed files
Map changed files → directly affected subsystems (set A)
For each subsystem in A, find all subsystems that depend on it (set B = dependents of A in dependency matrix)
Regeneration set = A ∪ B
System architecture overview regenerated only if dependency matrix changed (new/removed inter-subsystem edges)
Flow traces regenerated only if any entity in the trace path was modified

Tests (7D):

Test	Input	Expected
Full generation	Fixture repo	Correct directory structure with all expected .md files
Section completeness	Generated subsystem doc	Contains: Purpose, Key Modules, Public API, Dependencies sections
Incremental: direct change	Modify `gateway/server.ts`	Only `gateway.md` + dependents regenerated
Incremental: cascading	Modify `config/types.ts` (shared)	`config.md` + all subsystems importing config regenerated
Incremental: no-op	No semantic diff	Zero files regenerated
Architecture.md ingestion	Fixture with `architecture.md`	LLM prompt includes architecture.md content

7E: Diagram Generator (`diagrams.js`)

Purpose: Auto-generate Mermaid diagrams from graph analysis outputs.

Diagram types:

Subsystem Dependency Graph (from 7A dependency matrix)
- Nodes = subsystems (excluding cross-cutting)
- Edges = inter-subsystem CALLS/IMPORTS with edge weight labels
- Cross-cutting subsystems shown as a separate "Shared" cluster
Flow Sequence Diagram (from 7C flow traces)
- Participants = subsystems in flow order
- Messages = function calls at boundary crossings
- Cycles shown as self-referencing notes
Contract Relationship Diagram (from 7B contracts)
- Classes/interfaces with fields
- IMPLEMENTS/EXTENDS relationships as arrows

Rendering: Use mmdr (Rust Mermaid renderer) to produce SVG. Embed in generated Markdown docs as ![diagram](./diagrams/subsystem-deps.svg).

Tests (7E):

Test	Input	Expected
Dependency diagram	Fixture dependency matrix	Valid Mermaid syntax, matches `expected-diagrams/deps.mmd`
Sequence diagram	Fixture flow trace	Valid Mermaid syntax, correct participant order
Contract diagram	Fixture contracts	Valid Mermaid syntax, correct relationships
Rendering	Any generated .mmd file	mmdr produces valid SVG without errors

Architecture.md Ingestion

Each repo may contain human-written architecture documentation. The pipeline:

Discovery: Scan for architecture.md, docs/architecture.md, ARCHITECTURE.md, docs/design.md in repo root
Parsing: Extract sections (headings → content blocks) as structured context
Injection: When generating subsystem docs or explanation docs, include relevant architecture.md sections in the LLM prompt alongside graph data
Diff tracking: If architecture.md changes between releases, flag it in the semantic diff as a documentation-relevant change

Cross-Repo Output Model

Two output modes:

Per-repo (reference only):

Subsystem architecture docs
Contract reference
Module reference
Mermaid diagrams
Useful for repo maintainers

Unified (full Divio):

Merges per-repo graphs via namespace registry (Phase 3) into super-graph
Runs 7A-7E on super-graph
Generates cross-repo flow traces and dependency diagrams
Includes human-authored tutorials and explanation docs
Useful for platform consumers and new engineers

Implementation Phases

Phase	Module	Effort	Depends On
7-fixtures	Ground truth fixture repo	0.5 day	—
7A	`subsystem.js` + tests	1 day	graph.js, fixtures
7B	`contracts.js` + tests	2 days	extract.js, fixtures
7C	`flow.js` + tests	2 days	graph.js, subsystem.js, fixtures
7D	`sysdoc.js` + tests	2 days	7A, 7B, 7C, docgen.js
7E	`diagrams.js` + tests	1 day	7A, 7C, 7B
7F	`supergraph.js` (Multi-repo Merge)	1 day	namespace.js, graph.js

Total: ~9.5 days

Critical path: fixtures → 7A → 7C → 7D Parallel: 7B, 7E, and 7F can run in parallel with core phases.

Build loop (BMad Wiggum): Each phase follows: build → test → BMad review → revise → re-review until GO.

Constraints

No new external dependencies (same as Phases 1-5)
LLM calls only for prose generation — all structural analysis is deterministic
tree-sitter@0.21.1 compatibility maintained
Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)
Must work on OpenClaw codebase (4,325 files) as primary benchmark
Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot
Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs).

Resolved Decisions

Tutorials: Human-authored only. Flow traces inform but don't generate tutorials — domain knowledge required.
Design decisions: Infer from commit history + semantic diffs AND parse architecture.md from each repo.
Cross-repo: Both per-repo (reference) and unified (full Divio). Different audiences.
Mermaid diagrams: Yes, via 7E. Three diagram types: dependency, sequence, contract.
Architecture.md ingestion: Parsed and injected as LLM context for subsystem and explanation docs.
Flow traces are Explanation, not How-To: Corrected Divio mapping. How-To deferred from MVP.
LLM output is not CI-tested: All testable artifacts are deterministic JSON. LLM prose is a formatting pass evaluated by human review.

20 KiB Raw Blame History Unescape Escape