Phase 8: Helm chart extraction with Go template support

- extract-helm.js: strips Go templates, parses Chart.yaml/values.yaml/templates
- Extracts K8s resource kinds, cross-chart interactions, shared secrets, ports
- generateHelmDiagram() for Mermaid interaction graphs
- Integrated into sysdoc.js: Helm entities merge into main knowledge graph
- Dir-based filenames to handle duplicate chart names
- .gitignore for node_modules, snapshots, venv, wasm
- 76 charts, 1813 entities, 1769 relationships on Foxtrot
This commit is contained in:
Jarvis Prime
2026-03-09 20:03:04 +00:00
parent d19cee36d7
commit f49a6c2dd9
7 changed files with 1161 additions and 78 deletions

View File

@@ -1,6 +1,6 @@
# Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation
**Status:** DRAFT
**Status:** DRAFT v2 (post-SPA Round 1)
**Author:** Max (AI) + Brian (Human)
**Date:** 2026-03-09
**Depends on:** Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)
@@ -19,14 +19,47 @@ Extend the V2 pipeline to generate Foxtrot-quality system documentation from the
## Success Criteria
| Metric | Target |
|--------|--------|
| Subsystem detection accuracy | ≥90% of modules correctly clustered |
| Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured |
| Contract extraction recall | ≥80% of exported interfaces/types extracted |
| Generated doc structure | Matches Divio 4-category template |
| Incremental update precision | Only subsystems touched by semantic diff get regenerated |
| LLM cost per full generation | ≤$2 (using local Ollama for drafting) |
All metrics are validated against a **ground truth fixture repository** (`test/fixtures/system-docs/`) containing a hand-labeled mini codebase (~30 files across 5 subsystems) with expected outputs for each module.
| Metric | Target | How Measured |
|--------|--------|-------------|
| Subsystem detection accuracy | ≥90% of modules correctly clustered | Compare `subsystem.js` output against `expected-subsystems.json` fixture. Accuracy = correctly assigned files / total files. |
| Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured | Compare dependency matrix against `expected-deps.json`. Recall = captured edges / expected edges. |
| Contract extraction recall | ≥80% of exported interfaces/types extracted | Compare extracted contracts against `expected-contracts.json`. Recall = extracted / total annotated. |
| Generated doc structure | Matches Divio 4-category template | Structural assertion: verify directory layout, required sections present in each generated .md file. |
| Incremental update precision | Only subsystems touched by semantic diff get regenerated | Apply a mock diff to fixture, assert only expected subsystem docs are regenerated (content hashing / md5sum check, avoid mtime flakiness). |
| Cascading invalidation | Shared subsystem change propagates to dependents | Apply a diff to a shared subsystem in fixture, assert dependent subsystem docs are also flagged for regeneration. |
| LLM cost per full generation | ≤$2 (using local Ollama for drafting) | BACKLOGGED — measure token count statically in CI (e.g. via `tiktoken`) without hitting API. |
| Flow tracer terminates | All traces complete in <5s on 4,325-file graph | Wall-clock assertion on OpenClaw snapshot. |
## Ground Truth Fixture Repository
Located at `test/fixtures/system-docs/`. Contains:
```
test/fixtures/system-docs/
├── src/
│ ├── gateway/ (5 files: server.ts, session.ts, middleware.ts, types.ts, utils.ts)
│ ├── agents/ (5 files: runner.ts, scope.ts, tools.ts, types.ts, defaults.ts)
│ ├── channels/
│ │ ├── telegram.ts
│ │ └── discord.ts
│ ├── config/ (3 files: config.ts, schema.ts, types.ts)
│ └── utils/ (3 files: logger.ts, crypto.ts, fs-helpers.ts)
├── expected-subsystems.json ← hand-labeled subsystem assignments
├── expected-deps.json ← hand-labeled inter-subsystem edges
├── expected-contracts.json ← hand-labeled interfaces/types
├── expected-flows.json ← hand-labeled flow traces for 2 entry points
├── expected-diagrams/ ← expected Mermaid source for each diagram type
└── architecture.md ← mock architecture doc for ingestion testing
```
**Edge cases included in fixtures:**
- `utils/` as a cross-cutting concern (high fan-out, should be tagged as `cross-cutting`)
- Circular dependency: `gateway/session.ts``agents/runner.ts` (mutual CALLS)
- Orphan file: `config/schema.ts` (no inbound edges, only exports)
- Re-exported interface: `gateway/types.ts` re-exports from `config/types.ts`
- Empty subsystem: `channels/` has only 2 files with no internal CALLS edges
## Architecture
@@ -54,6 +87,12 @@ Extend the V2 pipeline to generate Foxtrot-quality system documentation from the
3. **Graph-based (future):** Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.
**Cross-cutting concern detection:**
Subsystems where >60% of edges are **inbound** from other subsystems (high fan-in — many subsystems depend on them, but they depend on almost nothing) are automatically tagged as `cross-cutting`. Examples: `utils/`, `config/`, `types/`. The metric is `inbound_edges / total_edges > 0.6`. Cross-cutting subsystems are:
- Excluded from the dependency matrix visualization (reduces hairball)
- Documented separately as "Shared Infrastructure" in the reference docs
- Still tracked in the raw dependency data for completeness
**Output:**
```json
{
@@ -61,21 +100,31 @@ Extend the V2 pipeline to generate Foxtrot-quality system documentation from the
{
"name": "gateway",
"label": "Session & Request Gateway",
"files": ["gateway/session-utils.ts", "gateway/server.ts", ...],
"kind": "domain",
"files": ["gateway/session-utils.ts", "gateway/server.ts"],
"entities": { "functions": 142, "classes": 3, "modules": 28 },
"publicExports": ["deriveSessionTitle", "loadSessionEntry", ...],
"publicExports": ["deriveSessionTitle", "loadSessionEntry"],
"internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],
"externalDeps": ["commander", "node:fs", "node:path"]
}
],
"crossCutting": ["utils", "config"],
"dependencyMatrix": {
"gateway→agents": { "calls": 89, "imports": 34 },
"agents→config": { "calls": 156, "imports": 120 },
...
"agents→config": { "calls": 156, "imports": 120 }
}
}
```
**Tests (7A):**
| Test | Input | Expected |
|------|-------|----------|
| Directory clustering | Fixture repo | Matches `expected-subsystems.json` (5 subsystems) |
| Config override | Fixture + `subsystems.yaml` merging gateway+routing | Merged subsystem with combined files |
| Cross-cutting detection | Fixture `utils/` (high fan-out) | Tagged as `cross-cutting` |
| Empty subsystem | Fixture `channels/` (2 files, no internal calls) | Valid subsystem with 0 internal edges |
| Orphan file | `config/schema.ts` (no inbound) | Assigned to `config` subsystem, not dropped |
### 7B: Contract Extractor (`contracts.js`)
**Purpose:** Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.
@@ -93,8 +142,20 @@ Extend the V2 pipeline to generate Foxtrot-quality system documentation from the
- `RETURNS` — function → return type
- `EXTENDS` — interface → interface
**Why this matters:**
Foxtrot docs define explicit contracts: "`accountCreation` expects `reltioCustomerId: string`". Without extracting interfaces/types, we can't generate contract documentation. The LLM has to guess from function bodies, which is unreliable.
**Error handling:**
- If tree-sitter fails to parse a file, skip it and log a warning (same as Phase 1 extract.js behavior)
- Re-exported interfaces (`export { Foo } from './types'`) are tracked via the existing IMPORTS edge; the contract extractor resolves the original definition
- Deeply nested type literals (>3 levels) are flattened to `object` to avoid graph bloat
**Tests (7B):**
| Test | Input | Expected |
|------|-------|----------|
| Interface extraction | `gateway/types.ts` with 3 interfaces | 3 Interface entities with correct fields |
| Type alias | `type SessionKey = string` | 1 TypeAlias entity |
| Enum extraction | `enum Status { Active, Inactive }` | 1 Enum entity with 2 members |
| Re-exported interface | `gateway/types.ts` re-exports from `config/types.ts` | Resolved to original definition |
| Parse failure | Malformed TS file | Skipped with warning, no crash |
| Recall benchmark | Fixture repo | ≥80% of `expected-contracts.json` extracted |
### 7C: Flow Tracer (`flow.js`)
@@ -102,25 +163,50 @@ Foxtrot docs define explicit contracts: "`accountCreation` expects `reltioCustom
**Algorithm:**
1. Start at entry point entity (e.g., `telegram/bot-handlers.ts:onMessage`)
2. BFS/DFS through CALLS edges, recording subsystem transitions
3. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
4. Prune: stop at depth N (configurable, default 5), skip test files, skip utility functions below a connectivity threshold
5. Output: ordered list of subsystem hops with the specific function calls that cross boundaries
2. BFS through CALLS edges, recording subsystem transitions
3. **Cycle detection:** Maintain a visited set per trace. If a node is revisited, record the cycle and stop that branch (do not re-enter).
4. **God object pruning:** Before tracing, compute in-degree for all nodes. Nodes with in-degree > `godThreshold` (default: 50) are excluded from traversal (they're utility functions called by everything — not meaningful flow participants). Logged as "excluded high-connectivity nodes."
5. **Depth limit:** Stop at depth N (configurable, default 8). Each subsystem boundary crossing increments depth by 1; intra-subsystem hops increment by 0.5 (prioritizes cross-subsystem flow).
6. **Test file exclusion:** Skip any file matching `*.test.*`, `*.spec.*`, `test/`, `__tests__/`.
7. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
8. Output: ordered list of subsystem hops with the specific function calls that cross boundaries
**Output:**
**Output (deterministic JSON — testable without LLM):**
```json
{
"entryPoint": "telegram/bot-handlers.ts:onMessage",
"depth": 8,
"godThreshold": 50,
"excludedNodes": ["utils/logger.ts:log", "config/config.ts:getConfig"],
"cyclesDetected": [
{ "at": "gateway/session.ts:loadSession", "backEdgeTo": "agents/runner.ts:runAgent" }
],
"flow": [
{ "subsystem": "telegram", "function": "onMessage", "action": "receives incoming message" },
{ "subsystem": "routing", "function": "routeInbound", "action": "routes to session handler", "crossedVia": "CALLS" },
{ "subsystem": "gateway", "function": "handleSession", "action": "loads session state", "crossedVia": "CALLS" },
{ "subsystem": "agents", "function": "runAgent", "action": "executes AI agent turn", "crossedVia": "CALLS" }
]
{ "subsystem": "telegram", "entity": "telegram/bot-handlers.ts:onMessage", "depth": 0 },
{ "subsystem": "routing", "entity": "routing/session-key.ts:resolveKey", "depth": 1, "crossedVia": "CALLS" },
{ "subsystem": "gateway", "entity": "gateway/session.ts:loadSession", "depth": 2, "crossedVia": "CALLS" },
{ "subsystem": "agents", "entity": "agents/runner.ts:runAgent", "depth": 3, "crossedVia": "CALLS" }
],
"subsystemSequence": ["telegram", "routing", "gateway", "agents"]
}
```
**LLM narration:** Feed the flow trace + source snippets at each hop to the LLM. Ask it to write a prose narrative: "When a Telegram message arrives, the bot handler dispatches it to the routing layer, which resolves the session key and..."
**LLM narration (separate step):** The deterministic JSON flow is the testable artifact. LLM narration is applied *after* as a formatting pass in 7D. This means:
- Flow correctness is tested against `expected-flows.json` (deterministic)
- LLM prose quality is evaluated separately (human review, not CI)
**Performance guarantee:** BFS with visited set + god object pruning + depth limit = O(V+E) bounded by depth. On the OpenClaw graph (23k nodes, 142k edges), traces must complete in <5 seconds. If a trace exceeds 5s, it is killed and logged as a timeout.
**Tests (7C):**
| Test | Input | Expected |
|------|-------|----------|
| Simple linear flow | Fixture entry point A→B→C across 3 subsystems | Matches `expected-flows.json` |
| Cycle detection | Fixture circular dep gateway↔agents | Cycle recorded, trace continues without loop |
| God object exclusion | Entry point that calls `utils/logger.ts:log` (high in-degree) | `log` excluded from trace |
| Depth limit | Deep call chain (>8 hops) | Trace stops at depth 8 |
| Test file exclusion | Entry point that calls a test helper | Test file skipped |
| Performance | OpenClaw full snapshot | <5s wall clock |
| Empty trace | Entry point with no outgoing CALLS | Returns flow with single entry, no hops |
### 7D: Hierarchical Doc Generator (`sysdoc.js`)
@@ -130,9 +216,7 @@ Foxtrot docs define explicit contracts: "`accountCreation` expects `reltioCustom
```
docs/
├── tutorials/
│ └── (not auto-generated — requires human curation)
├── how-to/
│ └── (generated from flow traces of common operations)
│ └── (human-authored only — not auto-generated)
├── reference/
│ ├── system-architecture.md ← from subsystem aggregator + dependency matrix
│ ├── subsystems/
@@ -146,65 +230,118 @@ docs/
│ └── (existing file-level docs from Phase 6)
├── explanation/
│ ├── architecture-patterns.md ← from dependency matrix analysis
│ ├── data-flows.md ← from flow tracer
│ └── design-decisions.md ← (requires human input or commit history analysis)
│ ├── data-flows.md ← from flow tracer (LLM-narrated flow traces)
│ └── design-decisions.md ← from architecture.md ingestion + commit history
```
**Divio category mapping (corrected):**
- **Tutorials:** Human-authored only. Not generated.
- **Reference:** System architecture, per-subsystem docs, contracts, module docs. All deterministic structure + LLM prose.
- **Explanation:** Architecture patterns (from dependency analysis), data flows (from flow traces — these explain *how the system works*, not *how to do a task*), design decisions (from architecture.md + commit history).
- **How-To:** Not auto-generated in MVP. Requires domain-specific task knowledge. Deferred.
**Generation pipeline:**
1. Run subsystem aggregator → subsystem map + dependency matrix
2. Run contract extractor → interface/type entities added to graph
3. Run flow tracer on configured entry points → flow narratives
4. For each subsystem: generate reference doc (LLM with subsystem context)
3. Run flow tracer on configured entry points → deterministic flow JSONs
4. For each subsystem: generate reference doc (LLM with subsystem context + architecture.md sections)
5. Generate system architecture overview (LLM with full dependency matrix)
6. Generate data flow explanations (LLM with flow traces)
6. Generate data flow explanations (LLM narrates flow JSONs into prose)
7. Generate Mermaid diagrams (7E) and embed in docs
**Incremental updates:**
- Semantic diff identifies changed files
- Map changed files → affected subsystems
- Only regenerate docs for affected subsystems
- System architecture overview regenerated only if dependency matrix changed
**Incremental updates with cascading invalidation:**
1. Semantic diff identifies changed files
2. Map changed files → directly affected subsystems (set A)
3. For each subsystem in A, find all subsystems that depend on it (set B = dependents of A in dependency matrix)
4. Regeneration set = A B
5. System architecture overview regenerated only if dependency matrix changed (new/removed inter-subsystem edges)
6. Flow traces regenerated only if any entity in the trace path was modified
### Template System
**Tests (7D):**
| Test | Input | Expected |
|------|-------|----------|
| Full generation | Fixture repo | Correct directory structure with all expected .md files |
| Section completeness | Generated subsystem doc | Contains: Purpose, Key Modules, Public API, Dependencies sections |
| Incremental: direct change | Modify `gateway/server.ts` | Only `gateway.md` + dependents regenerated |
| Incremental: cascading | Modify `config/types.ts` (shared) | `config.md` + all subsystems importing config regenerated |
| Incremental: no-op | No semantic diff | Zero files regenerated |
| Architecture.md ingestion | Fixture with `architecture.md` | LLM prompt includes architecture.md content |
Each doc type has a Markdown template with slots:
### 7E: Diagram Generator (`diagrams.js`)
```markdown
# {{subsystem.label}}
**Purpose:** Auto-generate Mermaid diagrams from graph analysis outputs.
## Purpose
{{llm_generated_purpose}}
**Diagram types:**
## Key Modules
{{for module in subsystem.topModules}}
- `{{module.name}}` — {{module.doc}}
{{endfor}}
1. **Subsystem Dependency Graph** (from 7A dependency matrix)
- Nodes = subsystems (excluding cross-cutting)
- Edges = inter-subsystem CALLS/IMPORTS with edge weight labels
- Cross-cutting subsystems shown as a separate "Shared" cluster
## Public API
{{for export in subsystem.publicExports}}
- `{{export.name}}({{export.params}})` → `{{export.returnType}}`
{{endfor}}
2. **Flow Sequence Diagram** (from 7C flow traces)
- Participants = subsystems in flow order
- Messages = function calls at boundary crossings
- Cycles shown as self-referencing notes
## Dependencies
{{dependency_table}}
3. **Contract Relationship Diagram** (from 7B contracts)
- Classes/interfaces with fields
- IMPLEMENTS/EXTENDS relationships as arrows
## Data Flows
{{for flow in subsystem.flows}}
### {{flow.name}}
{{flow.narrative}}
{{endfor}}
```
**Rendering:** Use `mmdr` (Rust Mermaid renderer) to produce SVG. Embed in generated Markdown docs as `![diagram](./diagrams/subsystem-deps.svg)`.
**Tests (7E):**
| Test | Input | Expected |
|------|-------|----------|
| Dependency diagram | Fixture dependency matrix | Valid Mermaid syntax, matches `expected-diagrams/deps.mmd` |
| Sequence diagram | Fixture flow trace | Valid Mermaid syntax, correct participant order |
| Contract diagram | Fixture contracts | Valid Mermaid syntax, correct relationships |
| Rendering | Any generated .mmd file | mmdr produces valid SVG without errors |
## Architecture.md Ingestion
Each repo may contain human-written architecture documentation. The pipeline:
1. **Discovery:** Scan for `architecture.md`, `docs/architecture.md`, `ARCHITECTURE.md`, `docs/design.md` in repo root
2. **Parsing:** Extract sections (headings → content blocks) as structured context
3. **Injection:** When generating subsystem docs or explanation docs, include relevant architecture.md sections in the LLM prompt alongside graph data
4. **Diff tracking:** If `architecture.md` changes between releases, flag it in the semantic diff as a documentation-relevant change
## Cross-Repo Output Model
Two output modes:
**Per-repo (reference only):**
- Subsystem architecture docs
- Contract reference
- Module reference
- Mermaid diagrams
- Useful for repo maintainers
**Unified (full Divio):**
- Merges per-repo graphs via namespace registry (Phase 3) into super-graph
- Runs 7A-7E on super-graph
- Generates cross-repo flow traces and dependency diagrams
- Includes human-authored tutorials and explanation docs
- Useful for platform consumers and new engineers
## Implementation Phases
| Phase | Module | Effort | Depends On |
|-------|--------|--------|------------|
| 7A | `subsystem.js` | 1 day | graph.js |
| 7B | `contracts.js` | 1-2 days | extract.js (new tree-sitter queries) |
| 7C | `flow.js` | 1 day | graph.js, subsystem.js |
| 7D | `sysdoc.js` | 1-2 days | 7A, 7B, 7C, docgen.js |
| 7-fixtures | Ground truth fixture repo | 0.5 day | — |
| 7A | `subsystem.js` + tests | 1 day | graph.js, fixtures |
| 7B | `contracts.js` + tests | 2 days | extract.js, fixtures |
| 7C | `flow.js` + tests | 2 days | graph.js, subsystem.js, fixtures |
| 7D | `sysdoc.js` + tests | 2 days | 7A, 7B, 7C, docgen.js |
| 7E | `diagrams.js` + tests | 1 day | 7A, 7C, 7B |
| 7F | `supergraph.js` (Multi-repo Merge) | 1 day | namespace.js, graph.js |
**Critical path:** 7A → 7C → 7D (flow tracer needs subsystem boundaries)
**Parallel:** 7B can run in parallel with 7A/7C
**Total: ~9.5 days**
**Critical path:** fixtures → 7A → 7C → 7D
**Parallel:** 7B, 7E, and 7F can run in parallel with core phases.
**Build loop (BMad Wiggum):** Each phase follows: build → test → BMad review → revise → re-review until GO.
## Constraints
@@ -214,15 +351,14 @@ Each doc type has a Markdown template with slots:
- Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)
- Must work on OpenClaw codebase (4,325 files) as primary benchmark
- Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot
- Memory budget: graph snapshots for OpenClaw are ~30MB JSON. In-memory graph with contract entities should stay under 500MB heap. If exceeded, implement streaming extraction (process files in batches, merge partial graphs).
## Open Questions
## Resolved Decisions
1. **Tutorials:** Should we attempt to auto-generate tutorials from flow traces, or leave that as human-only? Foxtrot tutorials are task-oriented ("Create your first VPC") which requires domain knowledge the graph doesn't have.
2. **Design decisions:** Can we infer design decisions from commit history + semantic diffs? ("We switched from X to Y in v2026.3.1 because...") Or is this always human-authored?
3. **Cross-repo:** For Foxtrot's 14-repo setup, do we generate one unified doc site or per-repo docs with cross-links? The namespace registry (Phase 3) handles entity linking, but the doc generator needs to know the boundary.
4. **Diagram generation:** Should we auto-generate Mermaid diagrams from the dependency matrix and flow traces? (We have the mermaid-renderer skill.)
5. **Config contract depth:** How deep do we go on YAML/HCL config extraction? Just top-level keys, or full schema with types and defaults?
1. **Tutorials:** Human-authored only. Flow traces inform but don't generate tutorials — domain knowledge required.
2. **Design decisions:** Infer from commit history + semantic diffs AND parse `architecture.md` from each repo.
3. **Cross-repo:** Both per-repo (reference) and unified (full Divio). Different audiences.
4. **Mermaid diagrams:** Yes, via 7E. Three diagram types: dependency, sequence, contract.
5. **Architecture.md ingestion:** Parsed and injected as LLM context for subsystem and explanation docs.
6. **Flow traces are Explanation, not How-To:** Corrected Divio mapping. How-To deferred from MVP.
7. **LLM output is not CI-tested:** All testable artifacts are deterministic JSON. LLM prose is a formatting pass evaluated by human review.