specs/system-docs-spec.md

# Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation

**Status:** DRAFT
**Author:** Max (AI) + Brian (Human)
**Date:** 2026-03-09
**Depends on:** Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)

---

## Problem Statement

The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the *system level*: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.

File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.

## Goal

Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).

## Success Criteria

| Metric | Target |
|--------|--------|
| Subsystem detection accuracy | ≥90% of modules correctly clustered |
| Cross-subsystem dependency completeness | ≥85% of actual inter-subsystem edges captured |
| Contract extraction recall | ≥80% of exported interfaces/types extracted |
| Generated doc structure | Matches Divio 4-category template |
| Incremental update precision | Only subsystems touched by semantic diff get regenerated |
| LLM cost per full generation | ≤$2 (using local Ollama for drafting) |

## Architecture

### 7A: Subsystem Aggregator (`subsystem.js`)

**Purpose:** Group file-level entities into logical subsystems and compute inter-subsystem relationships.

**Clustering Strategy (tiered):**

1. **Directory-based (default):** Top-level directory under `src/` = subsystem. `gateway/`, `agents/`, `cli/`, `telegram/`, etc. Simple, deterministic, zero-config.

2. **Config-driven (override):** Optional `subsystems.yaml` that maps directories to named subsystems with human labels and grouping overrides.
   ```yaml
   subsystems:
     - name: Gateway
       label: "Session & Request Gateway"
       paths: ["gateway/", "routing/"]
     - name: Agents
       label: "AI Agent Runtime"
       paths: ["agents/", "auto-reply/"]
     - name: Channels
       label: "Channel Adapters"
       paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"]
   ```

3. **Graph-based (future):** Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.

**Output:**
```json
{
  "subsystems": [
    {
      "name": "gateway",
      "label": "Session & Request Gateway",
      "files": ["gateway/session-utils.ts", "gateway/server.ts", ...],
      "entities": { "functions": 142, "classes": 3, "modules": 28 },
      "publicExports": ["deriveSessionTitle", "loadSessionEntry", ...],
      "internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],
      "externalDeps": ["commander", "node:fs", "node:path"]
    }
  ],
  "dependencyMatrix": {
    "gateway→agents": { "calls": 89, "imports": 34 },
    "agents→config": { "calls": 156, "imports": 120 },
    ...
  }
}
```

### 7B: Contract Extractor (`contracts.js`)

**Purpose:** Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.

**What to extract:**
- `interface Foo { ... }` → entity type `Interface`, with fields as properties
- `type Foo = { ... }` → entity type `TypeAlias`
- `enum Foo { ... }` → entity type `Enum`, with members
- Exported `const` objects used as config defaults → entity type `ConfigContract`
- YAML schema keys (from config files) → entity type `ConfigSchema`

**Relationships:**
- `IMPLEMENTS` — class → interface
- `ACCEPTS` — function parameter → interface/type (function signature contracts)
- `RETURNS` — function → return type
- `EXTENDS` — interface → interface

**Why this matters:**
Foxtrot docs define explicit contracts: "`accountCreation` expects `reltioCustomerId: string`". Without extracting interfaces/types, we can't generate contract documentation. The LLM has to guess from function bodies, which is unreliable.

### 7C: Flow Tracer (`flow.js`)

**Purpose:** Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow.

**Algorithm:**
1. Start at entry point entity (e.g., `telegram/bot-handlers.ts:onMessage`)
2. BFS/DFS through CALLS edges, recording subsystem transitions
3. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call
4. Prune: stop at depth N (configurable, default 5), skip test files, skip utility functions below a connectivity threshold
5. Output: ordered list of subsystem hops with the specific function calls that cross boundaries

**Output:**
```json
{
  "entryPoint": "telegram/bot-handlers.ts:onMessage",
  "flow": [
    { "subsystem": "telegram", "function": "onMessage", "action": "receives incoming message" },
    { "subsystem": "routing", "function": "routeInbound", "action": "routes to session handler", "crossedVia": "CALLS" },
    { "subsystem": "gateway", "function": "handleSession", "action": "loads session state", "crossedVia": "CALLS" },
    { "subsystem": "agents", "function": "runAgent", "action": "executes AI agent turn", "crossedVia": "CALLS" }
  ]
}
```

**LLM narration:** Feed the flow trace + source snippets at each hop to the LLM. Ask it to write a prose narrative: "When a Telegram message arrives, the bot handler dispatches it to the routing layer, which resolves the session key and..."

### 7D: Hierarchical Doc Generator (`sysdoc.js`)

**Purpose:** Orchestrate 7A-7C to produce a complete documentation site in Divio structure.

**Output structure:**
```
docs/
├── tutorials/
│   └── (not auto-generated — requires human curation)
├── how-to/
│   └── (generated from flow traces of common operations)
├── reference/
│   ├── system-architecture.md      ← from subsystem aggregator + dependency matrix
│   ├── subsystems/
│   │   ├── gateway.md              ← per-subsystem: purpose, exports, deps, key modules
│   │   ├── agents.md
│   │   └── ...
│   ├── contracts/
│   │   ├── session-types.md        ← from contract extractor
│   │   └── ...
│   └── modules/
│       └── (existing file-level docs from Phase 6)
├── explanation/
│   ├── architecture-patterns.md    ← from dependency matrix analysis
│   ├── data-flows.md              ← from flow tracer
│   └── design-decisions.md        ← (requires human input or commit history analysis)
```

**Generation pipeline:**
1. Run subsystem aggregator → subsystem map + dependency matrix
2. Run contract extractor → interface/type entities added to graph
3. Run flow tracer on configured entry points → flow narratives
4. For each subsystem: generate reference doc (LLM with subsystem context)
5. Generate system architecture overview (LLM with full dependency matrix)
6. Generate data flow explanations (LLM with flow traces)

**Incremental updates:**
- Semantic diff identifies changed files
- Map changed files → affected subsystems
- Only regenerate docs for affected subsystems
- System architecture overview regenerated only if dependency matrix changed

### Template System

Each doc type has a Markdown template with slots:

```markdown
# {{subsystem.label}}

## Purpose
{{llm_generated_purpose}}

## Key Modules
{{for module in subsystem.topModules}}
- `{{module.name}}` — {{module.doc}}
{{endfor}}

## Public API
{{for export in subsystem.publicExports}}
- `{{export.name}}({{export.params}})` → `{{export.returnType}}`
{{endfor}}

## Dependencies
{{dependency_table}}

## Data Flows
{{for flow in subsystem.flows}}
### {{flow.name}}
{{flow.narrative}}
{{endfor}}
```

## Implementation Phases

| Phase | Module | Effort | Depends On |
|-------|--------|--------|------------|
| 7A | `subsystem.js` | 1 day | graph.js |
| 7B | `contracts.js` | 1-2 days | extract.js (new tree-sitter queries) |
| 7C | `flow.js` | 1 day | graph.js, subsystem.js |
| 7D | `sysdoc.js` | 1-2 days | 7A, 7B, 7C, docgen.js |

**Critical path:** 7A → 7C → 7D (flow tracer needs subsystem boundaries)
**Parallel:** 7B can run in parallel with 7A/7C

## Constraints

- No new external dependencies (same as Phases 1-5)
- LLM calls only for prose generation — all structural analysis is deterministic
- tree-sitter@0.21.1 compatibility maintained
- Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)
- Must work on OpenClaw codebase (4,325 files) as primary benchmark
- Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot

## Open Questions

1. **Tutorials:** Should we attempt to auto-generate tutorials from flow traces, or leave that as human-only? Foxtrot tutorials are task-oriented ("Create your first VPC") which requires domain knowledge the graph doesn't have.

2. **Design decisions:** Can we infer design decisions from commit history + semantic diffs? ("We switched from X to Y in v2026.3.1 because...") Or is this always human-authored?

3. **Cross-repo:** For Foxtrot's 14-repo setup, do we generate one unified doc site or per-repo docs with cross-links? The namespace registry (Phase 3) handles entity linking, but the doc generator needs to know the boundary.

4. **Diagram generation:** Should we auto-generate Mermaid diagrams from the dependency matrix and flow traces? (We have the mermaid-renderer skill.)

5. **Config contract depth:** How deep do we go on YAML/HCL config extraction? Just top-level keys, or full schema with types and defaults?
Phase 6: LLM doc generation + Phase 7 system-docs spec 2026-03-09 06:20:54 +00:00			`# Dev Intel Pipeline v2 — Phase 7: System-Level Documentation Generation`

			`Status: DRAFT`
			`Author: Max (AI) + Brian (Human)`
			`Date: 2026-03-09`
			`Depends on: Phases 1-6 (extract, graph, namespace, semantic-diff, pipeline, docgen)`

			`---`

			`## Problem Statement`

			`The V2 pipeline generates accurate file-level documentation ("this module exports X, depends on Y, calls Z"). But real platform documentation — like the Foxtrot Confluence docs — operates at the system level: subsystem architecture, cross-subsystem data flows, configuration contracts, deployment pipelines, and layered dependency narratives.`

			`File-level docs are reference material. System-level docs are what engineers actually read to understand how things work.`

			`## Goal`

			`Extend the V2 pipeline to generate Foxtrot-quality system documentation from the code knowledge graph, organized in the Divio documentation framework (Tutorials, How-To, Reference, Explanation).`

			`## Success Criteria`

			`\| Metric \| Target \|`
			`\|--------\|--------\|`
			`\| Subsystem detection accuracy \| ≥90% of modules correctly clustered \|`
			`\| Cross-subsystem dependency completeness \| ≥85% of actual inter-subsystem edges captured \|`
			`\| Contract extraction recall \| ≥80% of exported interfaces/types extracted \|`
			`\| Generated doc structure \| Matches Divio 4-category template \|`
			`\| Incremental update precision \| Only subsystems touched by semantic diff get regenerated \|`
			`\| LLM cost per full generation \| ≤$2 (using local Ollama for drafting) \|`

			`## Architecture`

			### 7A: Subsystem Aggregator (`subsystem.js`)

			`Purpose: Group file-level entities into logical subsystems and compute inter-subsystem relationships.`

			`Clustering Strategy (tiered):`

			1. Directory-based (default): Top-level directory under `src/` = subsystem. `gateway/`, `agents/`, `cli/`, `telegram/`, etc. Simple, deterministic, zero-config.

			2. Config-driven (override): Optional `subsystems.yaml` that maps directories to named subsystems with human labels and grouping overrides.
			```yaml
			`subsystems:`
			`- name: Gateway`
			`label: "Session & Request Gateway"`
			`paths: ["gateway/", "routing/"]`
			`- name: Agents`
			`label: "AI Agent Runtime"`
			`paths: ["agents/", "auto-reply/"]`
			`- name: Channels`
			`label: "Channel Adapters"`
			`paths: ["telegram/", "discord/", "slack/", "signal/", "whatsapp/"]`
			```

			`3. Graph-based (future): Community detection (Louvain/label propagation) on the CALLS+IMPORTS graph to find natural clusters. Useful for repos without clean directory boundaries.`

			`Output:`
			```json
			`{`
			`"subsystems": [`
			`{`
			`"name": "gateway",`
			`"label": "Session & Request Gateway",`
			`"files": ["gateway/session-utils.ts", "gateway/server.ts", ...],`
			`"entities": { "functions": 142, "classes": 3, "modules": 28 },`
			`"publicExports": ["deriveSessionTitle", "loadSessionEntry", ...],`
			`"internalDeps": [{"from": "gateway", "to": "agents", "edges": 89, "type": "CALLS"}],`
			`"externalDeps": ["commander", "node:fs", "node:path"]`
			`}`
			`],`
			`"dependencyMatrix": {`
			`"gateway→agents": { "calls": 89, "imports": 34 },`
			`"agents→config": { "calls": 156, "imports": 120 },`
			`...`
			`}`
			`}`
			```

			### 7B: Contract Extractor (`contracts.js`)

			`Purpose: Extract TypeScript interfaces, type aliases, enums, and config schemas as first-class graph entities.`

			`What to extract:`
			- `interface Foo { ... }` → entity type `Interface`, with fields as properties
			- `type Foo = { ... }` → entity type `TypeAlias`
			- `enum Foo { ... }` → entity type `Enum`, with members
			- Exported `const` objects used as config defaults → entity type `ConfigContract`
			- YAML schema keys (from config files) → entity type `ConfigSchema`

			`Relationships:`
			- `IMPLEMENTS` — class → interface
			- `ACCEPTS` — function parameter → interface/type (function signature contracts)
			- `RETURNS` — function → return type
			- `EXTENDS` — interface → interface

			`Why this matters:`
			Foxtrot docs define explicit contracts: "`accountCreation` expects `reltioCustomerId: string`". Without extracting interfaces/types, we can't generate contract documentation. The LLM has to guess from function bodies, which is unreliable.

			### 7C: Flow Tracer (`flow.js`)

			`Purpose: Given an entry point, walk the call graph across subsystem boundaries and produce a sequenced narrative of the data flow.`

			`Algorithm:`
			1. Start at entry point entity (e.g., `telegram/bot-handlers.ts:onMessage`)
			`2. BFS/DFS through CALLS edges, recording subsystem transitions`
			`3. At each subsystem boundary crossing, record: source subsystem → target subsystem, via which function call`
			`4. Prune: stop at depth N (configurable, default 5), skip test files, skip utility functions below a connectivity threshold`
			`5. Output: ordered list of subsystem hops with the specific function calls that cross boundaries`

			`Output:`
			```json
			`{`
			`"entryPoint": "telegram/bot-handlers.ts:onMessage",`
			`"flow": [`
			`{ "subsystem": "telegram", "function": "onMessage", "action": "receives incoming message" },`
			`{ "subsystem": "routing", "function": "routeInbound", "action": "routes to session handler", "crossedVia": "CALLS" },`
			`{ "subsystem": "gateway", "function": "handleSession", "action": "loads session state", "crossedVia": "CALLS" },`
			`{ "subsystem": "agents", "function": "runAgent", "action": "executes AI agent turn", "crossedVia": "CALLS" }`
			`]`
			`}`
			```

			`LLM narration: Feed the flow trace + source snippets at each hop to the LLM. Ask it to write a prose narrative: "When a Telegram message arrives, the bot handler dispatches it to the routing layer, which resolves the session key and..."`

			### 7D: Hierarchical Doc Generator (`sysdoc.js`)

			`Purpose: Orchestrate 7A-7C to produce a complete documentation site in Divio structure.`

			`Output structure:`
			```
			`docs/`
			`├── tutorials/`
			`│ └── (not auto-generated — requires human curation)`
			`├── how-to/`
			`│ └── (generated from flow traces of common operations)`
			`├── reference/`
			`│ ├── system-architecture.md ← from subsystem aggregator + dependency matrix`
			`│ ├── subsystems/`
			`│ │ ├── gateway.md ← per-subsystem: purpose, exports, deps, key modules`
			`│ │ ├── agents.md`
			`│ │ └── ...`
			`│ ├── contracts/`
			`│ │ ├── session-types.md ← from contract extractor`
			`│ │ └── ...`
			`│ └── modules/`
			`│ └── (existing file-level docs from Phase 6)`
			`├── explanation/`
			`│ ├── architecture-patterns.md ← from dependency matrix analysis`
			`│ ├── data-flows.md ← from flow tracer`
			`│ └── design-decisions.md ← (requires human input or commit history analysis)`
			```

			`Generation pipeline:`
			`1. Run subsystem aggregator → subsystem map + dependency matrix`
			`2. Run contract extractor → interface/type entities added to graph`
			`3. Run flow tracer on configured entry points → flow narratives`
			`4. For each subsystem: generate reference doc (LLM with subsystem context)`
			`5. Generate system architecture overview (LLM with full dependency matrix)`
			`6. Generate data flow explanations (LLM with flow traces)`

			`Incremental updates:`
			`- Semantic diff identifies changed files`
			`- Map changed files → affected subsystems`
			`- Only regenerate docs for affected subsystems`
			`- System architecture overview regenerated only if dependency matrix changed`

			`### Template System`

			`Each doc type has a Markdown template with slots:`

			```markdown
			`# {{subsystem.label}}`

			`## Purpose`
			`{{llm_generated_purpose}}`

			`## Key Modules`
			`{{for module in subsystem.topModules}}`
			- `{{module.name}}` — {{module.doc}}
			`{{endfor}}`

			`## Public API`
			`{{for export in subsystem.publicExports}}`
			- `{{export.name}}({{export.params}})` → `{{export.returnType}}`
			`{{endfor}}`

			`## Dependencies`
			`{{dependency_table}}`

			`## Data Flows`
			`{{for flow in subsystem.flows}}`
			`### {{flow.name}}`
			`{{flow.narrative}}`
			`{{endfor}}`
			```

			`## Implementation Phases`

			`\| Phase \| Module \| Effort \| Depends On \|`
			`\|-------\|--------\|--------\|------------\|`
			\| 7A \| `subsystem.js` \| 1 day \| graph.js \|
			\| 7B \| `contracts.js` \| 1-2 days \| extract.js (new tree-sitter queries) \|
			\| 7C \| `flow.js` \| 1 day \| graph.js, subsystem.js \|
			\| 7D \| `sysdoc.js` \| 1-2 days \| 7A, 7B, 7C, docgen.js \|

			`Critical path: 7A → 7C → 7D (flow tracer needs subsystem boundaries)`
			`Parallel: 7B can run in parallel with 7A/7C`

			`## Constraints`

			`- No new external dependencies (same as Phases 1-5)`
			`- LLM calls only for prose generation — all structural analysis is deterministic`
			`- tree-sitter@0.21.1 compatibility maintained`
			`- Templates are Markdown with simple mustache-style slots (no template engine dependency — string replacement)`
			`- Must work on OpenClaw codebase (4,325 files) as primary benchmark`
			`- Foxtrot repos are not available in this environment — design must work from any repo's graph snapshot`

			`## Open Questions`

			`1. Tutorials: Should we attempt to auto-generate tutorials from flow traces, or leave that as human-only? Foxtrot tutorials are task-oriented ("Create your first VPC") which requires domain knowledge the graph doesn't have.`

			`2. Design decisions: Can we infer design decisions from commit history + semantic diffs? ("We switched from X to Y in v2026.3.1 because...") Or is this always human-authored?`

			`3. Cross-repo: For Foxtrot's 14-repo setup, do we generate one unified doc site or per-repo docs with cross-links? The namespace registry (Phase 3) handles entity linking, but the doc generator needs to know the boundary.`

			`4. Diagram generation: Should we auto-generate Mermaid diagrams from the dependency matrix and flow traces? (We have the mermaid-renderer skill.)`

			`5. Config contract depth: How deep do we go on YAML/HCL config extraction? Just top-level keys, or full schema with types and defaults?`