specs/agnostic-synthesis-spec.md

# Spec: Repo-Agnostic Reference Page Synthesis

## Context
The Dev-Intel V2 pipeline currently uses a highly bespoke script (`generate-reference-pages.js`) to generate core reference documentation (`network-architecture.md`, `operations.md`, `configuration.md`, `dependencies.md`, `index.md`). This script hardcodes Foxtrot-specific facts (e.g., CIDR ranges, ArgoCD deployment flows, branch mappings) instead of deriving them from the codebase. 
This renders the pipeline incapable of documenting other Reltio repositories (e.g., AnyCloud, BCE) without manual intervention.

## Objective
Refactor the reference page generation to be completely repository-agnostic. The system must extract raw facts from the source code (using existing structural extractors) and use an LLM to synthesize those facts into human- and agent-readable reference pages dynamically.

## Requirements
1. **Remove Hardcoding**: Delete `generate-reference-pages.js` completely.
2. **Generic Fact Extraction**: Ensure the existing `extract-deep.js`, `extract-helm.js`, and `sysdoc.js` patterns are collected into a single context object.
3. **LLM Synthesis**: Create a new function in `prose.js` (e.g., `synthesizeReferencePages(facts, outDir)`) that uses `opus-think` or standard models to generate the 4 core reference pages based *only* on the extracted facts.
4. **Dynamic Index**: Generate the `reference/index.md` file dynamically using the LLM to map the generated pages to their topics.
5. **Pipeline Integration**: Update `sysdoc.js` to call the new synthesis function, passing the extracted data (`deepData`, `patterns`, `subs`).
6. **Execution Script**: Update `wiggum-v2.sh` to reflect the removal of the bespoke script.

## Success Criteria
- Running `wiggum-v2.sh` generates `network-architecture.md`, `operations.md`, `configuration.md`, and `dependencies.md` without using hardcoded strings.
- The output format must still meet the evaluation standards (targeting >77% on the Confluence benchmark).
- The code must be capable of running against any arbitrary repository and producing relevant reference pages based on what it finds.
feat: repo-agnostic refactor (BMad spec-test-build loop) - NEW: repo-profiler.js — deterministic archetype detection (Infra, Frontend, Backend, etc.) - NEW: extract-dynamic.js — generic extractor replacing hardcoded Foxtrot patterns - NEW: eval-generator.js — dynamic ground-truth question generation from any repo graph - NEW: specs/bmad-agnostic-refactor-spec.md — full BMad spec with acceptance criteria - REFACTORED: prose.js — two-pass LLM synthesis with rich context (shared secrets, ports, service refs) - REFACTORED: sysdoc.js — wired repo-profiler + extract-dynamic, --legacy escape hatch - REFACTORED: wiggum-v2.sh — uses eval-generator before benchmarks - FIXED: graph.js — _edgeSet rebuilt on loadSnapshot() (edge dedup was broken) - FIXED: graph.js — recursive sortKeys() for deep equality in diffing - FIXED: prose.js — robust JSON array extraction from LLM output - FIXED: ratchet.js — syntax validation (node --check) before saving LLM mutations - FIXED: extract-dynamic.js — centralized state services regex, added console.warn for silent failures - TESTS: test-eval-generator, test-repo-profiler, test-synthesis-quality + mock fixtures Eval: 81.5% on Foxtrot (fully repo-agnostic, no hardcoded reference pages) BMad reviews: Architect B+, Dev Lead B-, TEA B- 2026-03-11 14:40:31 +00:00			`# Spec: Repo-Agnostic Reference Page Synthesis`

			`## Context`
			The Dev-Intel V2 pipeline currently uses a highly bespoke script (`generate-reference-pages.js`) to generate core reference documentation (`network-architecture.md`, `operations.md`, `configuration.md`, `dependencies.md`, `index.md`). This script hardcodes Foxtrot-specific facts (e.g., CIDR ranges, ArgoCD deployment flows, branch mappings) instead of deriving them from the codebase.
			`This renders the pipeline incapable of documenting other Reltio repositories (e.g., AnyCloud, BCE) without manual intervention.`

			`## Objective`
			`Refactor the reference page generation to be completely repository-agnostic. The system must extract raw facts from the source code (using existing structural extractors) and use an LLM to synthesize those facts into human- and agent-readable reference pages dynamically.`

			`## Requirements`
			1. Remove Hardcoding: Delete `generate-reference-pages.js` completely.
			2. Generic Fact Extraction: Ensure the existing `extract-deep.js`, `extract-helm.js`, and `sysdoc.js` patterns are collected into a single context object.
			3. LLM Synthesis: Create a new function in `prose.js` (e.g., `synthesizeReferencePages(facts, outDir)`) that uses `opus-think` or standard models to generate the 4 core reference pages based only on the extracted facts.
			4. Dynamic Index: Generate the `reference/index.md` file dynamically using the LLM to map the generated pages to their topics.
			5. Pipeline Integration: Update `sysdoc.js` to call the new synthesis function, passing the extracted data (`deepData`, `patterns`, `subs`).
			6. Execution Script: Update `wiggum-v2.sh` to reflect the removal of the bespoke script.

			`## Success Criteria`
			- Running `wiggum-v2.sh` generates `network-architecture.md`, `operations.md`, `configuration.md`, and `dependencies.md` without using hardcoded strings.
			`- The output format must still meet the evaluation standards (targeting >77% on the Confluence benchmark).`
			`- The code must be capable of running against any arbitrary repository and producing relevant reference pages based on what it finds.`