- NEW: repo-profiler.js — deterministic archetype detection (Infra, Frontend, Backend, etc.) - NEW: extract-dynamic.js — generic extractor replacing hardcoded Foxtrot patterns - NEW: eval-generator.js — dynamic ground-truth question generation from any repo graph - NEW: specs/bmad-agnostic-refactor-spec.md — full BMad spec with acceptance criteria - REFACTORED: prose.js — two-pass LLM synthesis with rich context (shared secrets, ports, service refs) - REFACTORED: sysdoc.js — wired repo-profiler + extract-dynamic, --legacy escape hatch - REFACTORED: wiggum-v2.sh — uses eval-generator before benchmarks - FIXED: graph.js — _edgeSet rebuilt on loadSnapshot() (edge dedup was broken) - FIXED: graph.js — recursive sortKeys() for deep equality in diffing - FIXED: prose.js — robust JSON array extraction from LLM output - FIXED: ratchet.js — syntax validation (node --check) before saving LLM mutations - FIXED: extract-dynamic.js — centralized state services regex, added console.warn for silent failures - TESTS: test-eval-generator, test-repo-profiler, test-synthesis-quality + mock fixtures Eval: 81.5% on Foxtrot (fully repo-agnostic, no hardcoded reference pages) BMad reviews: Architect B+, Dev Lead B-, TEA B-
21 lines
2.1 KiB
Markdown
21 lines
2.1 KiB
Markdown
# Spec: Repo-Agnostic Reference Page Synthesis
|
|
|
|
## Context
|
|
The Dev-Intel V2 pipeline currently uses a highly bespoke script (`generate-reference-pages.js`) to generate core reference documentation (`network-architecture.md`, `operations.md`, `configuration.md`, `dependencies.md`, `index.md`). This script hardcodes Foxtrot-specific facts (e.g., CIDR ranges, ArgoCD deployment flows, branch mappings) instead of deriving them from the codebase.
|
|
This renders the pipeline incapable of documenting other Reltio repositories (e.g., AnyCloud, BCE) without manual intervention.
|
|
|
|
## Objective
|
|
Refactor the reference page generation to be completely repository-agnostic. The system must extract raw facts from the source code (using existing structural extractors) and use an LLM to synthesize those facts into human- and agent-readable reference pages dynamically.
|
|
|
|
## Requirements
|
|
1. **Remove Hardcoding**: Delete `generate-reference-pages.js` completely.
|
|
2. **Generic Fact Extraction**: Ensure the existing `extract-deep.js`, `extract-helm.js`, and `sysdoc.js` patterns are collected into a single context object.
|
|
3. **LLM Synthesis**: Create a new function in `prose.js` (e.g., `synthesizeReferencePages(facts, outDir)`) that uses `opus-think` or standard models to generate the 4 core reference pages based *only* on the extracted facts.
|
|
4. **Dynamic Index**: Generate the `reference/index.md` file dynamically using the LLM to map the generated pages to their topics.
|
|
5. **Pipeline Integration**: Update `sysdoc.js` to call the new synthesis function, passing the extracted data (`deepData`, `patterns`, `subs`).
|
|
6. **Execution Script**: Update `wiggum-v2.sh` to reflect the removal of the bespoke script.
|
|
|
|
## Success Criteria
|
|
- Running `wiggum-v2.sh` generates `network-architecture.md`, `operations.md`, `configuration.md`, and `dependencies.md` without using hardcoded strings.
|
|
- The output format must still meet the evaluation standards (targeting >77% on the Confluence benchmark).
|
|
- The code must be capable of running against any arbitrary repository and producing relevant reference pages based on what it finds. |