docs/prd-v3.md

# Product Requirements Document: Dev Intel V3

## 1. Problem Statement
Dev Intel V2 successfully generates documentation from our Foxtrot monorepo, achieving a 93% agent eval and 78% human eval score. However, the pipeline relies on ~2000 lines of custom JavaScript. Much of this custom code duplicates the functionality of well-established Open Source Software (OSS). We need to simplify the architecture, reduce the maintenance burden, and embrace community-standard tools without sacrificing output quality. Our "ratchet loop" is functionally just a "Ralph Wiggum" loop, and we should embrace a simplified, brute-force bash loop with clear objective completion criteria rather than complex custom code.

## 2. Architecture
The V3 architecture adopts a hybrid approach: "OSS for the heavy lifting, custom code for the magic." 

### OSS Replacements
* **Terraform Documentation:** `terraform-docs` (Replaces `extract-terraform.js`)
* **Helm Chart Documentation:** `helm-docs` (Replaces `extract-helm.js` & `sysdoc.js` chart section)
* **Evaluation Harness:** `promptfoo` (Replaces `eval-agent.js`, `eval-human.js`, `eval.js`)
* **Documentation Serving:** `mkdocs-material` (Replaces custom doc serving)
* **Ratchet Loop:** Simple Ralph Wiggum bash loop (Replaces `ratchet.js`)

### Retained Custom Components (The Value Add)
* **Graph Builder (`graph.js` + `extract.js`):** Tree-sitter extraction to build a unified knowledge graph across 13 repositories.
* **Subsystem Aggregator (`subsystem.js`):** Grouping files into logical subsystems and detecting cross-cutting concerns.
* **Cross-Chart Interaction Analysis:** Analyzing shared secrets, ports, and service references across Helm charts (which `helm-docs` cannot do natively).
* **LLM Prose Enrichment (`prose.js`):** Feeding the dependency matrix and anomaly flags into Claude to generate "why" explanations.
* **Glue Layer:** Minimal orchestration connecting OSS tools and custom analysis into unified output.

## 3. Requirements
* **LLM Engine:** Use `http://192.168.86.11:8000/v1` with the `claude-haiku-4.5` model.
* **Scale:** Must handle the Foxtrot monorepo (13 subdirectories, 17K+ files).
* **Footprint constraint:** The pipeline should be composed of ~500 lines of custom Node.js code plus config files.
* **Speed constraint:** Must run end-to-end in under 10 minutes (excluding LLM execution wait times).
* **Cost constraint:** Target cost is $1.00 per release.
* **Code Implementation:** Replace the existing terraform and per-chart Helm doc generation with the CLI tools (`terraform-docs` and `helm-docs`).
* **Docs Website:** Implement an `mkdocs.yml` configuration to serve the output as a searchable site.
* **Evaluation Implementation:** Configure `promptfoo` via YAML to act as the objective judge.

## 4. Ralph Wiggum Loop Spec
The previous `ratchet.js` implementation will be replaced by a `bash` script. This runs an AI agent in a simple, well-known ratchet pattern: loop until objective completion criteria are met.

**Execution Flow:**
1. **Generate:** Run the Dev Intel V3 pipeline.
2. **Evaluate:** Run `promptfoo eval` against the pipeline's output.
3. **Diagnose:** Check the `promptfoo` score against the required threshold.
4. **Condition:**
   * **If Score >= Threshold:** Success, exit the loop.
   * **If Score < Threshold:** Re-feed the previous output and failure context (the evaluation feedback) back into the generator prompt for context.
5. **Repeat:** Continue up to *N* iterations until criteria are met.

## 5. Success Metrics
* **Quality Parity or Better:** Agent eval score >= 93%, Human eval score >= 78%.
* **Simplicity:** Custom codebase shrinks from ~2000 lines to ~500 lines.
* **Performance:** Execution overhead is under 10 minutes.
* **Efficiency:** Pipeline inference costs remain <= $1 per release.

## 6. Migration Plan
To safely deprecate V2 while maintaining documentation pipelines:
1. **Remove Custom Extractors:** Delete `extract-terraform.js`, `extract-helm.js`, and the Helm-specific logic inside `sysdoc.js`.
2. **Remove Custom Evaluators:** Delete `eval-agent.js`, `eval-human.js`, and `eval.js`.
3. **Remove Custom Ratchet:** Delete `ratchet.js`.
4. **Integrate CLI Binaries:** Install and wire up `terraform-docs` and `helm-docs`.
5. **Add Configs:** Write `promptfoo.yaml` for evaluations and `mkdocs.yml` for serving docs.
6. **Implement Bash Script:** Write the Ralph Wiggum loop.
7. **Re-wire Glue Code:** Connect the outputs from the OSS tools into the preserved `prose.js` module.
feat: confluence benchmark, pattern extractor, agent KB, UX spec - extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions, CIDR allocations, naming conventions, sync waves, tech stack from code - agent-kb.js: token-efficient JSON rendering of same doc tree - eval-confluence-ref-questions.json: 32 reference-only benchmark questions - wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%) - docs/human-ux-spec.md: BMad UX designer spec for human doc structure - Eval results: V2 at 28.7% vs confluence 77.8% baseline - Hub/spoke ownership now correctly extracted (95% on that question) - Naming conventions, regions, CIDRs surfaced in system-architecture.md 2026-03-10 14:20:35 +00:00			`# Product Requirements Document: Dev Intel V3`

			`## 1. Problem Statement`
			Dev Intel V2 successfully generates documentation from our Foxtrot monorepo, achieving a 93% agent eval and 78% human eval score. However, the pipeline relies on ~2000 lines of custom JavaScript. Much of this custom code duplicates the functionality of well-established Open Source Software (OSS). We need to simplify the architecture, reduce the maintenance burden, and embrace community-standard tools without sacrificing output quality. Our "ratchet loop" is functionally just a "Ralph Wiggum" loop, and we should embrace a simplified, brute-force bash loop with clear objective completion criteria rather than complex custom code.

			`## 2. Architecture`
			`The V3 architecture adopts a hybrid approach: "OSS for the heavy lifting, custom code for the magic."`

			`### OSS Replacements`
			* Terraform Documentation: `terraform-docs` (Replaces `extract-terraform.js`)
			* Helm Chart Documentation: `helm-docs` (Replaces `extract-helm.js` & `sysdoc.js` chart section)
			* Evaluation Harness: `promptfoo` (Replaces `eval-agent.js`, `eval-human.js`, `eval.js`)
			* Documentation Serving: `mkdocs-material` (Replaces custom doc serving)
			* Ratchet Loop: Simple Ralph Wiggum bash loop (Replaces `ratchet.js`)

			`### Retained Custom Components (The Value Add)`
			* Graph Builder (`graph.js` + `extract.js`): Tree-sitter extraction to build a unified knowledge graph across 13 repositories.
			* Subsystem Aggregator (`subsystem.js`): Grouping files into logical subsystems and detecting cross-cutting concerns.
			* Cross-Chart Interaction Analysis: Analyzing shared secrets, ports, and service references across Helm charts (which `helm-docs` cannot do natively).
			* LLM Prose Enrichment (`prose.js`): Feeding the dependency matrix and anomaly flags into Claude to generate "why" explanations.
			`* Glue Layer: Minimal orchestration connecting OSS tools and custom analysis into unified output.`

			`## 3. Requirements`
			* LLM Engine: Use `http://192.168.86.11:8000/v1` with the `claude-haiku-4.5` model.
			`* Scale: Must handle the Foxtrot monorepo (13 subdirectories, 17K+ files).`
			`* Footprint constraint: The pipeline should be composed of ~500 lines of custom Node.js code plus config files.`
			`* Speed constraint: Must run end-to-end in under 10 minutes (excluding LLM execution wait times).`
			`* Cost constraint: Target cost is $1.00 per release.`
			* Code Implementation: Replace the existing terraform and per-chart Helm doc generation with the CLI tools (`terraform-docs` and `helm-docs`).
			* Docs Website: Implement an `mkdocs.yml` configuration to serve the output as a searchable site.
			* Evaluation Implementation: Configure `promptfoo` via YAML to act as the objective judge.

			`## 4. Ralph Wiggum Loop Spec`
			The previous `ratchet.js` implementation will be replaced by a `bash` script. This runs an AI agent in a simple, well-known ratchet pattern: loop until objective completion criteria are met.

			`Execution Flow:`
			`1. Generate: Run the Dev Intel V3 pipeline.`
			2. Evaluate: Run `promptfoo eval` against the pipeline's output.
			3. Diagnose: Check the `promptfoo` score against the required threshold.
			`4. Condition:`
			`* If Score >= Threshold: Success, exit the loop.`
			`* If Score < Threshold: Re-feed the previous output and failure context (the evaluation feedback) back into the generator prompt for context.`
			`5. Repeat: Continue up to N iterations until criteria are met.`

			`## 5. Success Metrics`
			`* Quality Parity or Better: Agent eval score >= 93%, Human eval score >= 78%.`
			`* Simplicity: Custom codebase shrinks from ~2000 lines to ~500 lines.`
			`* Performance: Execution overhead is under 10 minutes.`
			`* Efficiency: Pipeline inference costs remain <= $1 per release.`

			`## 6. Migration Plan`
			`To safely deprecate V2 while maintaining documentation pipelines:`
			1. Remove Custom Extractors: Delete `extract-terraform.js`, `extract-helm.js`, and the Helm-specific logic inside `sysdoc.js`.
			2. Remove Custom Evaluators: Delete `eval-agent.js`, `eval-human.js`, and `eval.js`.
			3. Remove Custom Ratchet: Delete `ratchet.js`.
			4. Integrate CLI Binaries: Install and wire up `terraform-docs` and `helm-docs`.
			5. Add Configs: Write `promptfoo.yaml` for evaluations and `mkdocs.yml` for serving docs.
			`6. Implement Bash Script: Write the Ralph Wiggum loop.`
			7. Re-wire Glue Code: Connect the outputs from the OSS tools into the preserved `prose.js` module.