Files
dev-intel-v2/docs/prd-v3.md
Jarvis Prime 0265ec7a60 feat: confluence benchmark, pattern extractor, agent KB, UX spec
- extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions,
  CIDR allocations, naming conventions, sync waves, tech stack from code
- agent-kb.js: token-efficient JSON rendering of same doc tree
- eval-confluence-ref-questions.json: 32 reference-only benchmark questions
- wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%)
- docs/human-ux-spec.md: BMad UX designer spec for human doc structure
- Eval results: V2 at 28.7% vs confluence 77.8% baseline
- Hub/spoke ownership now correctly extracted (95% on that question)
- Naming conventions, regions, CIDRs surfaced in system-architecture.md
2026-03-10 14:20:35 +00:00

4.4 KiB

Product Requirements Document: Dev Intel V3

1. Problem Statement

Dev Intel V2 successfully generates documentation from our Foxtrot monorepo, achieving a 93% agent eval and 78% human eval score. However, the pipeline relies on ~2000 lines of custom JavaScript. Much of this custom code duplicates the functionality of well-established Open Source Software (OSS). We need to simplify the architecture, reduce the maintenance burden, and embrace community-standard tools without sacrificing output quality. Our "ratchet loop" is functionally just a "Ralph Wiggum" loop, and we should embrace a simplified, brute-force bash loop with clear objective completion criteria rather than complex custom code.

2. Architecture

The V3 architecture adopts a hybrid approach: "OSS for the heavy lifting, custom code for the magic."

OSS Replacements

  • Terraform Documentation: terraform-docs (Replaces extract-terraform.js)
  • Helm Chart Documentation: helm-docs (Replaces extract-helm.js & sysdoc.js chart section)
  • Evaluation Harness: promptfoo (Replaces eval-agent.js, eval-human.js, eval.js)
  • Documentation Serving: mkdocs-material (Replaces custom doc serving)
  • Ratchet Loop: Simple Ralph Wiggum bash loop (Replaces ratchet.js)

Retained Custom Components (The Value Add)

  • Graph Builder (graph.js + extract.js): Tree-sitter extraction to build a unified knowledge graph across 13 repositories.
  • Subsystem Aggregator (subsystem.js): Grouping files into logical subsystems and detecting cross-cutting concerns.
  • Cross-Chart Interaction Analysis: Analyzing shared secrets, ports, and service references across Helm charts (which helm-docs cannot do natively).
  • LLM Prose Enrichment (prose.js): Feeding the dependency matrix and anomaly flags into Claude to generate "why" explanations.
  • Glue Layer: Minimal orchestration connecting OSS tools and custom analysis into unified output.

3. Requirements

  • LLM Engine: Use http://192.168.86.11:8000/v1 with the claude-haiku-4.5 model.
  • Scale: Must handle the Foxtrot monorepo (13 subdirectories, 17K+ files).
  • Footprint constraint: The pipeline should be composed of ~500 lines of custom Node.js code plus config files.
  • Speed constraint: Must run end-to-end in under 10 minutes (excluding LLM execution wait times).
  • Cost constraint: Target cost is $1.00 per release.
  • Code Implementation: Replace the existing terraform and per-chart Helm doc generation with the CLI tools (terraform-docs and helm-docs).
  • Docs Website: Implement an mkdocs.yml configuration to serve the output as a searchable site.
  • Evaluation Implementation: Configure promptfoo via YAML to act as the objective judge.

4. Ralph Wiggum Loop Spec

The previous ratchet.js implementation will be replaced by a bash script. This runs an AI agent in a simple, well-known ratchet pattern: loop until objective completion criteria are met.

Execution Flow:

  1. Generate: Run the Dev Intel V3 pipeline.
  2. Evaluate: Run promptfoo eval against the pipeline's output.
  3. Diagnose: Check the promptfoo score against the required threshold.
  4. Condition:
    • If Score >= Threshold: Success, exit the loop.
    • If Score < Threshold: Re-feed the previous output and failure context (the evaluation feedback) back into the generator prompt for context.
  5. Repeat: Continue up to N iterations until criteria are met.

5. Success Metrics

  • Quality Parity or Better: Agent eval score >= 93%, Human eval score >= 78%.
  • Simplicity: Custom codebase shrinks from ~2000 lines to ~500 lines.
  • Performance: Execution overhead is under 10 minutes.
  • Efficiency: Pipeline inference costs remain <= $1 per release.

6. Migration Plan

To safely deprecate V2 while maintaining documentation pipelines:

  1. Remove Custom Extractors: Delete extract-terraform.js, extract-helm.js, and the Helm-specific logic inside sysdoc.js.
  2. Remove Custom Evaluators: Delete eval-agent.js, eval-human.js, and eval.js.
  3. Remove Custom Ratchet: Delete ratchet.js.
  4. Integrate CLI Binaries: Install and wire up terraform-docs and helm-docs.
  5. Add Configs: Write promptfoo.yaml for evaluations and mkdocs.yml for serving docs.
  6. Implement Bash Script: Write the Ralph Wiggum loop.
  7. Re-wire Glue Code: Connect the outputs from the OSS tools into the preserved prose.js module.