Files

Jarvis Prime 0265ec7a60 feat: confluence benchmark, pattern extractor, agent KB, UX spec

- extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions,
  CIDR allocations, naming conventions, sync waves, tech stack from code
- agent-kb.js: token-efficient JSON rendering of same doc tree
- eval-confluence-ref-questions.json: 32 reference-only benchmark questions
- wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%)
- docs/human-ux-spec.md: BMad UX designer spec for human doc structure
- Eval results: V2 at 28.7% vs confluence 77.8% baseline
- Hub/spoke ownership now correctly extracted (95% on that question)
- Naming conventions, regions, CIDRs surfaced in system-architecture.md

2026-03-10 14:20:35 +00:00

4.4 KiB

Raw Permalink Blame History

Product Requirements Document: Dev Intel V3

1. Problem Statement

Dev Intel V2 successfully generates documentation from our Foxtrot monorepo, achieving a 93% agent eval and 78% human eval score. However, the pipeline relies on ~2000 lines of custom JavaScript. Much of this custom code duplicates the functionality of well-established Open Source Software (OSS). We need to simplify the architecture, reduce the maintenance burden, and embrace community-standard tools without sacrificing output quality. Our "ratchet loop" is functionally just a "Ralph Wiggum" loop, and we should embrace a simplified, brute-force bash loop with clear objective completion criteria rather than complex custom code.

2. Architecture

The V3 architecture adopts a hybrid approach: "OSS for the heavy lifting, custom code for the magic."

OSS Replacements

Terraform Documentation: terraform-docs (Replaces extract-terraform.js)
Helm Chart Documentation: helm-docs (Replaces extract-helm.js & sysdoc.js chart section)
Evaluation Harness: promptfoo (Replaces eval-agent.js, eval-human.js, eval.js)
Documentation Serving: mkdocs-material (Replaces custom doc serving)
Ratchet Loop: Simple Ralph Wiggum bash loop (Replaces ratchet.js)

Retained Custom Components (The Value Add)

Graph Builder (graph.js + extract.js): Tree-sitter extraction to build a unified knowledge graph across 13 repositories.
Subsystem Aggregator (subsystem.js): Grouping files into logical subsystems and detecting cross-cutting concerns.
Cross-Chart Interaction Analysis: Analyzing shared secrets, ports, and service references across Helm charts (which helm-docs cannot do natively).
LLM Prose Enrichment (prose.js): Feeding the dependency matrix and anomaly flags into Claude to generate "why" explanations.
Glue Layer: Minimal orchestration connecting OSS tools and custom analysis into unified output.

3. Requirements

LLM Engine: Use http://192.168.86.11:8000/v1 with the claude-haiku-4.5 model.
Scale: Must handle the Foxtrot monorepo (13 subdirectories, 17K+ files).
Footprint constraint: The pipeline should be composed of ~500 lines of custom Node.js code plus config files.
Speed constraint: Must run end-to-end in under 10 minutes (excluding LLM execution wait times).
Cost constraint: Target cost is $1.00 per release.
Code Implementation: Replace the existing terraform and per-chart Helm doc generation with the CLI tools (terraform-docs and helm-docs).
Docs Website: Implement an mkdocs.yml configuration to serve the output as a searchable site.
Evaluation Implementation: Configure promptfoo via YAML to act as the objective judge.

4. Ralph Wiggum Loop Spec

The previous ratchet.js implementation will be replaced by a bash script. This runs an AI agent in a simple, well-known ratchet pattern: loop until objective completion criteria are met.

Execution Flow:

Generate: Run the Dev Intel V3 pipeline.
Evaluate: Run promptfoo eval against the pipeline's output.
Diagnose: Check the promptfoo score against the required threshold.
Condition:
- If Score >= Threshold: Success, exit the loop.
- If Score < Threshold: Re-feed the previous output and failure context (the evaluation feedback) back into the generator prompt for context.
Repeat: Continue up to N iterations until criteria are met.

5. Success Metrics

Quality Parity or Better: Agent eval score >= 93%, Human eval score >= 78%.
Simplicity: Custom codebase shrinks from ~2000 lines to ~500 lines.
Performance: Execution overhead is under 10 minutes.
Efficiency: Pipeline inference costs remain <= $1 per release.

6. Migration Plan

To safely deprecate V2 while maintaining documentation pipelines:

Remove Custom Extractors: Delete extract-terraform.js, extract-helm.js, and the Helm-specific logic inside sysdoc.js.
Remove Custom Evaluators: Delete eval-agent.js, eval-human.js, and eval.js.
Remove Custom Ratchet: Delete ratchet.js.
Integrate CLI Binaries: Install and wire up terraform-docs and helm-docs.
Add Configs: Write promptfoo.yaml for evaluations and mkdocs.yml for serving docs.
Implement Bash Script: Write the Ralph Wiggum loop.
Re-wire Glue Code: Connect the outputs from the OSS tools into the preserved prose.js module.

4.4 KiB Raw Permalink Blame History