Files

Jarvis Prime 0265ec7a60 feat: confluence benchmark, pattern extractor, agent KB, UX spec

- extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions,
  CIDR allocations, naming conventions, sync waves, tech stack from code
- agent-kb.js: token-efficient JSON rendering of same doc tree
- eval-confluence-ref-questions.json: 32 reference-only benchmark questions
- wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%)
- docs/human-ux-spec.md: BMad UX designer spec for human doc structure
- Eval results: V2 at 28.7% vs confluence 77.8% baseline
- Hub/spoke ownership now correctly extracted (95% on that question)
- Naming conventions, regions, CIDRs surfaced in system-architecture.md

2026-03-10 14:20:35 +00:00

3.5 KiB

Raw Blame History

Product Requirements Document: Dev Intel V2

1. Problem Statement

Dev Intel V2 currently extracts code entities and Helm chart structures to build a unified knowledge graph and generate Diataxis-structured documentation for infrastructure monorepos. While the pipeline performs well for AI agents (93.4% eval score), human engineers are struggling (78.6% eval score) because the generated prose is purely descriptive rather than explanatory. Furthermore, critical infrastructure components like Terraform are missing from the extraction, and architectural flow tracing is non-existent, leaving significant gaps in the generated documentation's usefulness for understanding change impact and structural anomalies.

2. User Personas

Infrastructure Engineer: Needs to understand the "why" behind the architecture, trace execution flows across boundaries, and quickly assess the blast radius of changes (e.g., modifying a secret or Helm chart).
AI Coding Agent: Relies on high-fidelity, highly structured knowledge graphs and inlined dependencies to reliably answer questions about the codebase without getting lost in nested wrapper charts.

3. Requirements

Tier 1: Fix What's Broken (Explanation & Accuracy)

T1.1: Inline Sub-chart Dependencies: Wrapper charts must inline their sub-chart dependencies in the index to ensure dependency queries do not fail.
T1.2: Explanatory LLM Prose: Update the LLM enrichment prompts to explain why subsystems depend on each other and why certain structural anomalies exist (e.g., subsystems with zero functions).
T1.3: Architectural Anomaly Resolution: Documentation must explicitly address and explain architectural structural anomalies to improve the current 30% success rate on architectural "why" questions.

Tier 2: Fill Real Gaps (Coverage & Tracing)

T2.1: Terraform Extraction (extract-terraform.js): Implement robust Terraform entity extraction. Currently, only 1 module is detected out of 336 files in control-core.
T2.2: Auto-Detection of Entry Points: Implement flow tracing by automatically detecting entry points. Target: Helm Deployments with Services, main() in shell scripts, __main__ in Python, and CI pipelines.
T2.3: Change Impact Analysis Interface: Build a query interface leveraging existing knowledge graph edges to answer change impact questions (e.g., "If I modify vault-secret, which charts redeploy?").

4. Success Metrics

Agent Eval Score: Maintain > 90%.
Human Eval Score: Increase from 78.6% to > 90%.
Terraform Coverage: Increase from ~0% to > 80% of control-core entities extracted.
Flow Traces: Document at least 5 meaningful entry-to-exit execution paths.

5. Out of Scope

Support for new languages outside of the current stack (Python, Go, TypeScript, Shell, HCL/Terraform).
Interactive UI dashboards (focus remains on markdown generation and query interfaces).
Modifying the core Diataxis structural framework.

6. Dependencies and Risks

Risk (LLM Context Limits): Inlining sub-chart dependencies and expanding explanatory prose could bloat the context window for the evaluating LLM.
Dependency: The change impact query interface relies heavily on the accuracy of the existing graph edges; if current edges are noisy, the impact analysis will be flawed.
Dependency: Terraform extraction requires successfully parsing HCL, which may have complex module resolution behaviors compared to standard code tree-sitter extraction.

3.5 KiB Raw Blame History