Commit Graph

19 Commits

Author SHA1 Message Date
Jarvis Prime
b8403be96c feat: repo-agnostic refactor (BMad spec-test-build loop)
- NEW: repo-profiler.js — deterministic archetype detection (Infra, Frontend, Backend, etc.)
- NEW: extract-dynamic.js — generic extractor replacing hardcoded Foxtrot patterns
- NEW: eval-generator.js — dynamic ground-truth question generation from any repo graph
- NEW: specs/bmad-agnostic-refactor-spec.md — full BMad spec with acceptance criteria
- REFACTORED: prose.js — two-pass LLM synthesis with rich context (shared secrets, ports, service refs)
- REFACTORED: sysdoc.js — wired repo-profiler + extract-dynamic, --legacy escape hatch
- REFACTORED: wiggum-v2.sh — uses eval-generator before benchmarks
- FIXED: graph.js — _edgeSet rebuilt on loadSnapshot() (edge dedup was broken)
- FIXED: graph.js — recursive sortKeys() for deep equality in diffing
- FIXED: prose.js — robust JSON array extraction from LLM output
- FIXED: ratchet.js — syntax validation (node --check) before saving LLM mutations
- FIXED: extract-dynamic.js — centralized state services regex, added console.warn for silent failures
- TESTS: test-eval-generator, test-repo-profiler, test-synthesis-quality + mock fixtures

Eval: 81.5% on Foxtrot (fully repo-agnostic, no hardcoded reference pages)
BMad reviews: Architect B+, Dev Lead B-, TEA B-
2026-03-11 14:40:31 +00:00
Jarvis Prime
15fb1a753b Add deep extractors, reference pages, keyword index; eval 53.3%
- extract-deep.js: mines addon versions, TF configs, script params, helm values, state services
- generate-reference-pages.js: creates operations.md, configuration.md, network-architecture.md
- reference/index.md: keyword-rich topic-to-file routing table
- Enriched CIDR extractor with inline comment capture
- Eval progression: 28.7% -> 33.4% -> 46.7% -> 52.5% -> 53.3%
- NOT_FOUND: 25 -> 20 -> 16 -> 10 -> 11
- Top scores: config-region-code 95%, argo-gen-params 95%, multiple 100%s
- Remaining gap: agent planner (haiku) doesn't consistently follow index routing
2026-03-10 19:01:21 +00:00
Jarvis Prime
0265ec7a60 feat: confluence benchmark, pattern extractor, agent KB, UX spec
- extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions,
  CIDR allocations, naming conventions, sync waves, tech stack from code
- agent-kb.js: token-efficient JSON rendering of same doc tree
- eval-confluence-ref-questions.json: 32 reference-only benchmark questions
- wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%)
- docs/human-ux-spec.md: BMad UX designer spec for human doc structure
- Eval results: V2 at 28.7% vs confluence 77.8% baseline
- Hub/spoke ownership now correctly extracted (95% on that question)
- Naming conventions, regions, CIDRs surfaced in system-architecture.md
2026-03-10 14:20:35 +00:00
Jarvis Prime
049609a358 Phase 9d: Human eval score improvement\n\n- Human readability score increased from 63.9% to 78.6%\n- Structural table additions and quick lookup index resolved navigation bottlenecks\n- NOT_FOUND rate dropped from 17.9% to 3.6% 2026-03-10 00:46:37 +00:00
Jarvis Prime
ca11b4459a Agent eval hits 93.4% — target exceeded
- Fixed ground truth generator to merge Helm entities (matching sysdoc.js pipeline)
- Added Quick Lookup index with name-to-file mapping for agent navigation
- Enriched All Charts table with AppVersion, Dependencies, Values Keys columns
- Increased agent file read cap to 30K for full index coverage
- Tree depth 4 for chart file discovery

Score progression: 54.3% → 84.3% → 88.4% → 93.4%
NOT_FOUND: 41% → 0%
All categories above 75%, easy questions at 98.1%
2026-03-10 00:40:38 +00:00
Jarvis Prime
304f0a9e9f Phase 9c: Split eval into Agent (file-browsing) and Human (readability) tracks
Agent eval: 54.3% (22 questions, 40.9% NOT_FOUND)
Human eval: 63.9% (28 questions, 17.9% NOT_FOUND)

Key findings:
- Agent navigation is the bottleneck (2.09/5) — long path-based filenames hurt discoverability
- Human findability is decent (3.46/5) but dependency questions fail (0%) because chart docs for wrapper charts don't surface their sub-chart deps
- Both tracks show strong precision (4.4+/5) — very low hallucination
- Resources (91%) and interactions (95%) score great for humans
- Configuration and contracts are solid across both tracks
2026-03-09 23:55:54 +00:00
Jarvis Prime
0cc4abcb0f Phase 9b: structural documentation improvements\n\n- sysdoc.js: Added Summary Statistics, Top Charts, and K8s Resource Types to architecture doc\n- Addresses ratchet failures where system-wide rollups were missing from generated prose\n- Eval v2 shows minor improvement, though RAG context window still limits wide scatter-gather queries 2026-03-09 23:40:07 +00:00
Jarvis Prime
b99341e8bc Phase 9: Doc Evaluation Harness\n\n- eval-questions.js: Generates ground-truth questions from raw source data\n- eval.js: LLM-as-judge scoring harness (answers from docs, scores against truth)\n- Generated 33 questions covering config, dependencies, resources, and interactions\n- Baseline score: 66.7% (configuration 93%, dependencies 77%, structural 31%) 2026-03-09 22:32:41 +00:00
Jarvis Prime
d9fa087e22 Phase 6+7: LLM prose generation pass over Foxtrot docs\n\n- Ran Claude Haiku to generate prose for architecture, subsystems, flows, and 124 Helm contracts\n- Fixed describeContract prompt in prose.js to correctly identify and describe Helm contract types without hallucinating\n- 80 files generated with rich architectural summaries 2026-03-09 20:15:50 +00:00
Jarvis Prime
4f7c77b3b1 Phase 8b: Helm contract extraction + diagram support
- extractHelmContracts() in contracts.js: values, services, workloads, deps
- Merged Helm contracts into main pipeline (124 contracts on Foxtrot)
- diagrams.js: generateContractDiagram now handles Helm types
- Sanitized Mermaid class names for Helm contracts
- 1601-line contracts index with full classDiagram
2026-03-09 20:05:52 +00:00
Jarvis Prime
f49a6c2dd9 Phase 8: Helm chart extraction with Go template support
- extract-helm.js: strips Go templates, parses Chart.yaml/values.yaml/templates
- Extracts K8s resource kinds, cross-chart interactions, shared secrets, ports
- generateHelmDiagram() for Mermaid interaction graphs
- Integrated into sysdoc.js: Helm entities merge into main knowledge graph
- Dir-based filenames to handle duplicate chart names
- .gitignore for node_modules, snapshots, venv, wasm
- 76 charts, 1813 entities, 1769 relationships on Foxtrot
2026-03-09 20:03:04 +00:00
Jarvis Prime
d19cee36d7 Phase 6+7D: Sonnet prose generation integration 2026-03-09 18:44:19 +00:00
Jarvis Prime
1869fcb5b2 7B: Add parse error tracking (BMad review fix) 2026-03-09 18:35:13 +00:00
Jarvis Prime
ca02fe131b Phase 7F: Supergraph Multi-Repo Merge 2026-03-09 18:19:14 +00:00
Jarvis Prime
d9fd7e3284 Phase 7B, 7E, 7D: Contracts, Diagrams, Sysdoc 2026-03-09 14:42:15 +00:00
Jarvis Prime
4c212740a2 Phase 7A+7C: Subsystem aggregator + Flow tracer (post-review fixes) 2026-03-09 06:51:32 +00:00
Jarvis Prime
4221ab4d76 Phase 6: LLM doc generation + Phase 7 system-docs spec 2026-03-09 06:20:54 +00:00
Jarvis Prime
7d5b6cbc32 Add README with benchmarks and V1 vs V2 comparison 2026-03-09 05:30:21 +00:00
Jarvis Prime
efb12d003b Dev Intel Pipeline v2 — multi-language semantic graph extractor
Phase 1: extract.js — tree-sitter AST parser (TS/JS/Python/Go/Java/Bash) + config parsers (YAML/HCL)
Phase 2: graph.js — in-memory directed graph store with build/query/diff CLI
Phase 3: namespace.js — cross-repo namespace registry with 3-tier resolution
Phase 4: semantic-diff.js — categorized diffs with impact scoring (0-100)
Phase 5: pipeline.js — batch extraction, incremental diffing, benchmarking

Benchmark: 4,325 files, 21,646 nodes, 133,979 edges in 67s (15ms/file)
BMad SPA reviews: all phases GO
2026-03-09 05:29:29 +00:00