Commit Graph

2 Commits

Author SHA1 Message Date
Jarvis Prime
ca11b4459a Agent eval hits 93.4% — target exceeded
- Fixed ground truth generator to merge Helm entities (matching sysdoc.js pipeline)
- Added Quick Lookup index with name-to-file mapping for agent navigation
- Enriched All Charts table with AppVersion, Dependencies, Values Keys columns
- Increased agent file read cap to 30K for full index coverage
- Tree depth 4 for chart file discovery

Score progression: 54.3% → 84.3% → 88.4% → 93.4%
NOT_FOUND: 41% → 0%
All categories above 75%, easy questions at 98.1%
2026-03-10 00:40:38 +00:00
Jarvis Prime
304f0a9e9f Phase 9c: Split eval into Agent (file-browsing) and Human (readability) tracks
Agent eval: 54.3% (22 questions, 40.9% NOT_FOUND)
Human eval: 63.9% (28 questions, 17.9% NOT_FOUND)

Key findings:
- Agent navigation is the bottleneck (2.09/5) — long path-based filenames hurt discoverability
- Human findability is decent (3.46/5) but dependency questions fail (0%) because chart docs for wrapper charts don't surface their sub-chart deps
- Both tracks show strong precision (4.4+/5) — very low hallucination
- Resources (91%) and interactions (95%) score great for humans
- Configuration and contracts are solid across both tracks
2026-03-09 23:55:54 +00:00