dev-intel-v2

max/dev-intel-v2

Fork 0

Commit Graph

Author	SHA1	Message	Date
Jarvis Prime	ca11b4459a	Agent eval hits 93.4% — target exceeded - Fixed ground truth generator to merge Helm entities (matching sysdoc.js pipeline) - Added Quick Lookup index with name-to-file mapping for agent navigation - Enriched All Charts table with AppVersion, Dependencies, Values Keys columns - Increased agent file read cap to 30K for full index coverage - Tree depth 4 for chart file discovery Score progression: 54.3% → 84.3% → 88.4% → 93.4% NOT_FOUND: 41% → 0% All categories above 75%, easy questions at 98.1%	2026-03-10 00:40:38 +00:00
Jarvis Prime	304f0a9e9f	Phase 9c: Split eval into Agent (file-browsing) and Human (readability) tracks Agent eval: 54.3% (22 questions, 40.9% NOT_FOUND) Human eval: 63.9% (28 questions, 17.9% NOT_FOUND) Key findings: - Agent navigation is the bottleneck (2.09/5) — long path-based filenames hurt discoverability - Human findability is decent (3.46/5) but dependency questions fail (0%) because chart docs for wrapper charts don't surface their sub-chart deps - Both tracks show strong precision (4.4+/5) — very low hallucination - Resources (91%) and interactions (95%) score great for humans - Configuration and contracts are solid across both tracks	2026-03-09 23:55:54 +00:00

Author

SHA1

Message

Date

Jarvis Prime

ca11b4459a

Agent eval hits 93.4% — target exceeded

- Fixed ground truth generator to merge Helm entities (matching sysdoc.js pipeline)
- Added Quick Lookup index with name-to-file mapping for agent navigation
- Enriched All Charts table with AppVersion, Dependencies, Values Keys columns
- Increased agent file read cap to 30K for full index coverage
- Tree depth 4 for chart file discovery

Score progression: 54.3% → 84.3% → 88.4% → 93.4%
NOT_FOUND: 41% → 0%
All categories above 75%, easy questions at 98.1%

2026-03-10 00:40:38 +00:00

Jarvis Prime

304f0a9e9f

Phase 9c: Split eval into Agent (file-browsing) and Human (readability) tracks

Agent eval: 54.3% (22 questions, 40.9% NOT_FOUND)
Human eval: 63.9% (28 questions, 17.9% NOT_FOUND)

Key findings:
- Agent navigation is the bottleneck (2.09/5) — long path-based filenames hurt discoverability
- Human findability is decent (3.46/5) but dependency questions fail (0%) because chart docs for wrapper charts don't surface their sub-chart deps
- Both tracks show strong precision (4.4+/5) — very low hallucination
- Resources (91%) and interactions (95%) score great for humans
- Configuration and contracts are solid across both tracks

2026-03-09 23:55:54 +00:00

2 Commits