Files
dev-intel-v2/wiggum-v2-ref.log
Jarvis Prime 0265ec7a60 feat: confluence benchmark, pattern extractor, agent KB, UX spec
- extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions,
  CIDR allocations, naming conventions, sync waves, tech stack from code
- agent-kb.js: token-efficient JSON rendering of same doc tree
- eval-confluence-ref-questions.json: 32 reference-only benchmark questions
- wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%)
- docs/human-ux-spec.md: BMad UX designer spec for human doc structure
- Eval results: V2 at 28.7% vs confluence 77.8% baseline
- Hub/spoke ownership now correctly extracted (95% on that question)
- Naming conventions, regions, CIDRs surfaced in system-architecture.md
2026-03-10 14:20:35 +00:00

88 lines
5.9 KiB
Plaintext

🔁 Ralph Wiggum Loop (V2) — max 3 iterations, target 77%
Benchmark: Confluence Gold Standard (/home/node/.openclaw/workspace/projects/dev-intel-v2/eval-confluence-ref-questions.json)
=== Iteration 1/3 ===
📝 Running V2 pipeline...
Generating prose for subsystem: app-tools...
Generating prose for subsystem: compute-common...
Generating prose for subsystem: compute-tools...
Generating prose for subsystem: control-core...
Generating prose for subsystem: ipam-core...
Generating prose for subsystem: ipam-tools...
Generating prose for subsystem: network-common...
Generating prose for subsystem: network-core...
Generating prose for subsystem: runtime...
Generating prose for subsystem: root...
Generating prose for 124 contracts...
Generated docs in ./foxtrot-docs
- 12 subsystems
- 124 contracts
- 0 flows
📊 Running agent file-browsing eval against Confluence questions...
Using model: claude-haiku-4.5
Agent Eval: 32 machine-audience questions
[1/32] arch-layered-order... 30% (A:2 C:2 P:1 N:1) files:5
[2/32] arch-hub-spoke-ownership... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[3/32] arch-aws-regions... 50% (A:2 C:5 P:1 N:2) files:5
[4/32] arch-gcp-shared-vpc-host... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[5/32] arch-cidr-employee-access... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[6/32] arch-production-cidr... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[7/32] dep-runtime-common-horizontal... 35% (A:2 C:1 P:2 N:2) files:5
[8/32] dep-vertical-layers... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[9/32] dep-create-account-repos... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[10/32] dep-create-cluster-repos... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[11/32] dep-compute-common-deps... 40% (A:2 C:1 P:3 N:2) files:5
[12/32] ops-argocd-deployment-flow... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[13/32] ops-ebf-release-pattern... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[14/32] ops-rollback-procedure... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[15/32] ops-branch-cluster-mapping... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[16/32] ops-jenkins-jobs... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[17/32] ops-create-cluster-timeout... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[18/32] config-cloud-resource-naming... 40% (A:2 C:2 P:2 N:2) files:5
[19/32] config-region-code-algorithm... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[20/32] config-app-config-merge-order... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[21/32] config-account-creation-product-id... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[22/32] config-ipam-rds-backup... 25% (A:0 C:0 P:5 N:0) files:4 [NOT_FOUND]
[23/32] config-dev-artifact-naming... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[24/32] services-tech-stack-orchestration... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[25/32] services-state-management... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[26/32] services-eks-addon-versions... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[27/32] services-aws-nat-egress-model... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[28/32] services-ipam-netbox-role... 85% (A:5 C:5 P:3 N:4) files:5
[29/32] contracts-argo-gen-params-required... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[30/32] contracts-azure-xrd-naming... 25% (A:0 C:0 P:5 N:0) files:5 [NOT_FOUND]
[31/32] contracts-helm-chart-required-values... 20% (A:1 C:1 P:1 N:1) files:5
[32/32] contracts-sync-wave-ordering... 15% (A:1 C:0 P:1 N:1) files:5
════════════════════════════════════════════════════════════
AGENT EVAL REPORT
════════════════════════════════════════════════════════════
Overall Score: 28.6%
Accuracy: 0.53/5 Completeness: 0.53/5 Precision: 4.19/5 Navigation: 0.47/5
Not Found: 24/32 (75.0%)
By Category:
architecture: 30.0% (6 questions)
dependencies: 30.0% (5 questions)
operations: 25.0% (6 questions)
configuration: 27.5% (6 questions)
services: 37.0% (5 questions)
contracts: 21.3% (4 questions)
By Difficulty:
easy: 30.5% (10 questions)
medium: 29.1% (17 questions)
hard: 23.0% (5 questions)
Weakest:
[contracts-sync-wave-ordering] 15% — What are the ArgoCD sync wave values and what resource types are deplo... (read: reference/helm/charts/app-common-charts-argocd-apps.md, reference/helm/index.md, reference/subsystems/app-common.md, diagrams/helm-interactions.mmd, reference/system-architecture.md)
[contracts-helm-chart-required-values] 20% — What are the five required values that all app Helm charts must define... (read: reference/helm/index.md, reference/subsystems/app-common.md, reference/contracts/index.md, reference/system-architecture.md, diagrams/app-common-contracts.mmd)
[arch-hub-spoke-ownership] 25% — Which ArgoCD instance owns the account, network, and compute layers, a... (read: reference/system-architecture.md, reference/subsystems/root.md, reference/helm/index.md, reference/subsystems/control-core.md, reference/contracts/index.md)
[arch-gcp-shared-vpc-host] 25% — What is the default GCP host project used for Shared VPC in network-co... (read: reference/subsystems/network-common.md, reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md, reference/system-architecture.md, reference/helm/index.md, diagrams/network-common-contracts.mmd)
[arch-cidr-employee-access] 25% — What is the CIDR range for the employee access (bastions) segment on A... (read: reference/system-architecture.md, reference/subsystems/network-core.md, reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md, reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md, reference/subsystems/network-common.md)
Full report: /home/node/.openclaw/workspace/projects/dev-intel-v2/eval-wiggum-v2-iter-1.json
🏁 Iteration 1 Score: 29% (Target: 77%)
❌ Below threshold. To iterate, we need a diagnosis and code fix step here.