{
  "evalType": "agent",
  "timestamp": "2026-03-10T15:56:10.050Z",
  "docsDir": "./foxtrot-docs",
  "totalQuestions": 32,
  "overallScore": "33.4",
  "avgAccuracy": "0.91",
  "avgCompleteness": "1.03",
  "avgPrecision": "3.75",
  "avgNavigation": "1.00",
  "notFoundCount": 20,
  "notFoundRate": "62.5%",
  "byCategory": {
    "architecture": {
      "avg": "35.0",
      "count": 6
    },
    "dependencies": {
      "avg": "30.0",
      "count": 5
    },
    "operations": {
      "avg": "25.0",
      "count": 6
    },
    "configuration": {
      "avg": "38.3",
      "count": 6
    },
    "services": {
      "avg": "49.0",
      "count": 5
    },
    "contracts": {
      "avg": "21.3",
      "count": 4
    }
  },
  "byDifficulty": {
    "easy": {
      "avg": "38.0",
      "count": 10
    },
    "medium": {
      "avg": "25.3",
      "count": 17
    },
    "hard": {
      "avg": "52.0",
      "count": 5
    }
  },
  "results": [
    {
      "id": "arch-layered-order",
      "category": "architecture",
      "difficulty": "easy",
      "question": "What are the five layers in Foxtrot's layered architecture, listed from top to bottom?",
      "groundTruth": "Application Layer (app-*), Compute Layer (compute-*), Network Layer (network-*), Account Layer (account-*), Control Plane (control-core).",
      "llmAnswer": "Based on the system-architecture.md file, the five layers in Foxtrot's layered architecture from top to bottom are:\n\n1. Runtime (shared)\n2. Application\n3. Compute\n4. Network\n5. IPAM",
      "filesRead": [
        "reference/system-architecture.md",
        "diagrams/system-deps.mmd",
        "reference/subsystems/root.md",
        "reference/helm/index.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "diagrams/system-deps.mmd",
        "reference/subsystems/root.md",
        "reference/helm/index.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 1,
        "completeness": 2,
        "precision": 1,
        "navigation": 2,
        "notes": "AI correctly identified Application, Compute, and Network layers but missed Account Layer and Control Plane entirely. Introduced hallucinated layers (Runtime, IPAM) not in ground truth. Only 3 of 5 correct items present, with 2 false positives. Agent cited a source file but retrieved incorrect information, suggesting either wrong file access or misinterpretation of content."
      },
      "composite": 30,
      "notFound": false
    },
    {
      "id": "arch-hub-spoke-ownership",
      "category": "architecture",
      "difficulty": "easy",
      "question": "Which ArgoCD instance owns the account, network, and compute layers, and which owns the application layer?",
      "groundTruth": "Hub ArgoCD owns account, network, and compute layers (infrastructure and control-plane appsets). Spoke ArgoCD owns the application layer (app-runtime appsets only).",
      "llmAnswer": "Based on the Deployment Topology section in the system-architecture.md file:\n\n**Hub (Infrastructure/Control Plane) ArgoCD manages:**\n- Account layer: `account-creation` manages `account-runtime`\n- Network layer: `aws-vpcs`, `gcp-vpcs`, `azure-vpcs` manage `network-runtime`\n- Compute layer: `aws-argocd-feature-clusters`, `gcp-argocd-feature-clusters`, `azure-argocd-feature-clusters` manage `compute-runtime`\n\n**Application layer ownership:**\nThe documentation shows a \"Spoke (Applications/Runtime)\" section in the deployment topology, but it is incomplete/truncated in the provided files (showing only `{{` manages ``), so the specific ArgoCD instance managing the application layer cannot be determined from the available documentation.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/control-core.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/control-core.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd"
      ],
      "score": {
        "accuracy": 3,
        "completeness": 2,
        "precision": 4,
        "navigation": 3,
        "notes": "AI correctly identified Hub ArgoCD ownership of account, network, and compute layers with specific appset details. However, it failed to definitively answer the second part of the question—that Spoke ArgoCD owns the application layer. Instead, it claimed the documentation was incomplete/truncated, which contradicts the ground truth. The agent found the right file and section but didn't extract or confirm the complete answer."
      },
      "composite": 60,
      "notFound": false
    },
    {
      "id": "arch-aws-regions",
      "category": "architecture",
      "difficulty": "easy",
      "question": "What AWS regions does Foxtrot support?",
      "groundTruth": "us-east-1, us-west-2, eu-west-1, eu-central-1.",
      "llmAnswer": "Based on the system architecture documentation, Foxtrot supports the following AWS regions:\n\n- ap-southeast-1\n- eu-central-1\n- eu-west-1\n- eu-west-2\n- eu-west-3\n- us-east-1\n- us-east-2\n- us-west-1\n- us-west-2",
      "filesRead": [
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 5,
        "precision": 1,
        "navigation": 2,
        "notes": "AI answer includes all 4 ground truth regions but adds 5 unsupported regions (ap-southeast-1, eu-west-2, eu-west-3, us-east-2, us-west-1), indicating hallucination or unreliable source verification. While completeness is achieved, the answer contains significant false positives that undermine accuracy and precision."
      },
      "composite": 50,
      "notFound": false
    },
    {
      "id": "arch-gcp-shared-vpc-host",
      "category": "architecture",
      "difficulty": "medium",
      "question": "What is the default GCP host project used for Shared VPC in network-common, and what is the host network path it attaches to?",
      "groundTruth": "The default host project is `network-services-436015`. When `sharedVpc.enabled: true`, service project resources attach to the host network path `projects/HOST_PROJECT/global/networks/gcp-core-network`.",
      "llmAnswer": "Based on the provided documentation:\n\n**Default GCP host project:** `network-services-436015`\n\n**Host network path:** NOT_FOUND\n\nThe system-architecture.md file confirms the host_project value and that shared_vpc is enabled for the foxtrot-gcp-vpc chart, but the specific host network path that the service project attaches to is not documented in the provided files.",
      "filesRead": [
        "reference/subsystems/network-common.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "diagrams/network-common-contracts.mmd"
      ],
      "plannedFiles": [
        "reference/subsystems/network-common.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "diagrams/network-common-contracts.mmd"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 1,
        "precision": 4,
        "navigation": 1,
        "notes": "AI correctly identified the host project (network-services-436015) but failed to locate the host network path (projects/HOST_PROJECT/global/networks/gcp-core-network) that exists in ground truth. The agent demonstrated partial navigation by finding one of two required pieces of information, then incorrectly concluded the second piece was undocumented rather than continuing to search. Precision remains reasonable since no false information was asserted, only an incomplete search result."
      },
      "composite": 40,
      "notFound": true
    },
    {
      "id": "arch-cidr-employee-access",
      "category": "architecture",
      "difficulty": "medium",
      "question": "What is the CIDR range for the employee access (bastions) segment on AWS and on GCP?",
      "groundTruth": "AWS: 10.208.128.0/24. GCP: 10.128.128.0/24.",
      "llmAnswer": "NOT_FOUND\n\nThe documentation references employee-access VPC configurations in `network-core/network/aws/vpc/us-east-1/employee-access-vpc1/` and `network-core/network/gcp/subnet/europe-west1/employee-access-subnet1/`, but the actual CIDR ranges for these segments are not specified in the provided files.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-core.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-core.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 1,
        "notes": "Agent identified relevant documentation paths but failed to extract actual CIDR ranges. Answered 'NOT_FOUND' rather than hallucinating values, demonstrating precision. However, navigation was minimal—found directory references but not the target data within those files. Required both AWS (10.208.128.0/24) and GCP (10.128.128.0/24) CIDR ranges; provided neither."
      },
      "composite": 30,
      "notFound": true
    },
    {
      "id": "arch-production-cidr",
      "category": "architecture",
      "difficulty": "medium",
      "question": "What is the CIDR range for production workloads on AWS and on GCP?",
      "groundTruth": "AWS: 10.192.0.0/10. GCP: 10.128.0.0/10.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not explicitly specify CIDR ranges designated for production workloads on AWS or GCP. While the system-architecture.md includes a \"Network CIDR Allocations\" table with various CIDR blocks, these entries do not indicate whether they are for production environments or specify their intended use beyond the CIDR notation itself.",
      "filesRead": [
        "reference/subsystems/network-core.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-common.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/subsystems/network-core.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-common.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 0,
        "navigation": 0,
        "notes": "Parse error: I can't engage with that request. I'm Kiro, an AI assistant built to help developers with their code"
      },
      "composite": 0,
      "notFound": true
    },
    {
      "id": "dep-runtime-common-horizontal",
      "category": "dependencies",
      "difficulty": "easy",
      "question": "Which runtime repositories consume charts from which common repositories?",
      "groundTruth": "app-runtime consumes charts from app-common; compute-runtime consumes charts from compute-common; network-runtime consumes charts from network-common; account-runtime consumes charts from account-common.",
      "llmAnswer": "\n\nNOT_FOUND\n\nThe provided documentation does not contain explicit information mapping which runtime repositories consume charts from which common repositories. While the system architecture describes a hub-and-spoke deployment topology where various ApplicationSets manage runtime repositories, and the Helm interactions diagram shows one example (`prod-gke-us-west3-compute1` depends on `compute-common/templates/foxtrot-gke`), a comprehensive mapping of runtime-to-common chart consumption relationships is not provided in the reference materials.",
      "filesRead": [
        "reference/subsystems/runtime.md",
        "reference/helm/index.md",
        "reference/system-architecture.md",
        "reference/contracts/index.md",
        "diagrams/helm-interactions.mmd"
      ],
      "plannedFiles": [
        "reference/subsystems/runtime.md",
        "reference/helm/index.md",
        "reference/system-architecture.md",
        "reference/contracts/index.md",
        "diagrams/helm-interactions.mmd"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI failed to locate the runtime-to-common chart consumption mappings that exist in the documentation. The answer correctly avoided hallucinating false relationships but provided no useful information. The agent did not navigate to the relevant source material containing the explicit mappings."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "dep-vertical-layers",
      "category": "dependencies",
      "difficulty": "easy",
      "question": "What are the vertical layer dependencies in Foxtrot's architecture?",
      "groundTruth": "Application Layer depends on Compute Layer; Compute Layer depends on Network Layer; Network Layer depends on Account Layer; Account Layer is managed by the Control Plane.",
      "llmAnswer": "Based on the system architecture documentation, Foxtrot's vertical layer dependencies are organized as follows (top to bottom):\n\n**Layered Architecture:**\n- **Runtime (shared)** → depends on root\n- **Application** (app-common, app-tools) → app-common depends on root and app-tools\n- **Compute** (compute-common, compute-tools) → compute-common depends on root; compute-tools depends on root and app-tools\n- **Network** (network-common, network-core) → no explicit function-level dependencies documented\n- **IPAM** (ipam-core, ipam-tools) → ipam-core depends on root; ipam-tools depends on root, app-tools, and compute-tools\n- **Account** (account-common) → no explicit function-level dependencies documented\n- **Control Plane** (control-core) → no explicit function-level dependencies documented\n- **Skills/Tooling** (skills)\n- **Documentation** (docs)\n\n**Key Dependency Pattern:**\nAll domain subsystems converge on `root` as a cross-cutting foundation. The `*-tools` subsystems (app-tools, compute-tools, ipam-tools) form a shared operational tier above root but below domain consumers. The architecture explicitly separates declarative infrastructure definition (control-core, network-core, network-common, account-common) from imperative operational logic (*-tools subsystems), creating a flat dependency graph rather than hierarchical encapsulation.",
      "filesRead": [
        "reference/system-architecture.md",
        "diagrams/system-deps.mmd",
        "reference/subsystems/root.md",
        "reference/subsystems/control-core.md",
        "reference/subsystems/runtime.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "diagrams/system-deps.mmd",
        "reference/subsystems/root.md",
        "reference/subsystems/control-core.md",
        "reference/subsystems/runtime.md"
      ],
      "score": {
        "accuracy": 1,
        "completeness": 2,
        "precision": 1,
        "navigation": 3,
        "notes": "The AI fundamentally mischaracterizes Foxtrot's architecture as a 'flat dependency graph' when the ground truth clearly establishes a hierarchical vertical chain: Application → Compute → Network → Account → Control Plane. While the AI identifies all the layers and provides detailed module-level dependencies, it inverts the core architectural pattern by claiming convergence on 'root' rather than describing the explicit vertical dependencies. The claim about 'flat dependency graph rather than hierarchical encapsulation' directly contradicts the ground truth's hierarchical structure."
      },
      "composite": 35,
      "notFound": false
    },
    {
      "id": "dep-create-account-repos",
      "category": "dependencies",
      "difficulty": "medium",
      "question": "How many PRs does `create-account` produce and which repositories does it touch?",
      "groundTruth": "4 PRs: account-runtime, network-core (account principal updates), control-core (provider config + assume-role policy updates), and cloud-iam (account bootstrap TF).",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation files do not contain information about `create-account` producing PRs or which repositories it touches.",
      "filesRead": [
        "reference/contracts/index.md",
        "agent-kb.json",
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/subsystems/account-common.md",
        "explanation/change-impact.md"
      ],
      "plannedFiles": [
        "reference/contracts/index.md",
        "agent-kb.json",
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/subsystems/account-common.md",
        "explanation/change-impact.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "Agent correctly avoided hallucination by reporting NOT_FOUND rather than inventing details. However, it failed to locate documentation describing create-account's PR generation across the four target repositories (account-runtime, network-core, control-core, cloud-iam). The agent either didn't search the right files or the documentation wasn't accessible in the provided context."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "dep-create-cluster-repos",
      "category": "dependencies",
      "difficulty": "medium",
      "question": "Which repositories does `create-cluster` commit to, and how many PRs does it typically produce?",
      "groundTruth": "4 PRs: compute-runtime (cluster path) plus 3 PRs in cloud-auth-core (cluster/node role PR, IRSA role PR, and network role PR). It also calls cluster-netbox-allocate.py to reserve/allocate cluster subnet CIDRs.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about which repositories `create-cluster` commits to or how many PRs it typically produces. While `create-cluster.sh` is referenced as a script in `compute-tools/bash-spoke-setup/scripts/`, there is no documentation describing its commit behavior or PR generation patterns.",
      "filesRead": [
        "reference/system-architecture.md",
        "agent-kb.json",
        "explanation/change-impact.md",
        "reference/contracts/index.md",
        "reference/subsystems/control-core.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "agent-kb.json",
        "explanation/change-impact.md",
        "reference/contracts/index.md",
        "reference/subsystems/control-core.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI correctly identified information was not available in provided documentation but failed to locate the actual source material containing create-cluster behavior. Answer demonstrates honest limitation reporting (high precision) but zero information retrieval success."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "dep-compute-common-deps",
      "category": "dependencies",
      "difficulty": "medium",
      "question": "What are the core dependencies of compute-common for cluster provisioning?",
      "groundTruth": "Crossplane core + provider-aws/provider-kubernetes, Helm, KCL build pipeline, and ArgoCD deployment flow.",
      "llmAnswer": "Based on the documentation provided, the core dependencies of compute-common for cluster provisioning are:\n\n**Primary Dependency:**\n- **root** (29 imports) — foundational configuration and environment resolution primitives that are essential for cluster lifecycle orchestration and environment-aware test parameterization\n\n**Operational Helpers:**\n- `helm_helper` — Helm chart deployment and management\n- `kubectl_helper` — Kubernetes API interactions\n- `cleanup_tracker` — resource lifecycle management\n\n**Configuration Contracts (HelmValues):**\n- `foxtrot_aks_values` — AKS cluster configuration (accountID, composition, cluster)\n- `foxtrot_eks_values` — EKS cluster configuration (accountID, composition, efs, cluster)\n- `foxtrot_gke_values` — GKE cluster configuration (cluster)\n\n**Infrastructure Artifacts:**\n- 64 files comprising Helm chart templates and Kubernetes manifests for multi-cloud provisioning (AKS, EKS, GKE)\n- Crossplane compositions and CRDs for infrastructure-as-code\n- Test fixtures and values files for cluster validation\n\nThe module is fundamentally configuration-heavy, with state externalized into cluster and Helm configuration rather than held in-process, which is the correct model for infrastructure orchestration where authoritative state lives in the control plane.",
      "filesRead": [
        "reference/subsystems/compute-common.md",
        "diagrams/system-deps.mmd",
        "diagrams/compute-common-contracts.mmd",
        "reference/system-architecture.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/subsystems/compute-common.md",
        "diagrams/system-deps.mmd",
        "diagrams/compute-common-contracts.mmd",
        "reference/system-architecture.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 1,
        "precision": 3,
        "navigation": 2,
        "notes": "AI identified Helm and Crossplane elements but missed explicit mention of core providers (provider-aws/provider-kubernetes), KCL build pipeline, and ArgoCD deployment flow—three of five ground truth components. The answer provides implementation details (helpers, configuration contracts) that may be accurate but don't address the core dependency question. Navigation suggests the agent explored the module but didn't locate or recognize the fundamental infrastructure dependencies."
      },
      "composite": 40,
      "notFound": false
    },
    {
      "id": "ops-argocd-deployment-flow",
      "category": "operations",
      "difficulty": "medium",
      "question": "What are the steps in the ArgoCD deployment flow for an application, from ApplicationSet watching app-runtime to pods starting?",
      "groundTruth": "1. ApplicationSet watches app-runtime. 2. Reads argo-gen-params.yaml per environment. 3. Generates Application. 4. Pulls chart from OCI registry. 5. Merges values and overrides. 6. Renders manifests. 7. Applies to cluster. 8. ESO fetches secrets. 9. Pods start with injected configs.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation describes the ArgoCD ApplicationSets generated by the argocd-apps chart and the overall system architecture, but does not contain detailed information about the specific steps in the deployment flow from ApplicationSet watching app-runtime to pods starting. The documentation shows what resources are created (ApplicationSets, ExternalSecrets, Jobs, etc.) but not the operational sequence of how ArgoCD processes these resources to deploy applications and start pods.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "diagrams/helm-interactions.mmd",
        "reference/subsystems/runtime.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "diagrams/helm-interactions.mmd",
        "reference/subsystems/runtime.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "Agent returned NOT_FOUND, indicating inability to locate the specific ArgoCD deployment flow steps in available documentation. No facts were presented to evaluate for accuracy or completeness. The response demonstrates honesty about information gaps rather than hallucination, but fails to retrieve the required ground truth content."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-ebf-release-pattern",
      "category": "operations",
      "difficulty": "medium",
      "question": "What are the steps in the EBF/weekly release pattern?",
      "groundTruth": "1. Create release branch from develop. 2. Promote versions for release/hotfix environments. 3. Run stage pipelines (preview, preprod, production). 4. Merge promoted changes to main. 5. Reconcile hotfix and develop branches.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation files do not contain information about the steps in the EBF/weekly release pattern.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/root.md",
        "reference/subsystems/control-core.md",
        "agent-kb.json",
        "explanation/change-impact.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/root.md",
        "reference/subsystems/control-core.md",
        "agent-kb.json",
        "explanation/change-impact.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent failed to locate documentation containing the EBF/weekly release pattern steps. While the NOT_FOUND response avoids hallucination, it indicates the agent did not successfully navigate to or identify the relevant source material containing the 5-step release process (branch creation, version promotion, pipeline execution, merge to main, and branch reconciliation)."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-rollback-procedure",
      "category": "operations",
      "difficulty": "medium",
      "question": "What is the preferred rollback procedure in Foxtrot, and when are script-level deletes used instead?",
      "groundTruth": "GitOps rollback is preferred: revert the runtime/config commit, merge the revert PR, and confirm ArgoCD reconciliation. Script-level deletes (delete-vpc.sh, delete-cluster.sh, delete-app.sh) are used only with explicit approval and change control.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation files do not contain information about Foxtrot's preferred rollback procedures or when script-level deletes are used. The documentation covers system architecture, deployment topology, Helm charts, and subsystem organization, but does not address rollback strategies or deletion procedures.",
      "filesRead": [
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "reference/helm/index.md",
        "agent-kb.json",
        "reference/subsystems/runtime.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "reference/helm/index.md",
        "agent-kb.json",
        "reference/subsystems/runtime.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent failed to locate documentation containing Foxtrot's rollback procedures and script-level delete policies. The answer 'NOT_FOUND' is incorrect—ground truth specifies GitOps rollback as preferred (revert commit → merge PR → confirm ArgoCD reconciliation) and script-level deletes only with explicit approval. Agent provided no hallucinations but demonstrated zero ability to navigate to relevant documentation."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-branch-cluster-mapping",
      "category": "operations",
      "difficulty": "easy",
      "question": "What is the branch-to-cluster mapping in app-runtime? Which branch targets which environments?",
      "groundTruth": "develop → internal clusters (development and QA); main → customer-facing clusters (production); hotfix → EBF verification; release → release verification (weekly release staging).",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain explicit branch-to-cluster mapping information for app-runtime. While the runtime subsystem documentation lists numerous environment paths (prod, test, dev clusters across regions), it does not specify which git branches target which environments or clusters.",
      "filesRead": [
        "reference/subsystems/runtime.md",
        "reference/system-architecture.md",
        "agent-kb.json",
        "reference/helm/index.md",
        "diagrams/helm-interactions.mmd"
      ],
      "plannedFiles": [
        "reference/subsystems/runtime.md",
        "reference/system-architecture.md",
        "agent-kb.json",
        "reference/helm/index.md",
        "diagrams/helm-interactions.mmd"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI answered NOT_FOUND when ground truth exists. Failed to locate branch-to-cluster mapping (develop→internal, main→production, hotfix→EBF, release→staging). No hallucinations but complete failure to retrieve required information."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-jenkins-jobs",
      "category": "operations",
      "difficulty": "hard",
      "question": "What are the six Jenkins jobs in the Foxtrot pipeline and what does each do?",
      "groundTruth": "appRuntimeSync: syncs runtime branches on changes to main/hotfix/release. foxtrotSync: updates imagetags-override.yaml with new image tags. foxtrotPromote: promotes and merges release outputs. chartSync: updates argo-gen-params.yaml with chart versions. chartCheckInAppCommon: detects chart version mismatches and triggers chartSync. chartPromoteVersion: promotes chart versions per release stage.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about six Jenkins jobs in the Foxtrot pipeline or their descriptions. While the documentation includes a Jenkins Helm chart reference and mentions various Foxtrot-related infrastructure templates (foxtrot-eks, foxtrot-aks, foxtrot-gke, etc.), it does not describe specific Jenkins pipeline jobs.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-jenkins.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/control-core.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-jenkins.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/control-core.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI correctly reported information unavailable rather than hallucinating, but failed to locate the ground truth documentation containing the six Foxtrot pipeline jobs and their descriptions. The answer demonstrates honest limitations but zero information retrieval success."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-create-cluster-timeout",
      "category": "operations",
      "difficulty": "medium",
      "question": "What is the timeout for waiting for a cluster to reach ready condition in the create-cluster workflow?",
      "groundTruth": "3600 seconds (1 hour), with a poll interval of 30 seconds.",
      "llmAnswer": "NOT_FOUND\n\nThe documentation provided lists MAX_WAIT parameters for the create-cluster.sh script (1800 and 3600 seconds), but does not explicitly specify which timeout applies to waiting for a cluster to reach ready condition. The actual script implementation details needed to answer this question are not included in the reference files provided.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-cluster.md",
        "reference/subsystems/compute-common.md",
        "reference/subsystems/compute-tools.md",
        "reference/system-architecture.md",
        "reference/contracts/index.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-cluster.md",
        "reference/subsystems/compute-common.md",
        "reference/subsystems/compute-tools.md",
        "reference/system-architecture.md",
        "reference/contracts/index.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent returned NOT_FOUND rather than providing the ground truth answer of 3600 seconds (1 hour) with 30-second poll interval. While the response contains no hallucinated information (hence full precision), it failed to locate or retrieve the correct timeout values from available documentation."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "config-cloud-resource-naming",
      "category": "configuration",
      "difficulty": "easy",
      "question": "What is the canonical format for cloud resource names in Foxtrot, and what does each component represent?",
      "groundTruth": "Format: `{provider}-{phase}-{region-code}-{index}-{resource}[-{descriptor}]`. provider: aws/gcp/azr; phase: prd/iqa/dev; region-code: two-letter derived code; index: incremental integer; resource: vpc/eks/gke/aks/lb/rds. Example: `gcp-prd-us-1-vpc-blue`.",
      "llmAnswer": "\n\nBased on the system-architecture.md documentation, the canonical naming conventions for Foxtrot cloud resources are:\n\n**AWS Resources:**\n- EKS Clusters: `aws-{phase}-{region-code}-{index}-eks`\n- VPCs: `aws-{phase}-{region-code}-{index}-vpc`\n\n**Azure Resources:**\n- AKS Clusters: `azr-{phase}-{region-code}-{index}-aks`\n- VNets: `azr-{phase}-{region-code}-{index}-vnet`\n\n**Component Breakdown:**\n- **Cloud provider prefix**: `aws` or `azr` (identifies the cloud platform)\n- **{phase}**: Deployment phase identifier (e.g., dev, prod)\n- **{region-code}**: Cloud region code (e.g., us-east-1, eu-west-1)\n- **{index}**: Numeric identifier for resource ordering/uniqueness\n- **Resource type suffix**: `eks`, `aks`, `vpc`, or `vnet` (identifies the resource type)",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/contracts/index.md",
        "reference/helm/index.md",
        "reference/subsystems/root.md",
        "reference/subsystems/compute-common.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/contracts/index.md",
        "reference/helm/index.md",
        "reference/subsystems/root.md",
        "reference/subsystems/compute-common.md"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 2,
        "precision": 2,
        "navigation": 1,
        "notes": "AI captures basic format structure but contains significant gaps and inaccuracies. Missing: GCP provider entirely, optional descriptor component, full resource type list (gke, lb, rds), and phase value examples. Critical error: region-code examples (us-east-1, eu-west-1) contradict ground truth specification of 'two-letter derived code'. Uses non-standard resource names (vnet instead of generic format). Claims documentation reference but clearly incomplete retrieval of specification."
      },
      "composite": 35,
      "notFound": false
    },
    {
      "id": "config-region-code-algorithm",
      "category": "configuration",
      "difficulty": "medium",
      "question": "What is the algorithm for deriving a two-letter region code from a cloud region name, and what is the region code for `westeurope`?",
      "groundTruth": "Algorithm: 1. Remove directional words (east, west, north, south). 2. Remove separators. 3. Take first two characters. For `westeurope`: remove 'west' → 'europe' → 'eu'.",
      "llmAnswer": "\n\nNOT_FOUND\n\nThe provided documentation does not contain an algorithm for deriving two-letter region codes from cloud region names. While the naming conventions section references region codes in patterns like `aws-{phase}-{region-code}-{index}-eks`, the actual derivation algorithm is not documented. Additionally, `westeurope` is not listed among the supported cloud regions (Azure only supports `centralus`).",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/helm/index.md",
        "agent-kb.json",
        "reference/subsystems/network-common.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/helm/index.md",
        "agent-kb.json",
        "reference/subsystems/network-common.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI returned NOT_FOUND despite ground truth containing both the algorithm (remove directional words, remove separators, take first two characters) and the specific answer for westeurope (eu). Per scoring rules for NOT_FOUND responses, precision scores 5 as no false information was hallucinated, but accuracy and completeness are 0 due to missing required information."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "config-app-config-merge-order",
      "category": "configuration",
      "difficulty": "medium",
      "question": "What is the configuration merge order for app charts in Foxtrot, from lowest to highest precedence?",
      "groundTruth": "1. values.yaml, 2. default-properties.yaml, 3. default-values.yaml / k8s-values.yaml, 4. common-values.yaml, 5. properties-override.yaml, 6. k8s-override.yaml, 7. imagetags-override.yaml.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about the configuration merge order for app charts in Foxtrot or the precedence hierarchy for configuration merging.",
      "filesRead": [
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "explanation/change-impact.md",
        "diagrams/app-common-contracts.mmd"
      ],
      "plannedFiles": [
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "explanation/change-impact.md",
        "diagrams/app-common-contracts.mmd"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent failed to locate the configuration merge order documentation. While the response avoids hallucination by explicitly stating information was not found, it provides zero correct facts and demonstrates inability to navigate to the relevant source material containing the 7-level precedence hierarchy."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "config-account-creation-product-id",
      "category": "configuration",
      "difficulty": "hard",
      "question": "What is the AWS Service Catalog product ID used by account-common for account creation, and what are the customer-facing and internal OU IDs?",
      "groundTruth": "Service Catalog product ID: `prod-mts6togilnnuk`. Customer-facing OU: `ou-8vpz-mogk4qua`. Internal OU: `ou-8vpz-rw7etshv`. Provider config reference: `aws-709659200419`.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation files do not contain information about AWS Service Catalog product IDs used by account-common for account creation, nor do they include customer-facing or internal OU IDs. The account-common documentation focuses on Helm chart structure and configuration surface, but does not specify these AWS organizational identifiers.",
      "filesRead": [
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/subsystems/account-common.md",
        "reference/contracts/index.md",
        "reference/helm/index.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/subsystems/account-common.md",
        "reference/contracts/index.md",
        "reference/helm/index.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 4,
        "navigation": 0,
        "notes": "AI agent reported NOT_FOUND for all requested information (Service Catalog product ID, customer-facing OU, internal OU, provider config). Ground truth confirms these values exist. Agent failed to locate the source documentation containing these AWS organizational identifiers and account creation configuration details. Precision score reflects clear statement of conclusion, but conclusion is factually incorrect."
      },
      "composite": 20,
      "notFound": true
    },
    {
      "id": "config-ipam-rds-backup",
      "category": "configuration",
      "difficulty": "hard",
      "question": "What are the RDS automated backup retention period and backup window configured in ipam-core?",
      "groundTruth": "RDS automated backup retention: 7 days. RDS backup window: 03:00-06:00 UTC.",
      "llmAnswer": "\n\nBased on the system architecture documentation, the RDS automated backup configuration in ipam-core is:\n\n- **backup_retention_period:** 7 days\n- **backup_window:** 03:00-06:00 (UTC)\n\nThese values are configured in `ipam-core/netbox/rds.tf`.",
      "filesRead": [
        "reference/subsystems/ipam-core.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/subsystems/ipam-core.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer matches ground truth exactly on both metrics (7 days retention, 03:00-06:00 UTC window). Correctly identified the configuration file location. No hallucinations or extraneous information present."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "config-dev-artifact-naming",
      "category": "configuration",
      "difficulty": "medium",
      "question": "What is the artifact naming convention for development chart versions vs release chart versions in the OCI registry?",
      "groundTruth": "Development artifacts: `{chart-name}:0.0.0-{commit-sha}`. Release artifacts: `{chart-name}:{chart-version}` (semantic version from Chart.yaml).",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about artifact naming conventions for development chart versions vs release chart versions in the OCI registry.",
      "filesRead": [
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent returned NOT_FOUND, indicating it could not locate the artifact naming convention information in the provided documentation. While this demonstrates precision (no hallucination), it fails on accuracy, completeness, and navigation dimensions. The agent did not find the ground truth specifications for development artifacts (0.0.0-{commit-sha}) or release artifacts ({chart-version})."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "services-tech-stack-orchestration",
      "category": "services",
      "difficulty": "easy",
      "question": "What technologies make up the orchestration layer in Foxtrot's technology stack?",
      "groundTruth": "Kubernetes (EKS, GKE, AKS), ArgoCD, and Crossplane.",
      "llmAnswer": "\n\nBased on the documentation provided, the orchestration layer in Foxtrot's technology stack consists of:\n\n1. **ArgoCD** - Used for GitOps-based deployment orchestration via ApplicationSets (managing hub and spoke topology)\n2. **Crossplane** - Used for infrastructure orchestration and provisioning across AWS, GCP, and Azure\n3. **Argo Workflows** - Used for workflow orchestration\n4. **Helm** - Used for Kubernetes package management and templating\n5. **Terraform** - Used for infrastructure-as-code provisioning\n\nThese technologies are primarily configured and deployed through the `control-core` subsystem, which contains 336 files of pure infrastructure-as-code and configuration artifacts (zero runtime functions). The orchestration layer manages the hub-and-spoke deployment topology across multiple cloud providers (AWS, GCP, Azure) and regions.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "diagrams/helm-interactions.mmd",
        "reference/subsystems/control-core.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "diagrams/helm-interactions.mmd",
        "reference/subsystems/control-core.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 2,
        "precision": 1,
        "navigation": 2,
        "notes": "AI answer correctly identifies ArgoCD and Crossplane but fails to explicitly list Kubernetes (EKS, GKE, AKS) as a core orchestration component. Critically, it includes Argo Workflows, Helm, and Terraform—none of which appear in the ground truth. These additions suggest either hallucination or conflation of related infrastructure tools with the core orchestration layer. The answer is incomplete (missing Kubernetes) and imprecise (adding unvalidated components)."
      },
      "composite": 35,
      "notFound": false
    },
    {
      "id": "services-state-management",
      "category": "services",
      "difficulty": "easy",
      "question": "What are the four state management services used in Foxtrot?",
      "groundTruth": "Elasticsearch, Hazelcast, Redis, and Milvus.",
      "llmAnswer": "Based on the documentation provided, the state management services listed are: **elasticsearch, hazelcast, cassandra, milvus, and redis**.\n\nHowever, the documentation lists **five** state management services, not four. If you're looking for a specific subset of four, that information is not clearly delineated in the provided files.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "diagrams/all-contracts.mmd",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "diagrams/all-contracts.mmd",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 3,
        "completeness": 4,
        "precision": 2,
        "navigation": 3,
        "notes": "AI correctly identified all 4 ground truth services (elasticsearch, hazelcast, redis, milvus) but hallucinated cassandra as a 5th service. The response shows the agent found relevant documentation but failed to filter to the correct subset. The caveat about documentation listing 5 services suggests the agent encountered conflicting information but didn't resolve it correctly against the ground truth."
      },
      "composite": 60,
      "notFound": false
    },
    {
      "id": "services-eks-addon-versions",
      "category": "services",
      "difficulty": "hard",
      "question": "What are the default pinned versions for the vpc-cni and coredns EKS addons in compute-common?",
      "groundTruth": "vpc-cni: v1.20.1-eksbuild.1. coredns: v1.12.2-eksbuild.4.",
      "llmAnswer": "Based on the system-architecture.md file, under \"Configuration & Operational Defaults\" → \"Cluster Addons (EKS/GKE)\":\n\n- **vpc-cni:** v1.20.1-eksbuild.1\n- **coredns:** v1.12.2-eksbuild.4",
      "filesRead": [
        "reference/subsystems/compute-common.md",
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/index.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/subsystems/compute-common.md",
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/index.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer matches ground truth exactly for both addon versions. Correctly cited source file and section location. No hallucinations or extraneous information. Demonstrated clear navigation to the right documentation."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "services-aws-nat-egress-model",
      "category": "services",
      "difficulty": "medium",
      "question": "What is the AWS NAT egress model for internal VPCs vs production clusters in Foxtrot?",
      "groundTruth": "Internal AWS VPCs use shared NAT egress. Production AWS clusters use dedicated NAT egress per cluster. This split reduces cost for internal environments while isolating production egress and capacity per cluster.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain specific information about the AWS NAT egress model for internal VPCs vs production clusters in Foxtrot. While the files reference various VPC types (egress-development-vpc1, employee-access-vpc1, shared-services-vpc1) and Terraform modules for AWS VPC configuration, the actual NAT egress model details are not documented in the materials provided.",
      "filesRead": [
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/network-common.md",
        "reference/system-architecture.md",
        "reference/subsystems/network-core.md",
        "diagrams/system-deps.mmd"
      ],
      "plannedFiles": [
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/network-common.md",
        "reference/system-architecture.md",
        "reference/subsystems/network-core.md",
        "diagrams/system-deps.mmd"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent failed to locate documented information about Foxtrot's NAT egress architecture. The ground truth specifies a clear split between shared NAT for internal VPCs and dedicated NAT per production cluster, but the agent returned NOT_FOUND. While the agent avoided hallucination and was honest about limitations, it provided zero correct facts and demonstrated inability to navigate to the relevant documentation."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "services-ipam-netbox-role",
      "category": "services",
      "difficulty": "medium",
      "question": "What is NetBox used for in Foxtrot, and what infrastructure does it run on?",
      "groundTruth": "NetBox is Foxtrot's IP address management (IPAM) source of truth. It tracks all VPC CIDR blocks across clouds, prevents CIDR overlap, and serves allocation data to compute-tools during VPC and cluster provisioning. It runs on AWS ECS with an RDS database and ALB, provisioned via Terraform in ipam-core.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation shows that NetBox is part of the `ipam-core` subsystem (IP Address Management) and runs on AWS infrastructure including ECS, RDS, Redis, ALB, ASG, and VPC components (based on the Terraform files listed). However, the documentation does not explicitly state what NetBox is used for or provide a clear description of its functional purpose within Foxtrot.",
      "filesRead": [
        "reference/subsystems/ipam-core.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/network-core.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/subsystems/ipam-core.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/network-core.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI answered 'NOT_FOUND' despite ground truth being available. Per evaluation rules, this receives 0/0/5/0 scoring. The agent found partial infrastructure details but failed to locate or synthesize the functional purpose (IPAM source of truth, CIDR tracking, overlap prevention, allocation serving). The answer does not meet the question requirements."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "contracts-argo-gen-params-required",
      "category": "contracts",
      "difficulty": "easy",
      "question": "What are the required fields in an argo-gen-params.yaml file for the ArgoCD ApplicationSet generator?",
      "groundTruth": "chart.version (string), namespace (string), and environment (string).",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about the required fields in an argo-gen-params.yaml file for the ArgoCD ApplicationSet generator. While the documentation references ArgoCD ApplicationSets and their configuration throughout the system architecture, it does not specify the structure or required fields of an argo-gen-params.yaml file.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "reference/system-architecture.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "reference/system-architecture.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent failed to locate the required fields (chart.version, namespace, environment) that exist in the documentation. The 'NOT_FOUND' response indicates the agent did not successfully navigate to or identify the relevant configuration specification. No hallucination occurred, but the core task—finding documented information—was not accomplished."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "contracts-azure-xrd-naming",
      "category": "contracts",
      "difficulty": "medium",
      "question": "What is the XRD name format and composition name format for Azure AKS in Foxtrot, with examples?",
      "groundTruth": "XRD name format: `{plural}.{group}`, e.g. `reltioaksclusters.foxtrot.reltio.com`. Composition name format: `reltio-aks-cluster-{channel}-{version}`, e.g. `reltio-aks-cluster-prd-v1`.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation includes Azure AKS cluster naming conventions (`azr-{phase}-{region-code}-{index}-aks`) and references to composition files (`compute-common/compositions/reltio-aks-cluster-kcl/`), but does not contain explicit XRD name formats or composition name formats with examples for Azure AKS in Foxtrot.",
      "filesRead": [
        "reference/helm/charts/compute-common-templates-foxtrot-aks.md",
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "reference/contracts/index.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/compute-common-templates-foxtrot-aks.md",
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "reference/contracts/index.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI agent returned NOT_FOUND without providing the specific XRD name format ({plural}.{group}) or composition name format (reltio-aks-cluster-{channel}-{version}) with examples. While the response avoids hallucination, it fails to locate or retrieve the ground truth information."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "contracts-helm-chart-required-values",
      "category": "contracts",
      "difficulty": "medium",
      "question": "What are the five required values that all app Helm charts must define?",
      "groundTruth": "accountID (string), cloudProvider (string), region (string), domain (string), and imageRegistry (string).",
      "llmAnswer": "Based on the contracts defined in the reference documentation, the five required values that all app Helm charts must define are:\n\n1. `clusterName`\n2. `environmentType`\n3. `arch`\n4. `cloudProvider`\n5. `region`\n\nThese fields appear consistently across all app chart value definitions (ai_app, auth_app, cleanse_app, mdm_app, pms_app, rdm_app, shared_app) in the system contracts.",
      "filesRead": [
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "reference/helm/charts/app-common-charts-cluster.md"
      ],
      "plannedFiles": [
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "reference/helm/charts/app-common-charts-cluster.md"
      ],
      "score": {
        "accuracy": 1,
        "completeness": 1,
        "precision": 1,
        "navigation": 1,
        "notes": "AI identified only 2 of 5 correct values (cloudProvider, region). Missing required values: accountID, domain, imageRegistry. Included incorrect values: clusterName, environmentType, arch. The answer appears to reference wrong documentation or misinterpret source material, suggesting poor navigation to correct information."
      },
      "composite": 20,
      "notFound": false
    },
    {
      "id": "contracts-sync-wave-ordering",
      "category": "contracts",
      "difficulty": "hard",
      "question": "What are the ArgoCD sync wave values and what resource types are deployed at each wave?",
      "groundTruth": "Wave -5: prerequisites. Wave 0: infrastructure. Wave 5: state apps. Wave 10: application services. Wave 15: post-deployment resources (monitoring, backup hooks, finalizers).",
      "llmAnswer": "Based on the \"Helm Sync Waves (Bootstrapping Order)\" table in reference/helm/index.md:\n\n| Wave | Count | Resources |\n|---|---|---|\n| -1 | 1 | ServiceAccount:pre-sync-app-templates-sa |\n| 0 | 3 | Application:{{, xReltioAKSCluster:{{, ReltioEKSCluster:{{ |\n| 1 | 5 | ClusterRole:pre-sync-app-templates-clusterrole, Application:{{, Workflow:kong-gateway-setup-{{, AccessEntry:{{, Addon:{{ |\n| 2 | 4 | ClusterRoleBinding:pre-sync-app-templates-clusterrolebinding, Workflow:{{, ProviderConfig:{{, ProviderConfig:{{ |\n| 3 | 4 | Workflow:{{, Application:kong-gateway, Release:{{, Release:{{ |\n| 4 | 1 | Release:{{ |\n| 5 | 1 | Job:pre-sync-app-templates-check-job |\n\nThe sync waves control the bootstrapping order, with wave -1 executing first (ServiceAccount creation), followed by waves 0-5 in sequence. This ensures proper dependency ordering during cluster initialization.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 1,
        "precision": 1,
        "navigation": 1,
        "notes": "AI answer is fundamentally misaligned with ground truth. Wave values are incorrect (-1 vs -5, missing waves 10 and 15). Resource categorization differs significantly: ground truth uses logical groupings (prerequisites, infrastructure, state apps, application services, post-deployment), while AI lists specific Kubernetes resource types. Only wave 0 and 5 overlap numerically, but their contents don't match the expected categories. AI appears to have either hallucinated the table or consulted wrong documentation."
      },
      "composite": 15,
      "notFound": false
    }
  ]
}