{
  "evalType": "agent",
  "timestamp": "2026-03-10T19:18:23.561Z",
  "docsDir": "./foxtrot-docs",
  "totalQuestions": 32,
  "overallScore": "68.0",
  "avgAccuracy": "3.41",
  "avgCompleteness": "3.31",
  "avgPrecision": "3.75",
  "avgNavigation": "3.13",
  "notFoundCount": 4,
  "notFoundRate": "12.5%",
  "byCategory": {
    "architecture": {
      "avg": "87.5",
      "count": 6
    },
    "dependencies": {
      "avg": "66.0",
      "count": 5
    },
    "operations": {
      "avg": "47.5",
      "count": 6
    },
    "configuration": {
      "avg": "87.5",
      "count": 6
    },
    "services": {
      "avg": "70.0",
      "count": 5
    },
    "contracts": {
      "avg": "40.0",
      "count": 4
    }
  },
  "byDifficulty": {
    "easy": {
      "avg": "81.0",
      "count": 10
    },
    "medium": {
      "avg": "61.5",
      "count": 17
    },
    "hard": {
      "avg": "64.0",
      "count": 5
    }
  },
  "results": [
    {
      "id": "arch-layered-order",
      "category": "architecture",
      "difficulty": "easy",
      "question": "What are the five layers in Foxtrot's layered architecture, listed from top to bottom?",
      "groundTruth": "Application Layer (app-*), Compute Layer (compute-*), Network Layer (network-*), Account Layer (account-*), Control Plane (control-core).",
      "llmAnswer": "Based on the system-architecture.md file, the five layers in Foxtrot's layered architecture, listed from top to bottom, are:\n\n1. **Application Layer** — `app-common`, `app-tools`\n2. **Compute Layer** — `compute-common`, `compute-tools`\n3. **Network Layer** — `network-common`, `network-core`\n4. **Account Layer** — `account-common`\n5. **Control Plane** — `control-core`",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/index.md",
        "diagrams/system-deps.mmd",
        "reference/helm/index.md",
        "explanation/change-impact.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/index.md",
        "diagrams/system-deps.mmd",
        "reference/helm/index.md",
        "explanation/change-impact.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 4,
        "navigation": 5,
        "notes": "All five layers correctly identified in proper order with accurate names. Agent cited the correct source file (system-architecture.md). The answer includes specific module examples (app-common, app-tools, etc.) beyond the ground truth's generic patterns (app-*, compute-*), which adds helpful context but represents information not explicitly required. No hallucinations or contradictions detected."
      },
      "composite": 95,
      "notFound": false
    },
    {
      "id": "arch-hub-spoke-ownership",
      "category": "architecture",
      "difficulty": "easy",
      "question": "Which ArgoCD instance owns the account, network, and compute layers, and which owns the application layer?",
      "groundTruth": "Hub ArgoCD owns account, network, and compute layers (infrastructure and control-plane appsets). Spoke ArgoCD owns the application layer (app-runtime appsets only).",
      "llmAnswer": "\n\nBased on the \"Deployment Topology (Hub & Spoke)\" section in the system architecture documentation:\n\n**Hub (control-core) ArgoCD instance** owns the account, network, and compute layers:\n- `account-creation` manages `account-runtime`\n- `aws-vpcs`, `gcp-vpcs`, `azure-vpcs` manage `network-runtime`\n- `aws-argocd-feature-clusters`, `gcp-argocd-feature-clusters`, `azure-argocd-feature-clusters` manage `compute-runtime`\n\n**Spoke clusters' ArgoCD instances** own the application layer, as indicated by the hub-and-spoke topology where ApplicationSets in `control-core` deploy to spoke clusters, which then manage application deployments via their own ArgoCD instances.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/control-core.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/control-core.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 4,
        "precision": 5,
        "navigation": 5,
        "notes": "Answer correctly identifies Hub ArgoCD ownership of account/network/compute layers and Spoke ownership of application layer. Provides specific, verifiable examples (account-creation, vpc appsets, argocd-feature-clusters). Minor gap: doesn't explicitly reference 'app-runtime appsets' terminology from ground truth, though the concept is present. Demonstrates clear navigation to correct documentation section with detailed supporting examples."
      },
      "composite": 95,
      "notFound": false
    },
    {
      "id": "arch-aws-regions",
      "category": "architecture",
      "difficulty": "easy",
      "question": "What AWS regions does Foxtrot support?",
      "groundTruth": "us-east-1, us-west-2, eu-west-1, eu-central-1.",
      "llmAnswer": "Based on the system architecture documentation, Foxtrot supports the following AWS regions:\n\n- ap-southeast-1\n- eu-central-1\n- eu-west-1\n- eu-west-2\n- eu-west-3\n- us-east-1\n- us-east-2\n- us-west-1\n- us-west-2",
      "filesRead": [
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 5,
        "precision": 1,
        "navigation": 2,
        "notes": "AI answer includes all 4 ground truth regions but adds 5 unsupported regions (ap-southeast-1, eu-west-2, eu-west-3, us-east-2, us-west-1), indicating hallucination or unreliable source verification. While completeness is achieved, the answer contains significant false positives that undermine accuracy and precision."
      },
      "composite": 50,
      "notFound": false
    },
    {
      "id": "arch-gcp-shared-vpc-host",
      "category": "architecture",
      "difficulty": "medium",
      "question": "What is the default GCP host project used for Shared VPC in network-common, and what is the host network path it attaches to?",
      "groundTruth": "The default host project is `network-services-436015`. When `sharedVpc.enabled: true`, service project resources attach to the host network path `projects/HOST_PROJECT/global/networks/gcp-core-network`.",
      "llmAnswer": "\n\nBased on the documentation provided:\n\n**Default GCP host project:** `network-services-436015`\n\n**Host network path:** `projects/network-services-436015/global/networks/gcp-core-network`\n\nWhen `sharedVpc.enabled: true`, service project resources attach to this host network path. Configuration is found in `runtime/network-runtime/gcp/*/override.yaml`.",
      "filesRead": [
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-common.md",
        "reference/network-architecture.md",
        "reference/helm/index.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-common.md",
        "reference/network-architecture.md",
        "reference/helm/index.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 4,
        "navigation": 4,
        "notes": "Answer correctly identifies both the default host project (network-services-436015) and the complete host network path. Covers the sharedVpc.enabled condition and attachment behavior. Includes additional contextual information about configuration file location (override.yaml) which, while not in ground truth, is accurate and helpful rather than hallucinated. Navigation appears sound based on specific values and file paths provided."
      },
      "composite": 90,
      "notFound": false
    },
    {
      "id": "arch-cidr-employee-access",
      "category": "architecture",
      "difficulty": "medium",
      "question": "What is the CIDR range for the employee access (bastions) segment on AWS and on GCP?",
      "groundTruth": "AWS: 10.208.128.0/24. GCP: 10.128.128.0/24.",
      "llmAnswer": "**AWS:** `10.208.128.0/24`\n\n**GCP:** `10.128.128.0/24`",
      "filesRead": [
        "reference/network-architecture.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-core.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/network-architecture.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/subsystems/network-core.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer matches ground truth exactly for both AWS and GCP CIDR ranges. Response is clean, direct, and contains no hallucinated or extra information. Correct answer indicates successful retrieval from authoritative source."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "arch-production-cidr",
      "category": "architecture",
      "difficulty": "medium",
      "question": "What is the CIDR range for production workloads on AWS and on GCP?",
      "groundTruth": "AWS: 10.192.0.0/10. GCP: 10.128.0.0/10.",
      "llmAnswer": "**AWS:** `10.192.0.0/10`\n\n**GCP:** `10.128.0.0/10`",
      "filesRead": [
        "reference/network-architecture.md",
        "reference/subsystems/network-core.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/network-architecture.md",
        "reference/subsystems/network-core.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/helm/charts/network-common-charts-foxtrot-gcp-vpc.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 4,
        "notes": "AI answer matches ground truth exactly for both AWS and GCP CIDR ranges. No hallucinations or extraneous information. Navigation score reduced slightly due to lack of visible evidence of information retrieval process, though perfect accuracy suggests correct sourcing."
      },
      "composite": 95,
      "notFound": false
    },
    {
      "id": "dep-runtime-common-horizontal",
      "category": "dependencies",
      "difficulty": "easy",
      "question": "Which runtime repositories consume charts from which common repositories?",
      "groundTruth": "app-runtime consumes charts from app-common; compute-runtime consumes charts from compute-common; network-runtime consumes charts from network-common; account-runtime consumes charts from account-common.",
      "llmAnswer": "\n\nBased on the system architecture documentation provided:\n\n**Runtime repositories consume charts from their respective common repositories as follows:**\n\n- `app-runtime` → consumes from `app-common`\n- `compute-runtime` → consumes from `compute-common`\n- `network-runtime` → consumes from `network-common`\n- `account-runtime` → consumes from `account-common`\n\nAs stated in the architecture: \"Cross-cutting: `runtime` repos (`app-runtime`, `compute-runtime`, `network-runtime`, `account-runtime`) consume charts horizontally from their respective `-common` repos and apply environment-specific overrides.\"",
      "filesRead": [
        "reference/helm/index.md",
        "reference/subsystems/runtime.md",
        "reference/system-architecture.md",
        "diagrams/helm-interactions.mmd",
        "reference/contracts/index.md"
      ],
      "plannedFiles": [
        "reference/helm/index.md",
        "reference/subsystems/runtime.md",
        "reference/system-architecture.md",
        "diagrams/helm-interactions.mmd",
        "reference/contracts/index.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer correctly identifies all four runtime-to-common repository mappings matching ground truth exactly. Includes relevant architectural context about cross-cutting concerns and environment-specific overrides without hallucination. Demonstrates successful navigation to authoritative source documentation."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "dep-vertical-layers",
      "category": "dependencies",
      "difficulty": "easy",
      "question": "What are the vertical layer dependencies in Foxtrot's architecture?",
      "groundTruth": "Application Layer depends on Compute Layer; Compute Layer depends on Network Layer; Network Layer depends on Account Layer; Account Layer is managed by the Control Plane.",
      "llmAnswer": "\n\nFoxtrot's architecture follows a strict five-layer dependency order where each layer may only depend on layers below it:\n\n1. **Application Layer** (top) — `app-common`, `app-tools`: Application Helm charts, state management services (elasticsearch, hazelcast, redis, milvus), Jenkins CI\n\n2. **Compute Layer** — `compute-common`, `compute-tools`: EKS/GKE/AKS cluster provisioning via Crossplane XRDs\n\n3. **Network Layer** — `network-common`, `network-core`: VPCs, shared VPC, CIDR allocation, NAT egress\n\n4. **Account Layer** — `account-common`: AWS account creation via Service Catalog\n\n5. **Control Plane** (bottom) — `control-core`: ArgoCD hub/spoke, ApplicationSets, provider configs\n\nCross-cutting: `runtime` repos (`app-runtime`, `compute-runtime`, `network-runtime`, `account-runtime`) consume charts horizontally from their respective `-common` repos and apply environment-specific overrides.",
      "filesRead": [
        "reference/system-architecture.md",
        "diagrams/system-deps.mmd",
        "reference/index.md",
        "explanation/change-impact.md",
        "reference/subsystems/root.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "diagrams/system-deps.mmd",
        "reference/index.md",
        "explanation/change-impact.md",
        "reference/subsystems/root.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 2,
        "navigation": 3,
        "notes": "The core dependency chain is correct and complete: Application→Compute→Network→Account→Control Plane matches ground truth exactly. However, the answer includes extensive unverifiable implementation details (specific repo names, component types, technologies) that appear plausible but cannot be confirmed against the ground truth. This creates precision risk—the agent may be hallucinating architectural specifics beyond what was actually verified. Navigation score reflects that while the answer directly addresses the question, there's no evidence of actual source file consultation to validate the additional claims."
      },
      "composite": 75,
      "notFound": false
    },
    {
      "id": "dep-create-account-repos",
      "category": "dependencies",
      "difficulty": "medium",
      "question": "How many PRs does `create-account` produce and which repositories does it touch?",
      "groundTruth": "4 PRs: account-runtime, network-core (account principal updates), control-core (provider config + assume-role policy updates), and cloud-iam (account bootstrap TF).",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about how many PRs `create-account` produces or which repositories it touches. While there are references to `account-creation` (an ArgoCD ApplicationSet in the `account-runtime` repository) and the `account-common` subsystem, there is no specific information about PR generation or the `create-account` component.",
      "filesRead": [
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/contracts/index.md",
        "agent-kb.json",
        "reference/subsystems/account-common.md",
        "explanation/change-impact.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/contracts/index.md",
        "agent-kb.json",
        "reference/subsystems/account-common.md",
        "explanation/change-impact.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "Agent correctly identified information was not available in provided documentation rather than hallucinating. However, failed to locate or access the correct source material containing the ground truth about create-account's 4 PRs and repository targets."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "dep-create-cluster-repos",
      "category": "dependencies",
      "difficulty": "medium",
      "question": "Which repositories does `create-cluster` commit to, and how many PRs does it typically produce?",
      "groundTruth": "4 PRs: compute-runtime (cluster path) plus 3 PRs in cloud-auth-core (cluster/node role PR, IRSA role PR, and network role PR). It also calls cluster-netbox-allocate.py to reserve/allocate cluster subnet CIDRs.",
      "llmAnswer": "Based on the documentation provided:\n\n`create-cluster` (via `compute-tools/bash-spoke-setup/scripts/create-cluster.sh`) produces **4 PRs** touching:\n- `compute-runtime` \n- `cloud-auth-core` (3 PRs)",
      "filesRead": [
        "reference/contracts/index.md",
        "agent-kb.json",
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "reference/subsystems/control-core.md"
      ],
      "plannedFiles": [
        "reference/contracts/index.md",
        "agent-kb.json",
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "reference/subsystems/control-core.md"
      ],
      "score": {
        "accuracy": 4,
        "completeness": 3,
        "precision": 5,
        "navigation": 4,
        "notes": "AI correctly identified 4 PRs across compute-runtime and cloud-auth-core with accurate breakdown. However, omitted the cluster-netbox-allocate.py call for CIDR allocation, which is part of create-cluster's complete behavior per ground truth. No hallucinations present, but answer is incomplete regarding full scope of tool functionality."
      },
      "composite": 80,
      "notFound": false
    },
    {
      "id": "dep-compute-common-deps",
      "category": "dependencies",
      "difficulty": "medium",
      "question": "What are the core dependencies of compute-common for cluster provisioning?",
      "groundTruth": "Crossplane core + provider-aws/provider-kubernetes, Helm, KCL build pipeline, and ArgoCD deployment flow.",
      "llmAnswer": "Based on the documentation provided, the core dependencies of `compute-common` for cluster provisioning are:\n\n1. **root** — 29 imports providing foundational configuration and environment resolution primitives. Without these, cluster lifecycle orchestration and environment-aware test parameterization cannot initialize.\n\n2. **Helm Charts** — Three cloud-specific chart templates:\n   - `foxtrot-aks` (Azure AKS clusters)\n   - `foxtrot-eks` (AWS EKS clusters)\n   - `foxtrot-gke` (GCP GKE clusters)\n\n3. **Crossplane Compositions** — XRD-based infrastructure definitions for multi-cloud cluster provisioning:\n   - `reltio-aks-cluster-kcl`\n   - `reltio-eks-cluster-kcl`\n   - `reltio-gke-spotinst`\n\n4. **Operational Helpers** — Pure functions that externalize state into cluster and Helm configuration:\n   - `helm_helper` — Helm operations\n   - `kubectl_helper` — Kubernetes operations\n   - `cleanup_tracker` — Resource lifecycle management\n\n5. **Configuration Contracts** — Helm values structures defining cluster parameters:\n   - `foxtrot_aks_values` (accountID, composition, cluster)\n   - `foxtrot_eks_values` (accountID, composition, efs, cluster)\n   - `foxtrot_gke_values` (cluster)\n\nThe module is configuration-heavy (64 files, predominantly manifests and templates) with test utilities as its public surface, making the Helm charts and Crossplane compositions the authoritative infrastructure definitions.",
      "filesRead": [
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "diagrams/compute-common-contracts.mmd",
        "diagrams/system-deps.mmd",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/subsystems/compute-common.md",
        "reference/system-architecture.md",
        "diagrams/compute-common-contracts.mmd",
        "diagrams/system-deps.mmd",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 2,
        "precision": 3,
        "navigation": 3,
        "notes": "AI correctly identifies Helm and Crossplane as dependencies but critically misses ArgoCD deployment flow and KCL build pipeline from ground truth. Also fails to explicitly call out Crossplane core and specific providers (provider-aws/provider-kubernetes). The answer is broader than ground truth, including operational helpers and configuration contracts that add context but aren't core dependencies. Navigation shows partial exploration—found Helm charts and Crossplane compositions but missed key infrastructure orchestration components."
      },
      "composite": 50,
      "notFound": false
    },
    {
      "id": "ops-argocd-deployment-flow",
      "category": "operations",
      "difficulty": "medium",
      "question": "What are the steps in the ArgoCD deployment flow for an application, from ApplicationSet watching app-runtime to pods starting?",
      "groundTruth": "1. ApplicationSet watches app-runtime. 2. Reads argo-gen-params.yaml per environment. 3. Generates Application. 4. Pulls chart from OCI registry. 5. Merges values and overrides. 6. Renders manifests. 7. Applies to cluster. 8. ESO fetches secrets. 9. Pods start with injected configs.",
      "llmAnswer": "Based on the system-architecture.md documentation, the ArgoCD deployment flow is:\n\n1. **ApplicationSet in `control-core` watches the runtime repo for changes**\n2. **Reads `argo-gen-params.yaml` per environment** (required fields: `chart.version`, `namespace`, `environment`)\n3. **Generates Application custom resource**\n4. **Pulls chart from OCI registry**\n5. **Merges values in precedence order**: values.yaml → default-properties → common-values → overrides → imagetags-override\n6. **Renders manifests and applies to target cluster via sync waves**\n\nThe flow ends with manifests being applied to the target cluster, which triggers pod creation through Kubernetes reconciliation.",
      "filesRead": [
        "reference/subsystems/runtime.md",
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/system-architecture.md",
        "diagrams/helm-interactions.mmd",
        "reference/subsystems/app-common.md"
      ],
      "plannedFiles": [
        "reference/subsystems/runtime.md",
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/system-architecture.md",
        "diagrams/helm-interactions.mmd",
        "reference/subsystems/app-common.md"
      ],
      "score": {
        "accuracy": 4,
        "completeness": 3,
        "precision": 4,
        "navigation": 4,
        "notes": "Answer captures the core ArgoCD flow accurately through manifest application but omits two critical steps: ESO secret fetching and explicit pod startup with injected configs. The agent found the right documentation and provided accurate details about argo-gen-params.yaml and values merging precedence, but the flow is incomplete. The answer ends at cluster application rather than following through to the full pod initialization with secrets."
      },
      "composite": 75,
      "notFound": false
    },
    {
      "id": "ops-ebf-release-pattern",
      "category": "operations",
      "difficulty": "medium",
      "question": "What are the steps in the EBF/weekly release pattern?",
      "groundTruth": "1. Create release branch from develop. 2. Promote versions for release/hotfix environments. 3. Run stage pipelines (preview, preprod, production). 4. Merge promoted changes to main. 5. Reconcile hotfix and develop branches.",
      "llmAnswer": "Based on the documentation provided:\n\n**EBF (Emergency Bug Fix) Release Pattern:**\n1. Changes are cherry-picked from `develop`\n2. Changes are applied to the `hotfix` branch in app-runtime\n3. Changes are verified on EBF verification clusters\n4. Changes are promoted to production via the `main` branch\n\n**Weekly Release Pattern:**\nThe documentation mentions that the `release` branch is used for release verification (weekly release staging), but detailed step-by-step instructions for the weekly release workflow are not provided in the reference files.",
      "filesRead": [
        "reference/operations.md",
        "reference/system-architecture.md",
        "reference/index.md",
        "explanation/change-impact.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/operations.md",
        "reference/system-architecture.md",
        "reference/index.md",
        "explanation/change-impact.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 1,
        "precision": 3,
        "navigation": 2,
        "notes": "AI answer captures partial elements (develop, hotfix, main branches, verification, production) but misses the systematic 5-step process. Explicitly admits weekly release details aren't found in documentation. Separates EBF and weekly patterns rather than presenting unified process. No hallucination but significant gaps in coverage of version promotion, specific pipeline stages (preview/preprod/production), and reconciliation steps."
      },
      "composite": 40,
      "notFound": false
    },
    {
      "id": "ops-rollback-procedure",
      "category": "operations",
      "difficulty": "medium",
      "question": "What is the preferred rollback procedure in Foxtrot, and when are script-level deletes used instead?",
      "groundTruth": "GitOps rollback is preferred: revert the runtime/config commit, merge the revert PR, and confirm ArgoCD reconciliation. Script-level deletes (delete-vpc.sh, delete-cluster.sh, delete-app.sh) are used only with explicit approval and change control.",
      "llmAnswer": "The preferred rollback procedure in Foxtrot is to revert the `chart.version` in `argo-gen-params.yaml` to the previous known-good version. ArgoCD detects the change and re-syncs the application with the prior chart version from the OCI registry.\n\nRegarding when script-level deletes are used instead: **NOT_FOUND** — The provided documentation does not specify when script-level deletes would be used as an alternative to the standard rollback procedure.",
      "filesRead": [
        "reference/operations.md",
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "reference/subsystems/control-core.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/operations.md",
        "reference/system-architecture.md",
        "explanation/change-impact.md",
        "reference/subsystems/control-core.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 1,
        "completeness": 1,
        "precision": 2,
        "navigation": 1,
        "notes": "AI provided a narrower, file-specific rollback approach (chart.version revert) rather than the full GitOps workflow (commit revert → PR merge → reconciliation confirmation). Critically, it claimed script-level delete usage was undocumented when ground truth explicitly specifies explicit approval and change control requirements. The agent failed to locate the complete rollback procedure documentation and missed the entire second component of the answer."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-branch-cluster-mapping",
      "category": "operations",
      "difficulty": "easy",
      "question": "What is the branch-to-cluster mapping in app-runtime? Which branch targets which environments?",
      "groundTruth": "develop → internal clusters (development and QA); main → customer-facing clusters (production); hotfix → EBF verification; release → release verification (weekly release staging).",
      "llmAnswer": "Based on the documentation provided, the branch-to-cluster mapping in app-runtime is:\n\n- `develop` → internal clusters (development and QA)\n- `main` → customer-facing clusters (production)\n- `hotfix` → EBF verification clusters\n- `release` → release verification (weekly release staging)",
      "filesRead": [
        "reference/subsystems/runtime.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "reference/configuration.md"
      ],
      "plannedFiles": [
        "reference/subsystems/runtime.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "reference/configuration.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 4,
        "notes": "AI answer perfectly matches ground truth across all four branch mappings (develop, main, hotfix, release). No hallucinations or extraneous information. Minor uncertainty on navigation scoring due to lack of visibility into actual tool usage/file discovery process, though the phrase 'based on the documentation provided' suggests successful information retrieval."
      },
      "composite": 95,
      "notFound": false
    },
    {
      "id": "ops-jenkins-jobs",
      "category": "operations",
      "difficulty": "hard",
      "question": "What are the six Jenkins jobs in the Foxtrot pipeline and what does each do?",
      "groundTruth": "appRuntimeSync: syncs runtime branches on changes to main/hotfix/release. foxtrotSync: updates imagetags-override.yaml with new image tags. foxtrotPromote: promotes and merges release outputs. chartSync: updates argo-gen-params.yaml with chart versions. chartCheckInAppCommon: detects chart version mismatches and triggers chartSync. chartPromoteVersion: promotes chart versions per release stage.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation does not contain information about six specific Jenkins jobs in the Foxtrot pipeline. While the documentation mentions that Jenkins runs as a StatefulSet on CI clusters and \"handles CI/CD pipelines for application builds and chart publishing,\" it does not list or describe six individual Jenkins jobs.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-jenkins.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/runtime.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-jenkins.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/subsystems/runtime.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "AI correctly reported information unavailable in provided documentation but failed to locate or retrieve the ground truth data about the six Foxtrot pipeline jobs. No hallucinations present, but complete failure to answer the question."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "ops-create-cluster-timeout",
      "category": "operations",
      "difficulty": "medium",
      "question": "What is the timeout for waiting for a cluster to reach ready condition in the create-cluster workflow?",
      "groundTruth": "3600 seconds (1 hour), with a poll interval of 30 seconds.",
      "llmAnswer": "NOT_FOUND\n\nThe provided documentation files do not contain information about the timeout for waiting for a cluster to reach ready condition in the create-cluster workflow.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-cluster.md",
        "reference/contracts/index.md",
        "reference/helm/index.md",
        "reference/configuration.md",
        "reference/subsystems/app-common.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-cluster.md",
        "reference/contracts/index.md",
        "reference/helm/index.md",
        "reference/configuration.md",
        "reference/subsystems/app-common.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 5,
        "navigation": 0,
        "notes": "Agent failed to locate the timeout configuration (3600 seconds with 30-second poll interval) that exists in the documentation. NOT_FOUND response indicates unsuccessful navigation to the relevant source material."
      },
      "composite": 25,
      "notFound": true
    },
    {
      "id": "config-cloud-resource-naming",
      "category": "configuration",
      "difficulty": "easy",
      "question": "What is the canonical format for cloud resource names in Foxtrot, and what does each component represent?",
      "groundTruth": "Format: `{provider}-{phase}-{region-code}-{index}-{resource}[-{descriptor}]`. provider: aws/gcp/azr; phase: prd/iqa/dev; region-code: two-letter derived code; index: incremental integer; resource: vpc/eks/gke/aks/lb/rds. Example: `gcp-prd-us-1-vpc-blue`.",
      "llmAnswer": "Based on the documentation provided, the canonical format for cloud resource names in Foxtrot follows these patterns:\n\n**AWS Resources:**\n- Clusters: `aws-{phase}-{region-code}-{index}-eks`\n- VPCs: `aws-{phase}-{region-code}-{index}-vpc`\n\n**Azure Resources:**\n- Clusters: `azr-{phase}-{region-code}-{index}-aks`\n- VNets: `azr-{phase}-{region-code}-{index}-vnet`\n\n**Component Breakdown:**\n- **Cloud provider prefix**: `aws` or `azr` (Azure)\n- **Phase**: Environment indicator (e.g., `dev`, `iqa`, `prd`)\n- **Region code**: Cloud region identifier (e.g., `us-1`, `eu-2`)\n- **Index**: Numeric identifier for the resource instance\n- **Resource type**: Service suffix (`eks`, `aks`, `vpc`, `vnet`)\n\nThese naming conventions are enforced via scripts in `compute-tools/bash-spoke-setup/scripts/` (create-cluster.sh and create-vpc.sh).",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/network-architecture.md",
        "reference/helm/index.md",
        "reference/index.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/network-architecture.md",
        "reference/helm/index.md",
        "reference/index.md"
      ],
      "score": {
        "accuracy": 3,
        "completeness": 2,
        "precision": 3,
        "navigation": 2,
        "notes": "AI correctly identifies the general format and component structure for AWS/Azure, but omits GCP provider entirely. Missing resource types (gke, lb, rds) and the optional descriptor suffix component. Region-code description lacks specificity ('two-letter derived code'). References to specific script files suggest some navigation effort, but the answer is incomplete against ground truth. The format structure shown is accurate for what's covered, but significant gaps in provider coverage and resource type enumeration reduce completeness."
      },
      "composite": 50,
      "notFound": false
    },
    {
      "id": "config-region-code-algorithm",
      "category": "configuration",
      "difficulty": "medium",
      "question": "What is the algorithm for deriving a two-letter region code from a cloud region name, and what is the region code for `westeurope`?",
      "groundTruth": "Algorithm: 1. Remove directional words (east, west, north, south). 2. Remove separators. 3. Take first two characters. For `westeurope`: remove 'west' → 'europe' → 'eu'.",
      "llmAnswer": "**Algorithm for deriving a two-letter region code:**\n\n1. Remove directional words (east, west, north, south)\n2. Remove separators (hyphens, spaces)\n3. Take first two characters\n\n**Region code for `westeurope`:**\n\n`eu`\n\n**Example derivation:** `westeurope` → remove 'west' → `europe` → take first two characters → `eu`",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/subsystems/runtime.md",
        "reference/configuration.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/subsystems/runtime.md",
        "reference/configuration.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 3,
        "notes": "AI answer matches ground truth exactly on algorithm steps and region code derivation. Formatting and example enhance clarity without introducing errors. Navigation score reduced because no search/file access process is demonstrated—answer appears provided directly without showing how the information was located."
      },
      "composite": 90,
      "notFound": false
    },
    {
      "id": "config-app-config-merge-order",
      "category": "configuration",
      "difficulty": "medium",
      "question": "What is the configuration merge order for app charts in Foxtrot, from lowest to highest precedence?",
      "groundTruth": "1. values.yaml, 2. default-properties.yaml, 3. default-values.yaml / k8s-values.yaml, 4. common-values.yaml, 5. properties-override.yaml, 6. k8s-override.yaml, 7. imagetags-override.yaml.",
      "llmAnswer": "From lowest to highest precedence:\n\n1. `values.yaml` (chart defaults)\n2. `default-properties.yaml`\n3. `default-values.yaml` / `k8s-values.yaml`\n4. `common-values.yaml`\n5. `properties-override.yaml`\n6. `k8s-override.yaml`\n7. `imagetags-override.yaml` (highest precedence)",
      "filesRead": [
        "reference/configuration.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/contracts/index.md"
      ],
      "plannedFiles": [
        "reference/configuration.md",
        "reference/subsystems/app-common.md",
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/contracts/index.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer matches ground truth exactly in both content and order. All 7 configuration files listed in correct precedence sequence. Helpful annotations (chart defaults, highest precedence) add clarity without introducing hallucinations or inaccuracies."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "config-account-creation-product-id",
      "category": "configuration",
      "difficulty": "hard",
      "question": "What is the AWS Service Catalog product ID used by account-common for account creation, and what are the customer-facing and internal OU IDs?",
      "groundTruth": "Service Catalog product ID: `prod-mts6togilnnuk`. Customer-facing OU: `ou-8vpz-mogk4qua`. Internal OU: `ou-8vpz-rw7etshv`. Provider config reference: `aws-709659200419`.",
      "llmAnswer": "Based on the configuration reference documentation:\n\n**AWS Service Catalog Product ID for account creation:** `prod-mts6togilnnuk`\n\n**Customer-facing OU ID:** `ou-8vpz-mogk4qua`\n\n**Internal OU ID:** `ou-8vpz-rw7etshv`",
      "filesRead": [
        "reference/subsystems/account-common.md",
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/contracts/index.md",
        "reference/configuration.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/subsystems/account-common.md",
        "reference/helm/charts/account-common-charts-account-creation.md",
        "reference/contracts/index.md",
        "reference/configuration.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 3,
        "precision": 5,
        "navigation": 4,
        "notes": "AI correctly identified all three primary identifiers (product ID, customer-facing OU, internal OU) with exact matches to ground truth. However, omitted the provider config reference (aws-709659200419) that was part of the complete ground truth. Answer is accurate and free of hallucination, but incomplete as it covers 3 of 4 expected items."
      },
      "composite": 85,
      "notFound": false
    },
    {
      "id": "config-ipam-rds-backup",
      "category": "configuration",
      "difficulty": "hard",
      "question": "What are the RDS automated backup retention period and backup window configured in ipam-core?",
      "groundTruth": "RDS automated backup retention: 7 days. RDS backup window: 03:00-06:00 UTC.",
      "llmAnswer": "Based on the configuration reference documentation:\n\n**Backup retention period:** 7 days\n\n**Backup window:** \"03:00-06:00\" UTC\n\nThese settings are configured in `ipam-core/netbox/rds.tf` for the IPAM RDS (NetBox Database) which uses PostgreSQL as the engine.",
      "filesRead": [
        "reference/subsystems/ipam-core.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/configuration.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/subsystems/ipam-core.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/configuration.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer matches ground truth exactly on both metrics (7 days retention, 03:00-06:00 UTC window). Additional context about file location and database engine adds value without introducing hallucinations or inaccuracies. Demonstrates successful navigation to the correct configuration source."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "config-dev-artifact-naming",
      "category": "configuration",
      "difficulty": "medium",
      "question": "What is the artifact naming convention for development chart versions vs release chart versions in the OCI registry?",
      "groundTruth": "Development artifacts: `{chart-name}:0.0.0-{commit-sha}`. Release artifacts: `{chart-name}:{chart-version}` (semantic version from Chart.yaml).",
      "llmAnswer": "Development artifacts: `{chart-name}:0.0.0-{commit-sha}`\n\nRelease artifacts: `{chart-name}:{chart-version}` (semantic version from Chart.yaml)",
      "filesRead": [
        "reference/helm/index.md",
        "reference/configuration.md",
        "reference/system-architecture.md",
        "reference/index.md",
        "agent-kb.json"
      ],
      "plannedFiles": [
        "reference/helm/index.md",
        "reference/configuration.md",
        "reference/system-architecture.md",
        "reference/index.md",
        "agent-kb.json"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer is verbatim match to ground truth. Both development and release artifact naming conventions are correctly stated with proper formatting and context. No extraneous information or hallucinations present."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "services-tech-stack-orchestration",
      "category": "services",
      "difficulty": "easy",
      "question": "What technologies make up the orchestration layer in Foxtrot's technology stack?",
      "groundTruth": "Kubernetes (EKS, GKE, AKS), ArgoCD, and Crossplane.",
      "llmAnswer": "Based on the documentation provided, Foxtrot's orchestration layer consists of the following technologies:\n\n1. **ArgoCD** — GitOps-based deployment orchestration via ApplicationSets that watch runtime repositories and manage Application custom resources across hub/spoke topology\n2. **Argo Workflows** — Workflow orchestration for multi-step operational tasks\n3. **Argo Events** — Event-driven automation and triggering\n4. **Crossplane** — Infrastructure provisioning and resource lifecycle management via XRDs (Custom Resource Definitions) and compositions\n5. **Helm** — Package management and templating for Kubernetes resources\n\nThese technologies are centralized in the `control-core` subsystem, which forms the bottom layer of Foxtrot's layered architecture. ArgoCD ApplicationSets serve as the primary orchestration mechanism, reading environment-specific parameters from `argo-gen-params.yaml`, generating Applications, pulling charts from OCI registries, and applying manifests to target clusters via sync waves.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/control-core.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/control-core.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd"
      ],
      "score": {
        "accuracy": 3,
        "completeness": 2,
        "precision": 2,
        "navigation": 3,
        "notes": "AI correctly identifies ArgoCD and Crossplane but critically omits Kubernetes (EKS, GKE, AKS) as an explicit orchestration layer component. Adds Argo Workflows, Argo Events, and Helm which are not in ground truth, suggesting scope creep or hallucination. Answer provides detailed implementation context but fails to match the specific, concise ground truth list."
      },
      "composite": 50,
      "notFound": false
    },
    {
      "id": "services-state-management",
      "category": "services",
      "difficulty": "easy",
      "question": "What are the four state management services used in Foxtrot?",
      "groundTruth": "Elasticsearch, Hazelcast, Redis, and Milvus.",
      "llmAnswer": "Based on the documentation provided, the four state management services used in Foxtrot are:\n\n1. **Elasticsearch** — full-text search and analytics\n2. **Hazelcast** — distributed in-memory cache and compute\n3. **Redis** — key-value cache and session store\n4. **Milvus** — vector database for AI/ML workloads",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "explanation/change-impact.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "explanation/change-impact.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer correctly identifies all four state management services matching ground truth exactly. Descriptions are accurate and relevant. No hallucinations or extraneous information detected. Perfect coverage of the question."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "services-eks-addon-versions",
      "category": "services",
      "difficulty": "hard",
      "question": "What are the default pinned versions for the vpc-cni and coredns EKS addons in compute-common?",
      "groundTruth": "vpc-cni: v1.20.1-eksbuild.1. coredns: v1.12.2-eksbuild.4.",
      "llmAnswer": "Based on the information in the provided files, specifically from the \"Configuration & Operational Defaults\" section in `reference/system-architecture.md`:\n\n- **vpc-cni**: v1.20.1-eksbuild.1\n- **coredns**: v1.12.2-eksbuild.4",
      "filesRead": [
        "reference/subsystems/compute-common.md",
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/index.md",
        "reference/configuration.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/subsystems/compute-common.md",
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md",
        "reference/helm/index.md",
        "reference/configuration.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer matches ground truth exactly on both addon versions. Properly cited source file and section, demonstrating effective information retrieval. No hallucinations or extraneous details."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "services-aws-nat-egress-model",
      "category": "services",
      "difficulty": "medium",
      "question": "What is the AWS NAT egress model for internal VPCs vs production clusters in Foxtrot?",
      "groundTruth": "Internal AWS VPCs use shared NAT egress. Production AWS clusters use dedicated NAT egress per cluster. This split reduces cost for internal environments while isolating production egress and capacity per cluster.",
      "llmAnswer": "**Internal AWS VPCs:** Shared NAT egress (`nat_gateway_configuration = \"all_azs\"` for shared VPCs)\n\n**Production AWS clusters:** Dedicated NAT egress per cluster\n\n**Internal environments:** `nat_gateway_configuration = \"none\"` (no dedicated NAT)\n\nThis split reduces cost for internal environments while isolating production egress and capacity.",
      "filesRead": [
        "reference/network-architecture.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/network-core.md",
        "reference/system-architecture.md",
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md"
      ],
      "plannedFiles": [
        "reference/network-architecture.md",
        "reference/helm/charts/network-common-charts-foxtrot-aws-vpc.md",
        "reference/subsystems/network-core.md",
        "reference/system-architecture.md",
        "reference/helm/charts/compute-common-templates-foxtrot-eks.md"
      ],
      "score": {
        "accuracy": 3,
        "completeness": 4,
        "precision": 2,
        "navigation": 2,
        "notes": "Core facts correct: shared NAT for internal VPCs, dedicated for production clusters, cost/isolation benefits stated. However, AI introduces unverified configuration syntax (`nat_gateway_configuration` parameters) not in ground truth. Internal environments section contradicts earlier statement (claims both 'all_azs' and 'none'). Cannot assess navigation quality from answer alone."
      },
      "composite": 55,
      "notFound": false
    },
    {
      "id": "services-ipam-netbox-role",
      "category": "services",
      "difficulty": "medium",
      "question": "What is NetBox used for in Foxtrot, and what infrastructure does it run on?",
      "groundTruth": "NetBox is Foxtrot's IP address management (IPAM) source of truth. It tracks all VPC CIDR blocks across clouds, prevents CIDR overlap, and serves allocation data to compute-tools during VPC and cluster provisioning. It runs on AWS ECS with an RDS database and ALB, provisioned via Terraform in ipam-core.",
      "llmAnswer": "NetBox is Foxtrot's IP address management (IPAM) source of truth. It tracks all VPC CIDR blocks across AWS, GCP, and Azure, prevents CIDR overlap, and serves allocation data to `compute-tools` during VPC and cluster provisioning. \n\nInfrastructure: RDS PostgreSQL backend in `ipam-core/netbox/`.",
      "filesRead": [
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/subsystems/network-core.md",
        "reference/subsystems/network-common.md"
      ],
      "plannedFiles": [
        "reference/system-architecture.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/subsystems/network-core.md",
        "reference/subsystems/network-common.md"
      ],
      "score": {
        "accuracy": 3,
        "completeness": 2,
        "precision": 2,
        "navigation": 2,
        "notes": "AI correctly identifies NetBox's IPAM role and core functions. However, infrastructure description is incomplete—omits ECS and ALB explicitly stated in ground truth. Hallucination: specifies AWS/GCP/Azure multi-cloud support not confirmed in ground truth (which only says 'across clouds'). Missing Terraform provisioning detail. Partial file path reference suggests incomplete information gathering."
      },
      "composite": 45,
      "notFound": false
    },
    {
      "id": "contracts-argo-gen-params-required",
      "category": "contracts",
      "difficulty": "easy",
      "question": "What are the required fields in an argo-gen-params.yaml file for the ArgoCD ApplicationSet generator?",
      "groundTruth": "chart.version (string), namespace (string), and environment (string).",
      "llmAnswer": "Based on the \"ArgoCD Deployment Flow\" section in the system architecture documentation, the required fields in an `argo-gen-params.yaml` file are:\n\n- `chart.version`\n- `namespace`\n- `environment`",
      "filesRead": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/subsystems/app-common.md",
        "reference/helm/index.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 5,
        "completeness": 5,
        "precision": 5,
        "navigation": 5,
        "notes": "AI answer correctly identifies all three required fields (chart.version, namespace, environment) with no hallucinations or extraneous information. Cites specific source location, demonstrating navigation to relevant documentation."
      },
      "composite": 100,
      "notFound": false
    },
    {
      "id": "contracts-azure-xrd-naming",
      "category": "contracts",
      "difficulty": "medium",
      "question": "What is the XRD name format and composition name format for Azure AKS in Foxtrot, with examples?",
      "groundTruth": "XRD name format: `{plural}.{group}`, e.g. `reltioaksclusters.foxtrot.reltio.com`. Composition name format: `reltio-aks-cluster-{channel}-{version}`, e.g. `reltio-aks-cluster-prd-v1`.",
      "llmAnswer": "Based on the system architecture documentation:\n\n**XRD Name Format:** `{plural}.{group}`\n- Example: `xreltioaksclusters.reltio.crossplane.io`\n\n**Composition Name Format:** `reltio-aks-cluster-{channel}-{version}`\n- Example: `reltio-aks-cluster-dev-v1`\n\n**XRD Kind:** `xReltioAKSCluster`\n\n**Source:** `compute-common/compositions/`",
      "filesRead": [
        "reference/helm/charts/compute-common-templates-foxtrot-aks.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/helm/index.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/compute-common-templates-foxtrot-aks.md",
        "reference/contracts/index.md",
        "reference/system-architecture.md",
        "reference/subsystems/compute-common.md",
        "reference/helm/index.md"
      ],
      "score": {
        "accuracy": 2,
        "completeness": 4,
        "precision": 2,
        "navigation": 2,
        "notes": "AI correctly identified format patterns ({plural}.{group} and reltio-aks-cluster-{channel}-{version}) but provided incorrect examples. XRD example shows wrong domain (reltio.crossplane.io instead of foxtrot.reltio.com) and wrong plural form (xreltioaksclusters vs reltioaksclusters). Composition example uses wrong channel (dev instead of prd). Agent appears to have found related documentation but not the correct Foxtrot-specific configuration."
      },
      "composite": 50,
      "notFound": false
    },
    {
      "id": "contracts-helm-chart-required-values",
      "category": "contracts",
      "difficulty": "medium",
      "question": "What are the five required values that all app Helm charts must define?",
      "groundTruth": "accountID (string), cloudProvider (string), region (string), domain (string), and imageRegistry (string).",
      "llmAnswer": "According to the Helm Chart Required Values section in the configuration reference, all app Helm charts must define these five required values:\n\n1. `replicaCount` — number of pod replicas\n2. `image.repository` — container image repository\n3. `image.tag` — container image tag\n4. `service.port` — service port number\n5. `resources` — CPU/memory resource requests and limits",
      "filesRead": [
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "reference/contracts/index.md",
        "reference/helm/charts/app-common-charts-cluster.md",
        "reference/configuration.md"
      ],
      "plannedFiles": [
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "reference/contracts/index.md",
        "reference/helm/charts/app-common-charts-cluster.md",
        "reference/configuration.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 0,
        "navigation": 0,
        "notes": "The AI provided five completely incorrect values (replicaCount, image.repository, image.tag, service.port, resources) that bear no relation to the ground truth (accountID, cloudProvider, region, domain, imageRegistry). This appears to be a hallucination of generic Helm chart values rather than the specific required configuration values. The agent failed to locate or consult the correct documentation source."
      },
      "composite": 0,
      "notFound": false
    },
    {
      "id": "contracts-sync-wave-ordering",
      "category": "contracts",
      "difficulty": "hard",
      "question": "What are the ArgoCD sync wave values and what resource types are deployed at each wave?",
      "groundTruth": "Wave -5: prerequisites. Wave 0: infrastructure. Wave 5: state apps. Wave 10: application services. Wave 15: post-deployment resources (monitoring, backup hooks, finalizers).",
      "llmAnswer": "Based on the \"Helm Sync Waves (Bootstrapping Order)\" section in `reference/helm/index.md`:\n\n| Wave | Count | Resources |\n|---|---|---|\n| -1 | 1 | ServiceAccount:pre-sync-app-templates-sa |\n| 0 | 3 | Application:{{, xReltioAKSCluster:{{, ReltioEKSCluster:{{ |\n| 1 | 5 | ClusterRole:pre-sync-app-templates-clusterrole, Application:{{, Workflow:kong-gateway-setup-{{, AccessEntry:{{, Addon:{{ |\n| 2 | 4 | ClusterRoleBinding:pre-sync-app-templates-clusterrolebinding, Workflow:{{, ProviderConfig:{{, ProviderConfig:{{ |\n| 3 | 4 | Workflow:{{, Application:kong-gateway, Release:{{, Release:{{ |\n| 4 | 1 | Release:{{ |\n| 5 | 1 | Job:pre-sync-app-templates-check-job |\n\nThe sync waves range from **-1 to 5**, with wave -1 executing first (ServiceAccount creation), followed by progressive waves that deploy infrastructure resources (Applications, Clusters), RBAC resources (ClusterRoles, ClusterRoleBindings), workflows, and finally validation jobs at wave 5.",
      "filesRead": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd",
        "reference/system-architecture.md"
      ],
      "plannedFiles": [
        "reference/helm/charts/app-common-charts-argocd-apps.md",
        "reference/helm/index.md",
        "reference/subsystems/app-common.md",
        "diagrams/helm-interactions.mmd",
        "reference/system-architecture.md"
      ],
      "score": {
        "accuracy": 0,
        "completeness": 0,
        "precision": 1,
        "navigation": 1,
        "notes": "The AI answer describes a completely different sync wave configuration than ground truth. Ground truth specifies waves at -5, 0, 5, 10, 15 with conceptual categories (prerequisites, infrastructure, state apps, application services, post-deployment). AI provides waves -1 through 5 with specific resource types that don't align with ground truth values or categories. The AI did locate a source file and provided structured output, but the fundamental wave values and resource groupings are incorrect. Either the AI consulted the wrong documentation or the ground truth and AI answer describe different systems entirely."
      },
      "composite": 10,
      "notFound": false
    }
  ]
}