# dd0c Platform - BDD Specification Gap Analysis ## Executive Summary This gap analysis compares the BDD acceptance specifications against the currently implemented Node.js/Fastify source code and PostgreSQL database migrations for the dd0c monorepo (P2-P6). Overall, the **Dashboard APIs** required by the React Console are highly implemented across all services. The frontend will successfully render and operate. The major gaps lie in the out-of-band background workers, external agents, robust message queuing (SQS/DLQ), and advanced intelligence/scoring heuristics. **Estimated Implementation Completion:** * **P4 - Lightweight IDP:** ~75% (Core scanners, catalog, and search are functional) * **P3 - Alert Intelligence:** ~65% (Ingestion, basic correlation, and UI APIs are solid) * **P5 - AWS Cost Anomaly:** ~50% (Scorer and APIs exist, but CloudTrail ingestion is missing) * **P6 - Runbook Automation:** ~40% (APIs and Slackbot exist; parsing, classification, and agent execution are completely missing) * **P2 - IaC Drift Detection:** ~30% (SaaS ingestion APIs exist; the entire external agent, mTLS, and diff engines are missing) --- ## Per-Service Breakdown by Epic ### P2: IaC Drift Detection * **Epic 1: Drift Detection Agent** ❌ **MISSING** - No Go agent binary. Terraform, CloudFormation, Kubernetes, and Pulumi state scanning engines do not exist. Secret scrubbing logic is missing. * **Epic 2: Agent Communication** 🟑 **PARTIAL** - Basic HTTP ingestion route exists (`/v1/ingest/drift`), but mTLS authentication and SQS FIFO message queues are not implemented. * **Epic 3: Event Processor** 🟑 **PARTIAL** - Ingestion, nonce replay prevention, and PostgreSQL persistence with RLS are implemented. Missing canonical schema normalization and chunked report reassembly. * **Epic 4: Notification Engine** 🟑 **PARTIAL** - Slack Block Kit, Email (Resend), Webhook, and PagerDuty dispatchers are implemented. Missing Daily Digest job and severity-based routing logic. * **Epic 5: Remediation** ❌ **MISSING** - Interactive Slack buttons exist in notification payloads, but the backend workflow engine, approval tracking, and agent-side execution dispatch are missing. * **Epic 6 & 7: Dashboard UI & API** βœ… **IMPLEMENTED** - `fetchStacks`, `fetchStackHistory`, and `fetchLatestReport` endpoints are fully implemented with tenant RLS. * **Epic 8 & 9: Infrastructure / PLG** ❌ **MISSING** - No CDK templates, CI/CD pipelines, Stripe billing, or CLI setup logic. * **Epic 10: Transparent Factory** 🟑 **PARTIAL** - Database migrations and RLS are implemented. Missing Feature Flag service and OTEL Tracing. ### P3: Alert Intelligence * **Epic 1: Webhook Ingestion** 🟑 **PARTIAL** - Webhook routes and HMAC validation for Datadog, PagerDuty, OpsGenie, and Grafana are implemented via Redis queue. Missing S3 archival, oversized payload handling, and SQS/DLQ. * **Epic 2: Alert Normalization** 🟑 **PARTIAL** - Basic provider mapping logic exists in `webhook-processor.ts`. * **Epic 3: Correlation Engine** 🟑 **PARTIAL** - Time-window correlation and fingerprint deduplication are implemented using Redis. Missing Service-Affinity matching and strict cross-tenant worker isolation. * **Epic 4: Notification & Escalation** 🟑 **PARTIAL** - Slack, Email, and Webhook dispatchers are implemented. Missing PagerDuty auto-escalation cron and Daily Noise Report. * **Epic 5: Slack Bot** 🟑 **PARTIAL** - Missing interactive feedback button handlers (`/slack/interactions`) for noise/helpful marking, and missing `/dd0c` slash commands. * **Epic 6 & 7: Dashboard UI & API** 🟑 **PARTIAL** - Incident CRUD, filtering, and summary endpoints are implemented. Missing `MTTR` and `Noise Reduction` analytics endpoints requested by the spec. * **Epic 8 & 9: Infrastructure / PLG** ❌ **MISSING** - No CDK, Stripe billing, or Free Tier (10K alerts/month) limit enforcement. ### P4: Lightweight IDP * **Epic 1: AWS Discovery Scanner** 🟑 **PARTIAL** - ECS, Lambda, and RDS resource discovery implemented. Missing CloudFormation, API Gateway, and Step Functions orchestration. * **Epic 2: GitHub Discovery Scanner** 🟑 **PARTIAL** - Repository fetching, pagination, and basic `package.json`/`Dockerfile` heuristics implemented. Missing advanced CODEOWNERS and commit history parsing. * **Epic 3: Service Catalog** 🟑 **PARTIAL** - Catalog ingestion, partial update staging, ownership resolution, and DB APIs implemented. Missing PagerDuty/OpsGenie on-call mapping. * **Epic 4: Search Engine** 🟑 **PARTIAL** - Meilisearch integration with PostgreSQL fallback implemented. Missing Redis prefix caching for `Cmd+K` performance optimization. * **Epic 5: Dashboard API** βœ… **IMPLEMENTED** - Service CRUD and ownership summary endpoints are fully functional and align with Console requirements. * **Epic 6: Analytics Dashboards** ❌ **MISSING** - API endpoints for Ownership Coverage, Health Scorecards, and Tech Debt tracking are missing. ### P5: AWS Cost Anomaly * **Epic 1: CloudTrail Ingestion** ❌ **MISSING** - A batch ingestion API exists, but the AWS EventBridge cross-account rules, SQS FIFO, and Lambda normalizer are entirely missing. * **Epic 2: Anomaly Detection** 🟑 **PARTIAL** - Welford's algorithm and basic Z-Score computation are implemented. Missing novelty scoring, cold-start fast path, and composite scoring logic. * **Epic 3: Zombie Hunter** ❌ **MISSING** - No scheduled jobs or logic to detect idle EC2, RDS, or EBS resources. * **Epic 4: Notification & Remediation** 🟑 **PARTIAL** - Slack notification generation is implemented. Missing the `/slack/interactions` endpoint to process remediation buttons (e.g., Stop Instance). * **Epic 6 & 7: Dashboard UI & API** βœ… **IMPLEMENTED** - Anomalies, Baselines, and Governance rule CRUD endpoints match Console expectations. * **Epic 10: Transparent Factory** 🟑 **PARTIAL** - The 14-day `GovernanceEngine` (Shadow -> Audit -> Enforce) auto-promotion and Panic Mode logic is implemented. Missing Circuit Breakers and OTEL spans. ### P6: Runbook Automation * **Epic 1: Runbook Parser** ❌ **MISSING** - The system currently expects raw YAML inputs. Confluence HTML, Notion Markdown, and LLM step extraction parsing engines are entirely missing. * **Epic 2: Action Classifier** ❌ **MISSING** - Neither the deterministic regex safety scanner nor the secondary LLM risk classifier exist. * **Epic 3: Execution Engine** 🟑 **PARTIAL** - Basic state transitions are handled in `api/runbooks.ts`. Missing Trust Level enforcement, network partition recovery, and step idempotency logic. * **Epic 4: Agent** ❌ **MISSING** - No Go agent binary, gRPC bidirectional streaming, or local sandbox execution environments exist. * **Epic 5: Audit Trail** 🟑 **PARTIAL** - Basic Postgres `audit_entries` table exists. Missing the immutable append-only hash chain logic and CSV/PDF compliance export APIs. * **Epic 6: Dashboard API** βœ… **IMPLEMENTED** - Runbook, execution, and approval APIs are implemented. Redis pub/sub Agent Bridge exists. Slackbot interaction handlers are fully implemented with signature verification. --- ## Priority Ranking (What to Implement Next) This ranking is based on maximizing time-to-value: prioritizing services where the Console UI is already supported, the backend logic is mostly complete, and the remaining gaps are well-defined. **1. P4 - Lightweight IDP** * **Why:** It is functionally the most complete. The Console APIs work, Meilisearch sync works, and basic AWS/GitHub discovery is operational. * **Next Steps:** Implement the missing AWS scanners (CloudFormation, API Gateway) and the `Redis` prefix caching for search. Add the analytics endpoints (Ownership, Health, Tech Debt) to unlock the remaining UI views. **2. P3 - Alert Intelligence** * **Why:** The core pipeline (Webhook -> Redis -> Worker -> DB) is functional and deduplication logic works. Console APIs are satisfied. * **Next Steps:** Build the `MTTR` and `Noise Reduction` analytics SQL queries, add PagerDuty escalation triggers, and implement the interactive Slack button handlers. **3. P5 - AWS Cost Anomaly** * **Why:** The complex math (Welford running stats) and database governance logic are done, making the dashboard functional for demo data. * **Next Steps:** The biggest blocker is that there is no data pipeline. Implement the CDK stack to deploy the EventBridge rules and the `Lambda Normalizer` to translate CloudTrail events into the existing `/v1/ingest` API. **4. P6 - Runbook Automation** * **Why:** The API orchestration, Slack integrations, and Redis Pub/Sub bridges are nicely implemented, but it is currently a "brain without a body." * **Next Steps:** It requires two massive standalone systems: the `Runbook Parser` (LLM + AST logic) and the actual external `Agent` (Go binary with gRPC and sandboxing). **5. P2 - IaC Drift Detection** * **Why:** Furthest from completion. While the SaaS API exists, it requires a highly complex external Go agent capable of reading Terraform/K8s/Pulumi state, a secure mTLS CA registration system, and a diffing/scoring engineβ€”none of which currently exist.