P1, P2, P3, P4, P6 reviewed by Gemini subagents. P5 reviewed manually (Gemini credential errors). All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.
9.5 KiB
dd0c/cost — Test Architecture Review
Reviewer: TDD Consultant (Manual Review) Date: March 1, 2026 Verdict: 🔴 NEEDS SIGNIFICANT WORK — This is the weakest test architecture of all 6 products.
1. Coverage Analysis
This document is 232 lines. For comparison, P1 (route) is 2,241 lines and P6 (run) is 1,762 lines. The coverage gaps are massive.
| Epic | Coverage Status | Notes |
|---|---|---|
| Epic 1: CloudTrail Ingestion | ⚠️ Partial | Section 3.1 has 5 test cases for the normalizer. Missing: SQS FIFO deduplication tests, DLQ retry behavior, EventBridge cross-account rule tests, S3 raw event archival. Story 1.2 (SQS + DLQ) has zero dedicated tests. |
| Epic 2: Anomaly Detection | ✅ Decent | Section 3.2 covers Z-score, novelty, cold-start. But missing: composite score weighting tests, edge cases (zero stddev, negative costs, NaN handling), baseline maturity transition tests. |
| Epic 3: Zombie Hunter | ❌ Missing | Zero test cases. The daily scan for idle/stopped resources that are still costing money has no tests at all. |
| Epic 4: Notification & Remediation | ⚠️ Thin | Section 4.2 has 3 integration tests for cross-account actions. Missing: Slack Block Kit formatting tests, daily digest aggregation, snooze/dismiss logic, interactive payload signature validation. |
| Epic 5: Onboarding & PLG | ❌ Missing | Zero test cases. CloudFormation template generation, Stripe billing, free tier enforcement — none tested. |
| Epic 6: Dashboard API | ❌ Missing | Zero test cases. REST API endpoints, tenant isolation, query performance — nothing. |
| Epic 7: Dashboard UI | ❌ Missing | Zero test cases. |
| Epic 8: Infrastructure (CDK) | ❌ Missing | Zero test cases. No CDK snapshot tests, no infrastructure drift detection (ironic). |
| Epic 9: Multi-Account Management | ❌ Missing | Zero test cases. Account linking, bulk scanning, cross-account permissions — nothing. |
| Epic 10: Transparent Factory | 🔴 Skeletal | Section 8 has exactly 3 test cases total across 2 of 5 tenets. Elastic Schema, Cognitive Durability, and Semantic Observability have zero tests. |
Bottom line: 5 of 10 epics have zero test coverage in this document. This is a skeleton, not a test architecture.
2. TDD Workflow Critique
The philosophy in Section 1 is sound — "test the math first" is correct for an anomaly detection product. But the execution is incomplete:
- The "strict TDD" list correctly identifies scoring and governance as test-first. Good.
- The "integration tests lead" for CloudTrail ingestion is acceptable.
- Missing: No guidance on testing the Welford algorithm implementation. This is a numerical algorithm with known floating-point edge cases (catastrophic cancellation with large values). The test architecture should mandate property-based testing (e.g.,
fast-check) for the baseline calculator, not just 3 example-based tests. - Missing: No guidance on testing the 14-day auto-promotion state machine. This is a time-dependent state transition that needs fake clock testing.
3. Test Pyramid Balance
The 70/20/10 ratio is stated but not justified. For dd0c/cost:
- Unit tests should be higher (80%) — the anomaly scoring engine is pure math. It should have exhaustive property-based tests, not just 50 example tests.
- Integration tests (15%) — DynamoDB Single-Table patterns, EventBridge→SQS→Lambda pipeline, cross-account STS.
- E2E (5%) — two journeys is fine for V1, but they need to be more detailed.
The current Section 6 (Performance) has exactly 2 test cases. For a product that processes CloudTrail events at scale, this is dangerously thin.
4. Anti-Patterns
-
Section 3.3 — Welford Algorithm: Only 3 tests for a numerical algorithm. This is the "happy path only" anti-pattern. Missing: what happens when stddev is 0 (division by zero in Z-score)? What happens with a single data point? What happens with extremely large values (float overflow)?
-
Section 4.1 — DynamoDB Transaction Test: "writes CostEvent and updates Baseline in single transaction" — this tests the happy path. Where's the test for transaction failure? DynamoDB transactions can fail due to conflicts, and the system must handle partial writes.
-
Section 5 — E2E Journeys: Journey 2 tests "Stop Instance" remediation but doesn't test what happens when the customer's IAM role has been revoked between alert and remediation click. This is a real-world race condition.
-
No negative tests anywhere. What happens when CloudTrail sends malformed JSON? What happens when the pricing table doesn't have the instance type? (Section 3.1 mentions "fallback pricing" but there's only 1 test for it.)
5. Transparent Factory Gaps
Section 8 is the biggest problem. It has 3 test cases across 2 tenets. Here's what's missing:
Atomic Flagging (1 test → needs ~10)
- Missing: flag default state (off), flag TTL enforcement, flag owner metadata, local evaluation (no network calls), CI block on expired flags, multiple concurrent flags.
- The single circuit breaker test uses ">10 alerts/hour" but Epic 10.1 specifies ">3x baseline" — inconsistency.
Elastic Schema (0 tests → needs ~8)
- Zero tests. Need: migration lint (no DROP/RENAME/TYPE), additive-only DynamoDB attribute changes, V1 code ignoring V2 attributes, sunset date enforcement, dual-write during migration.
Cognitive Durability (0 tests → needs ~5)
- Zero tests. Need: decision log schema validation, CI enforcement for scoring PRs, cyclomatic complexity gate, decision log presence check.
Semantic Observability (0 tests → needs ~8)
- Zero tests. Need: OTEL span emission on every anomaly scoring decision, span attributes (cost.anomaly_score, cost.z_score, cost.baseline_days), PII protection (account ID hashing), fast-path span attributes.
Configurable Autonomy (2 tests → needs ~8)
- The 14-day auto-promotion tests are good but incomplete. Missing: panic mode activation (<1s), panic mode stops all alerting, per-account governance override, policy decision logging, governance drift monitoring.
6. Performance Test Gaps
Section 6 has 2 tests. For a real-time cost monitoring product, this is inadequate:
- Missing: Burst ingestion (what happens when 1000 CloudTrail events arrive in 1 second during an auto-scaling event?)
- Missing: Baseline calculation performance with 90 days of historical data per account
- Missing: Anomaly scoring latency under concurrent multi-account evaluation
- Missing: DynamoDB hot partition detection (all events for one account hitting the same partition key)
- Missing: SQS FIFO throughput limits (300 msg/s per message group — what happens when a large account exceeds this?)
- Missing: Lambda cold start impact on end-to-end latency
7. Missing Test Scenarios
Security
- CloudTrail event forgery: What if someone sends fake CloudTrail events to the EventBridge bus? HMAC/signature validation?
- Slack interactive payload signature: Slack sends a signing secret with interactive payloads. No test validates this.
- Cross-account IAM role revocation: Customer revokes the dd0c role between alert and remediation click.
- Remediation authorization: Who can click "Terminate"? No RBAC tests for remediation actions.
Data Integrity
- CloudTrail event deduplication: CloudTrail can send duplicate events. SQS FIFO dedup is mentioned in Epic 1.2 but has zero tests.
- Baseline corruption recovery: What if a DynamoDB write partially fails and corrupts the running mean/stddev? No recovery tests.
- Pricing table staleness: Static pricing tables will become stale. No test validates that the system handles unknown instance types gracefully beyond the single "fallback pricing" test.
- Cost calculation precision: Floating-point arithmetic on money. No tests for rounding behavior or currency precision.
Operational
- DLQ overflow: What happens when the DLQ fills up? No backpressure tests.
- Multi-tenant isolation: No tests ensuring one tenant's anomalies don't leak to another tenant's Slack channel.
- Account onboarding race condition: What if CloudTrail events arrive before the account is fully onboarded?
8. Top 5 Recommendations (Prioritized)
-
Expand to cover all 10 epics. 5 epics have zero tests. At minimum, add unit test stubs for Zombie Hunter (Epic 3), Onboarding (Epic 5), and Dashboard API (Epic 6). These are customer-facing features.
-
Rewrite Section 8 (Transparent Factory) from scratch. 3 tests across 2 tenets is unacceptable. Every tenet needs 5-10 tests. The Elastic Schema and Semantic Observability sections are completely empty.
-
Add property-based testing for the anomaly math. The Welford algorithm, Z-score calculation, and composite scoring are numerical — they need
fast-checkor equivalent, not just example-based tests. Test edge cases: zero stddev, single data point, NaN, Infinity, negative costs. -
Add security tests. Slack payload signature validation, CloudTrail event authenticity, cross-account IAM revocation handling, remediation RBAC. This product executes
StopInstancesandDeleteDBInstance— security testing is non-negotiable. -
Expand performance section to 10+ tests. Burst ingestion, baseline calculation at scale, DynamoDB hot partitions, SQS FIFO throughput limits, Lambda cold starts. The current 2 tests give zero confidence in production readiness.
This document needs a complete rewrite before it can guide TDD implementation. The scoring engine tests are a good start, but everything else is a placeholder.