Add Gemini TDD reviews for all 6 products

P1, P2, P3, P4, P6 reviewed by Gemini subagents. P5 reviewed manually (Gemini credential errors). All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.
2026-03-01 00:29:24 +00:00
parent 1101fef096
commit 2fe0ed856e
6 changed files with 501 additions and 0 deletions
--- a/products/05-aws-cost-anomaly/test-architecture/review.md
+++ b/products/05-aws-cost-anomaly/test-architecture/review.md
@@ -0,0 +1,135 @@
+# dd0c/cost — Test Architecture Review
+
+**Reviewer:** TDD Consultant (Manual Review)
+**Date:** March 1, 2026
+**Verdict:** 🔴 NEEDS SIGNIFICANT WORK — This is the weakest test architecture of all 6 products.
+
+---
+
+## 1. Coverage Analysis
+
+This document is 232 lines. For comparison, P1 (route) is 2,241 lines and P6 (run) is 1,762 lines. The coverage gaps are massive.
+
+| Epic | Coverage Status | Notes |
+|------|----------------|-------|
+| Epic 1: CloudTrail Ingestion | ⚠️ Partial | Section 3.1 has 5 test cases for the normalizer. Missing: SQS FIFO deduplication tests, DLQ retry behavior, EventBridge cross-account rule tests, S3 raw event archival. Story 1.2 (SQS + DLQ) has zero dedicated tests. |
+| Epic 2: Anomaly Detection | ✅ Decent | Section 3.2 covers Z-score, novelty, cold-start. But missing: composite score weighting tests, edge cases (zero stddev, negative costs, NaN handling), baseline maturity transition tests. |
+| Epic 3: Zombie Hunter | ❌ Missing | Zero test cases. The daily scan for idle/stopped resources that are still costing money has no tests at all. |
+| Epic 4: Notification & Remediation | ⚠️ Thin | Section 4.2 has 3 integration tests for cross-account actions. Missing: Slack Block Kit formatting tests, daily digest aggregation, snooze/dismiss logic, interactive payload signature validation. |
+| Epic 5: Onboarding & PLG | ❌ Missing | Zero test cases. CloudFormation template generation, Stripe billing, free tier enforcement — none tested. |
+| Epic 6: Dashboard API | ❌ Missing | Zero test cases. REST API endpoints, tenant isolation, query performance — nothing. |
+| Epic 7: Dashboard UI | ❌ Missing | Zero test cases. |
+| Epic 8: Infrastructure (CDK) | ❌ Missing | Zero test cases. No CDK snapshot tests, no infrastructure drift detection (ironic). |
+| Epic 9: Multi-Account Management | ❌ Missing | Zero test cases. Account linking, bulk scanning, cross-account permissions — nothing. |
+| Epic 10: Transparent Factory | 🔴 Skeletal | Section 8 has exactly 3 test cases total across 2 of 5 tenets. Elastic Schema, Cognitive Durability, and Semantic Observability have zero tests. |
+
+**Bottom line:** 5 of 10 epics have zero test coverage in this document. This is a skeleton, not a test architecture.
+
+---
+
+## 2. TDD Workflow Critique
+
+The philosophy in Section 1 is sound — "test the math first" is correct for an anomaly detection product. But the execution is incomplete:
+
+- The "strict TDD" list correctly identifies scoring and governance as test-first. Good.
+- The "integration tests lead" for CloudTrail ingestion is acceptable.
+- **Missing:** No guidance on testing the Welford algorithm implementation. This is a numerical algorithm with known floating-point edge cases (catastrophic cancellation with large values). The test architecture should mandate property-based testing (e.g., `fast-check`) for the baseline calculator, not just 3 example-based tests.
+- **Missing:** No guidance on testing the 14-day auto-promotion state machine. This is a time-dependent state transition that needs fake clock testing.
+
+---
+
+## 3. Test Pyramid Balance
+
+The 70/20/10 ratio is stated but not justified. For dd0c/cost:
+
+- **Unit tests should be higher (80%)** — the anomaly scoring engine is pure math. It should have exhaustive property-based tests, not just 50 example tests.
+- **Integration tests (15%)** — DynamoDB Single-Table patterns, EventBridge→SQS→Lambda pipeline, cross-account STS.
+- **E2E (5%)** — two journeys is fine for V1, but they need to be more detailed.
+
+The current Section 6 (Performance) has exactly 2 test cases. For a product that processes CloudTrail events at scale, this is dangerously thin.
+
+---
+
+## 4. Anti-Patterns
+
+1. **Section 3.3 — Welford Algorithm:** Only 3 tests for a numerical algorithm. This is the "happy path only" anti-pattern. Missing: what happens when stddev is 0 (division by zero in Z-score)? What happens with a single data point? What happens with extremely large values (float overflow)?
+
+2. **Section 4.1 — DynamoDB Transaction Test:** "writes CostEvent and updates Baseline in single transaction" — this tests the happy path. Where's the test for transaction failure? DynamoDB transactions can fail due to conflicts, and the system must handle partial writes.
+
+3. **Section 5 — E2E Journeys:** Journey 2 tests "Stop Instance" remediation but doesn't test what happens when the customer's IAM role has been revoked between alert and remediation click. This is a real-world race condition.
+
+4. **No negative tests anywhere.** What happens when CloudTrail sends malformed JSON? What happens when the pricing table doesn't have the instance type? (Section 3.1 mentions "fallback pricing" but there's only 1 test for it.)
+
+---
+
+## 5. Transparent Factory Gaps
+
+Section 8 is the biggest problem. It has 3 test cases across 2 tenets. Here's what's missing:
+
+### Atomic Flagging (1 test → needs ~10)
+- Missing: flag default state (off), flag TTL enforcement, flag owner metadata, local evaluation (no network calls), CI block on expired flags, multiple concurrent flags.
+- The single circuit breaker test uses ">10 alerts/hour" but Epic 10.1 specifies ">3x baseline" — inconsistency.
+
+### Elastic Schema (0 tests → needs ~8)
+- Zero tests. Need: migration lint (no DROP/RENAME/TYPE), additive-only DynamoDB attribute changes, V1 code ignoring V2 attributes, sunset date enforcement, dual-write during migration.
+
+### Cognitive Durability (0 tests → needs ~5)
+- Zero tests. Need: decision log schema validation, CI enforcement for scoring PRs, cyclomatic complexity gate, decision log presence check.
+
+### Semantic Observability (0 tests → needs ~8)
+- Zero tests. Need: OTEL span emission on every anomaly scoring decision, span attributes (cost.anomaly_score, cost.z_score, cost.baseline_days), PII protection (account ID hashing), fast-path span attributes.
+
+### Configurable Autonomy (2 tests → needs ~8)
+- The 14-day auto-promotion tests are good but incomplete. Missing: panic mode activation (<1s), panic mode stops all alerting, per-account governance override, policy decision logging, governance drift monitoring.
+
+---
+
+## 6. Performance Test Gaps
+
+Section 6 has 2 tests. For a real-time cost monitoring product, this is inadequate:
+
+- **Missing:** Burst ingestion (what happens when 1000 CloudTrail events arrive in 1 second during an auto-scaling event?)
+- **Missing:** Baseline calculation performance with 90 days of historical data per account
+- **Missing:** Anomaly scoring latency under concurrent multi-account evaluation
+- **Missing:** DynamoDB hot partition detection (all events for one account hitting the same partition key)
+- **Missing:** SQS FIFO throughput limits (300 msg/s per message group — what happens when a large account exceeds this?)
+- **Missing:** Lambda cold start impact on end-to-end latency
+
+---
+
+## 7. Missing Test Scenarios
+
+### Security
+- **CloudTrail event forgery:** What if someone sends fake CloudTrail events to the EventBridge bus? HMAC/signature validation?
+- **Slack interactive payload signature:** Slack sends a signing secret with interactive payloads. No test validates this.
+- **Cross-account IAM role revocation:** Customer revokes the dd0c role between alert and remediation click.
+- **Remediation authorization:** Who can click "Terminate"? No RBAC tests for remediation actions.
+
+### Data Integrity
+- **CloudTrail event deduplication:** CloudTrail can send duplicate events. SQS FIFO dedup is mentioned in Epic 1.2 but has zero tests.
+- **Baseline corruption recovery:** What if a DynamoDB write partially fails and corrupts the running mean/stddev? No recovery tests.
+- **Pricing table staleness:** Static pricing tables will become stale. No test validates that the system handles unknown instance types gracefully beyond the single "fallback pricing" test.
+- **Cost calculation precision:** Floating-point arithmetic on money. No tests for rounding behavior or currency precision.
+
+### Operational
+- **DLQ overflow:** What happens when the DLQ fills up? No backpressure tests.
+- **Multi-tenant isolation:** No tests ensuring one tenant's anomalies don't leak to another tenant's Slack channel.
+- **Account onboarding race condition:** What if CloudTrail events arrive before the account is fully onboarded?
+
+---
+
+## 8. Top 5 Recommendations (Prioritized)
+
+1. **Expand to cover all 10 epics.** 5 epics have zero tests. At minimum, add unit test stubs for Zombie Hunter (Epic 3), Onboarding (Epic 5), and Dashboard API (Epic 6). These are customer-facing features.
+
+2. **Rewrite Section 8 (Transparent Factory) from scratch.** 3 tests across 2 tenets is unacceptable. Every tenet needs 5-10 tests. The Elastic Schema and Semantic Observability sections are completely empty.
+
+3. **Add property-based testing for the anomaly math.** The Welford algorithm, Z-score calculation, and composite scoring are numerical — they need `fast-check` or equivalent, not just example-based tests. Test edge cases: zero stddev, single data point, NaN, Infinity, negative costs.
+
+4. **Add security tests.** Slack payload signature validation, CloudTrail event authenticity, cross-account IAM revocation handling, remediation RBAC. This product executes `StopInstances` and `DeleteDBInstance` — security testing is non-negotiable.
+
+5. **Expand performance section to 10+ tests.** Burst ingestion, baseline calculation at scale, DynamoDB hot partitions, SQS FIFO throughput limits, Lambda cold starts. The current 2 tests give zero confidence in production readiness.
+
+---
+
+*This document needs a complete rewrite before it can guide TDD implementation. The scoring engine tests are a good start, but everything else is a placeholder.*