BMad code reviews for P4 (portal) and P5 (cost) — manual

P4: Discovery reliability flagged as existential risk, VCR cassette staleness, ownership conflict race condition, Step Functions→cron gap P5: Concurrent baseline update risk, remediation RBAC gap, pricing staleness, property-based tests need 10K runs, Clock interface needed for governance
2026-03-01 02:06:06 +00:00
parent b7cce013ed
commit 9cc5aeaa03
2 changed files with 127 additions and 0 deletions
--- a/products/05-aws-cost-anomaly/test-architecture/bmad-review.md
+++ b/products/05-aws-cost-anomaly/test-architecture/bmad-review.md
@@ -0,0 +1,63 @@
+# dd0c/cost — BMad Code Review
+
+**Reviewer:** BMad Code Review Agent (Manual)
+**Date:** March 1, 2026
+**Verdict:** Test architecture is solid for the math core but has gaps in operational and self-hosted paths.
+
+---
+
+## Severity-Rated Findings
+
+### 🔴 Critical
+
+1. **No test for concurrent baseline updates from multiple Lambda invocations.** The Welford algorithm updates mean/stddev incrementally. If two Lambda invocations process events for the same resource type simultaneously, the DynamoDB TransactWriteItem will handle conflicts — but there's no test proving the baseline converges correctly after a conflict retry. This is a data corruption risk.
+
+2. **Remediation authorization is under-specified.** The test suite validates Slack signature and cross-account STS, but doesn't test WHO can click "Stop Instance." Any user in the Slack workspace can click the button. There's no RBAC test ensuring only authorized users (e.g., account owners) can trigger destructive remediation.
+
+3. **No test for pricing table version mismatch.** The static pricing table is bundled with the Lambda. If AWS changes pricing and the table is stale, cost calculations will be wrong. There's no test for: (a) detecting staleness, (b) graceful degradation when pricing is unknown, beyond the single "fallback pricing" test.
+
+### 🟡 Important
+
+4. **Property-based tests need explicit shrinking strategy.** The fast-check tests define properties but don't specify `numRuns` or shrinking behavior. Default is 100 runs — for financial math, recommend 10,000 runs minimum with explicit seed for reproducibility.
+
+5. **14-day auto-promotion is time-dependent — flakiness risk.** The governance tests use "simulate 14 days" but don't specify how time is injected. If using real `Date.now()`, these tests will be flaky. Need a Clock interface (same pattern as dd0c/alert's FakeClock).
+
+6. **Zombie Hunter has no test for CloudWatch API throttling.** The daily scan calls DescribeInstances + GetMetricStatistics across potentially hundreds of instances. CloudWatch throttles at 400 TPS. No test for backoff/retry behavior.
+
+7. **DynamoDB→PostgreSQL JSONB migration (self-hosted) has no behavioral parity tests.** DynamoDB conditional writes behave differently from PostgreSQL UPSERT with conflict handling. The TransactWriteItem tests need a PostgreSQL equivalent that proves identical behavior.
+
+8. **No test for EventBridge rule lag.** CloudTrail events can arrive 5-15 minutes after the API call. The test suite assumes near-real-time delivery. Need a test proving the system handles delayed events correctly (e.g., doesn't double-count if a delayed event arrives after a scan).
+
+### 🟢 Nice-to-Have
+
+9. **Zombie Hunter could use property-based testing for cost estimation.** The cumulative waste calculation (days × hourly rate) is simple math but could have precision issues over long periods.
+
+10. **No chaos test for Redis failure during panic mode check.** If Redis is down when panic mode is checked, does the system default to "panic active" (safe) or "panic inactive" (dangerous)?
+
+11. **Dashboard API pagination tests don't cover DynamoDB LastEvaluatedKey edge cases.** When the last page has exactly `limit` items, DynamoDB still returns a LastEvaluatedKey. The cursor implementation must handle this.
+
+---
+
+## V1 Cut List (Skip for MVP)
+
+- Multi-account management (Epic 9) — launch with single account, add multi later
+- Dashboard UI Playwright tests (Epic 7) — manual testing sufficient for V1
+- Zombie Hunter (Epic 3) — nice-to-have, not core value prop
+- Performance/load tests — premature until you have real traffic
+- Self-hosted dual-mode tests — cloud-first, self-hosted later
+
+## Must-Have Before Launch
+
+- All property-based tests for Z-score/Welford/composite scoring (with 10K runs)
+- Clock interface for time-dependent governance tests
+- Concurrent baseline update conflict test
+- Remediation RBAC test (who can click "Stop Instance")
+- Slack signature validation (timing-safe)
+- Cross-account STS with IAM revocation handling
+- Circuit breaker trip/reset cycle
+- Free tier enforcement (1 account limit)
+- Panic mode activation/deactivation round-trip
+
+---
+
+*Overall: The math core is well-tested. The operational edges (concurrency, authorization, pricing staleness) need work before launch.*