diff --git a/products/04-lightweight-idp/test-architecture/bmad-review.md b/products/04-lightweight-idp/test-architecture/bmad-review.md
new file mode 100644
index 0000000..58d4fe2
--- /dev/null
+++ b/products/04-lightweight-idp/test-architecture/bmad-review.md
@@ -0,0 +1,64 @@
+# dd0c/portal — BMad Code Review
+
+**Reviewer:** BMad Code Review Agent (Manual)
+**Date:** March 1, 2026
+**Verdict:** Weakest test architecture of the 6 products. Discovery reliability is the existential risk and it's under-tested.
+
+---
+
+## Severity-Rated Findings
+
+### 🔴 Critical
+
+1. **VCR cassettes are a maintenance trap.** The test architecture recommends VCR cassettes over moto for "real AWS API shapes." This is correct in theory, but VCR cassettes become stale when AWS changes API responses (which happens frequently for DescribeServices, ListFunctions, etc.). There's no test or CI step that validates cassette freshness. Recommend: a weekly CI job that re-records cassettes against a real AWS account and diffs them.
+
+2. **No test for discovery scan timeout/partial failure recovery.** If the AWS scanner times out after discovering 500 of 1000 resources, what happens to the catalog? The test architecture mentions "partial discovery failure resilience" but has no concrete test. This is the #1 operational risk — a partial scan could mark 500 services as "stale" and delete them from the catalog.
+
+3. **Ownership conflict resolution has no integration test.** The unit test checks explicit > implicit > heuristic priority, but there's no integration test proving this works when two discovery sources (AWS tags + GitHub CODEOWNERS) claim the same service simultaneously. Race condition risk.
+
+### 🟡 Important
+
+4. **Meilisearch index rebuild "zero-downtime" test is vague.** The test says "verify zero-downtime index swapping during mapping updates" but doesn't specify the mechanism. Meilisearch uses index swapping via `swap-indexes` API — this needs an explicit test proving search queries return results during the swap window.
+
+5. **Step Functions → cron (self-hosted) loses orchestration guarantees.** Step Functions provides: retry with backoff, parallel execution, error handling, state persistence. A simple cron scheduler has none of these. The self-hosted discovery pipeline needs its own error handling tests that don't exist in the current architecture.
+
+6. **GitHub GraphQL rate limit test doesn't cover secondary rate limits.** GitHub has both primary (5000 points/hr) and secondary (100 concurrent requests) rate limits. The test only covers primary. Secondary limits cause 403s that look different from primary 429s.
+
+7. **No test for pgvector search quality.** The architecture uses pgvector for semantic search alongside Meilisearch for full-text. But there's no test proving pgvector returns relevant results for service discovery queries. Embedding quality is not guaranteed.
+
+8. **WebSocket progress streaming has no reconnection test.** If the WebSocket drops during onboarding discovery, does the client reconnect and resume? Or does it show stale progress?
+
+### 🟢 Nice-to-Have
+
+9. **No accessibility tests for the Cmd+K search modal.** Keyboard navigation, screen reader support, focus management.
+
+10. **No test for service catalog data export.** Users will want to export their catalog to CSV/JSON for compliance.
+
+11. **PagerDuty/OpsGenie sync tests don't cover schedule rotation edge cases.** What happens when an on-call rotation changes mid-sync?
+
+---
+
+## V1 Cut List (Skip for MVP)
+
+- Analytics dashboards (Epic 6) — launch with basic catalog, add analytics later
+- PagerDuty/OpsGenie integration — manual on-call mapping is fine for V1
+- pgvector semantic search — Meilisearch full-text is sufficient for V1
+- Performance benchmarks (10K resources) — premature
+- Self-hosted dual-mode tests — cloud-first
+- WebSocket progress streaming — polling fallback is simpler for V1
+
+## Must-Have Before Launch
+
+- Discovery scan timeout/partial failure recovery test
+- Ownership conflict resolution integration test
+- Meilisearch index rebuild zero-downtime test
+- GitHub rate limit handling (both primary and secondary)
+- VCR cassette freshness validation in CI
+- Free tier enforcement (50 services)
+- Cmd+K search latency test (<10ms from Redis cache)
+- Tenant isolation (cross-tenant service visibility)
+- OAuth signup flow (Cognito → tenant creation)
+
+---
+
+*Overall: The "5-Minute Miracle" onboarding is the product's moat, but the discovery pipeline — the thing that makes it work — has the thinnest test coverage. Fix discovery reliability before anything else.*
diff --git a/products/05-aws-cost-anomaly/test-architecture/bmad-review.md b/products/05-aws-cost-anomaly/test-architecture/bmad-review.md
new file mode 100644
index 0000000..aaf6787
--- /dev/null
+++ b/products/05-aws-cost-anomaly/test-architecture/bmad-review.md
@@ -0,0 +1,63 @@
+# dd0c/cost — BMad Code Review
+
+**Reviewer:** BMad Code Review Agent (Manual)
+**Date:** March 1, 2026
+**Verdict:** Test architecture is solid for the math core but has gaps in operational and self-hosted paths.
+
+---
+
+## Severity-Rated Findings
+
+### 🔴 Critical
+
+1. **No test for concurrent baseline updates from multiple Lambda invocations.** The Welford algorithm updates mean/stddev incrementally. If two Lambda invocations process events for the same resource type simultaneously, the DynamoDB TransactWriteItem will handle conflicts — but there's no test proving the baseline converges correctly after a conflict retry. This is a data corruption risk.
+
+2. **Remediation authorization is under-specified.** The test suite validates Slack signature and cross-account STS, but doesn't test WHO can click "Stop Instance." Any user in the Slack workspace can click the button. There's no RBAC test ensuring only authorized users (e.g., account owners) can trigger destructive remediation.
+
+3. **No test for pricing table version mismatch.** The static pricing table is bundled with the Lambda. If AWS changes pricing and the table is stale, cost calculations will be wrong. There's no test for: (a) detecting staleness, (b) graceful degradation when pricing is unknown, beyond the single "fallback pricing" test.
+
+### 🟡 Important
+
+4. **Property-based tests need explicit shrinking strategy.** The fast-check tests define properties but don't specify `numRuns` or shrinking behavior. Default is 100 runs — for financial math, recommend 10,000 runs minimum with explicit seed for reproducibility.
+
+5. **14-day auto-promotion is time-dependent — flakiness risk.** The governance tests use "simulate 14 days" but don't specify how time is injected. If using real `Date.now()`, these tests will be flaky. Need a Clock interface (same pattern as dd0c/alert's FakeClock).
+
+6. **Zombie Hunter has no test for CloudWatch API throttling.** The daily scan calls DescribeInstances + GetMetricStatistics across potentially hundreds of instances. CloudWatch throttles at 400 TPS. No test for backoff/retry behavior.
+
+7. **DynamoDB→PostgreSQL JSONB migration (self-hosted) has no behavioral parity tests.** DynamoDB conditional writes behave differently from PostgreSQL UPSERT with conflict handling. The TransactWriteItem tests need a PostgreSQL equivalent that proves identical behavior.
+
+8. **No test for EventBridge rule lag.** CloudTrail events can arrive 5-15 minutes after the API call. The test suite assumes near-real-time delivery. Need a test proving the system handles delayed events correctly (e.g., doesn't double-count if a delayed event arrives after a scan).
+
+### 🟢 Nice-to-Have
+
+9. **Zombie Hunter could use property-based testing for cost estimation.** The cumulative waste calculation (days × hourly rate) is simple math but could have precision issues over long periods.
+
+10. **No chaos test for Redis failure during panic mode check.** If Redis is down when panic mode is checked, does the system default to "panic active" (safe) or "panic inactive" (dangerous)?
+
+11. **Dashboard API pagination tests don't cover DynamoDB LastEvaluatedKey edge cases.** When the last page has exactly `limit` items, DynamoDB still returns a LastEvaluatedKey. The cursor implementation must handle this.
+
+---
+
+## V1 Cut List (Skip for MVP)
+
+- Multi-account management (Epic 9) — launch with single account, add multi later
+- Dashboard UI Playwright tests (Epic 7) — manual testing sufficient for V1
+- Zombie Hunter (Epic 3) — nice-to-have, not core value prop
+- Performance/load tests — premature until you have real traffic
+- Self-hosted dual-mode tests — cloud-first, self-hosted later
+
+## Must-Have Before Launch
+
+- All property-based tests for Z-score/Welford/composite scoring (with 10K runs)
+- Clock interface for time-dependent governance tests
+- Concurrent baseline update conflict test
+- Remediation RBAC test (who can click "Stop Instance")
+- Slack signature validation (timing-safe)
+- Cross-account STS with IAM revocation handling
+- Circuit breaker trip/reset cycle
+- Free tier enforcement (1 account limit)
+- Panic mode activation/deactivation round-trip
+
+---
+
+*Overall: The math core is well-tested. The operational edges (concurrency, authorization, pricing staleness) need work before launch.*