BMad code reviews for P4 (portal) and P5 (cost) — manual

P4: Discovery reliability flagged as existential risk, VCR cassette staleness, ownership conflict race condition, Step Functions→cron gap P5: Concurrent baseline update risk, remediation RBAC gap, pricing staleness, property-based tests need 10K runs, Clock interface needed for governance
2026-03-01 02:06:06 +00:00
parent b7cce013ed
commit 9cc5aeaa03
2 changed files with 127 additions and 0 deletions
--- a/products/04-lightweight-idp/test-architecture/bmad-review.md
+++ b/products/04-lightweight-idp/test-architecture/bmad-review.md
@@ -0,0 +1,64 @@
+# dd0c/portal — BMad Code Review
+
+**Reviewer:** BMad Code Review Agent (Manual)
+**Date:** March 1, 2026
+**Verdict:** Weakest test architecture of the 6 products. Discovery reliability is the existential risk and it's under-tested.
+
+---
+
+## Severity-Rated Findings
+
+### 🔴 Critical
+
+1. **VCR cassettes are a maintenance trap.** The test architecture recommends VCR cassettes over moto for "real AWS API shapes." This is correct in theory, but VCR cassettes become stale when AWS changes API responses (which happens frequently for DescribeServices, ListFunctions, etc.). There's no test or CI step that validates cassette freshness. Recommend: a weekly CI job that re-records cassettes against a real AWS account and diffs them.
+
+2. **No test for discovery scan timeout/partial failure recovery.** If the AWS scanner times out after discovering 500 of 1000 resources, what happens to the catalog? The test architecture mentions "partial discovery failure resilience" but has no concrete test. This is the #1 operational risk — a partial scan could mark 500 services as "stale" and delete them from the catalog.
+
+3. **Ownership conflict resolution has no integration test.** The unit test checks explicit > implicit > heuristic priority, but there's no integration test proving this works when two discovery sources (AWS tags + GitHub CODEOWNERS) claim the same service simultaneously. Race condition risk.
+
+### 🟡 Important
+
+4. **Meilisearch index rebuild "zero-downtime" test is vague.** The test says "verify zero-downtime index swapping during mapping updates" but doesn't specify the mechanism. Meilisearch uses index swapping via `swap-indexes` API — this needs an explicit test proving search queries return results during the swap window.
+
+5. **Step Functions → cron (self-hosted) loses orchestration guarantees.** Step Functions provides: retry with backoff, parallel execution, error handling, state persistence. A simple cron scheduler has none of these. The self-hosted discovery pipeline needs its own error handling tests that don't exist in the current architecture.
+
+6. **GitHub GraphQL rate limit test doesn't cover secondary rate limits.** GitHub has both primary (5000 points/hr) and secondary (100 concurrent requests) rate limits. The test only covers primary. Secondary limits cause 403s that look different from primary 429s.
+
+7. **No test for pgvector search quality.** The architecture uses pgvector for semantic search alongside Meilisearch for full-text. But there's no test proving pgvector returns relevant results for service discovery queries. Embedding quality is not guaranteed.
+
+8. **WebSocket progress streaming has no reconnection test.** If the WebSocket drops during onboarding discovery, does the client reconnect and resume? Or does it show stale progress?
+
+### 🟢 Nice-to-Have
+
+9. **No accessibility tests for the Cmd+K search modal.** Keyboard navigation, screen reader support, focus management.
+
+10. **No test for service catalog data export.** Users will want to export their catalog to CSV/JSON for compliance.
+
+11. **PagerDuty/OpsGenie sync tests don't cover schedule rotation edge cases.** What happens when an on-call rotation changes mid-sync?
+
+---
+
+## V1 Cut List (Skip for MVP)
+
+- Analytics dashboards (Epic 6) — launch with basic catalog, add analytics later
+- PagerDuty/OpsGenie integration — manual on-call mapping is fine for V1
+- pgvector semantic search — Meilisearch full-text is sufficient for V1
+- Performance benchmarks (10K resources) — premature
+- Self-hosted dual-mode tests — cloud-first
+- WebSocket progress streaming — polling fallback is simpler for V1
+
+## Must-Have Before Launch
+
+- Discovery scan timeout/partial failure recovery test
+- Ownership conflict resolution integration test
+- Meilisearch index rebuild zero-downtime test
+- GitHub rate limit handling (both primary and secondary)
+- VCR cassette freshness validation in CI
+- Free tier enforcement (50 services)
+- Cmd+K search latency test (<10ms from Redis cache)
+- Tenant isolation (cross-tenant service visibility)
+- OAuth signup flow (Cognito → tenant creation)
+
+---
+
+*Overall: The "5-Minute Miracle" onboarding is the product's moat, but the discovery pipeline — the thing that makes it work — has the thinnest test coverage. Fix discovery reliability before anything else.*