BMad code reviews complete for all 6 products
P1 route: Gemini — 'Ship the proxy, stop writing tests for the tests' P2 drift: Gemini — mTLS revocation, state lock corruption, RLS pool leak P3 alert: Gemini — replay attacks, trace propagation, SQS claim-check P4 portal: Manual — discovery reliability is existential risk P5 cost: Manual — concurrent baselines, remediation RBAC, pricing staleness P6 run: Gemini — policy update loophole, AST parsing, audit streaming
This commit is contained in:
@@ -0,0 +1,47 @@
|
||||
# dd0c/alert — BMad Code Review
|
||||
|
||||
**Reviewer:** BMad Code Review Agent (Gemini)
|
||||
**Date:** March 1, 2026
|
||||
|
||||
---
|
||||
|
||||
## Severity-Rated Findings
|
||||
|
||||
### 🔴 Critical
|
||||
|
||||
1. **Ingestion Security: Replay Attack Vulnerability.** HMAC tests validate signatures but don't enforce timestamp freshness. Datadog and PagerDuty have timestamp headers, but OpsGenie doesn't always package it cleanly. Without rejecting payloads older than ~5 minutes, an attacker can capture a valid webhook and spam the ingestion endpoint, blowing up SQS queues and Redis windows.
|
||||
|
||||
2. **Trace Propagation: Cross-SQS CI Verification.** Checking that `traceparent` is attached to SQS MessageAttributes isn't enough. AWS SDKs often drop or mangle these when crossing into ECS. CI needs to assert on the *reassembled* trace tree — verify the ECS span registers as a child of the Lambda ingestion span, not a disconnected root span.
|
||||
|
||||
3. **Multi-tenancy: Confused Deputy Vulnerabilities.** Partition key enforcement tests only check the happy path for a single tenant. Need explicit negative tests: insert data for Tenant A and Tenant B, query using Tenant A's JWT, explicitly assert Tenant B's data is `undefined` in the result set.
|
||||
|
||||
### 🟡 Important
|
||||
|
||||
4. **Correlation Quality: Out-of-Order Delivery.** Tests likely simulate perfect chronological delivery. Distributed monitoring is messy. What happens if an alert from T-minus-30s arrives *after* the correlation window has closed and shipped the incident to SQS? Does it trigger a duplicate Slack ping, or correctly attach to the existing incident timeline?
|
||||
|
||||
5. **SQS 256KB: Claim-Check Edge Cases.** Compression + S3 pointers is standard, but tests must cover: (a) S3 put/get latency causing Lambda timeouts, (b) orphaned S3 pointers without lifecycle rules, (c) ECS Fargate failing to fetch payload due to IAM boundary issues.
|
||||
|
||||
6. **Self-Hosted Mode: Behavioral Gaps.** DynamoDB scales connections and handles TTL natively. PostgreSQL requires explicit connection pooling (PgBouncer) and manual partitioning/pruning. Lambda recycles memory; Fastify is persistent — a tiny memory leak in the correlation engine will crash Fastify but go unnoticed in Lambda. Test suite doesn't simulate long-running process memory limits or Postgres connection exhaustion.
|
||||
|
||||
---
|
||||
|
||||
## V1 Cut List
|
||||
|
||||
- Self-hosted mode / DB abstractions — pick AWS SaaS and commit. Supporting Postgres/Fastify doubles testing surface for zero immediate revenue.
|
||||
- Dashboard UI E2E (Playwright) — test the API thoroughly, visually verify the UI.
|
||||
- OTEL trace propagation tests — visually verify in Jaeger once.
|
||||
- DLQ replay with backpressure — manual replay is fine for V1.
|
||||
- Slack circuit breaker — if Slack is down, alerts queue. Accept it.
|
||||
|
||||
## Must-Have Before Launch
|
||||
|
||||
1. **HMAC timestamp validation** — reject payloads older than 5 minutes (all 4 sources).
|
||||
2. **Cross-tenant negative tests** — explicitly assert data isolation between tenants.
|
||||
3. **Correlation window edge cases** — out-of-order delivery, late arrivals after window close.
|
||||
4. **SQS 256KB S3 pointer round-trip** — prove the claim-check pattern works end-to-end.
|
||||
5. **Free tier enforcement** — 10K alerts/month counter, 7-day retention purge.
|
||||
6. **Slack signature validation** — timing-safe HMAC for interactive payloads.
|
||||
|
||||
---
|
||||
|
||||
*"If you aren't testing for leakage, it will leak."*
|
||||
Reference in New Issue
Block a user