Add BMad review epic addendums for all 6 products
Per-product surgical additions to existing epics (not cross-cutting): - P1 route: 8pts (key redaction, SSE billing, token math, CI runner) - P2 drift: 12pts (mTLS revocation, state lock recovery, pgmq visibility, RLS leak, entropy scrubber) - P3 alert: 10pts (HMAC replay, claim-check, out-of-order correlation, free tier, tenant isolation) - P4 portal: 9pts (partial scan recovery, ownership conflicts, Meilisearch rebuild, VCR freshness, free tier) - P5 cost: 7pts (concurrent baselines, remediation RBAC, Clock interface, property tests, Redis fallback) - P6 run: 15pts (shell AST parsing, canary suite, intervention TTL, streaming audit, crypto signatures) Total: 61 story points across 30 new stories
This commit is contained in:
76
products/03-alert-intelligence/epics/epic-addendum-bmad.md
Normal file
76
products/03-alert-intelligence/epics/epic-addendum-bmad.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# dd0c/alert — Epic Addendum (BMad Review Findings)
|
||||
|
||||
**Source:** BMad Code Review (March 1, 2026)
|
||||
**Approach:** Surgical additions to existing epics — no new epics created.
|
||||
|
||||
---
|
||||
|
||||
## Epic 1 Addendum: Webhook Ingestion
|
||||
|
||||
### Story 1.6: HMAC Timestamp Freshness (Replay Prevention)
|
||||
As a security-conscious operator, I want webhook payloads older than 5 minutes to be rejected, so that captured webhooks cannot be replayed to flood my ingestion pipeline.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Datadog: Rejects `dd-webhook-timestamp` older than 300 seconds.
|
||||
- PagerDuty: Rejects payloads with missing timestamp header.
|
||||
- OpsGenie: Extracts timestamp from payload body and validates freshness.
|
||||
- Fresh webhooks (within 5-minute window) are accepted normally.
|
||||
|
||||
**Estimate:** 2 points
|
||||
|
||||
### Story 1.7: SQS 256KB Claim-Check Round-Trip
|
||||
As a reliable ingestion pipeline, I want large alert payloads (>256KB) to round-trip through S3 claim-check without data loss, so that high-cardinality incidents are fully preserved.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Payloads > 256KB are compressed and stored in S3; SQS message contains S3 pointer.
|
||||
- Correlation engine fetches from S3 and processes the full payload.
|
||||
- S3 fetch timeout (10s) sends message to DLQ without crashing the engine.
|
||||
- Engine health check returns 200 after S3 timeout recovery.
|
||||
|
||||
**Estimate:** 3 points
|
||||
|
||||
---
|
||||
|
||||
## Epic 2 Addendum: Correlation Engine
|
||||
|
||||
### Story 2.6: Out-of-Order Alert Delivery
|
||||
As a reliable correlation engine, I want late-arriving alerts to attach to existing incidents (not create duplicates), so that distributed monitoring delays don't fragment the incident timeline.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Alert arriving after window close but within 2x window attaches to existing incident.
|
||||
- Alert arriving after 3x window creates a new incident.
|
||||
- Attached alerts update the incident timeline with correct original timestamp.
|
||||
|
||||
**Estimate:** 2 points
|
||||
|
||||
---
|
||||
|
||||
## Epic 5 Addendum: Slack Bot
|
||||
|
||||
### Story 5.6: Free Tier Enforcement (10K alerts/month)
|
||||
As a PLG product, I want free tier tenants limited to 10K alerts/month with 7-day retention, so that the free tier is sustainable and upgrades are incentivized.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Alert at count 9,999 accepted; alert at count 10,001 returns 429 with Stripe upgrade URL.
|
||||
- Counter resets on first of each month.
|
||||
- Data older than 7 days purged for free tier; 90-day retention for pro tier.
|
||||
|
||||
**Estimate:** 2 points
|
||||
|
||||
---
|
||||
|
||||
## Epic 6 Addendum: Dashboard API
|
||||
|
||||
### Story 6.7: Cross-Tenant Negative Isolation Tests
|
||||
As a multi-tenant SaaS, I want explicit negative tests proving Tenant A cannot read Tenant B's data, so that confused deputy vulnerabilities are caught before launch.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Tenant A query returns zero Tenant B incidents (explicit assertion, not just "works for A").
|
||||
- Cross-tenant incident access returns 404 (not 403 — don't leak existence).
|
||||
- Tenant A analytics reflect only Tenant A's alert count.
|
||||
|
||||
**Estimate:** 1 point
|
||||
|
||||
---
|
||||
|
||||
**Total Addendum:** 10 points across 5 stories
|
||||
Reference in New Issue
Block a user