Add BMad review epic addendums for all 6 products
Per-product surgical additions to existing epics (not cross-cutting): - P1 route: 8pts (key redaction, SSE billing, token math, CI runner) - P2 drift: 12pts (mTLS revocation, state lock recovery, pgmq visibility, RLS leak, entropy scrubber) - P3 alert: 10pts (HMAC replay, claim-check, out-of-order correlation, free tier, tenant isolation) - P4 portal: 9pts (partial scan recovery, ownership conflicts, Meilisearch rebuild, VCR freshness, free tier) - P5 cost: 7pts (concurrent baselines, remediation RBAC, Clock interface, property tests, Redis fallback) - P6 run: 15pts (shell AST parsing, canary suite, intervention TTL, streaming audit, crypto signatures) Total: 61 story points across 30 new stories
This commit is contained in:
75
products/05-aws-cost-anomaly/epics/epic-addendum-bmad.md
Normal file
75
products/05-aws-cost-anomaly/epics/epic-addendum-bmad.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# dd0c/cost — Epic Addendum (BMad Review Findings)
|
||||
|
||||
**Source:** BMad Code Review (March 1, 2026)
|
||||
**Approach:** Surgical additions to existing epics — no new epics created.
|
||||
|
||||
---
|
||||
|
||||
## Epic 2 Addendum: Anomaly Detection Engine
|
||||
|
||||
### Story 2.8: Concurrent Baseline Update Conflict Resolution
|
||||
As a reliable anomaly detector, I want concurrent Lambda invocations updating the same baseline to converge correctly via DynamoDB conditional writes, so that Welford running stats are never corrupted.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Two simultaneous updates to the same baseline both succeed (one retries via ConditionalCheckFailed).
|
||||
- Final baseline count reflects both observations.
|
||||
- Retry reads fresh baseline before re-applying the update.
|
||||
|
||||
**Estimate:** 2 points
|
||||
|
||||
### Story 2.9: Property-Based Anomaly Scorer Validation (10K runs)
|
||||
As a mathematically sound anomaly detector, I want the scorer validated with 10K property-based test runs, so that edge cases in the scoring function are caught before launch.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Score is always between 0 and 100 for any valid input (10K runs, seed=42).
|
||||
- Score monotonically increases as cost increases (10K runs).
|
||||
- Reproducible via fixed seed.
|
||||
|
||||
**Estimate:** 1 point
|
||||
|
||||
---
|
||||
|
||||
## Epic 3 Addendum: Notification Service
|
||||
|
||||
### Story 3.7: Remediation RBAC (Slack Action Authorization)
|
||||
As a security-conscious operator, I want only account owners to trigger destructive remediation actions (Stop Instance), so that a random Slack viewer can't shut down production.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Owner role can trigger "Stop Instance" (200).
|
||||
- Viewer role gets 403 with "insufficient permissions".
|
||||
- User from different Slack workspace gets 403.
|
||||
- Non-destructive actions (snooze, mark-expected) allowed for all authenticated users.
|
||||
|
||||
**Estimate:** 2 points
|
||||
|
||||
---
|
||||
|
||||
## Epic 4 Addendum: Customer Onboarding
|
||||
|
||||
### Story 4.7: Clock Interface for Governance Tests
|
||||
As a testable governance engine, I want time-dependent logic (14-day auto-promotion) to use an injectable Clock interface, so that governance tests are deterministic and don't depend on wall-clock time.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- `FakeClock` can be injected into `GovernanceEngine`.
|
||||
- Day 13: no promotion. Day 15 + low FP rate: promotion. Day 15 + high FP rate: no promotion.
|
||||
- No `Date.now()` calls in governance logic — all via Clock interface.
|
||||
|
||||
**Estimate:** 1 point
|
||||
|
||||
---
|
||||
|
||||
## Epic 8 Addendum: Infrastructure & DevOps
|
||||
|
||||
### Story 8.7: Redis Failure Safe Default for Panic Mode
|
||||
As a resilient platform, I want panic mode checks to default to "active" (safe) when Redis is unreachable, so that a Redis outage doesn't accidentally disable safety controls.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Redis disconnect → `checkPanicMode()` returns `true` (panic active).
|
||||
- Warning logged: "Redis unreachable — defaulting to panic=active".
|
||||
- Normal operation resumes when Redis reconnects.
|
||||
|
||||
**Estimate:** 1 point
|
||||
|
||||
---
|
||||
|
||||
**Total Addendum:** 7 points across 5 stories
|
||||
Reference in New Issue
Block a user