Files
dd0c/products/02-iac-drift-detection/epics/epic-addendum-bmad.md
Max Mayfield 72a0f26a7b Add BMad review epic addendums for all 6 products
Per-product surgical additions to existing epics (not cross-cutting):
- P1 route: 8pts (key redaction, SSE billing, token math, CI runner)
- P2 drift: 12pts (mTLS revocation, state lock recovery, pgmq visibility, RLS leak, entropy scrubber)
- P3 alert: 10pts (HMAC replay, claim-check, out-of-order correlation, free tier, tenant isolation)
- P4 portal: 9pts (partial scan recovery, ownership conflicts, Meilisearch rebuild, VCR freshness, free tier)
- P5 cost: 7pts (concurrent baselines, remediation RBAC, Clock interface, property tests, Redis fallback)
- P6 run: 15pts (shell AST parsing, canary suite, intervention TTL, streaming audit, crypto signatures)

Total: 61 story points across 30 new stories
2026-03-01 02:27:55 +00:00

3.0 KiB

dd0c/drift — Epic Addendum (BMad Review Findings)

Source: BMad Code Review (March 1, 2026) Approach: Surgical additions to existing epics — no new epics created.


Epic 2 Addendum: Agent Communication

Story 2.7: mTLS Revocation — Instant Lockout

As a security-conscious platform operator, I want revoked agent certificates to be instantly locked out (including active connections), so that a compromised agent cannot continue sending data.

Acceptance Criteria:

  • CRL refresh triggers within 30 seconds of cert revocation.
  • Existing mTLS connections from revoked certs are terminated (not just new connections rejected).
  • New connection attempts with revoked certs return TLS handshake failure.
  • Payload replay with captured nonce returns HTTP 409 Conflict.

Estimate: 3 points


Epic 3 Addendum: Drift Analysis Engine

Story 3.8: Terraform State Lock Recovery on Panic

As a customer, I want the panic button to safely release Terraform state locks, so that hitting "stop" doesn't brick my infrastructure.

Acceptance Criteria:

  • Panic mode triggers terraform force-unlock if normal unlock fails.
  • State lock is verified released within 10 seconds of panic signal.
  • Agent logs the force-unlock attempt for audit trail.
  • If both unlock methods fail, agent alerts the admin with the lock ID for manual recovery.

Estimate: 3 points

Story 3.9: pgmq Visibility Timeout for Long Scans

As a self-hosted operator, I want long-running drift scans to extend their pgmq visibility timeout, so that a second worker doesn't pick up the same job mid-scan.

Acceptance Criteria:

  • Worker extends visibility by 2 minutes every 90 seconds during processing.
  • No duplicate processing occurs for scans taking up to 15 minutes.
  • If worker crashes without extending, job becomes visible after timeout (correct behavior).

Estimate: 2 points


Epic 5 Addendum: Dashboard API

Story 5.8: RLS Connection Pool Leak Prevention

As a multi-tenant SaaS, I want PgBouncer to clear tenant context between requests, so that Tenant A's drift data never leaks to Tenant B.

Acceptance Criteria:

  • SET LOCAL app.tenant_id is cleared on connection return to pool.
  • 100 concurrent tenant requests produce zero cross-tenant data leakage.
  • Stress test with interleaved tenant requests on same PgBouncer connection passes.

Estimate: 2 points


Epic 10 Addendum: Transparent Factory Compliance

Story 10.6: Secret Scrubber Entropy Scanning

As a security-first platform, I want the secret scrubber to detect high-entropy strings (not just regex patterns), so that Base64-encoded keys and custom tokens are caught.

Acceptance Criteria:

  • Shannon entropy > 3.5 bits/char on strings > 20 chars triggers redaction.
  • Base64-encoded AWS keys detected and scrubbed.
  • Multi-line RSA private keys detected and replaced with [REDACTED RSA KEY].
  • Normal log messages (low entropy) are not false-positived.

Estimate: 2 points


Total Addendum: 12 points across 5 stories