78 lines
3.0 KiB
Markdown
78 lines
3.0 KiB
Markdown
|
|
# dd0c/drift — Epic Addendum (BMad Review Findings)
|
||
|
|
|
||
|
|
**Source:** BMad Code Review (March 1, 2026)
|
||
|
|
**Approach:** Surgical additions to existing epics — no new epics created.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Epic 2 Addendum: Agent Communication
|
||
|
|
|
||
|
|
### Story 2.7: mTLS Revocation — Instant Lockout
|
||
|
|
As a security-conscious platform operator, I want revoked agent certificates to be instantly locked out (including active connections), so that a compromised agent cannot continue sending data.
|
||
|
|
|
||
|
|
**Acceptance Criteria:**
|
||
|
|
- CRL refresh triggers within 30 seconds of cert revocation.
|
||
|
|
- Existing mTLS connections from revoked certs are terminated (not just new connections rejected).
|
||
|
|
- New connection attempts with revoked certs return TLS handshake failure.
|
||
|
|
- Payload replay with captured nonce returns HTTP 409 Conflict.
|
||
|
|
|
||
|
|
**Estimate:** 3 points
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Epic 3 Addendum: Drift Analysis Engine
|
||
|
|
|
||
|
|
### Story 3.8: Terraform State Lock Recovery on Panic
|
||
|
|
As a customer, I want the panic button to safely release Terraform state locks, so that hitting "stop" doesn't brick my infrastructure.
|
||
|
|
|
||
|
|
**Acceptance Criteria:**
|
||
|
|
- Panic mode triggers `terraform force-unlock` if normal unlock fails.
|
||
|
|
- State lock is verified released within 10 seconds of panic signal.
|
||
|
|
- Agent logs the force-unlock attempt for audit trail.
|
||
|
|
- If both unlock methods fail, agent alerts the admin with the lock ID for manual recovery.
|
||
|
|
|
||
|
|
**Estimate:** 3 points
|
||
|
|
|
||
|
|
### Story 3.9: pgmq Visibility Timeout for Long Scans
|
||
|
|
As a self-hosted operator, I want long-running drift scans to extend their pgmq visibility timeout, so that a second worker doesn't pick up the same job mid-scan.
|
||
|
|
|
||
|
|
**Acceptance Criteria:**
|
||
|
|
- Worker extends visibility by 2 minutes every 90 seconds during processing.
|
||
|
|
- No duplicate processing occurs for scans taking up to 15 minutes.
|
||
|
|
- If worker crashes without extending, job becomes visible after timeout (correct behavior).
|
||
|
|
|
||
|
|
**Estimate:** 2 points
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Epic 5 Addendum: Dashboard API
|
||
|
|
|
||
|
|
### Story 5.8: RLS Connection Pool Leak Prevention
|
||
|
|
As a multi-tenant SaaS, I want PgBouncer to clear tenant context between requests, so that Tenant A's drift data never leaks to Tenant B.
|
||
|
|
|
||
|
|
**Acceptance Criteria:**
|
||
|
|
- `SET LOCAL app.tenant_id` is cleared on connection return to pool.
|
||
|
|
- 100 concurrent tenant requests produce zero cross-tenant data leakage.
|
||
|
|
- Stress test with interleaved tenant requests on same PgBouncer connection passes.
|
||
|
|
|
||
|
|
**Estimate:** 2 points
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Epic 10 Addendum: Transparent Factory Compliance
|
||
|
|
|
||
|
|
### Story 10.6: Secret Scrubber Entropy Scanning
|
||
|
|
As a security-first platform, I want the secret scrubber to detect high-entropy strings (not just regex patterns), so that Base64-encoded keys and custom tokens are caught.
|
||
|
|
|
||
|
|
**Acceptance Criteria:**
|
||
|
|
- Shannon entropy > 3.5 bits/char on strings > 20 chars triggers redaction.
|
||
|
|
- Base64-encoded AWS keys detected and scrubbed.
|
||
|
|
- Multi-line RSA private keys detected and replaced with `[REDACTED RSA KEY]`.
|
||
|
|
- Normal log messages (low entropy) are not false-positived.
|
||
|
|
|
||
|
|
**Estimate:** 2 points
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Total Addendum:** 12 points across 5 stories
|