Implement review remediation + PLG analytics SDK

- All 6 test architectures patched with Section 11 addendums - P5 (cost) fully rewritten from 232 to ~600 lines - PLG brainstorm + party mode advisory board results - Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe) - Analytics tests v2 (safeParse, no , no timestamp, no PII) - Addresses all Gemini review findings across P1-P6
2026-03-01 01:42:49 +00:00
parent 2fe0ed856e
commit 03bfe931fc
9 changed files with 2950 additions and 85 deletions
--- a/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
+++ b/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
@@ -1,8 +1,8 @@
 # dd0c/cost — Test Architecture & TDD Strategy

 **Product:** dd0c/cost — AWS Cost Anomaly Detective
-**Author:** Test Architecture Phase
-**Date:** February 28, 2026
+**Author:** Test Architecture Phase (v2 — Post-Review Rewrite)
+**Date:** March 1, 2026
 **Status:** V1 MVP — Solo Founder Scope

 ---
@@ -13,7 +13,9 @@

 dd0c/cost sits at the intersection of **money and infrastructure**. A false negative means a customer loses thousands of dollars. A false positive means alert fatigue and churn. The test suite's primary job is to mathematically prove the anomaly scoring engine works across edge cases.

-Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively unit-tested with synthetic data before any AWS APIs are mocked.
+Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively tested with property-based testing before any AWS APIs are mocked.
+
+Second principle: **Every dollar matters.** Cost calculations involve floating-point arithmetic on money. Rounding errors, precision loss, and currency handling must be tested with the same rigor as a financial system.

 ### 1.2 Red-Green-Refactor Adapted to dd0c/cost

@@ -28,33 +30,52 @@ REFACTOR → Optimize the baseline lookup, extract novelty checks,
 ```

 **When to write tests first (strict TDD):**
- Anomaly scoring engine (Z-scores, novelty checks, composite severity)
- Cold-start heuristics (fast-path for >$5/hr resources)
- Baseline calculation (moving averages, standard deviation)
- Governance policy (strict vs. audit mode, 14-day promotion)
+- All anomaly scoring (Z-scores, novelty checks, composite severity)
+- All cold-start heuristics (fast-path for >$5/hr resources)
+- All baseline calculation (Welford algorithm, maturity transitions)
+- All governance policy (strict vs. audit mode, 14-day auto-promotion, panic mode)
+- All Slack signature validation (security-critical)
+- All cost calculations (pricing lookup, hourly cost estimation)
+- All feature flag circuit breakers

 **When integration tests lead:**
 - CloudTrail ingestion (implement against LocalStack EventBridge, then lock in)
 - DynamoDB Single-Table schema (build access patterns, then integration test)
+- Cross-account STS role assumption (test against LocalStack)

 **When E2E tests lead:**
- The Slack alert interaction (format block kit, test the "Snooze/Terminate" buttons)
+- Slack alert interaction (format block kit, test "Snooze/Terminate" buttons)
+- Onboarding wizard (CloudFormation quick-create → role validation → first alert)

 ### 1.3 Test Naming Conventions

 ```typescript
+// Unit tests
 describe('AnomalyScorer', () => {
-  it('assigns critical severity when Z-score > 3 and hourly cost > $1', () => {});
-  it('flags actor novelty when IAM role has never launched this service', () => {});
+  it('assigns critical severity when Z-score exceeds 3 and hourly cost exceeds $1', () => {});
+  it('flags actor novelty when IAM role has never launched this service type', () => {});
  it('bypasses baseline and triggers fast-path critical for $10/hr instance', () => {});
 });

-describe('CloudTrailNormalizer', () => {
-  it('extracts instance type and region from RunInstances event', () => {});
-  it('looks up correct on-demand pricing for us-east-1 r6g.xlarge', () => {});
+describe('BaselineCalculator', () => {
+  it('updates running mean using Welford online algorithm', () => {});
+  it('handles zero standard deviation without division by zero', () => {});
+});
+
+// Property-based tests
+describe('AnomalyScorer (property-based)', () => {
+  it('always returns severity between 0 and 100 for any valid input', () => {});
+  it('monotonically increases score as Z-score increases', () => {});
+  it('never assigns critical to events below $0.50/hr regardless of Z-score', () => {});
 });
 ```

+**Rules:**
+- Describe the observable outcome, not the implementation
+- Use present tense
+- If you need "and" in the name, split into two tests
+- Property-based tests explicitly state the invariant
+
 ---

 ## Section 2: Test Pyramid
@@ -63,93 +84,441 @@ describe('CloudTrailNormalizer', () => {

 | Level | Target | Count (V1) | Runtime |
 |-------|--------|------------|---------|
-| Unit | 70% | ~250 tests | <20s |
-| Integration | 20% | ~80 tests | <3min |
-| E2E/Smoke | 10% | ~15 tests | <5min |
+| Unit | 80% | ~350 tests | <25s |
+| Integration | 15% | ~65 tests | <4min |
+| E2E/Smoke | 5% | ~15 tests | <8min |
+
+Higher unit ratio than other dd0c products because the core value is pure math (scoring, baselines, Z-scores).

 ### 2.2 Unit Test Targets

 | Component | Key Behaviors | Est. Tests |
 |-----------|--------------|------------|
-| Event Normalizer | CloudTrail parsing, pricing lookup, deduplication | 40 |
-| Baseline Engine | Running mean/stddev calculation, maturity checks | 35 |
-| Anomaly Scorer | Z-score math, novelty detection, composite scoring | 50 |
-| Remediation Handler | Stop/Terminate payload parsing, IAM role assumption logic | 20 |
-| Notification Engine | Slack formatting, daily digest aggregation | 30 |
-| Governance Policy | Mode enforcement, 14-day auto-promotion | 25 |
-| Feature Flags | Circuit breaker on alert volume, flag metadata | 15 |
+| CloudTrail Normalizer | Event parsing, pricing lookup, dedup, field extraction | 40 |
+| Baseline Engine | Welford algorithm, maturity transitions, feedback loop | 45 |
+| Anomaly Scorer | Z-score, novelty, composite scoring, cold-start fast-path | 60 |
+| Zombie Hunter | Idle resource detection, cost estimation, age calculation | 25 |
+| Notification Formatter | Slack Block Kit, daily digest, CLI command generation | 30 |
+| Slack Bot | Command parsing, signature validation, action handling | 25 |
+| Remediation Handler | Stop/Terminate logic, IAM role assumption, snooze/dismiss | 20 |
+| Dashboard API | CRUD, tenant isolation, pagination, filtering | 25 |
+| Governance Policy | Mode enforcement, 14-day promotion, panic mode | 30 |
+| Feature Flags | Circuit breaker, flag lifecycle, local evaluation | 15 |
+| Onboarding | CFN template validation, role validation, free tier enforcement | 20 |
+| Cost Calculations | Pricing precision, rounding, fallback pricing, currency | 15 |
+
+### 2.3 Integration Test Boundaries
+
+| Boundary | What's Tested | Infrastructure |
+|----------|--------------|----------------|
+| EventBridge → SQS FIFO | Cross-account event routing, dedup, ordering | LocalStack |
+| SQS → Event Processor Lambda | Batch processing, error handling, DLQ routing | LocalStack |
+| Event Processor → DynamoDB | CostEvent writes, baseline updates, transactions | Testcontainers DynamoDB Local |
+| Anomaly Scorer → DynamoDB | Baseline reads, anomaly record writes | Testcontainers DynamoDB Local |
+| Notifier → Slack API | Block Kit delivery, rate limiting, message updates | WireMock |
+| API Gateway → Lambda | Auth (Cognito JWT), routing, throttling | LocalStack |
+| STS → Customer Account | Cross-account role assumption, ExternalId validation | LocalStack |
+| CDK Synth | Infrastructure snapshot, resource policy validation | CDK assertions |
+
+### 2.4 E2E/Smoke Scenarios
+
+1. **Real-Time Anomaly Detection**: CloudTrail event → scoring → Slack alert (<30s)
+2. **Interactive Remediation**: Slack button click → StopInstances → message update
+3. **Onboarding Flow**: Signup → CFN deploy → role validation → first alert
+4. **14-Day Auto-Promotion**: Simulate 14 days → verify strict→audit transition
+5. **Zombie Hunter**: Daily scan → detect idle EC2 → Slack digest
+6. **Panic Mode**: Enable panic → all alerting stops → anomalies still logged

 ---

 ## Section 3: Unit Test Strategy

-### 3.1 Cost Ingestion & Normalization
+### 3.1 CloudTrail Normalizer

 ```typescript
 describe('CloudTrailNormalizer', () => {
-  it('normalizes EC2 RunInstances event to CostEvent schema', () => {});
-  it('normalizes RDS CreateDBInstance event to CostEvent schema', () => {});
-  it('extracts assumed role ARN as actor instead of base STS role', () => {});
-  it('applies fallback pricing when instance type is not in static table', () => {});
-  it('ignores non-cost-generating events (e.g., DescribeInstances)', () => {});
+  describe('Event Parsing', () => {
+    it('normalizes EC2 RunInstances to CostEvent schema', () => {});
+    it('normalizes RDS CreateDBInstance to CostEvent schema', () => {});
+    it('normalizes Lambda CreateFunction to CostEvent schema', () => {});
+    it('extracts assumed role ARN as actor (not base STS role)', () => {});
+    it('extracts instance type, region, and AZ from event detail', () => {});
+    it('handles batched RunInstances (multiple instances in one call)', () => {});
+    it('ignores non-cost-generating events (DescribeInstances, ListBuckets)', () => {});
+    it('handles malformed CloudTrail JSON without crashing', () => {});
+    it('handles missing optional fields gracefully', () => {});
+  });
+
+  describe('Pricing Lookup', () => {
+    it('looks up correct on-demand price for us-east-1 m5.xlarge', () => {});
+    it('looks up correct on-demand price for us-west-2 r6g.2xlarge', () => {});
+    it('applies fallback pricing when instance type not in static table', () => {});
+    it('returns $0 for instance types with no pricing data and logs warning', () => {});
+    it('handles GPU instances (p4d, g5) with correct pricing', () => {});
+  });
+
+  describe('Deduplication', () => {
+    it('generates deterministic fingerprint from eventID', () => {});
+    it('detects duplicate CloudTrail events by eventID', () => {});
+    it('allows same resource type from different events', () => {});
+  });
+
+  describe('Cost Precision', () => {
+    it('calculates hourly cost with 4 decimal places', () => {});
+    it('rounds consistently (banker rounding) to avoid accumulation errors', () => {});
+    it('handles sub-cent costs for Lambda invocations', () => {});
+  });
 });
 ```

-### 3.2 Anomaly Engine (The Math)
+### 3.2 Anomaly Scorer
+
+The most critical component. Uses property-based testing via `fast-check`.

 ```typescript
 describe('AnomalyScorer', () => {
-  describe('Statistical Scoring (Z-Score)', () => {
-    it('returns score=0 when event cost exactly matches baseline mean', () => {});
+  describe('Z-Score Calculation', () => {
+    it('returns 0 when event cost exactly matches baseline mean', () => {});
    it('returns proportional score for Z-scores between 1.0 and 3.0', () => {});
-    it('caps Z-score contribution at max threshold', () => {});
+    it('caps Z-score contribution at configurable max threshold', () => {});
+    it('handles zero standard deviation without division by zero', () => {});
+    it('handles single data point baseline (stddev undefined)', () => {});
+    it('handles extremely large values without float overflow', () => {});
+    it('handles negative cost delta (cost decrease) as non-anomalous', () => {});
  });

  describe('Novelty Scoring', () => {
-    it('adds novelty penalty when instance type is first seen for account', () => {});
-    it('adds novelty penalty when IAM user has never provisioned this service', () => {});
+    it('adds instance novelty penalty when type first seen for account', () => {});
+    it('adds actor novelty penalty when IAM role is new', () => {});
+    it('does not penalize known instance type + known actor', () => {});
+    it('weights instance novelty higher than actor novelty', () => {});
+  });
+
+  describe('Composite Scoring', () => {
+    it('combines Z-score + novelty into composite severity', () => {});
+    it('classifies composite < 30 as info', () => {});
+    it('classifies composite 30-60 as warning', () => {});
+    it('classifies composite > 60 as critical', () => {});
+    it('never assigns critical to events below $0.50/hr', () => {});
  });

  describe('Cold-Start Fast Path', () => {
    it('flags $5/hr instance as warning when baseline < 14 days', () => {});
    it('flags $25/hr instance as critical immediately, bypassing baseline', () => {});
-    it('ignores $0.10/hr instances during cold-start learning period', () => {});
+    it('ignores $0.10/hr instances during cold-start learning', () => {});
+    it('fast-path is always on — not behind a feature flag', () => {});
+    it('transitions from fast-path to statistical scoring at maturity', () => {});
+  });
+
+  describe('Feedback Loop', () => {
+    it('reduces score for resources marked as expected', () => {});
+    it('adds actor to expected list after mark-as-expected', () => {});
+    it('still flags expected actor if cost is 10x above baseline', () => {});
+  });
+
+  describe('Property-Based Tests (fast-check)', () => {
+    it('score is always between 0 and 100 for any valid input', () => {
+      // fc.assert(fc.property(
+      //   fc.record({ cost: fc.float({min: 0}), mean: fc.float({min: 0}), stddev: fc.float({min: 0}) }),
+      //   (input) => { const score = scorer.score(input); return score >= 0 && score <= 100; }
+      // ))
+    });
+    it('score monotonically increases as cost increases (baseline fixed)', () => {});
+    it('score monotonically increases as Z-score increases', () => {});
+    it('cold-start fast-path always triggers for cost > $25/hr', () => {});
+    it('mature baseline never uses fast-path thresholds', () => {});
  });
 });
 ```

-### 3.3 Baseline Learning
+### 3.3 Baseline Engine

 ```typescript
 describe('BaselineCalculator', () => {
-  it('updates running mean and stddev using Welford algorithm', () => {});
-  it('adds new actor to observed_actors set', () => {});
-  it('marks baseline as mature when event_count > 20 and age_days > 14', () => {});
+  describe('Welford Online Algorithm', () => {
+    it('updates running mean correctly after each observation', () => {});
+    it('updates running variance correctly after each observation', () => {});
+    it('produces correct stddev after 100 observations', () => {});
+    it('handles first observation (count=1, stddev=0)', () => {});
+    it('handles identical observations (stddev=0)', () => {});
+    it('handles catastrophic cancellation with large values', () => {
+      // Welford is numerically stable — verify this property
+    });
+  });
+
+  describe('Maturity Transitions', () => {
+    it('starts in cold-start state', () => {});
+    it('transitions to learning after 5 events', () => {});
+    it('transitions to mature after 20 events AND 14 days', () => {});
+    it('does not mature with 100 events but only 3 days', () => {});
+    it('does not mature with 14 days but only 5 events', () => {});
+  });
+
+  describe('Actor & Instance Tracking', () => {
+    it('adds new actor to observed_actors set', () => {});
+    it('adds new instance type to observed_types set', () => {});
+    it('does not duplicate existing actors', () => {});
+  });
+
+  describe('Property-Based Tests', () => {
+    it('mean converges to true mean as observations increase', () => {});
+    it('variance is always non-negative', () => {});
+    it('stddev equals sqrt(variance) within float tolerance', () => {});
+  });
+});
+```
+
+### 3.4 Zombie Hunter
+
+```typescript
+describe('ZombieHunter', () => {
+  it('detects EC2 instance running >7 days with <5% CPU utilization', () => {});
+  it('detects RDS instance with 0 connections for >3 days', () => {});
+  it('detects unattached EBS volumes older than 7 days', () => {});
+  it('calculates cumulative waste cost for each zombie', () => {});
+  it('excludes instances tagged dd0c:ignore', () => {});
+  it('handles API pagination for accounts with 500+ instances', () => {});
+  it('respects read-only IAM permissions (never modifies resources)', () => {});
+});
+```
+
+### 3.5 Notification Formatter
+
+```typescript
+describe('NotificationFormatter', () => {
+  describe('Slack Block Kit', () => {
+    it('formats EC2 anomaly with resource type, region, cost, actor', () => {});
+    it('formats RDS anomaly with engine, storage, multi-AZ status', () => {});
+    it('includes "Why this alert" section with anomaly signals', () => {});
+    it('includes suggested CLI commands for remediation', () => {});
+    it('includes Snooze/Mark Expected/Stop Instance buttons', () => {});
+    it('generates correct aws ec2 stop-instances command', () => {});
+    it('generates correct aws rds stop-db-instance command', () => {});
+  });
+
+  describe('Daily Digest', () => {
+    it('aggregates 24h of anomalies into summary stats', () => {});
+    it('includes total estimated spend across all accounts', () => {});
+    it('highlights top 3 costliest anomalies', () => {});
+    it('includes zombie resource count and waste estimate', () => {});
+    it('shows baseline learning progress for new accounts', () => {});
+  });
+});
+```
+
+### 3.6 Slack Bot
+
+```typescript
+describe('SlackBot', () => {
+  describe('Signature Validation', () => {
+    it('validates correct Slack request signature (HMAC-SHA256)', () => {});
+    it('rejects request with invalid signature', () => {});
+    it('rejects request with missing X-Slack-Signature header', () => {});
+    it('rejects request with expired timestamp (>5 min)', () => {});
+    it('uses timing-safe comparison to prevent timing attacks', () => {});
+  });
+
+  describe('Command Parsing', () => {
+    it('routes /dd0c status to status handler', () => {});
+    it('routes /dd0c anomalies to anomaly list handler', () => {});
+    it('routes /dd0c digest to digest handler', () => {});
+    it('returns help text for unknown commands', () => {});
+    it('responds within 3 seconds or defers with 200 OK', () => {});
+  });
+
+  describe('Interactive Actions', () => {
+    it('validates interactive payload signature', () => {});
+    it('handles mark_expected action and updates baseline', () => {});
+    it('handles snooze_1h action and sets snoozeUntil', () => {});
+    it('handles snooze_24h action', () => {});
+    it('updates original Slack message after action', () => {});
+    it('rejects action from user not in authorized workspace', () => {});
+  });
+});
+```
+
+### 3.7 Governance Policy Engine
+
+```typescript
+describe('GovernancePolicy', () => {
+  describe('Mode Enforcement', () => {
+    it('strict mode: logs anomaly but does not send Slack alert', () => {});
+    it('audit mode: sends Slack alert with full logging', () => {});
+    it('defaults new accounts to strict mode', () => {});
+  });
+
+  describe('14-Day Auto-Promotion', () => {
+    it('does not promote account with <14 days of baseline', () => {});
+    it('does not promote account with >10% false-positive rate', () => {});
+    it('promotes account on day 15 if FP rate <10%', () => {});
+    it('calculates false-positive rate from mark-as-expected actions', () => {});
+    it('auto-promotion check runs daily via cron', () => {});
+  });
+
+  describe('Panic Mode', () => {
+    it('stops all alerting when panic=true', () => {});
+    it('continues scoring and logging during panic', () => {});
+    it('activates in <1 second via Redis key', () => {});
+    it('activatable via POST /admin/panic', () => {});
+    it('dashboard API returns "alerting paused" header during panic', () => {});
+  });
+
+  describe('Per-Account Override', () => {
+    it('account can set stricter mode than system default', () => {});
+    it('account cannot downgrade from system strict to audit', () => {});
+    it('merge logic: max_restrictive(system, account)', () => {});
+  });
+
+  describe('Policy Decision Logging', () => {
+    it('logs "suppressed by strict mode" with anomaly context', () => {});
+    it('logs "auto-promoted to audit mode" with baseline stats', () => {});
+    it('logs "panic mode active — alerting paused"', () => {});
+  });
+});
+```
+
+### 3.8 Dashboard API
+
+```typescript
+describe('DashboardAPI', () => {
+  describe('Account Management', () => {
+    it('GET /v1/accounts returns connected accounts for tenant', () => {});
+    it('DELETE /v1/accounts/:id marks account as disconnecting', () => {});
+    it('returns 401 without valid Cognito JWT', () => {});
+    it('scopes all queries to authenticated tenantId', () => {});
+  });
+
+  describe('Anomaly Listing', () => {
+    it('GET /v1/anomalies returns recent anomalies', () => {});
+    it('supports since, status, severity filters', () => {});
+    it('implements cursor-based pagination', () => {});
+    it('includes slackMessageUrl when alert was sent', () => {});
+  });
+
+  describe('Baseline Overrides', () => {
+    it('PATCH /v1/accounts/:id/baselines/:service/:type updates sensitivity', () => {});
+    it('rejects invalid sensitivity values', () => {});
+  });
+
+  describe('Tenant Isolation', () => {
+    it('never returns anomalies from another tenant', () => {});
+    it('never returns accounts from another tenant', () => {});
+    it('enforces tenantId on all DynamoDB queries', () => {});
+  });
+});
+```
+
+### 3.9 Onboarding & PLG
+
+```typescript
+describe('Onboarding', () => {
+  describe('CloudFormation Template', () => {
+    it('generates valid CFN YAML with correct IAM permissions', () => {});
+    it('includes ExternalId parameter', () => {});
+    it('includes EventBridge rule for cost-relevant CloudTrail events', () => {});
+    it('quick-create URL contains correct template URL and parameters', () => {});
+  });
+
+  describe('Role Validation', () => {
+    it('successfully assumes role with correct ExternalId', () => {});
+    it('returns clear error on role not found', () => {});
+    it('returns clear error on ExternalId mismatch', () => {});
+    it('triggers zombie scan on successful connection', () => {});
+  });
+
+  describe('Free Tier Enforcement', () => {
+    it('allows first account connection on free tier', () => {});
+    it('rejects second account with 403 and upgrade prompt', () => {});
+    it('allows multiple accounts on pro tier', () => {});
+  });
+
+  describe('Stripe Integration', () => {
+    it('creates Stripe Checkout session with correct pricing', () => {});
+    it('handles checkout.session.completed webhook', () => {});
+    it('handles customer.subscription.deleted webhook', () => {});
+    it('validates Stripe webhook signature', () => {});
+    it('updates tenant tier to pro on successful payment', () => {});
+    it('downgrades tenant on subscription cancellation', () => {});
+  });
+});
+```
+
+### 3.10 Feature Flag Circuit Breaker
+
+```typescript
+describe('AlertVolumeCircuitBreaker', () => {
+  it('allows alerting when volume is within 3x baseline', () => {});
+  it('trips breaker when alerts exceed 3x baseline over 1 hour', () => {});
+  it('auto-disables the scoring flag when breaker trips', () => {});
+  it('buffers suppressed alerts in DLQ for review', () => {});
+  it('tracks alert-per-account rate in Redis sliding window', () => {});
+  it('resets breaker after manual flag re-enable', () => {});
+  it('fast-path alerts are exempt from circuit breaker', () => {});
 });
 ```

 ---
-
 ## Section 4: Integration Test Strategy

 ### 4.1 DynamoDB Data Layer (Testcontainers)

 ```typescript
-describe('DynamoDB Single-Table Patterns', () => {
-  it('writes CostEvent and updates Baseline in single transaction', async () => {});
-  it('queries all anomalies for tenant within time range', async () => {});
-  it('fetches tenant config and Slack tokens securely', async () => {});
+describe('DynamoDB Integrations', () => {
+  let dynamodb: StartedTestContainer;
+
+  beforeAll(async () => {
+    dynamodb = await new GenericContainer('amazon/dynamodb-local:latest')
+      .withExposedPorts(8000).start();
+    // Create dd0c-cost-main table with GSIs
+  });
+
+  describe('Transactional Writes', () => {
+    it('writes CostEvent and updates Baseline in single TransactWriteItem', async () => {});
+    it('fails gracefully if TransactWriteItem encounters ConditionalCheckFailed', async () => {});
+    it('handles partial failure recovery when Baseline update conflicts', async () => {});
+  });
+
+  describe('Access Patterns', () => {
+    it('queries all anomalies for tenant within time range (GSI3)', async () => {});
+    it('fetches tenant config and Slack tokens securely', async () => {});
+    it('retrieves accurate Baseline snapshot by resource type', async () => {});
+  });
 });
 ```

-### 4.2 AWS API Contract Tests
+### 4.2 Cross-Account STS & AWS APIs (LocalStack)

 ```typescript
-describe('AWS Cross-Account Actions', () => {
-  // Uses LocalStack to simulate target account
-  it('assumes target account remediation role successfully', async () => {});
-  it('executes ec2:StopInstances when remediation approved', async () => {});
-  it('executes rds:DeleteDBInstance with skip-final-snapshot', async () => {});
+describe('AWS Cross-Account Integrations', () => {
+  let localstack: StartedTestContainer;
+
+  beforeAll(async () => {
+    localstack = await new GenericContainer('localstack/localstack:3')
+      .withEnv('SERVICES', 'sts,ec2,rds')
+      .withExposedPorts(4566).start();
+  });
+
+  describe('Role Assumption', () => {
+    it('successfully assumes target account remediation role via STS', async () => {});
+    it('fails when ExternalId does not match (Security)', async () => {});
+    it('handles STS credential expiration gracefully', async () => {});
+  });
+
+  describe('Remediation Actions', () => {
+    it('executes ec2:StopInstances when remediation approved', async () => {});
+    it('executes rds:StopDBInstance when remediation approved', async () => {});
+    it('fails safely when target IAM role lacks StopInstances permission', async () => {});
+  });
+});
+```
+
+### 4.3 Slack API Contract (WireMock)
+
+```typescript
+describe('Slack Integration', () => {
+  it('formats and delivers Block Kit message successfully', async () => {});
+  it('handles 429 Rate Limit by throwing retryable error for SQS visibility timeout', async () => {});
+  it('updates existing Slack message when anomaly is snoozed', async () => {});
 });
 ```

@@ -159,24 +528,65 @@ describe('AWS Cross-Account Actions', () => {

 ### 5.1 Critical User Journeys

-**Journey 1: Real-Time Anomaly Detection**
-1. Send synthetic `RunInstances` event to EventBridge (p9.16xlarge, $40/hr).
-2. Verify system processes event and triggers fast-path (no baseline).
-3. Verify Slack alert is generated with correct cost estimate.
+**Journey 1: Real-Time Anomaly Detection (The Golden Path)**
+```typescript
+describe('E2E: Anomaly Detection', () => {
+  it('detects anomaly and alerts Slack within 30 seconds', async () => {
+    // 1. Inject synthetic CloudTrail `RunInstances` event (p4d.24xlarge) into SQS Ingestion Queue
+    // 2. Poll DynamoDB to ensure CostEvent was recorded
+    // 3. Poll DynamoDB to ensure AnomalyRecord was created (fast-path triggered)
+    // 4. Assert WireMock received the Slack chat.postMessage call with Block Kit
+  });
+});
+```

 **Journey 2: Interactive Remediation**
-1. Send webhook simulating user clicking "Stop Instance" in Slack.
-2. Verify API Gateway → Lambda executes `StopInstances` against LocalStack.
-3. Verify Slack message updates to "Remediation Successful".
+```typescript
+describe('E2E: Interactive Remediation', () => {
+  it('stops EC2 instance when user clicks Stop in Slack', async () => {
+    // 1. Simulate Slack sending interactive webhook payload for "Stop Instance"
+    // 2. Validate HMAC signature in API Gateway lambda
+    // 3. Verify LocalStack EC2 mock receives StopInstances call
+    // 4. Verify Slack message is updated to "Remediation Successful"
+  });
+});
+```
+
+**Journey 3: Onboarding & First Scan**
+```typescript
+describe('E2E: Onboarding', () => {
+  it('validates IAM role and triggers initial zombie scan', async () => {
+    // 1. Trigger POST /v1/accounts with new role ARN
+    // 2. Verify account marked active
+    // 3. Verify EventBridge Scheduler creates cron for Zombie Hunter
+  });
+});
+```

 ---

 ## Section 6: Performance & Load Testing

+### 6.1 Ingestion & Scoring Throughput
 ```typescript
-describe('Ingestion Throughput', () => {
-  it('processes 500 CloudTrail events/second via SQS FIFO', async () => {});
-  it('DynamoDB baseline updates complete in <20ms p95', async () => {});
+describe('Performance: Alert Storm', () => {
+  it('processes 1000 CloudTrail events/sec without SQS DLQ overflow', async () => {
+    // k6 load test hitting SQS directly
+  });
+  
+  it('DynamoDB baseline updates complete in <20ms p95 under load', async () => {
+    // Ensure Single-Table schema does not create hot partitions
+  });
+
+  it('Anomaly Scorer Lambda consumes <256MB memory during burst', async () => {});
+});
+```
+
+### 6.2 Data Scale Tests
+```typescript
+describe('Performance: Baseline Scale', () => {
+  it('calculates Z-score in <5ms even when observed_actors set exceeds 1000', async () => {});
+  it('handles accounts with 100,000+ daily CostEvents without throttling DynamoDB (On-Demand scaling)', async () => {});
 });
 ```

@@ -184,49 +594,119 @@ describe('Ingestion Throughput', () => {

 ## Section 7: CI/CD Pipeline Integration

- **PR Gate:** Unit tests (<2min), Coverage >85% (Scoring engine >95%).
- **Merge:** Integration tests with LocalStack & Testcontainers DynamoDB.
- **Staging:** E2E journeys against isolated staging AWS account.
+### 7.1 Pipeline Stages
+```
+┌─────────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
+│ Pre-Commit  │───▶│ PR Gate  │───▶│ Merge    │───▶│ Staging  │───▶│ Prod     │
+│ (local)     │    │ (CI)     │    │ (CI)     │    │ (CD)     │    │ (CD)     │
+└─────────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
+  lint + type       unit tests      integration     E2E + perf     canary
+  <10s              math prop       Testcontainers  LocalStack     <5 mins
+                    tests <1m       <4 mins         <10 mins
+```
+
+### 7.2 Coverage Gates
+| Component | Threshold |
+|-----------|-----------|
+| Anomaly Scorer (Math) | 100% |
+| CloudTrail Normalizer | 95% |
+| Governance Policy | 95% |
+| Slack Signature Auth | 100% |
+| Overall Pipeline | 85% |

 ---

 ## Section 8: Transparent Factory Tenet Testing

-### 8.1 Atomic Flagging (Circuit Breaker)
+### 8.1 Atomic Flagging
 ```typescript
-it('auto-disables scoring rule if it generates >10 alerts/hour for single tenant', () => {});
+describe('Atomic Flagging', () => {
+  it('auto-disables scoring rule flag if alert volume exceeds 3x baseline in 1hr', () => {});
+  it('buffers suppressed anomalies in SQS DLQ while flag is off', () => {});
+  it('fails CI if any flag TTL exceeds 14 days', () => {});
+  it('evaluates flags strictly locally (in-memory provider)', () => {});
+});
 ```

-### 8.2 Configurable Autonomy (14-Day Auto-Promotion)
+### 8.2 Elastic Schema
 ```typescript
-it('keeps new tenant in strict mode (log-only) for first 14 days', () => {});
-it('auto-promotes to audit mode (auto-alert) on day 15 if false-positive rate < 10%', () => {});
+describe('Elastic Schema', () => {
+  it('rejects DynamoDB table definition modifications that alter key schemas', () => {});
+  it('requires all DynamoDB item updates to use ADD/SET (additive only)', () => {});
+  it('ignores unknown attributes (V2 fields) in V1 CostEvent decoders', () => {});
+});
+```
+
+### 8.3 Cognitive Durability
+```typescript
+describe('Cognitive Durability', () => {
+  it('requires decision_log.json for any PR modifying Z-score thresholds or weights', () => {});
+  it('enforces cyclomatic complexity < 10 for all AnomalyScorer math functions', () => {});
+});
+```
+
+### 8.4 Semantic Observability
+```typescript
+describe('Semantic Observability', () => {
+  it('emits OTEL span for every Anomaly Scoring decision', () => {});
+  it('includes attributes: cost.z_score, cost.anomaly_score, cost.baseline_days', () => {});
+  it('includes cost.fast_path_triggered flag when baseline is bypassed', () => {});
+  it('hashes AWS Account ID in spans to protect PII/tenant identity', () => {});
+});
+```
+
+### 8.5 Configurable Autonomy
+```typescript
+describe('Configurable Autonomy', () => {
+  it('keeps new tenant in Strict Mode (log-only) for first 14 days', () => {});
+  it('auto-promotes to Audit Mode on day 15 if false-positive rate < 10%', () => {});
+  it('Panic Mode halts ALL Slack alerts in <1 second via Redis check', () => {});
+  it('Panic Mode does NOT halt baseline recording (read-only tracking continues)', () => {});
+});
 ```

 ---

 ## Section 9: Test Data & Fixtures

-```
-fixtures/
-  cloudtrail/
-    ec2-runinstances.json
-    rds-create-db.json
-    lambda-create-function.json
-  baselines/
-    mature-steady-spend.json
-    volatile-dev-account.json
-    cold-start.json
+### 9.1 Data Factories
+```typescript
+export const makeCloudTrailEvent = (overrides) => ({
+  eventVersion: '1.08',
+  userIdentity: { type: 'AssumedRole', arn: 'arn:aws:sts::123:assumed-role/user' },
+  eventTime: new Date().toISOString(),
+  eventSource: 'ec2.amazonaws.com',
+  eventName: 'RunInstances',
+  requestParameters: { instanceType: 'm5.large' },
+  ...overrides
+});
+
+export const makeBaseline = (overrides) => ({
+  meanHourlyCost: 1.25,
+  stdDev: 0.15,
+  eventCount: 45,
+  ageDays: 16,
+  observedActors: ['arn:aws:iam::123:role/ci'],
+  observedInstanceTypes: ['t3.medium', 'm5.large'],
+  ...overrides
+});
 ```

 ---

 ## Section 10: TDD Implementation Order

-1. **Phase 1:** Anomaly math + Unit tests (Strict TDD).
-2. **Phase 2:** CloudTrail normalizer + Pricing tables.
-3. **Phase 3:** DynamoDB single-table implementation (Integration led).
-4. **Phase 4:** Slack formatting + Remediation Lambda.
-5. **Phase 5:** Governance policies (14-day promotion logic).
+1. **Phase 1: Math & Core Logic (Strict TDD)**
+   - Welford algorithm, Z-score math, Novelty scoring, `fast-check` property tests.
+2. **Phase 2: Ingestion & Normalization**
+   - CloudTrail parsers, pricing static tables, event deduplication.
+3. **Phase 3: Data Persistence (Integration Led)**
+   - DynamoDB Single-Table setup, TransactWriteItems, Testcontainers tests.
+4. **Phase 4: Notifications & Slack Actions**
+   - Block Kit formatting, Slack signature validation, API Gateway endpoints.
+5. **Phase 5: Governance & Tenets**
+   - 14-day promotion logic, Panic mode, OTEL tracing.
+6. **Phase 6: E2E Pipeline**
+   - CDK definitions, LocalStack event injection, wire everything together.

-*End of dd0c/cost Test Architecture*
+*End of dd0c/cost Test Architecture (v2)*