From 1101fef09614a391f5aceb683f20ee3cbb776201 Mon Sep 17 00:00:00 2001
From: Max Mayfield <max@dd0c.net>
Date: Sat, 28 Feb 2026 23:33:07 +0000
Subject: [PATCH] Update test architectures for P3, P4, P5

---
 .../test-architecture/test-architecture.md    | 1420 +++++++++++++++-
 .../test-architecture/test-architecture.md    | 1488 +++++++++++------
 .../test-architecture/test-architecture.md    |  285 +++-
 3 files changed, 2575 insertions(+), 618 deletions(-)

diff --git a/products/03-alert-intelligence/test-architecture/test-architecture.md b/products/03-alert-intelligence/test-architecture/test-architecture.md
index 20826c8..1744ed4 100644
--- a/products/03-alert-intelligence/test-architecture/test-architecture.md
+++ b/products/03-alert-intelligence/test-architecture/test-architecture.md
@@ -1,69 +1,1411 @@
 # dd0c/alert — Test Architecture & TDD Strategy
-**Product:** dd0c/alert (Alert Intelligence Platform)
-**Version:** 2.0 | **Date:** 2026-02-28 | **Phase:** 7 — Test Architecture
-**Stack:** TypeScript / Node.js 20 | Vitest | Testcontainers | LocalStack
+
+**Product:** dd0c/alert — Alert Intelligence Platform
+**Author:** Test Architecture Phase
+**Date:** February 28, 2026
+**Status:** V1 MVP — Solo Founder Scope
 
 ---
 
-## 1. Testing Philosophy & TDD Workflow
+## Section 1: Testing Philosophy & TDD Workflow
 
-### 1.1 Core Principle
+### 1.1 Core Philosophy
 
-dd0c/alert is an intelligence platform — it makes decisions about what engineers see during incidents. A wrong suppression decision can hide a P1. A wrong correlation can create noise. **Tests are not optional; they are the specification.**
+dd0c/alert is a **safety-critical observability tool** — a bug that silently suppresses a real alert during an incident is worse than having no tool at all. The test suite is the contract that guarantees "we will never eat your alerts."
 
-Every behavioral rule in the Correlation Engine, Noise Scorer, and Notification Router must be expressed as a failing test before a single line of implementation is written.
+Guiding principle: **tests describe observable behavior from the on-call engineer's perspective**. If a test can't be explained as "when X happens, the engineer sees Y," it's testing implementation, not behavior.
 
-### 1.2 Red-Green-Refactor Cycle
+For a solo founder, the test suite is also the **regression safety net** — it catches the subtle scoring bugs that would erode customer trust over weeks.
+
+### 1.2 Red-Green-Refactor Adapted to dd0c/alert
 
 ```
-RED    → Write a failing test that describes the desired behavior.
-         The test must fail for the right reason (not a compile error).
+RED   → Write a failing test that describes the desired behavior
+         (e.g., "3 Datadog alerts for the same service within 5 minutes
+          should produce 1 correlated incident")
 
-GREEN  → Write the minimum code to make the test pass.
-         No gold-plating. No "while I'm here" changes.
+GREEN → Write the minimum code to make it pass
+         (hardcode the window, just make it work)
 
-REFACTOR → Clean up the implementation without breaking tests.
-           Extract functions, rename for clarity, remove duplication.
-           Tests stay green throughout.
+REFACTOR → Clean up without breaking tests
+            (extract the window manager, add Redis backing,
+             optimize the fingerprinting)
 ```
 
-**Strict rule:** No implementation code is written without a failing test first. PRs that add implementation without a corresponding test are blocked by CI.
+**When to write tests first (strict TDD):**
+- All correlation logic (time-window clustering, service graph traversal, deploy correlation)
+- All noise scoring algorithms (rule-based scoring, threshold calculations)
+- All HMAC signature validation (security-critical)
+- All fingerprinting/deduplication logic
+- All suppression governance (strict vs. audit mode)
+- All circuit breaker state transitions (suppression DLQ replay)
 
-### 1.3 Test Naming Convention
+**When integration tests lead (test-after, then harden):**
+- Provider webhook parsers — implement against real payload samples, then lock in with contract tests
+- SQS FIFO message ordering — test against LocalStack after implementation
+- Slack message formatting — build the blocks, then snapshot test the output
 
-Tests follow the `given_when_then` pattern using Vitest's `describe`/`it` structure:
+**When E2E tests lead:**
+- The 60-second time-to-value journey — define the happy path first, build backward
+- Weekly noise digest generation — define expected output, then build the aggregation
+
+### 1.3 Test Naming Conventions
 
 ```typescript
+// Unit tests (vitest)
+describe('CorrelationEngine', () => {
+  it('groups alerts for same service within 5min window into single incident', () => {});
+  it('extends window by 2min when alert arrives in last 30 seconds', () => {});
+  it('caps window extension at 15 minutes total', () => {});
+  it('merges downstream service alerts when upstream window is active', () => {});
+});
+
 describe('NoiseScorer', () => {
-  describe('given a deploy-correlated alert window', () => {
-    it('should boost noise score by 25 points when deploy is attached', () => { ... });
-    it('should add 5 additional points when PR title contains "feature-flag"', () => { ... });
-    it('should not boost score above 50 when service matches never-suppress safelist', () => { ... });
+  it('scores deploy-correlated alerts higher when deploy is within 10min', () => {});
+  it('returns zero noise score for first-ever alert from a service', () => {});
+  it('adds 5 points when PR title matches config or feature-flag', () => {});
+});
+
+describe('HmacValidator', () => {
+  it('rejects Datadog webhook with missing DD-WEBHOOK-SIGNATURE header', () => {});
+  it('rejects PagerDuty webhook with tampered body', () => {});
+  it('accepts valid signature and passes payload through', () => {});
+});
+```
+
+**Rules:**
+- Describe the **observable outcome**, not the internal mechanism
+- Use present tense ("groups", "rejects", "scores")
+- If you need "and" in the name, split into two tests
+- Group by component in `describe` blocks
+
+---
+
+## Section 2: Test Pyramid
+
+### 2.1 Ratio
+
+| Level | Target | Count (V1) | Runtime |
+|-------|--------|------------|---------|
+| Unit | 70% | ~350 tests | <30s |
+| Integration | 20% | ~100 tests | <5min |
+| E2E/Smoke | 10% | ~20 tests | <10min |
+
+### 2.2 Unit Test Targets (per component)
+
+| Component | Key Behaviors | Est. Tests |
+|-----------|--------------|------------|
+| Webhook Parsers (Datadog, PD, OpsGenie, Grafana) | Payload normalization, field mapping, batch handling | 60 |
+| HMAC Validator | Signature verification per provider, rejection paths | 20 |
+| Fingerprint Generator | Deterministic hashing, dedup detection | 15 |
+| Correlation Engine | Time-window open/close/extend, service graph merge, deploy correlation | 80 |
+| Noise Scorer | Rule-based scoring, deploy proximity weighting, threshold calculations | 60 |
+| Suggestion Engine | Suppression recommendations, "what would have happened" calculations | 30 |
+| Notification Formatter | Slack block formatting, digest generation, in-place message updates | 25 |
+| Governance Policy | Strict/audit mode enforcement, panic mode, per-customer overrides | 30 |
+| Feature Flags | Circuit breaker on suppression volume, flag lifecycle | 15 |
+| Canonical Schema Mapper | Provider → canonical field mapping, severity normalization | 15 |
+
+### 2.3 Integration Test Boundaries
+
+| Boundary | What's Tested | Infrastructure |
+|----------|--------------|----------------|
+| Lambda → SQS FIFO | Message ordering, dedup, tenant partitioning | LocalStack |
+| SQS → Correlation Engine | Consumer polling, batch processing, error handling | LocalStack |
+| Correlation Engine → Redis | Window CRUD, sorted set operations, TTL expiry | Testcontainers Redis |
+| Correlation Engine → DynamoDB | Incident persistence, tenant config reads | Testcontainers DynamoDB Local |
+| Correlation Engine → TimescaleDB | Time-series writes, continuous aggregate queries | Testcontainers PostgreSQL + TimescaleDB |
+| Notification Service → Slack | Block formatting, rate limiting, message update | WireMock |
+| API Gateway → Lambda | Webhook routing, auth, throttling | LocalStack |
+
+### 2.4 E2E/Smoke Scenarios
+
+1. **60-Second TTV Journey**: Webhook received → alert in Slack within 60s
+2. **Alert Storm Correlation**: 50 alerts in 2 minutes → grouped into 1 incident
+3. **Deploy Correlation**: Deploy event + alert storm → deploy identified as trigger
+4. **Noise Digest**: 7 days of alerts → weekly Slack digest with noise stats
+5. **Multi-Provider Merge**: Datadog + PagerDuty alerts for same service → single incident
+6. **Panic Mode**: Enable panic → all suppression stops → alerts pass through raw
+
+---
+
+## Section 3: Unit Test Strategy
+
+### 3.1 Webhook Parsers
+
+Each provider parser is a pure function: payload in, canonical alert(s) out. No side effects, no DB calls.
+
+```typescript
+// tests/unit/parsers/datadog.test.ts
+describe('DatadogParser', () => {
+  it('normalizes single alert payload to canonical schema', () => {});
+  it('normalizes batched alert array into multiple canonical alerts', () => {});
+  it('maps Datadog P1 to critical, P5 to info', () => {});
+  it('extracts service name from tags array', () => {});
+  it('handles missing optional fields without throwing', () => {});
+  it('generates stable fingerprint from title + service + tenant', () => {});
+});
+
+// tests/unit/parsers/pagerduty.test.ts
+describe('PagerDutyParser', () => {
+  it('normalizes incident.triggered event to canonical alert', () => {});
+  it('normalizes incident.resolved event with resolution metadata', () => {});
+  it('ignores incident.acknowledged events (not alerts)', () => {});
+  it('maps PD urgency high to critical, low to info', () => {});
+});
+
+// tests/unit/parsers/opsgenie.test.ts
+describe('OpsGenieParser', () => {
+  it('normalizes alert.created action to canonical alert', () => {});
+  it('extracts priority P1-P5 and maps to severity', () => {});
+  it('handles custom fields in details object', () => {});
+});
+
+// tests/unit/parsers/grafana.test.ts
+describe('GrafanaParser', () => {
+  it('normalizes Grafana Alertmanager webhook payload', () => {});
+  it('handles multiple alerts in single webhook (Grafana batches)', () => {});
+  it('extracts dashboard URL as context link', () => {});
+});
+```
+
+**Mocking strategy:** None needed — parsers are pure functions. Use recorded payload fixtures from `fixtures/webhooks/{provider}/`.
+
+**Fixture structure:**
+```
+fixtures/webhooks/
+  datadog/
+    single-alert.json
+    batched-alerts.json
+    monitor-recovered.json
+  pagerduty/
+    incident-triggered.json
+    incident-resolved.json
+    incident-acknowledged.json
+  opsgenie/
+    alert-created.json
+    alert-closed.json
+  grafana/
+    single-firing.json
+    multi-firing.json
+    resolved.json
+```
+
+### 3.2 HMAC Validator
+
+```typescript
+describe('HmacValidator', () => {
+  // Datadog uses hex-encoded HMAC-SHA256
+  it('validates correct Datadog DD-WEBHOOK-SIGNATURE header', () => {});
+  it('rejects Datadog webhook with wrong signature', () => {});
+  it('rejects Datadog webhook with missing signature header', () => {});
+
+  // PagerDuty uses v1= prefix with HMAC-SHA256
+  it('validates correct PagerDuty X-PagerDuty-Signature header', () => {});
+  it('rejects PagerDuty webhook with tampered body', () => {});
+
+  // OpsGenie uses different header name
+  it('validates correct OpsGenie X-OpsGenie-Signature header', () => {});
+
+  // Edge cases
+  it('rejects empty body with any signature', () => {});
+  it('handles timing-safe comparison to prevent timing attacks', () => {});
+});
+```
+
+**Mocking strategy:** None — crypto operations are deterministic. Use known secret + body + expected signature triples.
+
+### 3.3 Fingerprint Generator
+
+```typescript
+describe('FingerprintGenerator', () => {
+  it('generates deterministic SHA-256 from tenant_id + provider + service + title', () => {});
+  it('produces same fingerprint for identical alerts regardless of timestamp', () => {});
+  it('produces different fingerprints when service differs', () => {});
+  it('normalizes title whitespace before hashing', () => {});
+  it('handles unicode characters in title consistently', () => {});
+});
+```
+
+### 3.4 Correlation Engine
+
+The most complex component. Heavy use of table-driven tests.
+
+```typescript
+describe('CorrelationEngine', () => {
+  describe('Time-Window Management', () => {
+    it('opens new 5min window on first alert for a service', () => {});
+    it('adds subsequent alerts to existing open window', () => {});
+    it('extends window by 2min when alert arrives in last 30 seconds', () => {});
+    it('caps total window duration at 15 minutes', () => {});
+    it('closes window after timeout with no new alerts', () => {});
+    it('generates incident record when window closes', () => {});
+  });
+
+  describe('Service Graph Correlation', () => {
+    it('merges downstream alerts into upstream window when dependency exists', () => {});
+    it('does not merge alerts for unrelated services', () => {});
+    it('handles circular dependencies without infinite loop', () => {});
+    it('traverses multi-level dependency chains (A→B→C)', () => {});
+  });
+
+  describe('Deploy Correlation', () => {
+    it('tags incident with deploy_id when deploy event within 10min of first alert', () => {});
+    it('does not correlate deploy older than 10 minutes', () => {});
+    it('correlates deploy to correct service even with multiple recent deploys', () => {});
+    it('adds deploy correlation score boost to noise calculation', () => {});
+  });
+
+  describe('Multi-Tenant Isolation', () => {
+    it('never correlates alerts across different tenants', () => {});
+    it('maintains separate windows per tenant', () => {});
+    it('handles concurrent alerts from multiple tenants', () => {});
   });
 });
 ```
 
-Test file naming: `{module}.test.ts` for unit tests, `{module}.integration.test.ts` for integration tests, `{journey}.e2e.test.ts` for E2E.
+**Mocking strategy:**
+- Mock Redis client (`ioredis-mock`) for window state
+- Mock DynamoDB client for service dependency reads
+- Mock SQS for downstream message publishing
+- Use `sinon.useFakeTimers()` for time-window testing
 
-### 1.4 When Tests Lead (TDD Mandatory)
+### 3.5 Noise Scorer
 
-TDD is **mandatory** for:
-- All noise scoring logic (`src/scoring/`)
-- All correlation rules (`src/correlation/`)
-- All suppression decisions (`src/suppression/`)
-- HMAC validation per provider
-- Canonical schema mapping (every provider parser)
-- Feature flag circuit breaker logic
-- Governance policy enforcement (`policy.json` evaluation)
-- Any function with cyclomatic complexity > 3
+```typescript
+describe('NoiseScorer', () => {
+  describe('Rule-Based Scoring', () => {
+    it('returns 0 for first-ever alert from a service (no history)', () => {});
+    it('scores higher when alert has fired >5 times in 24 hours', () => {});
+    it('scores higher when alert auto-resolved within 5 minutes', () => {});
+    it('adds deploy correlation bonus (+15 points) when deploy is recent', () => {});
+    it('adds feature-flag bonus (+5 points) when PR title matches config/feature-flag', () => {});
+    it('caps total score at 100', () => {});
+    it('never scores critical severity alerts above 80 (safety cap)', () => {});
+  });
 
-TDD is **recommended but not enforced** for:
-- Infrastructure glue code (SQS consumers, DynamoDB adapters)
-- Slack Block Kit message formatting
-- Dashboard API route handlers (covered by integration tests)
+  describe('Threshold Calculations', () => {
+    it('classifies score 0-30 as signal (keep)', () => {});
+    it('classifies score 31-70 as review (annotate)', () => {});
+    it('classifies score 71-100 as noise (suggest suppress)', () => {});
+    it('uses tenant-specific thresholds when configured', () => {});
+  });
 
-### 1.5 Test Ownership
+  describe('What-Would-Have-Happened', () => {
+    it('calculates suppression count for historical window', () => {});
+    it('reports zero false negatives when no suppressed alert was critical', () => {});
+    it('flags false negative when suppressed alert was later escalated', () => {});
+  });
+});
+```
 
-Each epic owns its tests. The Correlation Engine team owns `src/correlation/**/*.test.ts`. No cross-team test ownership. If a test breaks due to a dependency change, the team that changed the dependency fixes the test.
+**Mocking strategy:** Mock the alert history store (DynamoDB queries). Scorer logic itself is pure calculation.
+
+### 3.6 Notification Formatter
+
+```typescript
+describe('NotificationFormatter', () => {
+  describe('Slack Blocks', () => {
+    it('formats single-alert notification with service, title, severity', () => {});
+    it('formats correlated incident with alert count and sources', () => {});
+    it('includes deploy trigger when deploy correlation exists', () => {});
+    it('includes noise score badge (🟢 signal / 🟡 review / 🔴 noise)', () => {});
+    it('includes feedback buttons (👍 Helpful / 👎 Not helpful)', () => {});
+    it('formats in-place update message (replaces initial alert)', () => {});
+  });
+
+  describe('Weekly Digest', () => {
+    it('aggregates 7 days of incidents into summary stats', () => {});
+    it('highlights top 3 noisiest services', () => {});
+    it('shows suppression savings ("would have saved X pages")', () => {});
+  });
+});
+```
+
+**Mocking strategy:** Snapshot tests — render the Slack blocks to JSON and compare against golden fixtures.
+
+### 3.7 Governance Policy Engine
+
+```typescript
+describe('GovernancePolicy', () => {
+  describe('Mode Enforcement', () => {
+    it('in strict mode: annotates alerts but never suppresses', () => {});
+    it('in audit mode: auto-suppresses with full logging', () => {});
+    it('defaults new tenants to strict mode', () => {});
+  });
+
+  describe('Panic Mode', () => {
+    it('when panic=true: all suppression stops immediately', () => {});
+    it('when panic=true: all alerts pass through unmodified', () => {});
+    it('panic mode activatable via Redis key check', () => {});
+    it('panic mode shows banner in dashboard API response', () => {});
+  });
+
+  describe('Per-Customer Override', () => {
+    it('customer can set stricter mode than system default', () => {});
+    it('customer cannot set less restrictive mode than system default', () => {});
+    it('merge logic: max_restrictive(system, customer)', () => {});
+  });
+
+  describe('Policy Decision Logging', () => {
+    it('logs "suppressed by audit mode" with full context', () => {});
+    it('logs "annotation-only, strict mode active" for strict tenants', () => {});
+    it('logs "panic mode active — all alerts passing through"', () => {});
+  });
+});
+```
+
+### 3.8 Feature Flag Circuit Breaker
+
+```typescript
+describe('SuppressionCircuitBreaker', () => {
+  it('allows suppression when volume is within baseline', () => {});
+  it('trips breaker when suppression exceeds 2x baseline over 30min', () => {});
+  it('auto-disables the scoring flag when breaker trips', () => {});
+  it('replays suppressed alerts from DLQ when breaker trips', () => {});
+  it('resets breaker after manual flag re-enable', () => {});
+  it('tracks suppression count per flag in Redis sliding window', () => {});
+});
+```
 
 ---
+
+## Section 4: Integration Test Strategy
+
+### 4.1 Webhook Contract Tests
+
+Each provider integration gets a contract test suite that validates the full path: HTTP request → Lambda → SQS message.
+
+```typescript
+// tests/integration/webhooks/datadog.contract.test.ts
+describe('Datadog Webhook Contract', () => {
+  let localstack: LocalStackContainer;
+  let sqsClient: SQSClient;
+
+  beforeAll(async () => {
+    localstack = await new LocalStackContainer().start();
+    sqsClient = new SQSClient({ endpoint: localstack.getEndpoint() });
+    // Create SQS FIFO queue
+    await sqsClient.send(new CreateQueueCommand({
+      QueueName: 'alert-ingested.fifo',
+      Attributes: { FifoQueue: 'true', ContentBasedDeduplication: 'true' }
+    }));
+  });
+
+  it('accepts valid Datadog webhook and produces canonical SQS message', async () => {
+    const payload = loadFixture('webhooks/datadog/single-alert.json');
+    const signature = computeHmac(payload, TEST_SECRET);
+
+    const res = await request(app)
+      .post('/v1/wh/tenant-123/datadog')
+      .set('DD-WEBHOOK-SIGNATURE', signature)
+      .send(payload);
+
+    expect(res.status).toBe(200);
+
+    const messages = await pollSqs(sqsClient, 'alert-ingested.fifo');
+    expect(messages).toHaveLength(1);
+    expect(messages[0].body).toMatchObject({
+      tenant_id: 'tenant-123',
+      provider: 'datadog',
+      severity: expect.stringMatching(/critical|high|medium|low|info/),
+      fingerprint: expect.stringMatching(/^[a-f0-9]{64}$/),
+    });
+  });
+
+  it('rejects webhook with invalid HMAC and produces no SQS message', async () => {
+    const payload = loadFixture('webhooks/datadog/single-alert.json');
+
+    const res = await request(app)
+      .post('/v1/wh/tenant-123/datadog')
+      .set('DD-WEBHOOK-SIGNATURE', 'bad-signature')
+      .send(payload);
+
+    expect(res.status).toBe(401);
+    const messages = await pollSqs(sqsClient, 'alert-ingested.fifo', { waitMs: 1000 });
+    expect(messages).toHaveLength(0);
+  });
+});
+```
+
+Repeat pattern for PagerDuty, OpsGenie, Grafana — each with provider-specific signature headers and payload formats.
+
+### 4.2 Correlation Engine → Redis Integration
+
+```typescript
+// tests/integration/correlation/redis-windows.test.ts
+describe('Correlation Engine + Redis', () => {
+  let redis: StartedTestContainer;
+  let redisClient: Redis;
+
+  beforeAll(async () => {
+    redis = await new GenericContainer('redis:7-alpine')
+      .withExposedPorts(6379)
+      .start();
+    redisClient = new Redis({ host: redis.getHost(), port: redis.getMappedPort(6379) });
+  });
+
+  it('opens window in Redis sorted set with correct TTL', async () => {
+    await correlationEngine.processAlert(makeAlert({ service: 'payment-api' }));
+
+    const windows = await redisClient.zrange('windows:tenant-123', 0, -1, 'WITHSCORES');
+    expect(windows).toHaveLength(2); // [windowId, closesAtEpoch]
+    const ttl = await redisClient.ttl('window:tenant-123:payment-api');
+    expect(ttl).toBeGreaterThan(280); // ~5min minus processing time
+  });
+
+  it('extends window when alert arrives in last 30 seconds', async () => {
+    // Open window, advance clock to T+4m31s, send another alert
+    await correlationEngine.processAlert(makeAlert({ service: 'payment-api' }));
+    vi.advanceTimersByTime(4 * 60 * 1000 + 31 * 1000);
+    await correlationEngine.processAlert(makeAlert({ service: 'payment-api' }));
+
+    const ttl = await redisClient.ttl('window:tenant-123:payment-api');
+    expect(ttl).toBeGreaterThan(100); // Extended by ~2min
+  });
+
+  it('isolates windows between tenants', async () => {
+    await correlationEngine.processAlert(makeAlert({ tenant: 'A', service: 'api' }));
+    await correlationEngine.processAlert(makeAlert({ tenant: 'B', service: 'api' }));
+
+    const windowsA = await redisClient.zrange('windows:A', 0, -1);
+    const windowsB = await redisClient.zrange('windows:B', 0, -1);
+    expect(windowsA).toHaveLength(1);
+    expect(windowsB).toHaveLength(1);
+    expect(windowsA[0]).not.toBe(windowsB[0]);
+  });
+});
+```
+
+### 4.3 Correlation Engine → DynamoDB Integration
+
+```typescript
+// tests/integration/correlation/dynamodb-incidents.test.ts
+describe('Correlation Engine + DynamoDB', () => {
+  let dynamodb: StartedTestContainer;
+
+  beforeAll(async () => {
+    dynamodb = await new GenericContainer('amazon/dynamodb-local:latest')
+      .withExposedPorts(8000)
+      .start();
+    // Create tables: alerts, incidents, tenant_config, service_dependencies
+  });
+
+  it('persists incident record when correlation window closes', async () => {
+    await correlationEngine.processAlert(makeAlert({ service: 'api' }));
+    await correlationEngine.processAlert(makeAlert({ service: 'api' }));
+    await correlationEngine.closeExpiredWindows();
+
+    const incidents = await queryIncidents('tenant-123');
+    expect(incidents).toHaveLength(1);
+    expect(incidents[0].alert_count).toBe(2);
+    expect(incidents[0].services).toContain('api');
+  });
+
+  it('reads service dependencies for cascading correlation', async () => {
+    await putServiceDependency('tenant-123', 'api', 'database');
+    await correlationEngine.processAlert(makeAlert({ service: 'database' }));
+    await correlationEngine.processAlert(makeAlert({ service: 'api' }));
+
+    // Both should be in the same window
+    const windows = await getActiveWindows('tenant-123');
+    expect(windows).toHaveLength(1);
+    expect(windows[0].services).toEqual(expect.arrayContaining(['api', 'database']));
+  });
+});
+```
+
+### 4.4 Correlation Engine → TimescaleDB Integration
+
+```typescript
+// tests/integration/correlation/timescaledb-trends.test.ts
+describe('Correlation Engine + TimescaleDB', () => {
+  let pg: StartedTestContainer;
+
+  beforeAll(async () => {
+    pg = await new GenericContainer('timescale/timescaledb:latest-pg16')
+      .withExposedPorts(5432)
+      .withEnvironment({ POSTGRES_PASSWORD: 'test' })
+      .start();
+    // Run migrations: create hypertables, continuous aggregates
+  });
+
+  it('writes alert frequency data to hypertable', async () => {
+    await correlationEngine.recordAlertEvent(makeAlert({ service: 'api' }));
+    const rows = await query('SELECT * FROM alert_events WHERE service = $1', ['api']);
+    expect(rows).toHaveLength(1);
+  });
+
+  it('continuous aggregate calculates hourly alert counts', async () => {
+    // Insert 10 alerts spread over 2 hours
+    await insertAlertEvents(10, { spreadHours: 2 });
+    await refreshContinuousAggregate('hourly_alert_summary');
+
+    const summary = await query('SELECT * FROM hourly_alert_summary');
+    expect(summary).toHaveLength(2);
+    expect(summary.reduce((s, r) => s + r.alert_count, 0)).toBe(10);
+  });
+});
+```
+
+### 4.5 Notification Service → Slack (WireMock)
+
+```typescript
+// tests/integration/notifications/slack.test.ts
+describe('Notification Service + Slack', () => {
+  let wiremock: WireMockContainer;
+
+  beforeAll(async () => {
+    wiremock = await new WireMockContainer().start();
+    wiremock.stub({
+      request: { method: 'POST', urlPath: '/api/chat.postMessage' },
+      response: { status: 200, body: JSON.stringify({ ok: true, ts: '1234.5678' }) }
+    });
+    wiremock.stub({
+      request: { method: 'POST', urlPath: '/api/chat.update' },
+      response: { status: 200, body: JSON.stringify({ ok: true }) }
+    });
+  });
+
+  it('sends initial alert notification to correct Slack channel', async () => {});
+  it('updates message in-place when correlation completes', async () => {});
+  it('respects Slack rate limits (1 msg/sec per channel)', async () => {});
+  it('retries on 429 with exponential backoff', async () => {});
+  it('includes feedback buttons in correlated incident message', async () => {});
+});
+```
+
+---
+
+## Section 5: E2E & Smoke Tests
+
+### 5.1 Critical User Journeys
+
+**Journey 1: 60-Second Time-to-Value**
+
+The defining test for dd0c/alert. Validates the entire pipeline from webhook to Slack notification.
+
+```typescript
+// tests/e2e/journeys/sixty-second-ttv.test.ts
+describe('60-Second Time-to-Value', () => {
+  it('delivers first correlated incident to Slack within 60 seconds of webhook', async () => {
+    const start = Date.now();
+
+    // 1. Send Datadog webhook
+    await sendWebhook('datadog', fixtures.datadog.singleAlert, { tenant: 'e2e-tenant' });
+
+    // 2. Wait for Slack message
+    const slackMessage = await waitForSlackMessage('e2e-channel', { timeoutMs: 60_000 });
+
+    const elapsed = Date.now() - start;
+    expect(elapsed).toBeLessThan(60_000);
+    expect(slackMessage.text).toContain('New alert');
+    expect(slackMessage.blocks).toBeDefined();
+  });
+});
+```
+
+**Journey 2: Alert Storm Correlation**
+
+```typescript
+// tests/e2e/journeys/alert-storm.test.ts
+describe('Alert Storm Correlation', () => {
+  it('groups 50 alerts in 2 minutes into a single correlated incident', async () => {
+    // Fire 50 alerts for same service over 2 minutes
+    for (let i = 0; i < 50; i++) {
+      await sendWebhook('datadog', makeAlertPayload({
+        service: 'payment-api',
+        title: `High latency on payment-api (${i})`,
+      }));
+      await sleep(2400); // ~50 alerts in 2 min
+    }
+
+    // Wait for correlation window to close
+    await sleep(5 * 60 * 1000 + 30_000); // 5min window + buffer
+
+    const slackMessages = await getSlackMessages('e2e-channel');
+    const incidentMessages = slackMessages.filter(m => m.text.includes('Incident'));
+    expect(incidentMessages).toHaveLength(1);
+    expect(incidentMessages[0].text).toContain('50 alerts grouped');
+  });
+});
+```
+
+**Journey 3: Deploy Correlation**
+
+```typescript
+// tests/e2e/journeys/deploy-correlation.test.ts
+describe('Deploy Correlation', () => {
+  it('identifies deploy as trigger when alerts follow within 10 minutes', async () => {
+    // 1. Send deploy event
+    await sendWebhook('github-actions', makeDeployPayload({
+      service: 'payment-api',
+      commit: 'abc123',
+      pr_title: 'feat: add retry logic',
+    }));
+
+    // 2. Wait 2 minutes, then fire alerts
+    await sleep(2 * 60 * 1000);
+    await sendWebhook('datadog', makeAlertPayload({ service: 'payment-api' }));
+    await sendWebhook('pagerduty', makeAlertPayload({ service: 'payment-api' }));
+
+    // 3. Wait for correlation
+    await sleep(6 * 60 * 1000);
+
+    const slackMessage = await getLatestSlackMessage('e2e-channel');
+    expect(slackMessage.text).toContain('Deploy #');
+    expect(slackMessage.text).toContain('abc123');
+  });
+});
+```
+
+**Journey 4: Panic Mode**
+
+```typescript
+// tests/e2e/journeys/panic-mode.test.ts
+describe('Panic Mode', () => {
+  it('stops all suppression immediately when panic mode is activated', async () => {
+    // 1. Enable audit mode, verify suppression works
+    await setGovernanceMode('e2e-tenant', 'audit');
+    await sendNoisyAlerts(10);
+    const beforePanic = await getSlackMessages('e2e-channel');
+    const suppressedBefore = beforePanic.filter(m => m.text.includes('suppressed'));
+
+    // 2. Activate panic mode
+    await fetch('/admin/panic', { method: 'POST' });
+
+    // 3. Send more alerts — all should pass through
+    await sendNoisyAlerts(10);
+    const afterPanic = await getSlackMessages('e2e-channel');
+    const rawAlerts = afterPanic.filter(m => !m.text.includes('suppressed'));
+    expect(rawAlerts.length).toBeGreaterThanOrEqual(10);
+  });
+});
+```
+
+### 5.2 E2E Infrastructure
+
+```yaml
+# docker-compose.e2e.yml
+services:
+  localstack:
+    image: localstack/localstack:3
+    environment:
+      SERVICES: sqs,s3,dynamodb,apigateway,lambda
+    ports: ["4566:4566"]
+
+  timescaledb:
+    image: timescale/timescaledb:latest-pg16
+    environment:
+      POSTGRES_PASSWORD: test
+    ports: ["5432:5432"]
+
+  redis:
+    image: redis:7-alpine
+    ports: ["6379:6379"]
+
+  wiremock:
+    image: wiremock/wiremock:3
+    ports: ["8080:8080"]
+    volumes:
+      - ./fixtures/wiremock:/home/wiremock/mappings
+
+  app:
+    build: .
+    environment:
+      AWS_ENDPOINT: http://localstack:4566
+      REDIS_URL: redis://redis:6379
+      TIMESCALE_URL: postgres://postgres:test@timescaledb:5432/test
+      SLACK_API_URL: http://wiremock:8080
+    depends_on: [localstack, timescaledb, redis, wiremock]
+```
+
+### 5.3 Synthetic Alert Generation
+
+```typescript
+// tests/e2e/helpers/alert-generator.ts
+export function makeAlertPayload(overrides: Partial<AlertPayload> = {}): DatadogWebhookPayload {
+  return {
+    id: ulid(),
+    title: overrides.title ?? `Alert: ${faker.hacker.phrase()}`,
+    text: faker.lorem.sentence(),
+    date_happened: Math.floor(Date.now() / 1000),
+    priority: overrides.priority ?? 'normal',
+    tags: [`service:${overrides.service ?? 'test-service'}`],
+    alert_type: overrides.severity ?? 'warning',
+    ...overrides,
+  };
+}
+
+export async function sendNoisyAlerts(count: number, opts?: { service?: string }) {
+  for (let i = 0; i < count; i++) {
+    await sendWebhook('datadog', makeAlertPayload({
+      service: opts?.service ?? 'noisy-service',
+      title: `Flapping alert #${i}`,
+    }));
+  }
+}
+```
+
+---
+
+## Section 6: Performance & Load Testing
+
+### 6.1 Alert Ingestion Throughput
+
+```typescript
+// tests/perf/ingestion-throughput.test.ts
+describe('Ingestion Throughput', () => {
+  it('processes 1000 webhooks/second without dropping payloads', async () => {
+    const results = await k6.run({
+      vus: 100,
+      duration: '30s',
+      thresholds: {
+        http_req_duration: ['p95<200'],  // 200ms p95
+        http_req_failed: ['rate<0.001'],  // <0.1% failure
+      },
+      script: `
+        import http from 'k6/http';
+        export default function() {
+          http.post('${WEBHOOK_URL}/v1/wh/perf-tenant/datadog', 
+            JSON.stringify(makeAlertPayload()),
+            { headers: { 'DD-WEBHOOK-SIGNATURE': validSig } }
+          );
+        }
+      `,
+    });
+    expect(results.metrics.http_req_failed.rate).toBeLessThan(0.001);
+  });
+});
+```
+
+### 6.2 Correlation Latency Under Alert Storms
+
+```typescript
+describe('Correlation Storm Performance', () => {
+  it('correlates 500 alerts across 10 services within 30 seconds', async () => {
+    const start = Date.now();
+    
+    // Simulate incident storm: 500 alerts, 10 services, 2 minutes
+    await generateAlertStorm({ alerts: 500, services: 10, durationMs: 120_000 });
+    
+    // Wait for all windows to close
+    await waitForIncidents('perf-tenant', { minCount: 1, timeoutMs: 30_000 });
+    
+    const elapsed = Date.now() - start - 120_000; // subtract generation time
+    expect(elapsed).toBeLessThan(30_000);
+  });
+
+  it('Redis memory stays under 50MB during 10K active windows', async () => {
+    // Open 10K windows across 100 tenants
+    for (let t = 0; t < 100; t++) {
+      for (let s = 0; s < 100; s++) {
+        await correlationEngine.processAlert(makeAlert({
+          tenant: `tenant-${t}`,
+          service: `service-${s}`,
+        }));
+      }
+    }
+    const memoryUsage = await redisClient.info('memory');
+    const usedMb = parseRedisMemory(memoryUsage);
+    expect(usedMb).toBeLessThan(50);
+  });
+});
+```
+
+### 6.3 Noise Scoring Latency
+
+```typescript
+describe('Noise Scoring Performance', () => {
+  it('scores a correlated incident with 50 alerts in <100ms', async () => {
+    const incident = makeIncident({ alertCount: 50, withHistory: true });
+    
+    const start = performance.now();
+    const score = await noiseScorer.score(incident);
+    const elapsed = performance.now() - start;
+    
+    expect(elapsed).toBeLessThan(100);
+    expect(score).toBeGreaterThanOrEqual(0);
+    expect(score).toBeLessThanOrEqual(100);
+  });
+});
+```
+
+### 6.4 Memory Pressure During High-Cardinality Correlation
+
+```typescript
+describe('Memory Pressure', () => {
+  it('ECS task stays under 512MB with 1000 concurrent correlation windows', async () => {
+    // Monitor ECS task memory while processing high-cardinality alerts
+    const memBefore = process.memoryUsage().heapUsed;
+    
+    await processHighCardinalityAlerts({ tenants: 100, servicesPerTenant: 10 });
+    
+    const memAfter = process.memoryUsage().heapUsed;
+    const deltaMb = (memAfter - memBefore) / 1024 / 1024;
+    expect(deltaMb).toBeLessThan(256); // Leave headroom in 512MB task
+  });
+});
+```
+
+---
+
+## Section 7: CI/CD Pipeline Integration
+
+### 7.1 Pipeline Stages
+
+```
+┌─────────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
+│ Pre-Commit   │───▶│ PR Gate  │───▶│ Merge    │───▶│ Staging  │───▶│ Prod     │
+│ (local)      │    │ (CI)     │    │ (CI)     │    │ (CD)     │    │ (CD)     │
+└─────────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
+  lint + format     unit tests      full suite      E2E + perf     smoke + canary
+  type check        integration     coverage gate   LocalStack     deploy event
+  <10s              <5min           <10min          <15min         self-dogfood
+```
+
+### 7.2 Stage Details
+
+**Pre-Commit (local, <10s):**
+- `eslint` + `prettier` format check
+- `tsc --noEmit` type check
+- Affected unit tests only (`vitest --changed`)
+
+**PR Gate (CI, <5min):**
+- Full unit test suite
+- Integration tests (Testcontainers spin up in CI)
+- Schema migration lint (no DROP/RENAME/TYPE changes)
+- Decision log presence check for scoring/correlation PRs
+- Coverage diff: new code must have ≥80% coverage
+
+**Merge to Main (CI, <10min):**
+- Full test suite (unit + integration)
+- Coverage gate: overall ≥80%, scoring engine ≥90%
+- CDK synth + diff (infrastructure changes)
+- Security scan (`npm audit`, `trivy`)
+
+**Staging (CD, <15min):**
+- Deploy to staging environment
+- E2E journey tests against LocalStack
+- Performance benchmarks (ingestion throughput, correlation latency)
+- Synthetic alert generation + validation
+
+**Production (CD):**
+- Canary deploy (10% traffic for 5 minutes)
+- Smoke tests (send test webhook, verify Slack delivery)
+- dd0c/alert dogfoods itself: deploy event sent to own webhook
+- Automated rollback if error rate >1% during canary
+
+### 7.3 Coverage Thresholds
+
+| Component | Minimum | Target |
+|-----------|---------|--------|
+| Webhook Parsers | 90% | 95% |
+| HMAC Validator | 95% | 100% |
+| Correlation Engine | 85% | 90% |
+| Noise Scorer | 90% | 95% |
+| Governance Policy | 90% | 95% |
+| Notification Formatter | 75% | 85% |
+| Overall | 80% | 85% |
+
+### 7.4 Test Parallelization
+
+```yaml
+# .github/workflows/test.yml
+jobs:
+  unit:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        shard: [1, 2, 3, 4]
+    steps:
+      - run: vitest --shard=${{ matrix.shard }}/4
+
+  integration:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        suite: [webhooks, correlation, notifications, storage]
+    steps:
+      - run: vitest --project=integration --grep=${{ matrix.suite }}
+
+  e2e:
+    needs: [unit, integration]
+    runs-on: ubuntu-latest
+    steps:
+      - run: docker compose -f docker-compose.e2e.yml up -d
+      - run: vitest --project=e2e
+```
+
+---
+
+## Section 8: Transparent Factory Tenet Testing
+
+### 8.1 Atomic Flagging — Suppression Circuit Breaker
+
+```typescript
+describe('Atomic Flagging', () => {
+  describe('Flag Lifecycle', () => {
+    it('new scoring rule flag defaults to false (off)', () => {});
+    it('flag has owner and ttl metadata', () => {});
+    it('CI blocks when flag at 100% exceeds 14-day TTL', () => {});
+  });
+
+  describe('Circuit Breaker on Suppression Volume', () => {
+    it('allows suppression when volume is within 2x baseline', () => {});
+    it('trips breaker when suppression exceeds 2x baseline over 30min', () => {});
+    it('auto-disables the flag when breaker trips', () => {});
+    it('buffers suppressed alerts in DLQ during normal operation', () => {});
+    it('replays DLQ alerts when breaker trips', async () => {
+      // 1. Enable scoring flag, suppress 20 alerts
+      // 2. Trip the breaker by spiking suppression rate
+      // 3. Verify all 20 suppressed alerts are re-emitted from DLQ
+      // 4. Verify flag is now disabled
+    });
+    it('DLQ retains alerts for 1 hour before expiry', () => {});
+  });
+
+  describe('Local Evaluation', () => {
+    it('flag evaluation does not make network calls', () => {});
+    it('flag state is cached in-memory and refreshed every 60s', () => {});
+  });
+});
+```
+
+### 8.2 Elastic Schema — Migration Validation
+
+```typescript
+describe('Elastic Schema', () => {
+  describe('Migration Lint', () => {
+    it('rejects migration with DROP COLUMN statement', () => {
+      const migration = 'ALTER TABLE alert_events DROP COLUMN old_field;';
+      expect(lintMigration(migration)).toContainError('DROP not allowed');
+    });
+    it('rejects migration with ALTER COLUMN TYPE', () => {
+      const migration = 'ALTER TABLE alert_events ALTER COLUMN severity TYPE integer;';
+      expect(lintMigration(migration)).toContainError('TYPE change not allowed');
+    });
+    it('rejects migration with RENAME COLUMN', () => {});
+    it('accepts migration with ADD COLUMN (nullable)', () => {
+      const migration = 'ALTER TABLE alert_events ADD COLUMN noise_score_v2 integer;';
+      expect(lintMigration(migration)).toBeValid();
+    });
+    it('accepts migration with new table creation', () => {});
+  });
+
+  describe('DynamoDB Schema', () => {
+    it('rejects attribute type change in table definition', () => {});
+    it('accepts new attribute addition', () => {});
+    it('V1 code ignores V2 attributes without error', () => {});
+  });
+
+  describe('Sunset Enforcement', () => {
+    it('every migration file contains sunset_date comment', () => {
+      const migrations = glob.sync('migrations/*.sql');
+      for (const m of migrations) {
+        const content = fs.readFileSync(m, 'utf-8');
+        expect(content).toMatch(/-- sunset_date: \d{4}-\d{2}-\d{2}/);
+      }
+    });
+    it('CI warns when migration is past sunset date', () => {});
+  });
+});
+```
+
+### 8.3 Cognitive Durability — Decision Log Validation
+
+```typescript
+describe('Cognitive Durability', () => {
+  it('decision_log.json exists for every PR touching scoring/', () => {
+    // CI hook: check git diff for files in src/scoring/
+    // If touched, require docs/decisions/*.json in the same PR
+  });
+
+  it('decision log has required fields', () => {
+    const logs = glob.sync('docs/decisions/*.json');
+    for (const log of logs) {
+      const entry = JSON.parse(fs.readFileSync(log, 'utf-8'));
+      expect(entry).toHaveProperty('reasoning');
+      expect(entry).toHaveProperty('alternatives_considered');
+      expect(entry).toHaveProperty('confidence');
+      expect(entry).toHaveProperty('timestamp');
+      expect(entry).toHaveProperty('author');
+    }
+  });
+
+  it('cyclomatic complexity stays under 10 for all scoring functions', () => {
+    // Run eslint with complexity rule
+    const result = execSync('eslint src/scoring/ --rule "complexity: [error, 10]"');
+    expect(result.exitCode).toBe(0);
+  });
+});
+```
+
+### 8.4 Semantic Observability — OTEL Span Assertions
+
+```typescript
+describe('Semantic Observability', () => {
+  let spanExporter: InMemorySpanExporter;
+
+  beforeEach(() => {
+    spanExporter = new InMemorySpanExporter();
+    // Configure OTEL with in-memory exporter for testing
+  });
+
+  describe('Alert Evaluation Spans', () => {
+    it('emits parent alert_evaluation span for each alert', async () => {
+      await processAlert(makeAlert());
+      const spans = spanExporter.getFinishedSpans();
+      const evalSpan = spans.find(s => s.name === 'alert_evaluation');
+      expect(evalSpan).toBeDefined();
+    });
+
+    it('emits child noise_scoring span with score attributes', async () => {
+      await processAlert(makeAlert());
+      const spans = spanExporter.getFinishedSpans();
+      const scoreSpan = spans.find(s => s.name === 'noise_scoring');
+      expect(scoreSpan).toBeDefined();
+      expect(scoreSpan.attributes['alert.noise_score']).toBeGreaterThanOrEqual(0);
+      expect(scoreSpan.attributes['alert.noise_score']).toBeLessThanOrEqual(100);
+    });
+
+    it('emits child correlation_matching span with match data', async () => {
+      await processAlert(makeAlert());
+      const spans = spanExporter.getFinishedSpans();
+      const corrSpan = spans.find(s => s.name === 'correlation_matching');
+      expect(corrSpan).toBeDefined();
+      expect(corrSpan.attributes).toHaveProperty('alert.correlation_matches');
+    });
+
+    it('emits suppression_decision span with reason', async () => {
+      await processAlert(makeAlert());
+      const spans = spanExporter.getFinishedSpans();
+      const suppSpan = spans.find(s => s.name === 'suppression_decision');
+      expect(suppSpan.attributes).toHaveProperty('alert.suppressed');
+      expect(suppSpan.attributes).toHaveProperty('alert.suppression_reason');
+    });
+  });
+
+  describe('PII Protection', () => {
+    it('never includes raw alert payload in span attributes', async () => {
+      await processAlert(makeAlert({ title: 'User john@example.com failed login' }));
+      const spans = spanExporter.getFinishedSpans();
+      for (const span of spans) {
+        const attrs = JSON.stringify(span.attributes);
+        expect(attrs).not.toContain('john@example.com');
+      }
+    });
+
+    it('uses hashed alert source identifier, not raw', async () => {
+      await processAlert(makeAlert({ source: 'prod-payment-api' }));
+      const spans = spanExporter.getFinishedSpans();
+      const evalSpan = spans.find(s => s.name === 'alert_evaluation');
+      expect(evalSpan.attributes['alert.source']).toMatch(/^[a-f0-9]+$/);
+    });
+  });
+});
+```
+
+### 8.5 Configurable Autonomy — Governance Policy Tests
+
+```typescript
+describe('Configurable Autonomy', () => {
+  describe('Governance Mode Enforcement', () => {
+    it('strict mode: annotates but never suppresses', async () => {
+      setPolicy({ governance_mode: 'strict' });
+      const result = await processNoisyAlert(makeAlert({ noiseScore: 95 }));
+      expect(result.suppressed).toBe(false);
+      expect(result.annotation).toContain('noise_score: 95');
+    });
+
+    it('audit mode: auto-suppresses with logging', async () => {
+      setPolicy({ governance_mode: 'audit' });
+      const result = await processNoisyAlert(makeAlert({ noiseScore: 95 }));
+      expect(result.suppressed).toBe(true);
+      expect(result.log).toContain('suppressed by audit mode');
+    });
+  });
+
+  describe('Panic Mode', () => {
+    it('activates in <1 second via API call', async () => {
+      const start = Date.now();
+      await fetch('/admin/panic', { method: 'POST' });
+      const panicActive = await redisClient.get('dd0c:panic');
+      expect(Date.now() - start).toBeLessThan(1000);
+      expect(panicActive).toBe('true');
+    });
+
+    it('stops all suppression when active', async () => {
+      await activatePanic();
+      const results = await Promise.all(
+        Array.from({ length: 10 }, () => processNoisyAlert(makeAlert({ noiseScore: 99 })))
+      );
+      expect(results.every(r => r.suppressed === false)).toBe(true);
+    });
+  });
+
+  describe('Per-Customer Override', () => {
+    it('customer strict overrides system audit', async () => {
+      setPolicy({ governance_mode: 'audit' });
+      setCustomerPolicy('tenant-123', { governance_mode: 'strict' });
+      const result = await processNoisyAlert(makeAlert({ tenant: 'tenant-123', noiseScore: 95 }));
+      expect(result.suppressed).toBe(false);
+    });
+
+    it('customer cannot downgrade from system strict to audit', async () => {
+      setPolicy({ governance_mode: 'strict' });
+      setCustomerPolicy('tenant-123', { governance_mode: 'audit' });
+      const result = await processNoisyAlert(makeAlert({ tenant: 'tenant-123', noiseScore: 95 }));
+      expect(result.suppressed).toBe(false); // System strict wins
+    });
+  });
+});
+```
+
+---
+
+## Section 9: Test Data & Fixtures
+
+### 9.1 Directory Structure
+
+```
+tests/
+  fixtures/
+    webhooks/
+      datadog/
+        single-alert.json
+        batched-alerts.json
+        monitor-recovered.json
+        high-priority.json
+      pagerduty/
+        incident-triggered.json
+        incident-resolved.json
+        incident-acknowledged.json
+      opsgenie/
+        alert-created.json
+        alert-closed.json
+      grafana/
+        single-firing.json
+        multi-firing.json
+        resolved.json
+    deploys/
+      github-actions-success.json
+      github-actions-failure.json
+      gitlab-ci-pipeline.json
+      argocd-sync.json
+    scenarios/
+      alert-storm-50-alerts.json
+      cascading-failure-3-services.json
+      flapping-alert-10-cycles.json
+      maintenance-window-suppression.json
+      deploy-correlated-incident.json
+    slack/
+      initial-alert-blocks.json
+      correlated-incident-blocks.json
+      weekly-digest-blocks.json
+    schemas/
+      canonical-alert.json
+      incident-record.json
+      tenant-config.json
+```
+
+### 9.2 Alert Payload Factory
+
+```typescript
+// tests/helpers/factories.ts
+export function makeCanonicalAlert(overrides: Partial<CanonicalAlert> = {}): CanonicalAlert {
+  return {
+    alert_id: ulid(),
+    tenant_id: overrides.tenant_id ?? 'test-tenant',
+    provider: overrides.provider ?? 'datadog',
+    service: overrides.service ?? 'test-service',
+    title: overrides.title ?? `Alert: ${faker.hacker.phrase()}`,
+    severity: overrides.severity ?? 'warning',
+    fingerprint: overrides.fingerprint ?? crypto.randomBytes(32).toString('hex'),
+    timestamp: overrides.timestamp ?? new Date().toISOString(),
+    raw_payload_s3_key: overrides.raw_payload_s3_key ?? `raw/${ulid()}.json`,
+    metadata: overrides.metadata ?? {},
+    ...overrides,
+  };
+}
+
+export function makeIncident(overrides: Partial<Incident> = {}): Incident {
+  const alertCount = overrides.alert_count ?? 5;
+  return {
+    incident_id: ulid(),
+    tenant_id: overrides.tenant_id ?? 'test-tenant',
+    services: overrides.services ?? ['test-service'],
+    alert_count: alertCount,
+    alerts: Array.from({ length: alertCount }, () => makeCanonicalAlert()),
+    noise_score: overrides.noise_score ?? 0,
+    deploy_correlation: overrides.deploy_correlation ?? null,
+    window_opened_at: overrides.window_opened_at ?? new Date().toISOString(),
+    window_closed_at: overrides.window_closed_at ?? new Date().toISOString(),
+    ...overrides,
+  };
+}
+
+export function makeDeployEvent(overrides: Partial<DeployEvent> = {}): DeployEvent {
+  return {
+    deploy_id: ulid(),
+    tenant_id: overrides.tenant_id ?? 'test-tenant',
+    service: overrides.service ?? 'test-service',
+    commit_sha: overrides.commit_sha ?? faker.git.commitSha(),
+    pr_title: overrides.pr_title ?? faker.git.commitMessage(),
+    deployed_at: overrides.deployed_at ?? new Date().toISOString(),
+    provider: overrides.provider ?? 'github-actions',
+    ...overrides,
+  };
+}
+```
+
+### 9.3 Noise Scenario Fixtures
+
+```typescript
+// tests/helpers/scenarios.ts
+export const NOISE_SCENARIOS = {
+  alertStorm: {
+    description: '50 alerts for same service in 2 minutes',
+    alerts: Array.from({ length: 50 }, (_, i) => makeCanonicalAlert({
+      service: 'payment-api',
+      title: `High latency variant ${i}`,
+      timestamp: new Date(Date.now() + i * 2400).toISOString(),
+    })),
+    expectedIncidents: 1,
+    expectedNoiseScore: { min: 70, max: 95 },
+  },
+
+  flappingAlert: {
+    description: 'Alert fires and resolves 10 times in 1 hour',
+    alerts: Array.from({ length: 20 }, (_, i) => makeCanonicalAlert({
+      service: 'health-check',
+      title: 'Health check failed',
+      severity: i % 2 === 0 ? 'warning' : 'info', // alternating fire/resolve
+      timestamp: new Date(Date.now() + i * 3 * 60 * 1000).toISOString(),
+    })),
+    expectedNoiseScore: { min: 80, max: 100 },
+  },
+
+  cascadingFailure: {
+    description: 'Database fails, then API, then frontend',
+    alerts: [
+      makeCanonicalAlert({ service: 'database', severity: 'critical', timestamp: t(0) }),
+      makeCanonicalAlert({ service: 'api', severity: 'high', timestamp: t(30) }),
+      makeCanonicalAlert({ service: 'api', severity: 'high', timestamp: t(45) }),
+      makeCanonicalAlert({ service: 'frontend', severity: 'medium', timestamp: t(60) }),
+      makeCanonicalAlert({ service: 'frontend', severity: 'medium', timestamp: t(90) }),
+    ],
+    serviceDependencies: [['api', 'database'], ['frontend', 'api']],
+    expectedIncidents: 1, // All merged via dependency graph
+    expectedNoiseScore: { min: 0, max: 30 }, // Real incident, not noise
+  },
+
+  deployCorrelated: {
+    description: 'Deploy followed by alert storm',
+    deploy: makeDeployEvent({ service: 'payment-api', pr_title: 'feat: add retry logic' }),
+    alerts: Array.from({ length: 8 }, () => makeCanonicalAlert({
+      service: 'payment-api',
+      severity: 'high',
+    })),
+    deployToAlertGapMs: 2 * 60 * 1000, // 2 minutes after deploy
+    expectedNoiseScore: { min: 50, max: 85 }, // Deploy correlation boosts noise score
+  },
+};
+```
+
+---
+
+## Section 10: TDD Implementation Order
+
+### 10.1 Bootstrap Sequence
+
+The test infrastructure itself must be built before any product code. This is the order:
+
+```
+Phase 0: Test Infrastructure (Week 0)
+  ├── 0.1 vitest config + TypeScript setup
+  ├── 0.2 Testcontainers helper (Redis, DynamoDB Local, TimescaleDB)
+  ├── 0.3 LocalStack helper (SQS, S3, API Gateway)
+  ├── 0.4 Fixture loader utility
+  ├── 0.5 Factory functions (makeCanonicalAlert, makeIncident, makeDeployEvent)
+  ├── 0.6 WireMock Slack stub
+  └── 0.7 CI pipeline with test stages
+```
+
+### 10.2 Epic-by-Epic TDD Order
+
+```
+Phase 1: Webhook Ingestion (Epic 1) — Tests First
+  ├── 1.1 RED: HMAC validator tests (all providers)
+  ├── 1.2 GREEN: Implement HMAC validation
+  ├── 1.3 RED: Datadog parser tests (single + batch)
+  ├── 1.4 GREEN: Implement Datadog parser
+  ├── 1.5 RED: PagerDuty parser tests
+  ├── 1.6 GREEN: Implement PagerDuty parser
+  ├── 1.7 RED: Fingerprint generator tests
+  ├── 1.8 GREEN: Implement fingerprinting
+  ├── 1.9 INTEGRATION: Lambda → SQS contract test
+  └── 1.10 REFACTOR: Extract provider parser interface
+
+Phase 2: Correlation Engine (Epic 2) — Tests First
+  ├── 2.1 RED: Time-window open/close/extend tests
+  ├── 2.2 GREEN: Implement window manager
+  ├── 2.3 RED: Service graph correlation tests
+  ├── 2.4 GREEN: Implement dependency traversal
+  ├── 2.5 RED: Deploy correlation tests
+  ├── 2.6 GREEN: Implement deploy tracker
+  ├── 2.7 INTEGRATION: Correlation → Redis window tests
+  ├── 2.8 INTEGRATION: Correlation → DynamoDB incident persistence
+  └── 2.9 INTEGRATION: Correlation → TimescaleDB trend writes
+
+Phase 3: Noise Analysis (Epic 3) — Tests First
+  ├── 3.1 RED: Rule-based noise scoring tests (all rules)
+  ├── 3.2 GREEN: Implement scorer
+  ├── 3.3 RED: Threshold classification tests
+  ├── 3.4 GREEN: Implement classifier
+  ├── 3.5 RED: "What would have happened" calculation tests
+  ├── 3.6 GREEN: Implement historical analysis
+  └── 3.7 REFACTOR: Extract scoring rules into configurable pipeline
+
+Phase 4: Notifications (Epic 4) — Integration Tests Lead
+  ├── 4.1 Implement Slack block formatter
+  ├── 4.2 RED: Snapshot tests for all message formats
+  ├── 4.3 INTEGRATION: Notification → Slack (WireMock)
+  ├── 4.4 RED: Rate limiting tests
+  └── 4.5 GREEN: Implement rate limiter
+
+Phase 5: Governance (Epic 10) — Tests First
+  ├── 5.1 RED: Strict/audit mode enforcement tests
+  ├── 5.2 GREEN: Implement policy engine
+  ├── 5.3 RED: Panic mode tests (<1s activation)
+  ├── 5.4 GREEN: Implement panic mode
+  ├── 5.5 RED: Circuit breaker + DLQ replay tests
+  ├── 5.6 GREEN: Implement circuit breaker
+  ├── 5.7 RED: OTEL span assertion tests
+  └── 5.8 GREEN: Instrument all components
+
+Phase 6: E2E Validation
+  ├── 6.1 60-second TTV journey
+  ├── 6.2 Alert storm correlation journey
+  ├── 6.3 Deploy correlation journey
+  ├── 6.4 Panic mode journey
+  └── 6.5 Performance benchmarks
+```
+
+### 10.3 "Never Ship Without" Checklist
+
+Before any release, these tests must pass:
+
+- [ ] All HMAC validation tests (security gate)
+- [ ] All correlation window tests (correctness gate)
+- [ ] All noise scoring tests (safety gate — never eat real alerts)
+- [ ] All governance policy tests (compliance gate)
+- [ ] Circuit breaker DLQ replay test (safety net gate)
+- [ ] 60-second TTV E2E journey (product promise gate)
+- [ ] PII protection span tests (privacy gate)
+- [ ] Schema migration lint (no breaking changes)
+- [ ] Coverage ≥80% overall, ≥90% on scoring engine
+
+---
+
+*End of dd0c/alert Test Architecture*
diff --git a/products/04-lightweight-idp/test-architecture/test-architecture.md b/products/04-lightweight-idp/test-architecture/test-architecture.md
index c7e0417..da06895 100644
--- a/products/04-lightweight-idp/test-architecture/test-architecture.md
+++ b/products/04-lightweight-idp/test-architecture/test-architecture.md
@@ -1,623 +1,1109 @@
 # dd0c/portal — Test Architecture & TDD Strategy
-**Product:** Lightweight Internal Developer Portal
-**Phase:** 6 — Architecture Design
-**Date:** 2026-02-28
-**Status:** Draft
+
+**Product:** dd0c/portal — Lightweight Internal Developer Platform
+**Author:** Test Architecture Phase
+**Date:** February 28, 2026
+**Status:** V1 MVP — Solo Founder Scope
 
 ---
 
-## 1. Testing Philosophy & TDD Workflow
+## Section 1: Testing Philosophy & TDD Workflow
 
-### Core Principle
+### 1.1 Core Philosophy
 
-dd0c/portal's most critical logic — ownership inference, discovery reconciliation, and confidence scoring — is pure algorithmic code with well-defined inputs and outputs. This is ideal TDD territory. The test suite is the specification.
+dd0c/portal is a **trust-critical catalog tool** — if auto-discovery assigns a service to the wrong team, or misses a service entirely, the platform loses credibility instantly. The >80% auto-discovery accuracy target from the party mode review is a hard gate, not a suggestion.
 
-The product's >80% discovery accuracy target is not a QA metric — it's a product promise. Tests enforce it continuously.
+Guiding principle: **tests validate what the platform engineer sees in the catalog**. Every test should map to a visible outcome — a service appearing, an ownership assignment, a scorecard grade.
 
-### Red-Green-Refactor Adapted to This Product
+### 1.2 Red-Green-Refactor Adapted to dd0c/portal
 
 ```
-RED   → Write a failing test that encodes a discovery heuristic or ownership rule
-GREEN → Write the minimum code to pass it (no clever abstractions yet)
-REFACTOR → Clean up once the rule is proven correct against real-world fixtures
+RED   → Write a failing test that describes the desired catalog state
+         (e.g., "after scanning an AWS account with 3 ECS services,
+          the catalog should contain 3 services with correct names")
+
+GREEN → Write the minimum code to make it pass
+
+REFACTOR → Extract the discovery logic, add confidence scoring,
+            optimize the scan parallelism
 ```
 
-**Adapted cycle for discovery heuristics:**
+**When to write tests first (strict TDD):**
+- All ownership inference logic (CODEOWNERS parsing, git blame weighting, signal merging)
+- All service reconciliation (AWS + GitHub cross-referencing)
+- All confidence scoring calculations
+- All governance policy enforcement (strict suggest-only vs. audit auto-mutate)
+- All phantom service quarantine logic
 
-1. Capture a real-world failure case (e.g., "Lambda functions named `payment-*` were not grouped into a service")
-2. Write a unit test encoding the expected grouping behavior using a fixture of that Lambda response
-3. Fix the heuristic
-4. Add the fixture to the regression suite permanently
+**When integration tests lead:**
+- AWS scanner (implement against LocalStack, then lock in contract tests)
+- GitHub GraphQL scanner (implement against recorded responses, then contract test)
+- Meilisearch indexing (build the index, then test search relevance)
 
-This means every production accuracy bug becomes a permanent test. The test suite grows as a living record of every edge case the discovery engine has encountered.
+**When E2E tests lead:**
+- 5-minute auto-discovery journey — define the expected catalog state, build backward
+- Cmd+K search experience — define expected search results, then build the index
 
-### When to Write Tests First vs. Integration Tests Lead
+### 1.3 Test Naming Conventions
 
-| Scenario | Approach | Rationale |
-|----------|----------|-----------|
-| Ownership scoring algorithm | Unit-first TDD | Pure function, deterministic, no I/O |
-| Discovery heuristics (CFN → service mapping) | Unit-first TDD | Deterministic logic over fixture data |
-| GitHub GraphQL query construction | Unit-first TDD | Query builder logic is pure |
-| AWS API pagination handling | Integration-first | Behavior depends on real API shape |
-| Meilisearch index sync | Integration-first | Depends on Meilisearch document model |
-| DynamoDB schema migrations | Integration-first | Requires real DynamoDB Local behavior |
-| WebSocket progress events | E2E-first | Requires full pipeline to be meaningful |
-| Stripe webhook handling | Integration-first | Depends on Stripe event payload shape |
+```python
+# Python unit tests (pytest) — AWS/GitHub scanners
+class TestAWSScanner:
+    def test_discovers_ecs_services_from_cluster_listing(self): ...
+    def test_groups_resources_by_cloudformation_stack_name(self): ...
+    def test_assigns_confidence_095_to_cfn_stack_services(self): ...
 
-### Test Naming Conventions
+class TestOwnershipInference:
+    def test_codeowners_signal_weighted_040(self): ...
+    def test_top_committer_signal_weighted_030(self): ...
+    def test_returns_ambiguous_when_top_scores_tied_under_050(self): ...
+```
 
-All tests follow the pattern: `[unit under test]_[scenario]_[expected outcome]`
-
-**TypeScript/Node.js (Jest):**
 ```typescript
-describe('OwnershipInferenceEngine', () => {
-  describe('scoreOwnership', () => {
-    it('returns_primary_owner_when_codeowners_present_with_high_confidence', () => {})
-    it('marks_service_unowned_when_top_score_below_threshold', () => {})
-    it('marks_service_ambiguous_when_top_two_scores_within_tolerance', () => {})
-  })
-})
-```
+// TypeScript tests (vitest) — API, frontend
+describe('CatalogAPI', () => {
+  it('returns services sorted by confidence score descending', () => {});
+  it('filters services by team ownership', () => {});
+});
 
-**Python (pytest):**
-```python
-class TestOwnershipScorer:
-    def test_codeowners_signal_weighted_highest_among_all_signals(self): ...
-    def test_git_blame_frequency_used_when_codeowners_absent(self): ...
-    def test_confidence_below_threshold_flags_service_as_unowned(self): ...
+describe('OwnershipInference', () => {
+  it('merges CODEOWNERS + git blame + PR reviewer signals', () => {});
+  it('flags service as ambiguous when confidence < 0.50', () => {});
+});
 ```
 
-**File naming:**
-- Unit tests: `*.test.ts` / `test_*.py` co-located with source
-- Integration tests: `*.integration.test.ts` / `test_*_integration.py` in `tests/integration/`
-- E2E tests: `tests/e2e/*.spec.ts` (Playwright)
-
 ---
 
-## 2. Test Pyramid
+## Section 2: Test Pyramid
 
-### Recommended Ratio: 70 / 20 / 10
+### 2.1 Ratio
 
-```
-         ┌─────────────┐
-         │   E2E / Smoke│  10%  (~30 tests)
-         │  (Playwright)│       Critical user journeys only
-         ├─────────────┤
-         │ Integration  │  20%  (~80 tests)
-         │  (real deps) │       Service boundaries, API contracts
-         ├─────────────┤
-         │    Unit      │  70%  (~280 tests)
-         │  (pure logic)│       All heuristics, scoring, parsing
-         └─────────────┘
-```
+| Level | Target | Count (V1) | Runtime |
+|-------|--------|------------|---------|
+| Unit | 70% | ~300 tests | <30s |
+| Integration | 20% | ~85 tests | <5min |
+| E2E/Smoke | 10% | ~15 tests | <10min |
 
-### Unit Test Targets (per component)
+### 2.2 Unit Test Targets
 
-| Component | Language | Test Framework | Target Coverage |
-|-----------|----------|---------------|----------------|
-| AWS Scanner (heuristics) | Python | pytest | 90% |
-| GitHub Scanner (parsers) | Node.js | Jest | 90% |
-| Reconciliation Engine | Node.js | Jest | 85% |
-| Ownership Inference | Python | pytest | 95% |
-| Portal API (route handlers) | Node.js | Jest + Supertest | 80% |
-| Search proxy + cache logic | Node.js | Jest | 85% |
-| Slack Bot command handlers | Node.js | Jest | 80% |
-| Feature flag evaluation | Node.js/Python | Jest/pytest | 95% |
-| Governance policy engine | Node.js | Jest | 95% |
-| Schema migration validators | Node.js | Jest | 100% |
+| Component | Key Behaviors | Est. Tests |
+|-----------|--------------|------------|
+| AWS Scanner (CloudFormation, ECS, Lambda, RDS) | Resource enumeration, tag extraction, service grouping | 50 |
+| GitHub Scanner (repos, CODEOWNERS, workflows) | GraphQL parsing, CODEOWNERS parsing, CI/CD target extraction | 40 |
+| Reconciliation Engine | AWS↔GitHub cross-reference, confidence scoring, dedup | 35 |
+| Ownership Inference | Signal weighting, ambiguity detection, team resolution | 40 |
+| Catalog API | CRUD, search, filtering, pagination | 30 |
+| Governance Policy | Strict/audit modes, panic mode, per-team overrides | 25 |
+| Feature Flags | Phantom quarantine circuit breaker, flag lifecycle | 15 |
+| Scorecard Engine (V1 basic) | Criteria evaluation, grade calculation | 20 |
+| Template Engine | Service template generation from catalog data | 15 |
+| Slack Bot | Command parsing, response formatting | 30 |
 
-### Integration Test Boundaries
+### 2.3 Integration Test Boundaries
 
-| Boundary | What to Test | Tool |
-|----------|-------------|------|
-| Discovery → GitHub API | GraphQL query shape, pagination, rate limit handling | MSW (mock service worker) or nock |
-| Discovery → AWS APIs | boto3 call sequences, pagination, error handling | moto (AWS mock library) |
-| Reconciler → PostgreSQL | Upsert logic, conflict resolution, RLS enforcement | Testcontainers (PostgreSQL) |
-| Inference → PostgreSQL | Ownership write, confidence update, correction propagation | Testcontainers (PostgreSQL) |
-| API → Meilisearch | Index sync, search query construction, tenant filter injection | Meilisearch test instance (Docker) |
-| API → Redis | Cache set/get/invalidation, TTL behavior | ioredis-mock or Testcontainers (Redis) |
-| Slack Bot → Portal API | Command → search → format response | Supertest against local API |
-| Stripe webhook → API | Subscription activation, plan change, cancellation | Stripe CLI webhook forwarding |
+| Boundary | What's Tested | Infrastructure |
+|----------|--------------|----------------|
+| AWS Scanner → AWS APIs | STS assume role, CloudFormation, ECS, Lambda, RDS listing | LocalStack |
+| GitHub Scanner → GitHub API | GraphQL queries, rate limiting, pagination | WireMock (recorded responses) |
+| Reconciler → PostgreSQL | Service upsert, ownership writes, conflict resolution | Testcontainers PostgreSQL |
+| API → PostgreSQL | Catalog queries, tenant isolation, search | Testcontainers PostgreSQL |
+| API → Meilisearch | Index sync, full-text search, faceted filtering | Testcontainers Meilisearch |
+| API → Redis | Session management, cache invalidation, rate limiting | Testcontainers Redis |
+| Slack Bot → Slack API | Command handling, block formatting | WireMock |
+| Step Functions → Lambdas | Discovery orchestration flow | LocalStack |
 
-### E2E / Smoke Test Scenarios
+### 2.4 E2E/Smoke Scenarios
 
-1. Full onboarding: GitHub OAuth → AWS connection → discovery trigger → catalog populated
-2. Cmd+K search returns results in <200ms after discovery
-3. Ownership correction propagates to similar services
-4. Slack `/dd0c who owns` returns correct owner
-5. Discovery accuracy: synthetic org with known ground truth scores >80%
-6. Governance strict mode: discovery populates pending queue, not catalog directly
-7. Panic mode: all catalog writes return 503
+1. **5-Minute Miracle**: Connect AWS + GitHub → auto-discover services → catalog populated with >80% accuracy
+2. **Cmd+K Search**: Type service name → results appear in <200ms with correct ranking
+3. **Ownership Assignment**: Discover services → infer ownership → correct team assigned
+4. **Phantom Quarantine**: Bad discovery rule → phantom services quarantined, not added to catalog
+5. **Panic Mode**: Enable panic → all discovery halts → catalog frozen read-only
 
 ---
 
-## 3. Unit Test Strategy (Per Component)
+## Section 3: Unit Test Strategy
 
-### 3.1 AWS Scanner (Python / pytest)
-
-**What to test:**
-- Resource-to-service grouping heuristics (the core logic)
-- Confidence score assignment per signal type
-- Pagination handling for each AWS API
-- Cross-region scan aggregation
-- Error handling for throttling, missing permissions, empty accounts
-
-**Key test cases:**
+### 3.1 AWS Scanner
 
 ```python
-# tests/unit/test_cfn_scanner.py
+# tests/unit/scanners/test_aws_scanner.py
 
 class TestCloudFormationScanner:
-    def test_stack_name_becomes_service_name_with_high_confidence(self):
-        # Given a CFN stack named "payment-api"
-        # Expect service entity with name="payment-api", confidence=0.95
+    def test_lists_all_stacks_with_pagination(self): ...
+    def test_extracts_service_name_from_stack_name(self): ...
+    def test_maps_stack_resources_to_service_components(self): ...
+    def test_assigns_confidence_095_to_cfn_discovered_services(self): ...
+    def test_handles_deleted_stacks_gracefully(self): ...
+    def test_extracts_service_team_project_tags(self): ...
 
-    def test_stack_tags_extracted_as_service_metadata(self):
-        # Given stack with tags {"service": "payment", "team": "payments"}
-        # Expect service.metadata includes both tags
-
-    def test_stacks_in_multiple_regions_deduplicated_by_name(self):
-        # Given same stack name in us-east-1 and us-west-2
-        # Expect single service entity with both regions in infrastructure
-
-    def test_deleted_stacks_excluded_from_results(self):
-        # Given stack with status DELETE_COMPLETE
-        # Expect it is not included in discovered services
-
-    def test_pagination_fetches_all_stacks_beyond_first_page(self):
-        # Given mock returning 2 pages of stacks
-        # Expect all stacks from both pages are processed
+class TestECSScanner:
+    def test_lists_all_clusters_and_services(self): ...
+    def test_extracts_container_image_from_task_definition(self): ...
+    def test_maps_ecs_service_to_cfn_stack_when_tagged(self): ...
+    def test_standalone_ecs_service_without_cfn_gets_confidence_070(self): ...
+    def test_handles_empty_cluster_without_error(self): ...
 
 class TestLambdaScanner:
-    def test_lambdas_with_shared_prefix_grouped_into_single_service(self):
-        # Given ["payment-webhook", "payment-processor", "payment-refund"]
-        # Expect single service "payment" with confidence=0.60
+    def test_lists_all_functions_with_pagination(self): ...
+    def test_extracts_api_gateway_event_source_mapping(self): ...
+    def test_links_lambda_to_api_gateway_route(self): ...
+    def test_standalone_lambda_without_trigger_still_discovered(self): ...
 
-    def test_lambda_with_apigw_trigger_gets_higher_confidence(self):
-        # Given Lambda with API Gateway event source mapping
-        # Expect confidence=0.85 (not 0.60)
+class TestRDSScanner:
+    def test_lists_rds_instances_with_tags(self): ...
+    def test_maps_database_to_service_by_naming_prefix(self): ...
+    def test_maps_database_to_service_by_cfn_stack_membership(self): ...
+    def test_marks_rds_as_infrastructure_not_service(self): ...
 
-    def test_standalone_lambda_without_prefix_pattern_kept_as_individual(self):
-        # Given Lambda named "data-export-job" with no siblings
-        # Expect individual service entity, not grouped
-
-class TestServiceGroupingHeuristics:
-    def test_cfn_stack_takes_priority_over_ecs_service_for_same_name(self):
-        # Given CFN stack "payment-api" AND ECS service "payment-api"
-        # Expect single service entity (not duplicate), source=cloudformation
-
-    def test_explicit_github_repo_tag_overrides_name_matching(self):
-        # Given AWS resource with tag github_repo="acme/payments-v2"
-        # Expect repo_link="acme/payments-v2" with confidence=0.95
-        # (not fuzzy name match result)
+class TestSTSRoleAssumption:
+    def test_assumes_cross_account_role_with_external_id(self): ...
+    def test_raises_clear_error_on_role_not_found(self): ...
+    def test_raises_clear_error_on_invalid_external_id(self): ...
+    def test_caches_credentials_until_expiry(self): ...
 ```
 
-**Mocking strategy:**
-- Use `moto` to mock all boto3 calls — no real AWS calls in unit tests
-- Fixture files in `tests/fixtures/aws/` contain realistic API response payloads
-- Each fixture named after the scenario: `cfn_stacks_multi_region.json`, `lambda_functions_with_apigw.json`
+**Mocking strategy:** `moto` library for AWS API mocking in unit tests. LocalStack for integration tests.
+
+### 3.2 GitHub Scanner
 
 ```python
-@pytest.fixture
-def mock_aws(aws_credentials):
-    with mock_cloudformation(), mock_ecs(), mock_lambda_():
-        yield
+# tests/unit/scanners/test_github_scanner.py
 
-def test_full_scan_produces_expected_service_count(mock_aws, cfn_fixture):
-    setup_mock_cfn_stacks(cfn_fixture)
-    result = AWSScanner(tenant_id="test", role_arn="arn:aws:iam::123:role/test").scan()
-    assert len(result.services) == cfn_fixture["expected_service_count"]
+class TestRepoScanner:
+    def test_lists_active_non_archived_non_forked_repos(self): ...
+    def test_extracts_primary_language(self): ...
+    def test_extracts_top_5_committers(self): ...
+    def test_batches_graphql_queries_at_100_repos_per_call(self): ...
+    def test_handles_rate_limit_with_retry_after(self): ...
+    def test_paginates_through_large_orgs(self): ...
+
+class TestCodeownersParser:
+    def test_parses_team_ownership_from_codeowners(self): ...
+    def test_handles_wildcard_pattern_matching(self): ...
+    def test_handles_multiple_owners_per_path(self): ...
+    def test_returns_empty_when_codeowners_missing(self): ...
+    def test_handles_comment_lines_and_blank_lines(self): ...
+    def test_resolves_github_team_to_display_name(self): ...
+
+class TestWorkflowParser:
+    def test_extracts_ecs_deploy_action_target(self): ...
+    def test_extracts_lambda_deploy_action_target(self): ...
+    def test_links_repo_to_aws_service_by_task_definition_name(self): ...
+    def test_handles_matrix_strategy_with_multiple_targets(self): ...
+    def test_ignores_non_deploy_workflows(self): ...
+
+class TestReadmeExtractor:
+    def test_extracts_first_descriptive_paragraph(self): ...
+    def test_skips_badges_and_header_images(self): ...
+    def test_returns_empty_for_missing_readme(self): ...
+    def test_truncates_at_500_characters(self): ...
 ```
 
----
+**Mocking strategy:** Recorded GraphQL responses in `fixtures/github/`. Use `responses` library for HTTP mocking.
 
-### 3.2 GitHub Scanner (Node.js / Jest)
-
-**What to test:**
-- GraphQL query construction and batching
-- CODEOWNERS file parsing (all valid formats)
-- README first-paragraph extraction
-- Deploy workflow target extraction
-- Rate limit detection and backoff
-
-**Key test cases:**
-
-```typescript
-// tests/unit/github-scanner/codeowners-parser.test.ts
-
-describe('CODEOWNERSParser', () => {
-  it('parses_simple_wildcard_ownership_to_team', () => {
-    const input = '* @acme/platform-team'
-    expect(parse(input)).toEqual([{ pattern: '*', owners: ['@acme/platform-team'] }])
-  })
-
-  it('parses_path_specific_ownership', () => {
-    const input = '/src/payments/ @acme/payments-team'
-    expect(parse(input)).toEqual([{ pattern: '/src/payments/', owners: ['@acme/payments-team'] }])
-  })
-
-  it('handles_multiple_owners_per_pattern', () => {
-    const input = '*.ts @acme/frontend @acme/platform'
-    expect(parse(input).owners).toHaveLength(2)
-  })
-
-  it('ignores_comment_lines', () => {
-    const input = '# This is a comment\n* @acme/team'
-    expect(parse(input)).toHaveLength(1)
-  })
-
-  it('returns_empty_array_for_missing_codeowners_file', () => {
-    expect(parse(null)).toEqual([])
-  })
-
-  it('handles_individual_user_ownership_not_just_teams', () => {
-    const input = '* @sarah-chen'
-    expect(parse(input)[0].owners[0]).toBe('@sarah-chen')
-  })
-})
-
-describe('READMEExtractor', () => {
-  it('extracts_first_non_heading_non_badge_paragraph', () => {
-    const readme = `# Payment Gateway\n\n![build](badge.svg)\n\nHandles Stripe checkout flows.`
-    expect(extractDescription(readme)).toBe('Handles Stripe checkout flows.')
-  })
-
-  it('returns_null_when_readme_has_only_headings_and_badges', () => {
-    const readme = `# Title\n\n![badge](url)`
-    expect(extractDescription(readme)).toBeNull()
-  })
-})
-
-describe('WorkflowTargetExtractor', () => {
-  it('extracts_ecs_service_name_from_deploy_workflow', () => {
-    const yaml = loadFixture('deploy-workflow-ecs.yml')
-    expect(extractDeployTarget(yaml)).toEqual({
-      type: 'ecs_service',
-      name: 'payment-api',
-      cluster: 'production'
-    })
-  })
-
-  it('extracts_lambda_function_name_from_serverless_deploy', () => {
-    const yaml = loadFixture('deploy-workflow-lambda.yml')
-    expect(extractDeployTarget(yaml)).toEqual({
-      type: 'lambda_function',
-      name: 'payment-webhook-handler'
-    })
-  })
-})
-```
-
-**Mocking strategy:**
-- Use `nock` or `msw` to intercept GitHub GraphQL API calls
-- Fixture files in `tests/fixtures/github/` for realistic API responses
-- Test the GraphQL query builder separately from the HTTP client
-
----
-
-### 3.3 Reconciliation Engine (Node.js / Jest)
-
-**What to test:**
-- Cross-referencing AWS resources with GitHub repos (all 5 matching rules)
-- Deduplication when multiple signals point to the same service
-- Conflict resolution when signals disagree
-- Batch processing of SQS messages
-
-**Key test cases:**
-
-```typescript
-describe('ReconciliationEngine', () => {
-  describe('matchAWSToGitHub', () => {
-    it('explicit_tag_match_takes_highest_priority', () => {
-      const awsService = buildAWSService({ tags: { github_repo: 'acme/payment-gateway' } })
-      const ghRepo = buildGHRepo({ name: 'payment-gateway', org: 'acme' })
-      const result = reconcile([awsService], [ghRepo])
-      expect(result[0].repoLinkSource).toBe('explicit_tag')
-      expect(result[0].repoLinkConfidence).toBe(0.95)
-    })
-
-    it('deploy_workflow_match_used_when_no_explicit_tag', () => {
-      const awsService = buildAWSService({ name: 'payment-api' })
-      const ghRepo = buildGHRepo({ deployTarget: 'payment-api' })
-      const result = reconcile([awsService], [ghRepo])
-      expect(result[0].repoLinkSource).toBe('deploy_workflow')
-    })
-
-    it('fuzzy_name_match_used_as_fallback', () => {
-      const awsService = buildAWSService({ name: 'payment-service' })
-      const ghRepo = buildGHRepo({ name: 'payment-svc' })
-      const result = reconcile([awsService], [ghRepo])
-      expect(result[0].repoLinkSource).toBe('name_match')
-      expect(result[0].repoLinkConfidence).toBe(0.75)
-    })
-
-    it('no_match_produces_aws_only_service_entity', () => {
-      const awsService = buildAWSService({ name: 'legacy-monolith' })
-      const result = reconcile([awsService], [])
-      expect(result[0].repoUrl).toBeNull()
-      expect(result[0].discoverySources).toContain('cloudformation')
-      expect(result[0].discoverySources).not.toContain('github_repo')
-    })
-
-    it('deduplicates_cfn_stack_and_ecs_service_with_same_name', () => {
-      const cfnService = buildAWSService({ source: 'cloudformation', name: 'payment-api' })
-      const ecsService = buildAWSService({ source: 'ecs_service', name: 'payment-api' })
-      const result = reconcile([cfnService, ecsService], [])
-      expect(result).toHaveLength(1)
-      expect(result[0].discoverySources).toContain('cloudformation')
-      expect(result[0].discoverySources).toContain('ecs_service')
-    })
-  })
-})
-```
-
----
-
-### 3.4 Ownership Inference Engine (Python / pytest)
-
-This is the highest-value unit test target. Ownership inference is the most complex logic and the most likely source of accuracy failures.
-
-**Key test cases:**
+### 3.3 Reconciliation Engine
 
 ```python
-class TestOwnershipScorer:
-    def test_codeowners_weighted_highest_at_0_40(self):
-        signals = [Signal(type='codeowners', team='payments', raw_score=1.0)]
-        result = score_ownership(signals)
-        assert result['payments'].weighted_score == pytest.approx(0.40)
+# tests/unit/test_reconciler.py
 
-    def test_multiple_signals_summed_correctly(self):
-        signals = [
-            Signal(type='codeowners', team='payments', raw_score=1.0),      # 0.40
-            Signal(type='cfn_tag', team='payments', raw_score=1.0),          # 0.20
-            Signal(type='git_blame_frequency', team='payments', raw_score=1.0), # 0.25
-        ]
-        result = score_ownership(signals)
-        assert result['payments'].total_score == pytest.approx(0.85)
-
-    def test_primary_owner_is_highest_scoring_team(self):
-        signals = [
-            Signal(type='codeowners', team='payments', raw_score=1.0),
-            Signal(type='git_blame_frequency', team='platform', raw_score=1.0),
-        ]
-        result = score_ownership(signals)
-        assert result.primary_owner == 'payments'
-
-    def test_service_marked_unowned_when_top_score_below_0_50(self):
-        signals = [Signal(type='git_blame_frequency', team='unknown', raw_score=0.3)]
-        result = score_ownership(signals)
-        assert result.status == 'unowned'
-
-    def test_service_marked_ambiguous_when_top_two_within_0_10(self):
-        signals = [
-            Signal(type='codeowners', team='payments', raw_score=0.8),
-            Signal(type='codeowners', team='platform', raw_score=0.75),
-        ]
-        result = score_ownership(signals)
-        assert result.status == 'ambiguous'
-
-    def test_user_correction_overrides_all_inference_with_score_1_00(self):
-        signals = [
-            Signal(type='codeowners', team='payments', raw_score=1.0),
-            Signal(type='user_correction', team='platform', raw_score=1.0),
-        ]
-        result = score_ownership(signals)
-        assert result.primary_owner == 'platform'
-        assert result.primary_confidence == 1.00
-        assert result.primary_source == 'user_correction'
-
-    def test_correction_propagation_applies_to_matching_repo_prefix(self):
-        correction = Correction(repo='payment-gateway', team='payments')
-        candidates = ['payment-processor', 'payment-webhook', 'auth-service']
-        propagated = propagate_correction(correction, candidates)
-        assert 'payment-processor' in propagated
-        assert 'payment-webhook' in propagated
-        assert 'auth-service' not in propagated
+class TestReconciler:
+    def test_matches_github_repo_to_aws_service_by_deploy_target(self): ...
+    def test_matches_github_repo_to_aws_service_by_naming_convention(self): ...
+    def test_merges_aws_and_github_metadata_into_single_service(self): ...
+    def test_deduplicates_services_discovered_from_multiple_sources(self): ...
+    def test_assigns_higher_confidence_when_both_sources_agree(self): ...
+    def test_creates_separate_services_when_no_cross_reference_found(self): ...
+    def test_preserves_manual_overrides_during_rescan(self): ...
+    def test_marks_previously_discovered_service_as_stale_when_missing(self): ...
 ```
 
----
+### 3.4 Ownership Inference
 
-### 3.5 Portal API — Route Handlers (Node.js / Jest + Supertest)
+The highest-risk logic in the product. Exhaustive testing required.
 
-**What to test:**
-- Tenant isolation enforcement (tenant_id injected into every query)
-- Search endpoint proxies to Meilisearch with mandatory tenant filter
-- PATCH /services enforces correction logging
-- Auth middleware rejects unauthenticated requests
+```python
+# tests/unit/test_ownership_inference.py
+
+class TestOwnershipInference:
+    # Signal weighting
+    def test_codeowners_signal_weighted_040(self): ...
+    def test_top_committer_signal_weighted_030(self): ...
+    def test_pr_reviewer_signal_weighted_020(self): ...
+    def test_aws_tag_signal_weighted_010(self): ...
+
+    # Confidence calculation
+    def test_single_strong_signal_produces_moderate_confidence(self): ...
+    def test_multiple_agreeing_signals_produce_high_confidence(self): ...
+    def test_conflicting_signals_produce_low_confidence(self): ...
+    def test_returns_ambiguous_when_top_scores_tied(self): ...
+    def test_returns_ambiguous_when_confidence_under_050(self): ...
+    def test_flags_unowned_when_no_signals_found(self): ...
+
+    # Edge cases
+    def test_handles_individual_owner_not_in_any_team(self): ...
+    def test_handles_deleted_github_team(self): ...
+    def test_handles_repo_with_single_committer(self): ...
+    def test_handles_repo_with_no_codeowners_file(self): ...
+    def test_manual_override_always_wins_regardless_of_signals(self): ...
+
+    # Table-driven: signal combinations
+    @pytest.mark.parametrize("signals,expected_team,expected_confidence", [
+        ({"codeowners": "team-a", "committers": "team-a", "reviewers": "team-a"}, "team-a", 0.90),
+        ({"codeowners": "team-a", "committers": "team-b", "reviewers": "team-a"}, "team-a", 0.60),
+        ({"codeowners": None, "committers": "team-b", "reviewers": "team-b"}, "team-b", 0.50),
+        ({"codeowners": "team-a", "committers": "team-b", "reviewers": "team-c"}, None, None),  # ambiguous
+    ])
+    def test_signal_combination_produces_expected_ownership(self, signals, expected_team, expected_confidence): ...
+```
+
+### 3.5 Catalog API
 
 ```typescript
-describe('GET /api/v1/services/search', () => {
-  it('injects_tenant_id_filter_into_meilisearch_query', async () => {
-    const spy = jest.spyOn(meilisearchClient, 'search')
-    await request(app).get('/api/v1/services/search?q=payment').set('Authorization', `Bearer ${tenantAToken}`)
-    expect(spy).toHaveBeenCalledWith(expect.objectContaining({
-      filter: expect.stringContaining(`tenant_id = '${TENANT_A_ID}'`)
-    }))
-  })
+// tests/unit/api/catalog.test.ts
+describe('CatalogAPI', () => {
+  describe('Service CRUD', () => {
+    it('creates service with all required fields', () => {});
+    it('returns 404 for non-existent service', () => {});
+    it('updates service metadata without overwriting ownership', () => {});
+    it('soft-deletes service (marks stale, does not remove)', () => {});
+  });
 
-  it('returns_401_when_no_auth_token_provided', async () => {
-    const res = await request(app).get('/api/v1/services/search?q=payment')
-    expect(res.status).toBe(401)
-  })
+  describe('Search & Filtering', () => {
+    it('returns services sorted by confidence descending', () => {});
+    it('filters by team ownership', () => {});
+    it('filters by language', () => {});
+    it('filters by discovery source (aws/github/manual)', () => {});
+    it('paginates with cursor-based pagination', () => {});
+  });
 
-  it('tenant_a_cannot_see_tenant_b_services', async () => {
-    // Seed Meilisearch with services for both tenants
-    // Query as tenant A, assert no tenant B results
-  })
-})
-
-describe('PATCH /api/v1/services/:id', () => {
-  it('stores_correction_in_corrections_table', async () => {
-    await request(app)
-      .patch(`/api/v1/services/${SERVICE_ID}`)
-      .send({ team_id: NEW_TEAM_ID })
-      .set('Authorization', `Bearer ${adminToken}`)
-    const correction = await db.corrections.findFirst({ where: { service_id: SERVICE_ID } })
-    expect(correction).toBeDefined()
-    expect(correction.new_value).toMatchObject({ team_id: NEW_TEAM_ID })
-  })
-
-  it('sets_confidence_to_1_00_on_user_correction', async () => {
-    await request(app).patch(`/api/v1/services/${SERVICE_ID}`).send({ team_id: NEW_TEAM_ID })
-    const ownership = await db.service_ownership.findFirst({ where: { service_id: SERVICE_ID } })
-    expect(ownership.confidence).toBe(1.00)
-    expect(ownership.source).toBe('user_correction')
-  })
-})
+  describe('Tenant Isolation', () => {
+    it('never returns services from another tenant', () => {});
+    it('enforces tenant_id on all queries', () => {});
+  });
+});
 ```
 
-### 3.6 Slack Bot Command Handlers (Node.js / Jest)
+### 3.6 Governance Policy Engine
 
-**What to test:**
-- Command parsing (`/dd0c who owns <service>`)
-- Typo tolerance matching logic (delegated to search, but bot needs to handle 0 results)
-- Block kit message formatting
-- Error handling (unauthorized workspace, missing service)
+```typescript
+describe('GovernancePolicy', () => {
+  describe('Mode Enforcement', () => {
+    it('strict mode: discovery populates pending review queue', () => {});
+    it('strict mode: never auto-mutates catalog', () => {});
+    it('audit mode: auto-applies discoveries with logging', () => {});
+    it('defaults new tenants to strict mode', () => {});
+  });
 
-### 3.7 Feature Flags & Governance Policy (Node.js / Jest)
+  describe('Panic Mode', () => {
+    it('halts all discovery scans when panic=true', () => {});
+    it('freezes catalog as read-only', () => {});
+    it('API returns 503 for write operations during panic', () => {});
+    it('shows maintenance banner in API response headers', () => {});
+  });
 
-**What to test:**
-- Flag evaluation (`openfeature` provider)
-- Governance strict vs. audit mode
-- Panic mode blocking writes
+  describe('Per-Team Override', () => {
+    it('team can lock services to strict even when system is audit', () => {});
+    it('team cannot downgrade from system strict to audit', () => {});
+    it('merge logic: max_restrictive(system, team)', () => {});
+  });
+});
+```
+
+### 3.7 Feature Flag Circuit Breaker
+
+```typescript
+describe('PhantomQuarantineBreaker', () => {
+  it('allows service creation when discovery rate is normal', () => {});
+  it('trips breaker when >5 unconfirmed services created in single scan', () => {});
+  it('quarantines phantom services instead of deleting them', () => {});
+  it('auto-disables the discovery flag when breaker trips', () => {});
+  it('quarantined services have status=quarantined, not active', () => {});
+  it('quarantined services visible in admin review queue', () => {});
+});
+```
+
+### 3.8 Slack Bot
+
+```typescript
+describe('SlackBot', () => {
+  describe('Command Parsing', () => {
+    it('parses /portal search <query> command', () => {});
+    it('parses /portal service <name> command', () => {});
+    it('parses /portal owner <service> command', () => {});
+    it('returns help text for unknown commands', () => {});
+  });
+
+  describe('Response Formatting', () => {
+    it('formats service card with name, team, language, links', () => {});
+    it('formats search results as compact list (max 10)', () => {});
+    it('formats ownership info with confidence badge', () => {});
+    it('includes "View in Portal" button link', () => {});
+  });
+});
+```
 
 ---
 
-## 4. Integration Test Strategy
+## Section 4: Integration Test Strategy
 
-Integration tests verify that our code correctly interacts with external boundaries: databases, caches, search indices, and third-party APIs. 
+### 4.1 AWS Scanner → LocalStack
 
-### 4.1 Service Boundary Tests
-- **Discovery ↔ GitHub/GitLab:** Use `nock` or `MSW` to mock the GitHub GraphQL endpoint. Assert that the Node.js scanner constructs the correct query and handles rate limits (HTTP 403/429) via retries.
-- **Catalog ↔ PostgreSQL:** Use Testcontainers for PostgreSQL to verify complex `upsert` queries, foreign key constraints, and RLS (Row-Level Security) tenant isolation.
-- **API ↔ Meilisearch:** Use a Meilisearch Docker container. Assert that document syncing (PostgreSQL -> SQS -> Meilisearch) completes and search queries with `tenant_id` filters return the expected subset of data.
+```python
+# tests/integration/scanners/test_aws_integration.py
 
-### 4.2 Git Provider API Contract Tests
-- Write scheduled "contract tests" that run against the *live* GitHub API daily using a dedicated test org.
-- These detect if GitHub changes their GraphQL schema or rate limit behavior.
-- Assert that `HEAD:CODEOWNERS` blob extraction still works.
+class TestAWSIntegration:
+    @pytest.fixture(autouse=True)
+    def setup_localstack(self, localstack_endpoint):
+        """Create test resources in LocalStack."""
+        self.cfn = boto3.client('cloudformation', endpoint_url=localstack_endpoint)
+        self.ecs = boto3.client('ecs', endpoint_url=localstack_endpoint)
+        # Create test stacks, clusters, services, lambdas
+        self.cfn.create_stack(StackName='payment-api', TemplateBody=MINIMAL_TEMPLATE)
+        self.ecs.create_cluster(clusterName='prod')
+        self.ecs.create_service(cluster='prod', serviceName='payment-api', ...)
 
-### 4.3 Testcontainers for Local Infrastructure
-- **Database:** `testcontainers-node` spinning up `postgres:15-alpine`.
-- **Search:** `getmeili/meilisearch:latest`.
-- **Cache:** `redis:7-alpine`.
-- Run these in GitHub Actions via Docker-in-Docker.
+    def test_full_aws_scan_discovers_all_resource_types(self): ...
+    def test_scan_groups_resources_by_cfn_stack(self): ...
+    def test_scan_handles_cross_region_resources(self): ...
+    def test_scan_respects_api_rate_limits(self): ...
+    def test_scan_completes_within_60_seconds_for_50_resources(self): ...
+```
+
+### 4.2 GitHub Scanner → WireMock
+
+```python
+# tests/integration/scanners/test_github_integration.py
+
+class TestGitHubIntegration:
+    @pytest.fixture(autouse=True)
+    def setup_wiremock(self, wiremock_url):
+        """Load recorded GitHub GraphQL responses."""
+        # Stub: POST /graphql → recorded response with 10 repos
+        wiremock.stub_for(post('/graphql').will_return(
+            json_response(load_fixture('github/org-repos-page1.json'))
+        ))
+
+    def test_full_github_scan_discovers_repos_with_metadata(self): ...
+    def test_scan_extracts_codeowners_for_each_repo(self): ...
+    def test_scan_extracts_deploy_workflows(self): ...
+    def test_scan_handles_graphql_rate_limit_with_retry(self): ...
+    def test_scan_paginates_through_100_plus_repos(self): ...
+```
+
+### 4.3 Reconciler → PostgreSQL
+
+```python
+# tests/integration/test_reconciler_db.py
+
+class TestReconcilerDB:
+    @pytest.fixture(autouse=True)
+    def setup_db(self, pg_container):
+        """Run migrations against Testcontainers PostgreSQL."""
+        run_migrations(pg_container.get_connection_url())
+
+    def test_upserts_discovered_service_without_duplicates(self): ...
+    def test_preserves_manual_ownership_override_on_rescan(self): ...
+    def test_marks_missing_services_as_stale(self): ...
+    def test_tenant_isolation_enforced_at_db_level(self): ...
+    def test_concurrent_scans_for_different_tenants_dont_conflict(self): ...
+```
+
+### 4.4 API → Meilisearch
+
+```typescript
+// tests/integration/search/meilisearch.test.ts
+describe('Meilisearch Integration', () => {
+  let meili: StartedTestContainer;
+
+  beforeAll(async () => {
+    meili = await new GenericContainer('getmeili/meilisearch:v1')
+      .withExposedPorts(7700)
+      .start();
+    // Index test services
+    await indexServices(testCatalog);
+  });
+
+  it('returns relevant results for service name search', async () => {
+    const results = await search('payment');
+    expect(results[0].name).toContain('payment');
+  });
+
+  it('returns results within 200ms for 1000-service catalog', async () => {
+    await indexServices(generate1000Services());
+    const start = performance.now();
+    await search('api');
+    expect(performance.now() - start).toBeLessThan(200);
+  });
+
+  it('supports faceted filtering by team and language', async () => {
+    const results = await search('', { filters: { team: 'platform', language: 'TypeScript' } });
+    expect(results.every(r => r.team === 'platform')).toBe(true);
+  });
+});
+```
+
+### 4.5 Step Functions → Lambda Orchestration (LocalStack)
+
+```python
+# tests/integration/test_discovery_orchestration.py
+
+class TestDiscoveryOrchestration:
+    def test_step_function_executes_aws_then_github_then_reconcile(self): ...
+    def test_step_function_retries_failed_scanner_once(self): ...
+    def test_step_function_completes_within_5_minutes(self): ...
+    def test_step_function_sends_completion_event_to_sqs(self): ...
+```
 
 ---
 
-## 5. E2E & Smoke Tests
+## Section 5: E2E & Smoke Tests
 
-E2E tests treat the system as a black box, interacting only through the API and the React UI. We keep these fast and focused on the "5-Minute Miracle" critical path.
+### 5.1 The 5-Minute Miracle
 
-### 5.1 Critical User Journeys (Playwright)
-1. **The Onboarding Flow:** Mock GitHub OAuth login -> Connect AWS (mock CFN role ARN validation) -> Trigger Discovery -> Wait for WebSocket completion -> Verify 147 services appear in catalog.
-2. **Cmd+K Search:** Open modal (`Cmd+K`) -> type "pay" -> assert "payment-gateway" is highlighted in < 200ms -> press Enter -> assert service detail card opens.
-3. **Correcting Ownership:** Open service detail -> Click "Correct Owner" -> select new team -> assert badge changes to 100% confidence -> assert Meilisearch is updated.
+```typescript
+// tests/e2e/journeys/five-minute-miracle.test.ts
+describe('5-Minute Auto-Discovery', () => {
+  it('discovers >80% of services from AWS + GitHub within 5 minutes', async () => {
+    // Setup: LocalStack with 20 known services, WireMock GitHub with 15 repos
+    const knownServices = await setupTestInfrastructure(20);
+    const knownRepos = await setupTestGitHub(15);
 
-### 5.2 The >80% Auto-Discovery Accuracy Validation
-- **The "Party Mode" Org:** Maintain a real GitHub org and a mock AWS environment with exactly 100 known services, 10 known teams, and specific chaotic naming conventions.
-- **The Assertion:** Run discovery. Assert that > 80 of the services are correctly inferred with the right primary owner and repo link.
-- *This is the most important test in the suite. If a PR drops this below 80%, it cannot be merged.*
+    // Trigger discovery
+    const start = Date.now();
+    await triggerDiscovery('e2e-tenant');
+    await waitForDiscoveryComplete('e2e-tenant', { timeoutMs: 5 * 60 * 1000 });
+    const elapsed = Date.now() - start;
 
-### 5.3 Synthetic Topology Generation
-- Script to generate `N` mock CFN stacks, `M` ECS services, and `K` GitHub repos to feed the E2E environment without hitting AWS/GitHub limits.
+    // Validate
+    expect(elapsed).toBeLessThan(5 * 60 * 1000);
+    const catalog = await getCatalog('e2e-tenant');
+    const matchedServices = catalog.filter(s =>
+      knownServices.some(k => s.name === k.name)
+    );
+    const accuracy = matchedServices.length / knownServices.length;
+    expect(accuracy).toBeGreaterThan(0.80);
+  });
+});
+```
+
+### 5.2 Cmd+K Search
+
+```typescript
+describe('Cmd+K Search Experience', () => {
+  it('returns search results within 200ms', async () => {
+    await populateCatalog(100);
+    const start = performance.now();
+    const results = await searchAPI('payment');
+    expect(performance.now() - start).toBeLessThan(200);
+    expect(results.length).toBeGreaterThan(0);
+  });
+
+  it('ranks exact name match above partial match', async () => {
+    await populateCatalog([
+      { name: 'payment-api' },
+      { name: 'payment-processor' },
+      { name: 'api-gateway' },
+    ]);
+    const results = await searchAPI('payment-api');
+    expect(results[0].name).toBe('payment-api');
+  });
+});
+```
+
+### 5.3 Phantom Quarantine Journey
+
+```typescript
+describe('Phantom Quarantine', () => {
+  it('quarantines phantom services when discovery rule misfires', async () => {
+    // Enable a bad discovery flag that creates phantom services
+    await enableFlag('experimental-tag-scanner');
+
+    // Trigger discovery — bad rule creates 8 phantom services
+    await triggerDiscovery('e2e-tenant');
+    await waitForDiscoveryComplete('e2e-tenant');
+
+    // Circuit breaker should have tripped (>5 unconfirmed)
+    const catalog = await getCatalog('e2e-tenant');
+    const quarantined = catalog.filter(s => s.status === 'quarantined');
+    expect(quarantined.length).toBeGreaterThanOrEqual(5);
+
+    // Flag should be auto-disabled
+    const flagState = await getFlagState('experimental-tag-scanner');
+    expect(flagState.enabled).toBe(false);
+  });
+});
+```
+
+### 5.4 E2E Infrastructure
+
+```yaml
+# docker-compose.e2e.yml
+services:
+  localstack:
+    image: localstack/localstack:3
+    environment:
+      SERVICES: sts,cloudformation,ecs,lambda,rds,s3,sqs,stepfunctions
+    ports: ["4566:4566"]
+
+  postgres:
+    image: postgres:16-alpine
+    environment:
+      POSTGRES_PASSWORD: test
+    ports: ["5432:5432"]
+
+  redis:
+    image: redis:7-alpine
+    ports: ["6379:6379"]
+
+  meilisearch:
+    image: getmeili/meilisearch:v1
+    ports: ["7700:7700"]
+
+  wiremock:
+    image: wiremock/wiremock:3
+    ports: ["8080:8080"]
+    volumes:
+      - ./fixtures/wiremock:/home/wiremock/mappings
+
+  app:
+    build: .
+    environment:
+      AWS_ENDPOINT: http://localstack:4566
+      DATABASE_URL: postgres://postgres:test@postgres:5432/test
+      REDIS_URL: redis://redis:6379
+      MEILI_URL: http://meilisearch:7700
+      GITHUB_API_URL: http://wiremock:8080
+      SLACK_API_URL: http://wiremock:8080
+    depends_on: [localstack, postgres, redis, meilisearch, wiremock]
+```
 
 ---
 
-## 6. Performance & Load Testing
-
-Load tests ensure the serverless architecture scales correctly and the Cmd+K search remains instantaneous.
+## Section 6: Performance & Load Testing
 
 ### 6.1 Discovery Scan Benchmarks
-- **Target:** 500 AWS resources + 500 GitHub repos scanned and reconciled in < 120 seconds.
-- **Tooling:** K6 or Artillery. Push 5,000 synthetic SQS messages into the Reconciler queue and measure Lambda batch processing throughput.
+
+```python
+# tests/perf/test_discovery_performance.py
+
+class TestDiscoveryPerformance:
+    def test_aws_scan_completes_within_60s_for_50_resources(self): ...
+    def test_aws_scan_completes_within_3min_for_500_resources(self): ...
+    def test_github_scan_completes_within_60s_for_100_repos(self): ...
+    def test_github_scan_completes_within_3min_for_500_repos(self): ...
+    def test_full_discovery_pipeline_completes_within_5min_for_medium_org(self):
+        """Medium org: 200 AWS resources + 150 GitHub repos."""
+        ...
+    def test_reconciliation_completes_within_30s_for_200_services(self): ...
+```
 
 ### 6.2 Catalog Query Latency
-- **Target:** API search endpoint returns in < 100ms at the 99th percentile.
-- **Test:** Load Meilisearch with 10,000 service documents. Fire 50 concurrent Cmd+K search requests per second. Assert p99 latency.
 
-### 6.3 Concurrent Scorecard Evaluation
-- Ensure the Python inference Lambda can evaluate 1,000 services concurrently without database connection exhaustion (using Aurora Serverless v2 connection pooling).
+```typescript
+describe('Catalog Query Performance', () => {
+  it('returns service list in <100ms with 1000 services', async () => {
+    await populateCatalog(1000);
+    const start = performance.now();
+    await getCatalog('perf-tenant', { limit: 50 });
+    expect(performance.now() - start).toBeLessThan(100);
+  });
+
+  it('Meilisearch returns results in <200ms with 5000 services', async () => {
+    await indexServices(generate5000Services());
+    const start = performance.now();
+    await search('payment');
+    expect(performance.now() - start).toBeLessThan(200);
+  });
+
+  it('concurrent 50 catalog queries complete within 500ms p95', async () => {
+    await populateCatalog(1000);
+    const results = await Promise.all(
+      Array.from({ length: 50 }, () => timedQuery('perf-tenant'))
+    );
+    const p95 = percentile(results.map(r => r.elapsed), 95);
+    expect(p95).toBeLessThan(500);
+  });
+});
+```
+
+### 6.3 Ownership Inference at Scale
+
+```python
+class TestOwnershipPerformance:
+    def test_infers_ownership_for_200_services_within_60s(self): ...
+    def test_memory_stays_under_256mb_during_500_service_inference(self): ...
+    def test_handles_org_with_50_teams_without_degradation(self): ...
+```
 
 ---
 
-## 7. CI/CD Pipeline Integration
+## Section 7: CI/CD Pipeline Integration
 
-The test pyramid is enforced through GitHub Actions.
+### 7.1 Pipeline Stages
 
-### 7.1 Test Stages
-- **Pre-commit:** Husky runs ESLint, Prettier, and fast unit tests (Jest/pytest) for changed files only.
-- **PR Gate:** Runs the full Unit and Integration test suites. Blocks merge if coverage drops or tests fail.
-- **Merge (Main):** Deploys to Staging. Runs E2E Critical User Journeys and the 80% Accuracy Validation suite against the Party Mode org.
-- **Post-Deploy:** Smoke tests verify health endpoints and ALB routing in production.
+```
+┌─────────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
+│ Pre-Commit   │───▶│ PR Gate  │───▶│ Merge    │───▶│ Staging  │───▶│ Prod     │
+│ (local)      │    │ (CI)     │    │ (CI)     │    │ (CD)     │    │ (CD)     │
+└─────────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
+  lint + type       unit tests      full suite      E2E + perf     smoke + canary
+  <10s              <5min           <10min          <15min         5-min miracle
+```
 
 ### 7.2 Coverage Thresholds
-- Global Unit Test Coverage: 80%
-- Ownership Inference & Reconciliation Logic: 95%
-- Feature Flag & Governance Evaluators: 100%
+
+| Component | Minimum | Target |
+|-----------|---------|--------|
+| Ownership Inference | 90% | 95% |
+| Reconciliation Engine | 85% | 90% |
+| AWS Scanner | 80% | 85% |
+| GitHub Scanner | 80% | 85% |
+| Governance Policy | 90% | 95% |
+| Catalog API | 80% | 85% |
+| Overall | 80% | 85% |
 
 ### 7.3 Test Parallelization
-- Jest tests run with `--maxWorkers=50%` locally, `100%` in CI.
-- Integration tests using Testcontainers run serially per file to avoid database port conflicts, or use dynamic port binding and separate schemas for parallel execution.
+
+```yaml
+# .github/workflows/test.yml
+jobs:
+  unit-python:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        suite: [scanners, reconciler, ownership, governance]
+    steps:
+      - run: pytest tests/unit/${{ matrix.suite }} -x --tb=short
+
+  unit-typescript:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        shard: [1, 2, 3]
+    steps:
+      - run: vitest --shard=${{ matrix.shard }}/3
+
+  integration:
+    runs-on: ubuntu-latest
+    services:
+      localstack: { image: localstack/localstack:3 }
+      postgres: { image: postgres:16-alpine }
+      redis: { image: redis:7-alpine }
+      meilisearch: { image: getmeili/meilisearch:v1 }
+    steps:
+      - run: pytest tests/integration/ -x
+      - run: vitest --project=integration
+
+  e2e:
+    needs: [unit-python, unit-typescript, integration]
+    runs-on: ubuntu-latest
+    steps:
+      - run: docker compose -f docker-compose.e2e.yml up -d
+      - run: vitest --project=e2e
+```
 
 ---
 
-## 8. Transparent Factory Tenet Testing
+## Section 8: Transparent Factory Tenet Testing
 
-Testing the governance and compliance features of the IDP itself.
+### 8.1 Atomic Flagging — Phantom Quarantine Circuit Breaker
 
-### 8.1 Feature Flag Circuit Breakers
-- **Test:** Enable a flagged discovery heuristic that generates 10 phantom services.
-- **Assert:** The system detects the threshold (>5 unconfirmed), auto-disables the flag, and marks the 10 services as `status: quarantined`.
+```typescript
+describe('Atomic Flagging', () => {
+  describe('Flag Lifecycle', () => {
+    it('new discovery source flag defaults to false', () => {});
+    it('flag has owner and ttl metadata (max 14 days)', () => {});
+    it('CI blocks when flag at 100% exceeds TTL', () => {});
+  });
 
-### 8.2 Schema Migration Validation
-- **Test:** Attempt to apply a PR that drops a column from the `services` table.
-- **Assert:** CI migration validator script fails the build (additive-only rule).
+  describe('Phantom Quarantine Breaker', () => {
+    it('allows service creation when <5 unconfirmed per scan', () => {});
+    it('trips breaker when >5 unconfirmed services in single scan', () => {});
+    it('quarantines phantom services (status=quarantined)', () => {});
+    it('auto-disables the discovery flag', () => {});
+    it('quarantined services appear in admin review queue', () => {});
+    it('admin can approve quarantined services into catalog', () => {});
+    it('admin can purge quarantined services', () => {});
+  });
 
-### 8.3 Decision Log Enforcement
-- **Test:** Run a discovery scan where service ownership is inferred from `git blame`.
-- **Assert:** A `decision_log` entry is written to PostgreSQL with the prompt/reasoning, alternatives, and confidence.
+  describe('Local Evaluation', () => {
+    it('flag check does not make network calls during scan', () => {});
+    it('flag state refreshed from file/env every 60s', () => {});
+  });
+});
+```
 
-### 8.4 OTEL Span Assertions
-- **Test:** Run the Reconciler Lambda.
-- **Assert:** The `catalog_scan` parent span contains child spans for `ownership_inference` with attributes for `catalog.service_id`, `catalog.ownership_signals`, and `catalog.confidence_score`. Use an in-memory OTEL exporter for testing.
+### 8.2 Elastic Schema — Migration Validation
 
-### 8.5 Governance Policy Enforcement
-- **Test:** Set tenant policy to `strict` mode. Simulate auto-discovery finding a new service.
-- **Assert:** Service is placed in the "pending review" queue and NOT visible in the main catalog.
-- **Test:** Set `panic_mode: true`. Attempt a `PATCH /api/v1/services/123`.
-- **Assert:** HTTP 503 Service Unavailable.
+```python
+class TestElasticSchema:
+    def test_rejects_migration_with_drop_column(self): ...
+    def test_rejects_migration_with_alter_column_type(self): ...
+    def test_rejects_migration_with_rename_column(self): ...
+    def test_accepts_migration_with_add_nullable_column(self): ...
+    def test_accepts_migration_with_new_table(self): ...
+    def test_v1_code_ignores_v2_columns_without_error(self): ...
+    def test_every_migration_has_sunset_date_comment(self):
+        for f in glob.glob('migrations/*.sql'):
+            content = open(f).read()
+            assert re.search(r'-- sunset_date: \d{4}-\d{2}-\d{2}', content)
+    def test_ci_warns_on_past_sunset_migrations(self): ...
+```
+
+### 8.3 Cognitive Durability — Decision Log Validation
+
+```typescript
+describe('Cognitive Durability', () => {
+  it('decision_log.json required for PRs touching ownership inference', () => {});
+  it('decision_log.json required for PRs touching reconciliation', () => {});
+
+  it('decision log has all required fields', () => {
+    const logs = glob.sync('docs/decisions/*.json');
+    for (const log of logs) {
+      const entry = JSON.parse(fs.readFileSync(log, 'utf-8'));
+      expect(entry).toHaveProperty('reasoning');
+      expect(entry).toHaveProperty('alternatives_considered');
+      expect(entry).toHaveProperty('confidence');
+      expect(entry).toHaveProperty('timestamp');
+      expect(entry).toHaveProperty('author');
+    }
+  });
+
+  it('ownership signal weight changes include before/after examples', () => {
+    // Decision logs for ownership changes must include sample scenarios
+  });
+});
+```
+
+### 8.4 Semantic Observability — OTEL Span Assertions
+
+```typescript
+describe('Semantic Observability', () => {
+  let spanExporter: InMemorySpanExporter;
+
+  describe('Discovery Scan Spans', () => {
+    it('emits parent catalog_scan span', async () => {
+      await triggerDiscovery('test-tenant');
+      const spans = spanExporter.getFinishedSpans();
+      expect(spans.find(s => s.name === 'catalog_scan')).toBeDefined();
+    });
+
+    it('emits child aws_scan and github_scan spans', async () => {
+      await triggerDiscovery('test-tenant');
+      const spans = spanExporter.getFinishedSpans();
+      expect(spans.find(s => s.name === 'aws_scan')).toBeDefined();
+      expect(spans.find(s => s.name === 'github_scan')).toBeDefined();
+    });
+  });
+
+  describe('Ownership Inference Spans', () => {
+    it('emits ownership_inference span with all signals considered', async () => {
+      await inferOwnership('test-service');
+      const span = spanExporter.getFinishedSpans().find(s => s.name === 'ownership_inference');
+      expect(span.attributes['catalog.ownership_signals']).toBeDefined();
+      expect(span.attributes['catalog.confidence_score']).toBeGreaterThanOrEqual(0);
+    });
+
+    it('includes rejected signals in span attributes', async () => {
+      await inferOwnership('test-service');
+      const span = spanExporter.getFinishedSpans().find(s => s.name === 'ownership_inference');
+      const signals = JSON.parse(span.attributes['catalog.ownership_signals']);
+      expect(signals.length).toBeGreaterThan(0);
+      // Each signal should have: source, team, weight, accepted/rejected
+    });
+  });
+
+  describe('PII Protection', () => {
+    it('hashes repo names in span attributes', async () => {
+      await triggerDiscovery('test-tenant');
+      const spans = spanExporter.getFinishedSpans();
+      for (const span of spans) {
+        const attrs = JSON.stringify(span.attributes);
+        expect(attrs).not.toContain('payment-api'); // real name
+      }
+    });
+
+    it('hashes team names in ownership spans', async () => {
+      await inferOwnership('test-service');
+      const span = spanExporter.getFinishedSpans().find(s => s.name === 'ownership_inference');
+      expect(span.attributes['catalog.service_id']).toMatch(/^[a-f0-9]+$/);
+    });
+  });
+});
+```
+
+### 8.5 Configurable Autonomy — Governance Tests
+
+```typescript
+describe('Configurable Autonomy', () => {
+  describe('Strict Mode (suggest-only)', () => {
+    it('discovery results go to pending review queue', async () => {
+      setPolicy({ governance_mode: 'strict' });
+      await triggerDiscovery('test-tenant');
+      const pending = await getPendingReview('test-tenant');
+      expect(pending.length).toBeGreaterThan(0);
+      const catalog = await getCatalog('test-tenant');
+      expect(catalog.length).toBe(0); // Nothing auto-added
+    });
+  });
+
+  describe('Audit Mode (auto-mutate)', () => {
+    it('discovery results auto-applied to catalog with logging', async () => {
+      setPolicy({ governance_mode: 'audit' });
+      await triggerDiscovery('test-tenant');
+      const catalog = await getCatalog('test-tenant');
+      expect(catalog.length).toBeGreaterThan(0);
+      const logs = await getPolicyLogs('test-tenant');
+      expect(logs.some(l => l.includes('auto-created in audit mode'))).toBe(true);
+    });
+  });
+
+  describe('Panic Mode', () => {
+    it('halts discovery scans immediately', async () => {
+      await activatePanic();
+      const result = await triggerDiscovery('test-tenant');
+      expect(result.status).toBe('halted');
+    });
+
+    it('catalog API returns 503 for writes', async () => {
+      await activatePanic();
+      const res = await fetch('/api/services', { method: 'POST', body: '{}' });
+      expect(res.status).toBe(503);
+    });
+
+    it('catalog API allows reads during panic', async () => {
+      await activatePanic();
+      const res = await fetch('/api/services');
+      expect(res.status).toBe(200);
+    });
+  });
+
+  describe('Per-Team Override', () => {
+    it('team strict lock prevents auto-mutation even in audit mode', async () => {
+      setPolicy({ governance_mode: 'audit' });
+      setTeamPolicy('platform-team', { governance_mode: 'strict' });
+      await triggerDiscovery('test-tenant');
+      const platformServices = (await getCatalog('test-tenant'))
+        .filter(s => s.team === 'platform-team');
+      expect(platformServices.length).toBe(0); // Blocked by team lock
+    });
+  });
+});
+```
 
 ---
 
-## 9. Test Data & Fixtures
+## Section 9: Test Data & Fixtures
 
-High-quality fixtures are the lifeblood of this TDD strategy.
+### 9.1 Directory Structure
 
-### 9.1 GitHub/GitLab API Response Factories
-- JSON files containing real obfuscated GraphQL responses for Repositories, `CODEOWNERS` blobs, and Team memberships.
-- Use factories (e.g., `fishery` or custom functions) to easily override fields: `buildGHRepo({ name: 'auth-service', languages: ['Go'] })`.
+```
+tests/
+  fixtures/
+    aws/
+      cloudformation/
+        payment-api-stack.json
+        user-service-stack.json
+        empty-stack.json
+      ecs/
+        prod-cluster-services.json
+        staging-cluster-services.json
+      lambda/
+        functions-list.json
+        api-gateway-mappings.json
+      rds/
+        instances-list.json
+    github/
+      graphql/
+        org-repos-page1.json
+        org-repos-page2.json
+        repo-details-with-codeowners.json
+        repo-details-no-codeowners.json
+      codeowners/
+        simple-team-ownership.txt
+        multi-path-ownership.txt
+        wildcard-patterns.txt
+        empty.txt
+      workflows/
+        ecs-deploy.yml
+        lambda-deploy.yml
+        matrix-deploy.yml
+        non-deploy-ci.yml
+    scenarios/
+      medium-org-200-resources.json
+      large-org-500-resources.json
+      conflicting-ownership.json
+      no-github-match.json
+    slack/
+      service-card-blocks.json
+      search-results-blocks.json
+      ownership-info-blocks.json
+```
 
-### 9.2 Synthetic Topology Generators
-- Scripts that generate interconnected AWS resources (e.g., a CFN stack containing an API Gateway routing to 3 Lambdas interacting with 1 RDS instance).
+### 9.2 Service Factory
 
-### 9.3 `CODEOWNERS` and Git Blame Mocks
-- Diverse `CODEOWNERS` files covering edge cases: wildcard matching, deep path matching, invalid syntax, user-vs-team owners.
+```python
+# tests/helpers/factories.py
+def make_aws_service(overrides=None):
+    defaults = {
+        "name": f"service-{fake.word()}",
+        "source": "aws",
+        "aws_resources": [
+            {"type": "ecs-service", "arn": f"arn:aws:ecs:us-east-1:123456789:service/prod/{fake.word()}"},
+        ],
+        "tags": {"service": fake.word(), "team": fake.word()},
+        "confidence": 0.85,
+        "discovered_at": datetime.utcnow().isoformat(),
+    }
+    return {**defaults, **(overrides or {})}
+
+def make_github_repo(overrides=None):
+    defaults = {
+        "name": f"{fake.word()}-{fake.word()}",
+        "language": random.choice(["TypeScript", "Python", "Go", "Java"]),
+        "codeowners": [{"path": "*", "owners": [f"@org/{fake.word()}-team"]}],
+        "top_committers": [fake.name() for _ in range(5)],
+        "has_deploy_workflow": random.choice([True, False]),
+        "deploy_target": None,
+    }
+    return {**defaults, **(overrides or {})}
+
+def make_catalog_service(overrides=None):
+    defaults = {
+        "service_id": str(uuid4()),
+        "tenant_id": "test-tenant",
+        "name": f"{fake.word()}-{random.choice(['api', 'service', 'worker', 'lambda'])}",
+        "team": f"{fake.word()}-team",
+        "language": random.choice(["TypeScript", "Python", "Go"]),
+        "sources": random.sample(["aws", "github"], k=random.randint(1, 2)),
+        "confidence": round(random.uniform(0.5, 1.0), 2),
+        "status": "active",
+        "ownership_signals": [],
+    }
+    return {**defaults, **(overrides or {})}
+```
+
+### 9.3 Synthetic Org Topology Generator
+
+```python
+# tests/helpers/org_generator.py
+def generate_org_topology(num_teams=5, services_per_team=10, repos_per_service=1.5):
+    """Generate a realistic org with teams, services, repos, and dependencies."""
+    teams = [f"team-{fake.word()}" for _ in range(num_teams)]
+    services = []
+    repos = []
+
+    for team in teams:
+        for i in range(services_per_team):
+            svc_name = f"{team.split('-')[1]}-{fake.word()}-{random.choice(['api', 'worker', 'lambda'])}"
+            services.append(make_aws_service({"name": svc_name, "tags": {"team": team}}))
+
+            # Each service has 1-2 repos
+            for j in range(int(repos_per_service)):
+                repos.append(make_github_repo({
+                    "name": svc_name if j == 0 else f"{svc_name}-lib",
+                    "codeowners": [{"path": "*", "owners": [f"@org/{team}"]}],
+                    "deploy_target": svc_name if j == 0 else None,
+                }))
+
+    return {"teams": teams, "services": services, "repos": repos}
+```
 
 ---
 
-## 10. TDD Implementation Order
+## Section 10: TDD Implementation Order
 
-To bootstrap the platform efficiently, testing and development should follow this sequence based on Epic dependencies:
+### 10.1 Bootstrap Sequence
 
-1. **Epic 2 (GitHub Parsers):** Write pure unit tests for `CODEOWNERS` parser and `README` extractor. *Value: High ROI, zero dependencies.*
-2. **Epic 1 (AWS Heuristics):** Write unit tests for mapping CFN stacks and Tags to Service entities. *Value: Core product logic.*
-3. **Epic 2 (Ownership Inference):** TDD the scoring algorithm. Build the weighting math. *Value: The brain of the platform.*
-4. **Epic 3 (Service Catalog Schema):** Integration tests for PostgreSQL RLS and upserting services. *Value: Data durability.*
-5. **Epic 2 (Reconciliation):** Unit tests merging AWS and GitHub mock entities. *Value: Pipeline glue.*
-6. **Epic 4 (Search Sync):** Integration tests for pushing DB updates to Meilisearch.
-7. **Epic 5 (API & UI):** E2E test for the Cmd+K search flow.
-8. **Epic 10 (Governance & Flags):** Unit tests for feature flag circuit breakers and strict mode.
-9. **Epic 9 (Onboarding):** Playwright E2E for the 5-Minute Miracle flow.
+```
+Phase 0: Test Infrastructure (Week 0)
+  ├── 0.1 pytest + vitest config
+  ├── 0.2 LocalStack helper (STS, CFN, ECS, Lambda, RDS, SQS, Step Functions)
+  ├── 0.3 Testcontainers helpers (PostgreSQL, Redis, Meilisearch)
+  ├── 0.4 WireMock GitHub GraphQL stubs
+  ├── 0.5 Factory functions (make_aws_service, make_github_repo, make_catalog_service)
+  ├── 0.6 Org topology generator
+  └── 0.7 CI pipeline with test stages
+```
 
-This sequence ensures the most complex algorithmic logic is proven before it is wired to databases and APIs.
+### 10.2 Epic-by-Epic TDD Order
+
+```
+Phase 1: AWS Discovery (Epic 1) — Tests First for STS, Integration-Led for Scanners
+  ├── 1.1 RED: STS role assumption tests (security-critical)
+  ├── 1.2 GREEN: Implement STS client
+  ├── 1.3 Implement CFN scanner against LocalStack
+  ├── 1.4 RED: CFN scanner unit tests (lock in behavior)
+  ├── 1.5 Implement ECS + Lambda + RDS scanners
+  ├── 1.6 RED: Scanner unit tests for each resource type
+  ├── 1.7 INTEGRATION: Full AWS scan against LocalStack
+  └── 1.8 REFACTOR: Extract scanner interface, add parallelism
+
+Phase 2: GitHub Discovery (Epic 2) — Integration-Led
+  ├── 2.1 Implement repo scanner against WireMock
+  ├── 2.2 RED: CODEOWNERS parser tests (strict TDD)
+  ├── 2.3 GREEN: Implement CODEOWNERS parser
+  ├── 2.4 RED: Workflow parser tests
+  ├── 2.5 GREEN: Implement workflow parser
+  ├── 2.6 INTEGRATION: Full GitHub scan against WireMock
+  └── 2.7 RED: Rate limit handling tests
+
+Phase 3: Reconciliation (Epic 3) — Tests First
+  ├── 3.1 RED: Cross-reference matching tests
+  ├── 3.2 GREEN: Implement reconciler
+  ├── 3.3 RED: Deduplication tests
+  ├── 3.4 GREEN: Implement dedup logic
+  ├── 3.5 INTEGRATION: Reconciler → PostgreSQL
+  └── 3.6 REFACTOR: Confidence scoring pipeline
+
+Phase 4: Ownership Inference (Epic 4) — Strict TDD
+  ├── 4.1 RED: Signal weighting tests (all combinations)
+  ├── 4.2 GREEN: Implement inference engine
+  ├── 4.3 RED: Ambiguity detection tests
+  ├── 4.4 GREEN: Implement ambiguity logic
+  ├── 4.5 RED: Manual override tests
+  ├── 4.6 GREEN: Implement override handling
+  └── 4.7 INTEGRATION: Inference → PostgreSQL
+
+Phase 5: Catalog API + Search (Epics 5-6) — Integration-Led
+  ├── 5.1 Implement API endpoints
+  ├── 5.2 RED: API unit tests (CRUD, filtering, pagination)
+  ├── 5.3 INTEGRATION: API → PostgreSQL
+  ├── 5.4 INTEGRATION: API → Meilisearch
+  └── 5.5 RED: Tenant isolation tests
+
+Phase 6: Governance (Epic 10) — Strict TDD
+  ├── 6.1 RED: Strict/audit mode tests
+  ├── 6.2 GREEN: Implement policy engine
+  ├── 6.3 RED: Panic mode tests
+  ├── 6.4 GREEN: Implement panic mode
+  ├── 6.5 RED: Phantom quarantine circuit breaker tests
+  ├── 6.6 GREEN: Implement circuit breaker
+  ├── 6.7 RED: OTEL span assertion tests
+  └── 6.8 GREEN: Instrument all components
+
+Phase 7: E2E Validation
+  ├── 7.1 5-Minute Miracle journey (>80% accuracy gate)
+  ├── 7.2 Cmd+K search journey (<200ms gate)
+  ├── 7.3 Phantom quarantine journey
+  ├── 7.4 Panic mode journey
+  └── 7.5 Performance benchmarks
+```
+
+### 10.3 "Never Ship Without" Checklist
+
+- [ ] All STS role assumption tests (security gate)
+- [ ] All ownership inference tests (accuracy gate — >80%)
+- [ ] All CODEOWNERS parser tests (correctness gate)
+- [ ] All governance policy tests (compliance gate)
+- [ ] Phantom quarantine circuit breaker test (safety gate)
+- [ ] 5-Minute Miracle E2E journey (product promise gate)
+- [ ] PII protection span tests (privacy gate)
+- [ ] Schema migration lint (no breaking changes)
+- [ ] Coverage ≥80% overall, ≥90% on ownership inference
+- [ ] Meilisearch search latency <200ms with 1000 services
+
+---
+
+*End of dd0c/portal Test Architecture*
diff --git a/products/05-aws-cost-anomaly/test-architecture/test-architecture.md b/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
index 7bc363b..00b7784 100644
--- a/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
+++ b/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
@@ -1,103 +1,232 @@
 # dd0c/cost — Test Architecture & TDD Strategy
 
-**Version:** 2.0  
-**Date:** February 28, 2026  
-**Status:** Authoritative  
-**Audience:** Founding engineer, future contributors
+**Product:** dd0c/cost — AWS Cost Anomaly Detective
+**Author:** Test Architecture Phase
+**Date:** February 28, 2026
+**Status:** V1 MVP — Solo Founder Scope
 
 ---
 
-> **Guiding principle:** A cost anomaly detector that misses a $3,000 GPU instance is worse than useless — it's a liability. A cost anomaly detector that cries wolf 40% of the time gets disabled. Tests are the only way to ship with confidence at solo-founder velocity.
+## Section 1: Testing Philosophy & TDD Workflow
 
----
+### 1.1 Core Philosophy
 
-## Table of Contents
+dd0c/cost sits at the intersection of **money and infrastructure**. A false negative means a customer loses thousands of dollars. A false positive means alert fatigue and churn. The test suite's primary job is to mathematically prove the anomaly scoring engine works across edge cases.
 
-1. [Testing Philosophy & TDD Workflow](#1-testing-philosophy--tdd-workflow)
-2. [Test Pyramid](#2-test-pyramid)
-3. [Unit Test Strategy](#3-unit-test-strategy)
-4. [Integration Test Strategy](#4-integration-test-strategy)
-5. [E2E & Smoke Tests](#5-e2e--smoke-tests)
-6. [Performance & Load Testing](#6-performance--load-testing)
-7. [CI/CD Pipeline Integration](#7-cicd-pipeline-integration)
-8. [Transparent Factory Tenet Testing](#8-transparent-factory-tenet-testing)
-9. [Test Data & Fixtures](#9-test-data--fixtures)
-10. [TDD Implementation Order](#10-tdd-implementation-order)
+Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively unit-tested with synthetic data before any AWS APIs are mocked.
 
----
-
-## 1. Testing Philosophy & TDD Workflow
-
-### Red-Green-Refactor for dd0c/cost
-
-TDD is non-negotiable for the anomaly scoring engine and baseline learning components. A scoring bug that ships to production means either missed anomalies (customers lose money) or false positives (customers disable the product). The cost of a test is minutes. The cost of a scoring bug is churn.
-
-**Where TDD is mandatory:**
-- `src/scoring/` — every scoring signal, composite calculation, and severity classification
-- `src/baseline/` — all statistical operations (mean, stddev, rolling window, cold-start transitions)
-- `src/parsers/` — every CloudTrail event parser (RunInstances, CreateDBInstance, etc.)
-- `src/pricing/` — pricing lookup logic and cost estimation
-- `src/governance/` — policy.json evaluation, auto-promotion logic, panic mode
-
-**Where TDD is recommended but not mandatory:**
-- `src/notifier/` — Slack Block Kit formatting (snapshot tests are sufficient)
-- `src/api/` — REST handlers (contract tests cover these)
-- `src/infra/` — CDK stacks (CDK assertions cover these)
-
-**Where tests follow implementation:**
-- `src/onboarding/` — CloudFormation URL generation, Cognito flows (integration tests only)
-- `src/slack/` — OAuth flows, signature verification (integration tests)
-
-### The Red-Green-Refactor Cycle
+### 1.2 Red-Green-Refactor Adapted to dd0c/cost
 
 ```
-RED:   Write a failing test that describes the desired behavior.
-       Name it precisely: what component, what input, what expected output.
-       Run it. Watch it fail. Confirm it fails for the right reason.
+RED   → Write a failing test that asserts a specific Z-score and severity
+         for a given historical baseline and new cost event.
 
-GREEN: Write the minimum code to make the test pass.
-       No gold-plating. No "while I'm here" refactors.
-       Run the test. Watch it pass.
+GREEN → Implement the scoring math to make it pass.
 
-REFACTOR: Clean up the implementation without changing behavior.
-          Extract constants. Rename variables. Simplify logic.
-          Tests must still pass after every refactor step.
+REFACTOR → Optimize the baseline lookup, extract novelty checks,
+            refine the heuristic weights.
 ```
 
-### Test Naming Convention
+**When to write tests first (strict TDD):**
+- Anomaly scoring engine (Z-scores, novelty checks, composite severity)
+- Cold-start heuristics (fast-path for >$5/hr resources)
+- Baseline calculation (moving averages, standard deviation)
+- Governance policy (strict vs. audit mode, 14-day promotion)
 
-All tests follow the pattern: `[unit under test] [scenario] [expected outcome]`
+**When integration tests lead:**
+- CloudTrail ingestion (implement against LocalStack EventBridge, then lock in)
+- DynamoDB Single-Table schema (build access patterns, then integration test)
+
+**When E2E tests lead:**
+- The Slack alert interaction (format block kit, test the "Snooze/Terminate" buttons)
+
+### 1.3 Test Naming Conventions
 
 ```typescript
-// ✅ Good — precise, readable, searchable
-describe('scoreAnomaly', () => {
-  it('returns critical severity when z-score exceeds 5.0 and instance type is novel', () => {});
-  it('returns none severity when account is in cold-start and cost is below $0.50/hr', () => {});
-  it('returns warning severity when actor is novel but cost is within 2 standard deviations', () => {});
-  it('compounds severity when multiple signals fire simultaneously', () => {});
+describe('AnomalyScorer', () => {
+  it('assigns critical severity when Z-score > 3 and hourly cost > $1', () => {});
+  it('flags actor novelty when IAM role has never launched this service', () => {});
+  it('bypasses baseline and triggers fast-path critical for $10/hr instance', () => {});
 });
 
-// ❌ Bad — vague, not searchable
-describe('scoring', () => {
-  it('works correctly', () => {});
-  it('handles edge cases', () => {});
+describe('CloudTrailNormalizer', () => {
+  it('extracts instance type and region from RunInstances event', () => {});
+  it('looks up correct on-demand pricing for us-east-1 r6g.xlarge', () => {});
 });
 ```
 
-### Decision Log Requirement
-
-Per Transparent Factory tenet (Story 10.3), any PR touching `src/scoring/`, `src/baseline/`, or `src/detection/` must include a `docs/decisions/<YYYY-MM-DD>-<slug>.json` file. The test suite validates this in CI.
-
-```json
-{
-  "prompt": "Should Z-score threshold be 2.5 or 3.0?",
-  "reasoning": "At 2.5, false positive rate in design partner data was 28%. At 3.0, it dropped to 18% with only 2 additional missed true positives over 30 days.",
-  "alternatives_considered": ["2.0 (too noisy)", "3.5 (misses too many real anomalies)"],
-  "confidence": "medium",
-  "timestamp": "2026-02-28T10:00:00Z",
-  "author": "brian"
-}
-```
-
 ---
 
+## Section 2: Test Pyramid
+
+### 2.1 Ratio
+
+| Level | Target | Count (V1) | Runtime |
+|-------|--------|------------|---------|
+| Unit | 70% | ~250 tests | <20s |
+| Integration | 20% | ~80 tests | <3min |
+| E2E/Smoke | 10% | ~15 tests | <5min |
+
+### 2.2 Unit Test Targets
+
+| Component | Key Behaviors | Est. Tests |
+|-----------|--------------|------------|
+| Event Normalizer | CloudTrail parsing, pricing lookup, deduplication | 40 |
+| Baseline Engine | Running mean/stddev calculation, maturity checks | 35 |
+| Anomaly Scorer | Z-score math, novelty detection, composite scoring | 50 |
+| Remediation Handler | Stop/Terminate payload parsing, IAM role assumption logic | 20 |
+| Notification Engine | Slack formatting, daily digest aggregation | 30 |
+| Governance Policy | Mode enforcement, 14-day auto-promotion | 25 |
+| Feature Flags | Circuit breaker on alert volume, flag metadata | 15 |
+
+---
+
+## Section 3: Unit Test Strategy
+
+### 3.1 Cost Ingestion & Normalization
+
+```typescript
+describe('CloudTrailNormalizer', () => {
+  it('normalizes EC2 RunInstances event to CostEvent schema', () => {});
+  it('normalizes RDS CreateDBInstance event to CostEvent schema', () => {});
+  it('extracts assumed role ARN as actor instead of base STS role', () => {});
+  it('applies fallback pricing when instance type is not in static table', () => {});
+  it('ignores non-cost-generating events (e.g., DescribeInstances)', () => {});
+});
+```
+
+### 3.2 Anomaly Engine (The Math)
+
+```typescript
+describe('AnomalyScorer', () => {
+  describe('Statistical Scoring (Z-Score)', () => {
+    it('returns score=0 when event cost exactly matches baseline mean', () => {});
+    it('returns proportional score for Z-scores between 1.0 and 3.0', () => {});
+    it('caps Z-score contribution at max threshold', () => {});
+  });
+
+  describe('Novelty Scoring', () => {
+    it('adds novelty penalty when instance type is first seen for account', () => {});
+    it('adds novelty penalty when IAM user has never provisioned this service', () => {});
+  });
+
+  describe('Cold-Start Fast Path', () => {
+    it('flags $5/hr instance as warning when baseline < 14 days', () => {});
+    it('flags $25/hr instance as critical immediately, bypassing baseline', () => {});
+    it('ignores $0.10/hr instances during cold-start learning period', () => {});
+  });
+});
+```
+
+### 3.3 Baseline Learning
+
+```typescript
+describe('BaselineCalculator', () => {
+  it('updates running mean and stddev using Welford algorithm', () => {});
+  it('adds new actor to observed_actors set', () => {});
+  it('marks baseline as mature when event_count > 20 and age_days > 14', () => {});
+});
+```
+
+---
+
+## Section 4: Integration Test Strategy
+
+### 4.1 DynamoDB Data Layer (Testcontainers)
+
+```typescript
+describe('DynamoDB Single-Table Patterns', () => {
+  it('writes CostEvent and updates Baseline in single transaction', async () => {});
+  it('queries all anomalies for tenant within time range', async () => {});
+  it('fetches tenant config and Slack tokens securely', async () => {});
+});
+```
+
+### 4.2 AWS API Contract Tests
+
+```typescript
+describe('AWS Cross-Account Actions', () => {
+  // Uses LocalStack to simulate target account
+  it('assumes target account remediation role successfully', async () => {});
+  it('executes ec2:StopInstances when remediation approved', async () => {});
+  it('executes rds:DeleteDBInstance with skip-final-snapshot', async () => {});
+});
+```
+
+---
+
+## Section 5: E2E & Smoke Tests
+
+### 5.1 Critical User Journeys
+
+**Journey 1: Real-Time Anomaly Detection**
+1. Send synthetic `RunInstances` event to EventBridge (p9.16xlarge, $40/hr).
+2. Verify system processes event and triggers fast-path (no baseline).
+3. Verify Slack alert is generated with correct cost estimate.
+
+**Journey 2: Interactive Remediation**
+1. Send webhook simulating user clicking "Stop Instance" in Slack.
+2. Verify API Gateway → Lambda executes `StopInstances` against LocalStack.
+3. Verify Slack message updates to "Remediation Successful".
+
+---
+
+## Section 6: Performance & Load Testing
+
+```typescript
+describe('Ingestion Throughput', () => {
+  it('processes 500 CloudTrail events/second via SQS FIFO', async () => {});
+  it('DynamoDB baseline updates complete in <20ms p95', async () => {});
+});
+```
+
+---
+
+## Section 7: CI/CD Pipeline Integration
+
+- **PR Gate:** Unit tests (<2min), Coverage >85% (Scoring engine >95%).
+- **Merge:** Integration tests with LocalStack & Testcontainers DynamoDB.
+- **Staging:** E2E journeys against isolated staging AWS account.
+
+---
+
+## Section 8: Transparent Factory Tenet Testing
+
+### 8.1 Atomic Flagging (Circuit Breaker)
+```typescript
+it('auto-disables scoring rule if it generates >10 alerts/hour for single tenant', () => {});
+```
+
+### 8.2 Configurable Autonomy (14-Day Auto-Promotion)
+```typescript
+it('keeps new tenant in strict mode (log-only) for first 14 days', () => {});
+it('auto-promotes to audit mode (auto-alert) on day 15 if false-positive rate < 10%', () => {});
+```
+
+---
+
+## Section 9: Test Data & Fixtures
+
+```
+fixtures/
+  cloudtrail/
+    ec2-runinstances.json
+    rds-create-db.json
+    lambda-create-function.json
+  baselines/
+    mature-steady-spend.json
+    volatile-dev-account.json
+    cold-start.json
+```
+
+---
+
+## Section 10: TDD Implementation Order
+
+1. **Phase 1:** Anomaly math + Unit tests (Strict TDD).
+2. **Phase 2:** CloudTrail normalizer + Pricing tables.
+3. **Phase 3:** DynamoDB single-table implementation (Integration led).
+4. **Phase 4:** Slack formatting + Remediation Lambda.
+5. **Phase 5:** Governance policies (14-day promotion logic).
+
+*End of dd0c/cost Test Architecture*