products/03-alert-intelligence/test-architecture/test-architecture.md

# dd0c/alert — Test Architecture & TDD Strategy

**Product:** dd0c/alert — Alert Intelligence Platform
**Author:** Test Architecture Phase
**Date:** February 28, 2026
**Status:** V1 MVP — Solo Founder Scope

---

## Section 1: Testing Philosophy & TDD Workflow

### 1.1 Core Philosophy

dd0c/alert is a **safety-critical observability tool** — a bug that silently suppresses a real alert during an incident is worse than having no tool at all. The test suite is the contract that guarantees "we will never eat your alerts."

Guiding principle: **tests describe observable behavior from the on-call engineer's perspective**. If a test can't be explained as "when X happens, the engineer sees Y," it's testing implementation, not behavior.

For a solo founder, the test suite is also the **regression safety net** — it catches the subtle scoring bugs that would erode customer trust over weeks.

### 1.2 Red-Green-Refactor Adapted to dd0c/alert

```
RED   → Write a failing test that describes the desired behavior
         (e.g., "3 Datadog alerts for the same service within 5 minutes
          should produce 1 correlated incident")

GREEN → Write the minimum code to make it pass
         (hardcode the window, just make it work)

REFACTOR → Clean up without breaking tests
            (extract the window manager, add Redis backing,
             optimize the fingerprinting)
```

**When to write tests first (strict TDD):**
- All correlation logic (time-window clustering, service graph traversal, deploy correlation)
- All noise scoring algorithms (rule-based scoring, threshold calculations)
- All HMAC signature validation (security-critical)
- All fingerprinting/deduplication logic
- All suppression governance (strict vs. audit mode)
- All circuit breaker state transitions (suppression DLQ replay)

**When integration tests lead (test-after, then harden):**
- Provider webhook parsers — implement against real payload samples, then lock in with contract tests
- SQS FIFO message ordering — test against LocalStack after implementation
- Slack message formatting — build the blocks, then snapshot test the output

**When E2E tests lead:**
- The 60-second time-to-value journey — define the happy path first, build backward
- Weekly noise digest generation — define expected output, then build the aggregation

### 1.3 Test Naming Conventions

```typescript
// Unit tests (vitest)
describe('CorrelationEngine', () => {
  it('groups alerts for same service within 5min window into single incident', () => {});
  it('extends window by 2min when alert arrives in last 30 seconds', () => {});
  it('caps window extension at 15 minutes total', () => {});
  it('merges downstream service alerts when upstream window is active', () => {});
});

describe('NoiseScorer', () => {
  it('scores deploy-correlated alerts higher when deploy is within 10min', () => {});
  it('returns zero noise score for first-ever alert from a service', () => {});
  it('adds 5 points when PR title matches config or feature-flag', () => {});
});

describe('HmacValidator', () => {
  it('rejects Datadog webhook with missing DD-WEBHOOK-SIGNATURE header', () => {});
  it('rejects PagerDuty webhook with tampered body', () => {});
  it('accepts valid signature and passes payload through', () => {});
});
```

**Rules:**
- Describe the **observable outcome**, not the internal mechanism
- Use present tense ("groups", "rejects", "scores")
- If you need "and" in the name, split into two tests
- Group by component in `describe` blocks

---

## Section 2: Test Pyramid

### 2.1 Ratio

| Level | Target | Count (V1) | Runtime |
|-------|--------|------------|---------|
| Unit | 70% | ~350 tests | <30s |
| Integration | 20% | ~100 tests | <5min |
| E2E/Smoke | 10% | ~20 tests | <10min |

### 2.2 Unit Test Targets (per component)

| Component | Key Behaviors | Est. Tests |
|-----------|--------------|------------|
| Webhook Parsers (Datadog, PD, OpsGenie, Grafana) | Payload normalization, field mapping, batch handling | 60 |
| HMAC Validator | Signature verification per provider, rejection paths | 20 |
| Fingerprint Generator | Deterministic hashing, dedup detection | 15 |
| Correlation Engine | Time-window open/close/extend, service graph merge, deploy correlation | 80 |
| Noise Scorer | Rule-based scoring, deploy proximity weighting, threshold calculations | 60 |
| Suggestion Engine | Suppression recommendations, "what would have happened" calculations | 30 |
| Notification Formatter | Slack block formatting, digest generation, in-place message updates | 25 |
| Governance Policy | Strict/audit mode enforcement, panic mode, per-customer overrides | 30 |
| Feature Flags | Circuit breaker on suppression volume, flag lifecycle | 15 |
| Canonical Schema Mapper | Provider → canonical field mapping, severity normalization | 15 |

### 2.3 Integration Test Boundaries

| Boundary | What's Tested | Infrastructure |
|----------|--------------|----------------|
| Lambda → SQS FIFO | Message ordering, dedup, tenant partitioning | LocalStack |
| SQS → Correlation Engine | Consumer polling, batch processing, error handling | LocalStack |
| Correlation Engine → Redis | Window CRUD, sorted set operations, TTL expiry | Testcontainers Redis |
| Correlation Engine → DynamoDB | Incident persistence, tenant config reads | Testcontainers DynamoDB Local |
| Correlation Engine → TimescaleDB | Time-series writes, continuous aggregate queries | Testcontainers PostgreSQL + TimescaleDB |
| Notification Service → Slack | Block formatting, rate limiting, message update | WireMock |
| API Gateway → Lambda | Webhook routing, auth, throttling | LocalStack |

### 2.4 E2E/Smoke Scenarios

1. **60-Second TTV Journey**: Webhook received → alert in Slack within 60s
2. **Alert Storm Correlation**: 50 alerts in 2 minutes → grouped into 1 incident
3. **Deploy Correlation**: Deploy event + alert storm → deploy identified as trigger
4. **Noise Digest**: 7 days of alerts → weekly Slack digest with noise stats
5. **Multi-Provider Merge**: Datadog + PagerDuty alerts for same service → single incident
6. **Panic Mode**: Enable panic → all suppression stops → alerts pass through raw

---

## Section 3: Unit Test Strategy

### 3.1 Webhook Parsers

Each provider parser is a pure function: payload in, canonical alert(s) out. No side effects, no DB calls.

```typescript
// tests/unit/parsers/datadog.test.ts
describe('DatadogParser', () => {
  it('normalizes single alert payload to canonical schema', () => {});
  it('normalizes batched alert array into multiple canonical alerts', () => {});
  it('maps Datadog P1 to critical, P5 to info', () => {});
  it('extracts service name from tags array', () => {});
  it('handles missing optional fields without throwing', () => {});
  it('generates stable fingerprint from title + service + tenant', () => {});
});

// tests/unit/parsers/pagerduty.test.ts
describe('PagerDutyParser', () => {
  it('normalizes incident.triggered event to canonical alert', () => {});
  it('normalizes incident.resolved event with resolution metadata', () => {});
  it('ignores incident.acknowledged events (not alerts)', () => {});
  it('maps PD urgency high to critical, low to info', () => {});
});

// tests/unit/parsers/opsgenie.test.ts
describe('OpsGenieParser', () => {
  it('normalizes alert.created action to canonical alert', () => {});
  it('extracts priority P1-P5 and maps to severity', () => {});
  it('handles custom fields in details object', () => {});
});

// tests/unit/parsers/grafana.test.ts
describe('GrafanaParser', () => {
  it('normalizes Grafana Alertmanager webhook payload', () => {});
  it('handles multiple alerts in single webhook (Grafana batches)', () => {});
  it('extracts dashboard URL as context link', () => {});
});
```

**Mocking strategy:** None needed — parsers are pure functions. Use recorded payload fixtures from `fixtures/webhooks/{provider}/`.

**Fixture structure:**
```
fixtures/webhooks/
  datadog/
    single-alert.json
    batched-alerts.json
    monitor-recovered.json
  pagerduty/
    incident-triggered.json
    incident-resolved.json
    incident-acknowledged.json
  opsgenie/
    alert-created.json
    alert-closed.json
  grafana/
    single-firing.json
    multi-firing.json
    resolved.json
```

### 3.2 HMAC Validator

```typescript
describe('HmacValidator', () => {
  // Datadog uses hex-encoded HMAC-SHA256
  it('validates correct Datadog DD-WEBHOOK-SIGNATURE header', () => {});
  it('rejects Datadog webhook with wrong signature', () => {});
  it('rejects Datadog webhook with missing signature header', () => {});

  // PagerDuty uses v1= prefix with HMAC-SHA256
  it('validates correct PagerDuty X-PagerDuty-Signature header', () => {});
  it('rejects PagerDuty webhook with tampered body', () => {});

  // OpsGenie uses different header name
  it('validates correct OpsGenie X-OpsGenie-Signature header', () => {});

  // Edge cases
  it('rejects empty body with any signature', () => {});
  it('handles timing-safe comparison to prevent timing attacks', () => {});
});
```

**Mocking strategy:** None — crypto operations are deterministic. Use known secret + body + expected signature triples.

### 3.3 Fingerprint Generator

```typescript
describe('FingerprintGenerator', () => {
  it('generates deterministic SHA-256 from tenant_id + provider + service + title', () => {});
  it('produces same fingerprint for identical alerts regardless of timestamp', () => {});
  it('produces different fingerprints when service differs', () => {});
  it('normalizes title whitespace before hashing', () => {});
  it('handles unicode characters in title consistently', () => {});
});
```

### 3.4 Correlation Engine

The most complex component. Heavy use of table-driven tests.

```typescript
describe('CorrelationEngine', () => {
  describe('Time-Window Management', () => {
    it('opens new 5min window on first alert for a service', () => {});
    it('adds subsequent alerts to existing open window', () => {});
    it('extends window by 2min when alert arrives in last 30 seconds', () => {});
    it('caps total window duration at 15 minutes', () => {});
    it('closes window after timeout with no new alerts', () => {});
    it('generates incident record when window closes', () => {});
  });

  describe('Service Graph Correlation', () => {
    it('merges downstream alerts into upstream window when dependency exists', () => {});
    it('does not merge alerts for unrelated services', () => {});
    it('handles circular dependencies without infinite loop', () => {});
    it('traverses multi-level dependency chains (A→B→C)', () => {});
  });

  describe('Deploy Correlation', () => {
    it('tags incident with deploy_id when deploy event within 10min of first alert', () => {});
    it('does not correlate deploy older than 10 minutes', () => {});
    it('correlates deploy to correct service even with multiple recent deploys', () => {});
    it('adds deploy correlation score boost to noise calculation', () => {});
  });

  describe('Multi-Tenant Isolation', () => {
    it('never correlates alerts across different tenants', () => {});
    it('maintains separate windows per tenant', () => {});
    it('handles concurrent alerts from multiple tenants', () => {});
  });
});
```

**Mocking strategy:**
- Mock Redis client (`ioredis-mock`) for window state
- Mock DynamoDB client for service dependency reads
- Mock SQS for downstream message publishing
- Use `sinon.useFakeTimers()` for time-window testing

### 3.5 Noise Scorer

```typescript
describe('NoiseScorer', () => {
  describe('Rule-Based Scoring', () => {
    it('returns 0 for first-ever alert from a service (no history)', () => {});
    it('scores higher when alert has fired >5 times in 24 hours', () => {});
    it('scores higher when alert auto-resolved within 5 minutes', () => {});
    it('adds deploy correlation bonus (+15 points) when deploy is recent', () => {});
    it('adds feature-flag bonus (+5 points) when PR title matches config/feature-flag', () => {});
    it('caps total score at 100', () => {});
    it('never scores critical severity alerts above 80 (safety cap)', () => {});
  });

  describe('Threshold Calculations', () => {
    it('classifies score 0-30 as signal (keep)', () => {});
    it('classifies score 31-70 as review (annotate)', () => {});
    it('classifies score 71-100 as noise (suggest suppress)', () => {});
    it('uses tenant-specific thresholds when configured', () => {});
  });

  describe('What-Would-Have-Happened', () => {
    it('calculates suppression count for historical window', () => {});
    it('reports zero false negatives when no suppressed alert was critical', () => {});
    it('flags false negative when suppressed alert was later escalated', () => {});
  });
});
```

**Mocking strategy:** Mock the alert history store (DynamoDB queries). Scorer logic itself is pure calculation.

### 3.6 Notification Formatter

```typescript
describe('NotificationFormatter', () => {
  describe('Slack Blocks', () => {
    it('formats single-alert notification with service, title, severity', () => {});
    it('formats correlated incident with alert count and sources', () => {});
    it('includes deploy trigger when deploy correlation exists', () => {});
    it('includes noise score badge (🟢 signal / 🟡 review / 🔴 noise)', () => {});
    it('includes feedback buttons (👍 Helpful / 👎 Not helpful)', () => {});
    it('formats in-place update message (replaces initial alert)', () => {});
  });

  describe('Weekly Digest', () => {
    it('aggregates 7 days of incidents into summary stats', () => {});
    it('highlights top 3 noisiest services', () => {});
    it('shows suppression savings ("would have saved X pages")', () => {});
  });
});
```

**Mocking strategy:** Snapshot tests — render the Slack blocks to JSON and compare against golden fixtures.

### 3.7 Governance Policy Engine

```typescript
describe('GovernancePolicy', () => {
  describe('Mode Enforcement', () => {
    it('in strict mode: annotates alerts but never suppresses', () => {});
    it('in audit mode: auto-suppresses with full logging', () => {});
    it('defaults new tenants to strict mode', () => {});
  });

  describe('Panic Mode', () => {
    it('when panic=true: all suppression stops immediately', () => {});
    it('when panic=true: all alerts pass through unmodified', () => {});
    it('panic mode activatable via Redis key check', () => {});
    it('panic mode shows banner in dashboard API response', () => {});
  });

  describe('Per-Customer Override', () => {
    it('customer can set stricter mode than system default', () => {});
    it('customer cannot set less restrictive mode than system default', () => {});
    it('merge logic: max_restrictive(system, customer)', () => {});
  });

  describe('Policy Decision Logging', () => {
    it('logs "suppressed by audit mode" with full context', () => {});
    it('logs "annotation-only, strict mode active" for strict tenants', () => {});
    it('logs "panic mode active — all alerts passing through"', () => {});
  });
});
```

### 3.8 Feature Flag Circuit Breaker

```typescript
describe('SuppressionCircuitBreaker', () => {
  it('allows suppression when volume is within baseline', () => {});
  it('trips breaker when suppression exceeds 2x baseline over 30min', () => {});
  it('auto-disables the scoring flag when breaker trips', () => {});
  it('replays suppressed alerts from DLQ when breaker trips', () => {});
  it('resets breaker after manual flag re-enable', () => {});
  it('tracks suppression count per flag in Redis sliding window', () => {});
});
```

---

## Section 4: Integration Test Strategy

### 4.1 Webhook Contract Tests

Each provider integration gets a contract test suite that validates the full path: HTTP request → Lambda → SQS message.

```typescript
// tests/integration/webhooks/datadog.contract.test.ts
describe('Datadog Webhook Contract', () => {
  let localstack: LocalStackContainer;
  let sqsClient: SQSClient;

  beforeAll(async () => {
    localstack = await new LocalStackContainer().start();
    sqsClient = new SQSClient({ endpoint: localstack.getEndpoint() });
    // Create SQS FIFO queue
    await sqsClient.send(new CreateQueueCommand({
      QueueName: 'alert-ingested.fifo',
      Attributes: { FifoQueue: 'true', ContentBasedDeduplication: 'true' }
    }));
  });

  it('accepts valid Datadog webhook and produces canonical SQS message', async () => {
    const payload = loadFixture('webhooks/datadog/single-alert.json');
    const signature = computeHmac(payload, TEST_SECRET);

    const res = await request(app)
      .post('/v1/wh/tenant-123/datadog')
      .set('DD-WEBHOOK-SIGNATURE', signature)
      .send(payload);

    expect(res.status).toBe(200);

    const messages = await pollSqs(sqsClient, 'alert-ingested.fifo');
    expect(messages).toHaveLength(1);
    expect(messages[0].body).toMatchObject({
      tenant_id: 'tenant-123',
      provider: 'datadog',
      severity: expect.stringMatching(/critical|high|medium|low|info/),
      fingerprint: expect.stringMatching(/^[a-f0-9]{64}$/),
    });
  });

  it('rejects webhook with invalid HMAC and produces no SQS message', async () => {
    const payload = loadFixture('webhooks/datadog/single-alert.json');

    const res = await request(app)
      .post('/v1/wh/tenant-123/datadog')
      .set('DD-WEBHOOK-SIGNATURE', 'bad-signature')
      .send(payload);

    expect(res.status).toBe(401);
    const messages = await pollSqs(sqsClient, 'alert-ingested.fifo', { waitMs: 1000 });
    expect(messages).toHaveLength(0);
  });
});
```

Repeat pattern for PagerDuty, OpsGenie, Grafana — each with provider-specific signature headers and payload formats.

### 4.2 Correlation Engine → Redis Integration

```typescript
// tests/integration/correlation/redis-windows.test.ts
describe('Correlation Engine + Redis', () => {
  let redis: StartedTestContainer;
  let redisClient: Redis;

  beforeAll(async () => {
    redis = await new GenericContainer('redis:7-alpine')
      .withExposedPorts(6379)
      .start();
    redisClient = new Redis({ host: redis.getHost(), port: redis.getMappedPort(6379) });
  });

  it('opens window in Redis sorted set with correct TTL', async () => {
    await correlationEngine.processAlert(makeAlert({ service: 'payment-api' }));

    const windows = await redisClient.zrange('windows:tenant-123', 0, -1, 'WITHSCORES');
    expect(windows).toHaveLength(2); // [windowId, closesAtEpoch]
    const ttl = await redisClient.ttl('window:tenant-123:payment-api');
    expect(ttl).toBeGreaterThan(280); // ~5min minus processing time
  });

  it('extends window when alert arrives in last 30 seconds', async () => {
    // Open window, advance clock to T+4m31s, send another alert
    await correlationEngine.processAlert(makeAlert({ service: 'payment-api' }));
    vi.advanceTimersByTime(4 * 60 * 1000 + 31 * 1000);
    await correlationEngine.processAlert(makeAlert({ service: 'payment-api' }));

    const ttl = await redisClient.ttl('window:tenant-123:payment-api');
    expect(ttl).toBeGreaterThan(100); // Extended by ~2min
  });

  it('isolates windows between tenants', async () => {
    await correlationEngine.processAlert(makeAlert({ tenant: 'A', service: 'api' }));
    await correlationEngine.processAlert(makeAlert({ tenant: 'B', service: 'api' }));

    const windowsA = await redisClient.zrange('windows:A', 0, -1);
    const windowsB = await redisClient.zrange('windows:B', 0, -1);
    expect(windowsA).toHaveLength(1);
    expect(windowsB).toHaveLength(1);
    expect(windowsA[0]).not.toBe(windowsB[0]);
  });
});
```

### 4.3 Correlation Engine → DynamoDB Integration

```typescript
// tests/integration/correlation/dynamodb-incidents.test.ts
describe('Correlation Engine + DynamoDB', () => {
  let dynamodb: StartedTestContainer;

  beforeAll(async () => {
    dynamodb = await new GenericContainer('amazon/dynamodb-local:latest')
      .withExposedPorts(8000)
      .start();
    // Create tables: alerts, incidents, tenant_config, service_dependencies
  });

  it('persists incident record when correlation window closes', async () => {
    await correlationEngine.processAlert(makeAlert({ service: 'api' }));
    await correlationEngine.processAlert(makeAlert({ service: 'api' }));
    await correlationEngine.closeExpiredWindows();

    const incidents = await queryIncidents('tenant-123');
    expect(incidents).toHaveLength(1);
    expect(incidents[0].alert_count).toBe(2);
    expect(incidents[0].services).toContain('api');
  });

  it('reads service dependencies for cascading correlation', async () => {
    await putServiceDependency('tenant-123', 'api', 'database');
    await correlationEngine.processAlert(makeAlert({ service: 'database' }));
    await correlationEngine.processAlert(makeAlert({ service: 'api' }));

    // Both should be in the same window
    const windows = await getActiveWindows('tenant-123');
    expect(windows).toHaveLength(1);
    expect(windows[0].services).toEqual(expect.arrayContaining(['api', 'database']));
  });
});
```

### 4.4 Correlation Engine → TimescaleDB Integration

```typescript
// tests/integration/correlation/timescaledb-trends.test.ts
describe('Correlation Engine + TimescaleDB', () => {
  let pg: StartedTestContainer;

  beforeAll(async () => {
    pg = await new GenericContainer('timescale/timescaledb:latest-pg16')
      .withExposedPorts(5432)
      .withEnvironment({ POSTGRES_PASSWORD: 'test' })
      .start();
    // Run migrations: create hypertables, continuous aggregates
  });

  it('writes alert frequency data to hypertable', async () => {
    await correlationEngine.recordAlertEvent(makeAlert({ service: 'api' }));
    const rows = await query('SELECT * FROM alert_events WHERE service = $1', ['api']);
    expect(rows).toHaveLength(1);
  });

  it('continuous aggregate calculates hourly alert counts', async () => {
    // Insert 10 alerts spread over 2 hours
    await insertAlertEvents(10, { spreadHours: 2 });
    await refreshContinuousAggregate('hourly_alert_summary');

    const summary = await query('SELECT * FROM hourly_alert_summary');
    expect(summary).toHaveLength(2);
    expect(summary.reduce((s, r) => s + r.alert_count, 0)).toBe(10);
  });
});
```

### 4.5 Notification Service → Slack (WireMock)

```typescript
// tests/integration/notifications/slack.test.ts
describe('Notification Service + Slack', () => {
  let wiremock: WireMockContainer;

  beforeAll(async () => {
    wiremock = await new WireMockContainer().start();
    wiremock.stub({
      request: { method: 'POST', urlPath: '/api/chat.postMessage' },
      response: { status: 200, body: JSON.stringify({ ok: true, ts: '1234.5678' }) }
    });
    wiremock.stub({
      request: { method: 'POST', urlPath: '/api/chat.update' },
      response: { status: 200, body: JSON.stringify({ ok: true }) }
    });
  });

  it('sends initial alert notification to correct Slack channel', async () => {});
  it('updates message in-place when correlation completes', async () => {});
  it('respects Slack rate limits (1 msg/sec per channel)', async () => {});
  it('retries on 429 with exponential backoff', async () => {});
  it('includes feedback buttons in correlated incident message', async () => {});
});
```

---

## Section 5: E2E & Smoke Tests

### 5.1 Critical User Journeys

**Journey 1: 60-Second Time-to-Value**

The defining test for dd0c/alert. Validates the entire pipeline from webhook to Slack notification.

```typescript
// tests/e2e/journeys/sixty-second-ttv.test.ts
describe('60-Second Time-to-Value', () => {
  it('delivers first correlated incident to Slack within 60 seconds of webhook', async () => {
    const start = Date.now();

    // 1. Send Datadog webhook
    await sendWebhook('datadog', fixtures.datadog.singleAlert, { tenant: 'e2e-tenant' });

    // 2. Wait for Slack message
    const slackMessage = await waitForSlackMessage('e2e-channel', { timeoutMs: 60_000 });

    const elapsed = Date.now() - start;
    expect(elapsed).toBeLessThan(60_000);
    expect(slackMessage.text).toContain('New alert');
    expect(slackMessage.blocks).toBeDefined();
  });
});
```

**Journey 2: Alert Storm Correlation**

```typescript
// tests/e2e/journeys/alert-storm.test.ts
describe('Alert Storm Correlation', () => {
  it('groups 50 alerts in 2 minutes into a single correlated incident', async () => {
    // Fire 50 alerts for same service over 2 minutes
    for (let i = 0; i < 50; i++) {
      await sendWebhook('datadog', makeAlertPayload({
        service: 'payment-api',
        title: `High latency on payment-api (${i})`,
      }));
      await sleep(2400); // ~50 alerts in 2 min
    }

    // Wait for correlation window to close
    await sleep(5 * 60 * 1000 + 30_000); // 5min window + buffer

    const slackMessages = await getSlackMessages('e2e-channel');
    const incidentMessages = slackMessages.filter(m => m.text.includes('Incident'));
    expect(incidentMessages).toHaveLength(1);
    expect(incidentMessages[0].text).toContain('50 alerts grouped');
  });
});
```

**Journey 3: Deploy Correlation**

```typescript
// tests/e2e/journeys/deploy-correlation.test.ts
describe('Deploy Correlation', () => {
  it('identifies deploy as trigger when alerts follow within 10 minutes', async () => {
    // 1. Send deploy event
    await sendWebhook('github-actions', makeDeployPayload({
      service: 'payment-api',
      commit: 'abc123',
      pr_title: 'feat: add retry logic',
    }));

    // 2. Wait 2 minutes, then fire alerts
    await sleep(2 * 60 * 1000);
    await sendWebhook('datadog', makeAlertPayload({ service: 'payment-api' }));
    await sendWebhook('pagerduty', makeAlertPayload({ service: 'payment-api' }));

    // 3. Wait for correlation
    await sleep(6 * 60 * 1000);

    const slackMessage = await getLatestSlackMessage('e2e-channel');
    expect(slackMessage.text).toContain('Deploy #');
    expect(slackMessage.text).toContain('abc123');
  });
});
```

**Journey 4: Panic Mode**

```typescript
// tests/e2e/journeys/panic-mode.test.ts
describe('Panic Mode', () => {
  it('stops all suppression immediately when panic mode is activated', async () => {
    // 1. Enable audit mode, verify suppression works
    await setGovernanceMode('e2e-tenant', 'audit');
    await sendNoisyAlerts(10);
    const beforePanic = await getSlackMessages('e2e-channel');
    const suppressedBefore = beforePanic.filter(m => m.text.includes('suppressed'));

    // 2. Activate panic mode
    await fetch('/admin/panic', { method: 'POST' });

    // 3. Send more alerts — all should pass through
    await sendNoisyAlerts(10);
    const afterPanic = await getSlackMessages('e2e-channel');
    const rawAlerts = afterPanic.filter(m => !m.text.includes('suppressed'));
    expect(rawAlerts.length).toBeGreaterThanOrEqual(10);
  });
});
```

### 5.2 E2E Infrastructure

```yaml
# docker-compose.e2e.yml
services:
  localstack:
    image: localstack/localstack:3
    environment:
      SERVICES: sqs,s3,dynamodb,apigateway,lambda
    ports: ["4566:4566"]

  timescaledb:
    image: timescale/timescaledb:latest-pg16
    environment:
      POSTGRES_PASSWORD: test
    ports: ["5432:5432"]

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

  wiremock:
    image: wiremock/wiremock:3
    ports: ["8080:8080"]
    volumes:
      - ./fixtures/wiremock:/home/wiremock/mappings

  app:
    build: .
    environment:
      AWS_ENDPOINT: http://localstack:4566
      REDIS_URL: redis://redis:6379
      TIMESCALE_URL: postgres://postgres:test@timescaledb:5432/test
      SLACK_API_URL: http://wiremock:8080
    depends_on: [localstack, timescaledb, redis, wiremock]
```

### 5.3 Synthetic Alert Generation

```typescript
// tests/e2e/helpers/alert-generator.ts
export function makeAlertPayload(overrides: Partial<AlertPayload> = {}): DatadogWebhookPayload {
  return {
    id: ulid(),
    title: overrides.title ?? `Alert: ${faker.hacker.phrase()}`,
    text: faker.lorem.sentence(),
    date_happened: Math.floor(Date.now() / 1000),
    priority: overrides.priority ?? 'normal',
    tags: [`service:${overrides.service ?? 'test-service'}`],
    alert_type: overrides.severity ?? 'warning',
    ...overrides,
  };
}

export async function sendNoisyAlerts(count: number, opts?: { service?: string }) {
  for (let i = 0; i < count; i++) {
    await sendWebhook('datadog', makeAlertPayload({
      service: opts?.service ?? 'noisy-service',
      title: `Flapping alert #${i}`,
    }));
  }
}
```

---

## Section 6: Performance & Load Testing

### 6.1 Alert Ingestion Throughput

```typescript
// tests/perf/ingestion-throughput.test.ts
describe('Ingestion Throughput', () => {
  it('processes 1000 webhooks/second without dropping payloads', async () => {
    const results = await k6.run({
      vus: 100,
      duration: '30s',
      thresholds: {
        http_req_duration: ['p95<200'],  // 200ms p95
        http_req_failed: ['rate<0.001'],  // <0.1% failure
      },
      script: `
        import http from 'k6/http';
        export default function() {
          http.post('${WEBHOOK_URL}/v1/wh/perf-tenant/datadog', 
            JSON.stringify(makeAlertPayload()),
            { headers: { 'DD-WEBHOOK-SIGNATURE': validSig } }
          );
        }
      `,
    });
    expect(results.metrics.http_req_failed.rate).toBeLessThan(0.001);
  });
});
```

### 6.2 Correlation Latency Under Alert Storms

```typescript
describe('Correlation Storm Performance', () => {
  it('correlates 500 alerts across 10 services within 30 seconds', async () => {
    const start = Date.now();
    
    // Simulate incident storm: 500 alerts, 10 services, 2 minutes
    await generateAlertStorm({ alerts: 500, services: 10, durationMs: 120_000 });
    
    // Wait for all windows to close
    await waitForIncidents('perf-tenant', { minCount: 1, timeoutMs: 30_000 });
    
    const elapsed = Date.now() - start - 120_000; // subtract generation time
    expect(elapsed).toBeLessThan(30_000);
  });

  it('Redis memory stays under 50MB during 10K active windows', async () => {
    // Open 10K windows across 100 tenants
    for (let t = 0; t < 100; t++) {
      for (let s = 0; s < 100; s++) {
        await correlationEngine.processAlert(makeAlert({
          tenant: `tenant-${t}`,
          service: `service-${s}`,
        }));
      }
    }
    const memoryUsage = await redisClient.info('memory');
    const usedMb = parseRedisMemory(memoryUsage);
    expect(usedMb).toBeLessThan(50);
  });
});
```

### 6.3 Noise Scoring Latency

```typescript
describe('Noise Scoring Performance', () => {
  it('scores a correlated incident with 50 alerts in <100ms', async () => {
    const incident = makeIncident({ alertCount: 50, withHistory: true });
    
    const start = performance.now();
    const score = await noiseScorer.score(incident);
    const elapsed = performance.now() - start;
    
    expect(elapsed).toBeLessThan(100);
    expect(score).toBeGreaterThanOrEqual(0);
    expect(score).toBeLessThanOrEqual(100);
  });
});
```

### 6.4 Memory Pressure During High-Cardinality Correlation

```typescript
describe('Memory Pressure', () => {
  it('ECS task stays under 512MB with 1000 concurrent correlation windows', async () => {
    // Monitor ECS task memory while processing high-cardinality alerts
    const memBefore = process.memoryUsage().heapUsed;
    
    await processHighCardinalityAlerts({ tenants: 100, servicesPerTenant: 10 });
    
    const memAfter = process.memoryUsage().heapUsed;
    const deltaMb = (memAfter - memBefore) / 1024 / 1024;
    expect(deltaMb).toBeLessThan(256); // Leave headroom in 512MB task
  });
});
```

---

## Section 7: CI/CD Pipeline Integration

### 7.1 Pipeline Stages

```
┌─────────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Pre-Commit   │───▶│ PR Gate  │───▶│ Merge    │───▶│ Staging  │───▶│ Prod     │
│ (local)      │    │ (CI)     │    │ (CI)     │    │ (CD)     │    │ (CD)     │
└─────────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
  lint + format     unit tests      full suite      E2E + perf     smoke + canary
  type check        integration     coverage gate   LocalStack     deploy event
  <10s              <5min           <10min          <15min         self-dogfood
```

### 7.2 Stage Details

**Pre-Commit (local, <10s):**
- `eslint` + `prettier` format check
- `tsc --noEmit` type check
- Affected unit tests only (`vitest --changed`)

**PR Gate (CI, <5min):**
- Full unit test suite
- Integration tests (Testcontainers spin up in CI)
- Schema migration lint (no DROP/RENAME/TYPE changes)
- Decision log presence check for scoring/correlation PRs
- Coverage diff: new code must have ≥80% coverage

**Merge to Main (CI, <10min):**
- Full test suite (unit + integration)
- Coverage gate: overall ≥80%, scoring engine ≥90%
- CDK synth + diff (infrastructure changes)
- Security scan (`npm audit`, `trivy`)

**Staging (CD, <15min):**
- Deploy to staging environment
- E2E journey tests against LocalStack
- Performance benchmarks (ingestion throughput, correlation latency)
- Synthetic alert generation + validation

**Production (CD):**
- Canary deploy (10% traffic for 5 minutes)
- Smoke tests (send test webhook, verify Slack delivery)
- dd0c/alert dogfoods itself: deploy event sent to own webhook
- Automated rollback if error rate >1% during canary

### 7.3 Coverage Thresholds

| Component | Minimum | Target |
|-----------|---------|--------|
| Webhook Parsers | 90% | 95% |
| HMAC Validator | 95% | 100% |
| Correlation Engine | 85% | 90% |
| Noise Scorer | 90% | 95% |
| Governance Policy | 90% | 95% |
| Notification Formatter | 75% | 85% |
| Overall | 80% | 85% |

### 7.4 Test Parallelization

```yaml
# .github/workflows/test.yml
jobs:
  unit:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - run: vitest --shard=${{ matrix.shard }}/4

  integration:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        suite: [webhooks, correlation, notifications, storage]
    steps:
      - run: vitest --project=integration --grep=${{ matrix.suite }}

  e2e:
    needs: [unit, integration]
    runs-on: ubuntu-latest
    steps:
      - run: docker compose -f docker-compose.e2e.yml up -d
      - run: vitest --project=e2e
```

---

## Section 8: Transparent Factory Tenet Testing

### 8.1 Atomic Flagging — Suppression Circuit Breaker

```typescript
describe('Atomic Flagging', () => {
  describe('Flag Lifecycle', () => {
    it('new scoring rule flag defaults to false (off)', () => {});
    it('flag has owner and ttl metadata', () => {});
    it('CI blocks when flag at 100% exceeds 14-day TTL', () => {});
  });

  describe('Circuit Breaker on Suppression Volume', () => {
    it('allows suppression when volume is within 2x baseline', () => {});
    it('trips breaker when suppression exceeds 2x baseline over 30min', () => {});
    it('auto-disables the flag when breaker trips', () => {});
    it('buffers suppressed alerts in DLQ during normal operation', () => {});
    it('replays DLQ alerts when breaker trips', async () => {
      // 1. Enable scoring flag, suppress 20 alerts
      // 2. Trip the breaker by spiking suppression rate
      // 3. Verify all 20 suppressed alerts are re-emitted from DLQ
      // 4. Verify flag is now disabled
    });
    it('DLQ retains alerts for 1 hour before expiry', () => {});
  });

  describe('Local Evaluation', () => {
    it('flag evaluation does not make network calls', () => {});
    it('flag state is cached in-memory and refreshed every 60s', () => {});
  });
});
```

### 8.2 Elastic Schema — Migration Validation

```typescript
describe('Elastic Schema', () => {
  describe('Migration Lint', () => {
    it('rejects migration with DROP COLUMN statement', () => {
      const migration = 'ALTER TABLE alert_events DROP COLUMN old_field;';
      expect(lintMigration(migration)).toContainError('DROP not allowed');
    });
    it('rejects migration with ALTER COLUMN TYPE', () => {
      const migration = 'ALTER TABLE alert_events ALTER COLUMN severity TYPE integer;';
      expect(lintMigration(migration)).toContainError('TYPE change not allowed');
    });
    it('rejects migration with RENAME COLUMN', () => {});
    it('accepts migration with ADD COLUMN (nullable)', () => {
      const migration = 'ALTER TABLE alert_events ADD COLUMN noise_score_v2 integer;';
      expect(lintMigration(migration)).toBeValid();
    });
    it('accepts migration with new table creation', () => {});
  });

  describe('DynamoDB Schema', () => {
    it('rejects attribute type change in table definition', () => {});
    it('accepts new attribute addition', () => {});
    it('V1 code ignores V2 attributes without error', () => {});
  });

  describe('Sunset Enforcement', () => {
    it('every migration file contains sunset_date comment', () => {
      const migrations = glob.sync('migrations/*.sql');
      for (const m of migrations) {
        const content = fs.readFileSync(m, 'utf-8');
        expect(content).toMatch(/-- sunset_date: \d{4}-\d{2}-\d{2}/);
      }
    });
    it('CI warns when migration is past sunset date', () => {});
  });
});
```

### 8.3 Cognitive Durability — Decision Log Validation

```typescript
describe('Cognitive Durability', () => {
  it('decision_log.json exists for every PR touching scoring/', () => {
    // CI hook: check git diff for files in src/scoring/
    // If touched, require docs/decisions/*.json in the same PR
  });

  it('decision log has required fields', () => {
    const logs = glob.sync('docs/decisions/*.json');
    for (const log of logs) {
      const entry = JSON.parse(fs.readFileSync(log, 'utf-8'));
      expect(entry).toHaveProperty('reasoning');
      expect(entry).toHaveProperty('alternatives_considered');
      expect(entry).toHaveProperty('confidence');
      expect(entry).toHaveProperty('timestamp');
      expect(entry).toHaveProperty('author');
    }
  });

  it('cyclomatic complexity stays under 10 for all scoring functions', () => {
    // Run eslint with complexity rule
    const result = execSync('eslint src/scoring/ --rule "complexity: [error, 10]"');
    expect(result.exitCode).toBe(0);
  });
});
```

### 8.4 Semantic Observability — OTEL Span Assertions

```typescript
describe('Semantic Observability', () => {
  let spanExporter: InMemorySpanExporter;

  beforeEach(() => {
    spanExporter = new InMemorySpanExporter();
    // Configure OTEL with in-memory exporter for testing
  });

  describe('Alert Evaluation Spans', () => {
    it('emits parent alert_evaluation span for each alert', async () => {
      await processAlert(makeAlert());
      const spans = spanExporter.getFinishedSpans();
      const evalSpan = spans.find(s => s.name === 'alert_evaluation');
      expect(evalSpan).toBeDefined();
    });

    it('emits child noise_scoring span with score attributes', async () => {
      await processAlert(makeAlert());
      const spans = spanExporter.getFinishedSpans();
      const scoreSpan = spans.find(s => s.name === 'noise_scoring');
      expect(scoreSpan).toBeDefined();
      expect(scoreSpan.attributes['alert.noise_score']).toBeGreaterThanOrEqual(0);
      expect(scoreSpan.attributes['alert.noise_score']).toBeLessThanOrEqual(100);
    });

    it('emits child correlation_matching span with match data', async () => {
      await processAlert(makeAlert());
      const spans = spanExporter.getFinishedSpans();
      const corrSpan = spans.find(s => s.name === 'correlation_matching');
      expect(corrSpan).toBeDefined();
      expect(corrSpan.attributes).toHaveProperty('alert.correlation_matches');
    });

    it('emits suppression_decision span with reason', async () => {
      await processAlert(makeAlert());
      const spans = spanExporter.getFinishedSpans();
      const suppSpan = spans.find(s => s.name === 'suppression_decision');
      expect(suppSpan.attributes).toHaveProperty('alert.suppressed');
      expect(suppSpan.attributes).toHaveProperty('alert.suppression_reason');
    });
  });

  describe('PII Protection', () => {
    it('never includes raw alert payload in span attributes', async () => {
      await processAlert(makeAlert({ title: 'User john@example.com failed login' }));
      const spans = spanExporter.getFinishedSpans();
      for (const span of spans) {
        const attrs = JSON.stringify(span.attributes);
        expect(attrs).not.toContain('john@example.com');
      }
    });

    it('uses hashed alert source identifier, not raw', async () => {
      await processAlert(makeAlert({ source: 'prod-payment-api' }));
      const spans = spanExporter.getFinishedSpans();
      const evalSpan = spans.find(s => s.name === 'alert_evaluation');
      expect(evalSpan.attributes['alert.source']).toMatch(/^[a-f0-9]+$/);
    });
  });
});
```

### 8.5 Configurable Autonomy — Governance Policy Tests

```typescript
describe('Configurable Autonomy', () => {
  describe('Governance Mode Enforcement', () => {
    it('strict mode: annotates but never suppresses', async () => {
      setPolicy({ governance_mode: 'strict' });
      const result = await processNoisyAlert(makeAlert({ noiseScore: 95 }));
      expect(result.suppressed).toBe(false);
      expect(result.annotation).toContain('noise_score: 95');
    });

    it('audit mode: auto-suppresses with logging', async () => {
      setPolicy({ governance_mode: 'audit' });
      const result = await processNoisyAlert(makeAlert({ noiseScore: 95 }));
      expect(result.suppressed).toBe(true);
      expect(result.log).toContain('suppressed by audit mode');
    });
  });

  describe('Panic Mode', () => {
    it('activates in <1 second via API call', async () => {
      const start = Date.now();
      await fetch('/admin/panic', { method: 'POST' });
      const panicActive = await redisClient.get('dd0c:panic');
      expect(Date.now() - start).toBeLessThan(1000);
      expect(panicActive).toBe('true');
    });

    it('stops all suppression when active', async () => {
      await activatePanic();
      const results = await Promise.all(
        Array.from({ length: 10 }, () => processNoisyAlert(makeAlert({ noiseScore: 99 })))
      );
      expect(results.every(r => r.suppressed === false)).toBe(true);
    });
  });

  describe('Per-Customer Override', () => {
    it('customer strict overrides system audit', async () => {
      setPolicy({ governance_mode: 'audit' });
      setCustomerPolicy('tenant-123', { governance_mode: 'strict' });
      const result = await processNoisyAlert(makeAlert({ tenant: 'tenant-123', noiseScore: 95 }));
      expect(result.suppressed).toBe(false);
    });

    it('customer cannot downgrade from system strict to audit', async () => {
      setPolicy({ governance_mode: 'strict' });
      setCustomerPolicy('tenant-123', { governance_mode: 'audit' });
      const result = await processNoisyAlert(makeAlert({ tenant: 'tenant-123', noiseScore: 95 }));
      expect(result.suppressed).toBe(false); // System strict wins
    });
  });
});
```

---

## Section 9: Test Data & Fixtures

### 9.1 Directory Structure

```
tests/
  fixtures/
    webhooks/
      datadog/
        single-alert.json
        batched-alerts.json
        monitor-recovered.json
        high-priority.json
      pagerduty/
        incident-triggered.json
        incident-resolved.json
        incident-acknowledged.json
      opsgenie/
        alert-created.json
        alert-closed.json
      grafana/
        single-firing.json
        multi-firing.json
        resolved.json
    deploys/
      github-actions-success.json
      github-actions-failure.json
      gitlab-ci-pipeline.json
      argocd-sync.json
    scenarios/
      alert-storm-50-alerts.json
      cascading-failure-3-services.json
      flapping-alert-10-cycles.json
      maintenance-window-suppression.json
      deploy-correlated-incident.json
    slack/
      initial-alert-blocks.json
      correlated-incident-blocks.json
      weekly-digest-blocks.json
    schemas/
      canonical-alert.json
      incident-record.json
      tenant-config.json
```

### 9.2 Alert Payload Factory

```typescript
// tests/helpers/factories.ts
export function makeCanonicalAlert(overrides: Partial<CanonicalAlert> = {}): CanonicalAlert {
  return {
    alert_id: ulid(),
    tenant_id: overrides.tenant_id ?? 'test-tenant',
    provider: overrides.provider ?? 'datadog',
    service: overrides.service ?? 'test-service',
    title: overrides.title ?? `Alert: ${faker.hacker.phrase()}`,
    severity: overrides.severity ?? 'warning',
    fingerprint: overrides.fingerprint ?? crypto.randomBytes(32).toString('hex'),
    timestamp: overrides.timestamp ?? new Date().toISOString(),
    raw_payload_s3_key: overrides.raw_payload_s3_key ?? `raw/${ulid()}.json`,
    metadata: overrides.metadata ?? {},
    ...overrides,
  };
}

export function makeIncident(overrides: Partial<Incident> = {}): Incident {
  const alertCount = overrides.alert_count ?? 5;
  return {
    incident_id: ulid(),
    tenant_id: overrides.tenant_id ?? 'test-tenant',
    services: overrides.services ?? ['test-service'],
    alert_count: alertCount,
    alerts: Array.from({ length: alertCount }, () => makeCanonicalAlert()),
    noise_score: overrides.noise_score ?? 0,
    deploy_correlation: overrides.deploy_correlation ?? null,
    window_opened_at: overrides.window_opened_at ?? new Date().toISOString(),
    window_closed_at: overrides.window_closed_at ?? new Date().toISOString(),
    ...overrides,
  };
}

export function makeDeployEvent(overrides: Partial<DeployEvent> = {}): DeployEvent {
  return {
    deploy_id: ulid(),
    tenant_id: overrides.tenant_id ?? 'test-tenant',
    service: overrides.service ?? 'test-service',
    commit_sha: overrides.commit_sha ?? faker.git.commitSha(),
    pr_title: overrides.pr_title ?? faker.git.commitMessage(),
    deployed_at: overrides.deployed_at ?? new Date().toISOString(),
    provider: overrides.provider ?? 'github-actions',
    ...overrides,
  };
}
```

### 9.3 Noise Scenario Fixtures

```typescript
// tests/helpers/scenarios.ts
export const NOISE_SCENARIOS = {
  alertStorm: {
    description: '50 alerts for same service in 2 minutes',
    alerts: Array.from({ length: 50 }, (_, i) => makeCanonicalAlert({
      service: 'payment-api',
      title: `High latency variant ${i}`,
      timestamp: new Date(Date.now() + i * 2400).toISOString(),
    })),
    expectedIncidents: 1,
    expectedNoiseScore: { min: 70, max: 95 },
  },

  flappingAlert: {
    description: 'Alert fires and resolves 10 times in 1 hour',
    alerts: Array.from({ length: 20 }, (_, i) => makeCanonicalAlert({
      service: 'health-check',
      title: 'Health check failed',
      severity: i % 2 === 0 ? 'warning' : 'info', // alternating fire/resolve
      timestamp: new Date(Date.now() + i * 3 * 60 * 1000).toISOString(),
    })),
    expectedNoiseScore: { min: 80, max: 100 },
  },

  cascadingFailure: {
    description: 'Database fails, then API, then frontend',
    alerts: [
      makeCanonicalAlert({ service: 'database', severity: 'critical', timestamp: t(0) }),
      makeCanonicalAlert({ service: 'api', severity: 'high', timestamp: t(30) }),
      makeCanonicalAlert({ service: 'api', severity: 'high', timestamp: t(45) }),
      makeCanonicalAlert({ service: 'frontend', severity: 'medium', timestamp: t(60) }),
      makeCanonicalAlert({ service: 'frontend', severity: 'medium', timestamp: t(90) }),
    ],
    serviceDependencies: [['api', 'database'], ['frontend', 'api']],
    expectedIncidents: 1, // All merged via dependency graph
    expectedNoiseScore: { min: 0, max: 30 }, // Real incident, not noise
  },

  deployCorrelated: {
    description: 'Deploy followed by alert storm',
    deploy: makeDeployEvent({ service: 'payment-api', pr_title: 'feat: add retry logic' }),
    alerts: Array.from({ length: 8 }, () => makeCanonicalAlert({
      service: 'payment-api',
      severity: 'high',
    })),
    deployToAlertGapMs: 2 * 60 * 1000, // 2 minutes after deploy
    expectedNoiseScore: { min: 50, max: 85 }, // Deploy correlation boosts noise score
  },
};
```

---

## Section 10: TDD Implementation Order

### 10.1 Bootstrap Sequence

The test infrastructure itself must be built before any product code. This is the order:

```
Phase 0: Test Infrastructure (Week 0)
  ├── 0.1 vitest config + TypeScript setup
  ├── 0.2 Testcontainers helper (Redis, DynamoDB Local, TimescaleDB)
  ├── 0.3 LocalStack helper (SQS, S3, API Gateway)
  ├── 0.4 Fixture loader utility
  ├── 0.5 Factory functions (makeCanonicalAlert, makeIncident, makeDeployEvent)
  ├── 0.6 WireMock Slack stub
  └── 0.7 CI pipeline with test stages
```

### 10.2 Epic-by-Epic TDD Order

```
Phase 1: Webhook Ingestion (Epic 1) — Tests First
  ├── 1.1 RED: HMAC validator tests (all providers)
  ├── 1.2 GREEN: Implement HMAC validation
  ├── 1.3 RED: Datadog parser tests (single + batch)
  ├── 1.4 GREEN: Implement Datadog parser
  ├── 1.5 RED: PagerDuty parser tests
  ├── 1.6 GREEN: Implement PagerDuty parser
  ├── 1.7 RED: Fingerprint generator tests
  ├── 1.8 GREEN: Implement fingerprinting
  ├── 1.9 INTEGRATION: Lambda → SQS contract test
  └── 1.10 REFACTOR: Extract provider parser interface

Phase 2: Correlation Engine (Epic 2) — Tests First
  ├── 2.1 RED: Time-window open/close/extend tests
  ├── 2.2 GREEN: Implement window manager
  ├── 2.3 RED: Service graph correlation tests
  ├── 2.4 GREEN: Implement dependency traversal
  ├── 2.5 RED: Deploy correlation tests
  ├── 2.6 GREEN: Implement deploy tracker
  ├── 2.7 INTEGRATION: Correlation → Redis window tests
  ├── 2.8 INTEGRATION: Correlation → DynamoDB incident persistence
  └── 2.9 INTEGRATION: Correlation → TimescaleDB trend writes

Phase 3: Noise Analysis (Epic 3) — Tests First
  ├── 3.1 RED: Rule-based noise scoring tests (all rules)
  ├── 3.2 GREEN: Implement scorer
  ├── 3.3 RED: Threshold classification tests
  ├── 3.4 GREEN: Implement classifier
  ├── 3.5 RED: "What would have happened" calculation tests
  ├── 3.6 GREEN: Implement historical analysis
  └── 3.7 REFACTOR: Extract scoring rules into configurable pipeline

Phase 4: Notifications (Epic 4) — Integration Tests Lead
  ├── 4.1 Implement Slack block formatter
  ├── 4.2 RED: Snapshot tests for all message formats
  ├── 4.3 INTEGRATION: Notification → Slack (WireMock)
  ├── 4.4 RED: Rate limiting tests
  └── 4.5 GREEN: Implement rate limiter

Phase 5: Governance (Epic 10) — Tests First
  ├── 5.1 RED: Strict/audit mode enforcement tests
  ├── 5.2 GREEN: Implement policy engine
  ├── 5.3 RED: Panic mode tests (<1s activation)
  ├── 5.4 GREEN: Implement panic mode
  ├── 5.5 RED: Circuit breaker + DLQ replay tests
  ├── 5.6 GREEN: Implement circuit breaker
  ├── 5.7 RED: OTEL span assertion tests
  └── 5.8 GREEN: Instrument all components

Phase 6: E2E Validation
  ├── 6.1 60-second TTV journey
  ├── 6.2 Alert storm correlation journey
  ├── 6.3 Deploy correlation journey
  ├── 6.4 Panic mode journey
  └── 6.5 Performance benchmarks
```

### 10.3 "Never Ship Without" Checklist

Before any release, these tests must pass:

- [ ] All HMAC validation tests (security gate)
- [ ] All correlation window tests (correctness gate)
- [ ] All noise scoring tests (safety gate — never eat real alerts)
- [ ] All governance policy tests (compliance gate)
- [ ] Circuit breaker DLQ replay test (safety net gate)
- [ ] 60-second TTV E2E journey (product promise gate)
- [ ] PII protection span tests (privacy gate)
- [ ] Schema migration lint (no breaking changes)
- [ ] Coverage ≥80% overall, ≥90% on scoring engine

---

*End of dd0c/alert Test Architecture*

---

## 11. Review Remediation Addendum (Post-Gemini Review)

### 11.1 Missing Epic Coverage

#### Epic 6: Dashboard API

```typescript
describe('Dashboard API', () => {
  describe('Authentication', () => {
    it('returns 401 for missing Cognito JWT', async () => {});
    it('returns 401 for expired JWT', async () => {});
    it('returns 401 for JWT signed by wrong issuer', async () => {});
    it('extracts tenantId from JWT claims', async () => {});
  });

  describe('Incident Listing (GET /v1/incidents)', () => {
    it('returns paginated incidents for authenticated tenant', async () => {});
    it('supports cursor-based pagination', async () => {});
    it('filters by status (open, acknowledged, resolved)', async () => {});
    it('filters by severity (critical, warning, info)', async () => {});
    it('filters by time range (since, until)', async () => {});
    it('returns empty array for tenant with no incidents', async () => {});
  });

  describe('Incident Detail (GET /v1/incidents/:id)', () => {
    it('returns full incident with correlated alerts', async () => {});
    it('returns 404 for incident belonging to different tenant', async () => {});
    it('includes timeline of state transitions', async () => {});
  });

  describe('Analytics (GET /v1/analytics)', () => {
    it('returns MTTR for last 7/30/90 days', async () => {});
    it('returns alert volume by source', async () => {});
    it('returns noise reduction percentage', async () => {});
    it('scopes all analytics to authenticated tenant', async () => {});
  });

  describe('Tenant Isolation', () => {
    it('tenant A cannot read tenant B incidents via API', async () => {});
    it('tenant A cannot read tenant B analytics', async () => {});
    it('all DynamoDB queries include tenantId partition key', async () => {});
  });
});
```

#### Epic 7: Dashboard UI (Playwright)

```typescript
// tests/e2e/ui/dashboard.spec.ts

test('login redirects to Cognito hosted UI', async ({ page }) => {
  await page.goto('/dashboard');
  await expect(page).toHaveURL(/cognito/);
});

test('incident list renders with correct severity badges', async ({ page }) => {
  await page.goto('/dashboard/incidents');
  await expect(page.locator('[data-testid="incident-card"]')).toHaveCount(5);
  await expect(page.locator('.severity-critical')).toBeVisible();
});

test('incident detail shows correlated alert timeline', async ({ page }) => {
  await page.goto('/dashboard/incidents/inc-123');
  await expect(page.locator('[data-testid="alert-timeline"]')).toBeVisible();
  await expect(page.locator('.timeline-event')).toHaveCountGreaterThan(1);
});

test('MTTR chart renders with real data', async ({ page }) => {
  await page.goto('/dashboard/analytics');
  await expect(page.locator('[data-testid="mttr-chart"]')).toBeVisible();
});

test('noise reduction percentage displays correctly', async ({ page }) => {
  await page.goto('/dashboard/analytics');
  const noise = page.locator('[data-testid="noise-reduction"]');
  await expect(noise).toContainText('%');
});

test('webhook setup wizard generates correct URL', async ({ page }) => {
  await page.goto('/dashboard/settings/integrations');
  await page.click('[data-testid="add-datadog"]');
  const url = await page.locator('[data-testid="webhook-url"]').textContent();
  expect(url).toMatch(/\/v1\/webhooks\/ingest\/.+/);
});
```

#### Epic 9: Onboarding & PLG

```typescript
describe('Free Tier Enforcement', () => {
  it('allows up to 10,000 alerts/month on free tier', async () => {});
  it('returns 429 with upgrade prompt at 10,001st alert', async () => {});
  it('resets counter on first of each month', async () => {});
  it('purges alert data older than 7 days on free tier', async () => {});
  it('retains alert data for 90 days on pro tier', async () => {});
});

describe('OAuth Signup', () => {
  it('creates tenant record on first Cognito login', async () => {});
  it('assigns free tier by default', async () => {});
  it('generates unique webhook URL per tenant', async () => {});
});

describe('Stripe Integration', () => {
  it('creates checkout session with correct pricing', async () => {});
  it('upgrades tenant on checkout.session.completed webhook', async () => {});
  it('downgrades tenant on subscription.deleted webhook', async () => {});
  it('validates Stripe webhook signature', async () => {});
});
```

#### Epic 5.3: Slack Feedback Endpoint

```typescript
describe('Slack Interactive Actions Endpoint', () => {
  it('validates Slack request signature (HMAC-SHA256)', async () => {});
  it('rejects request with invalid signature', async () => {});
  it('handles "helpful" feedback — updates incident quality score', async () => {});
  it('handles "noise" feedback — adds to suppression training data', async () => {});
  it('handles "escalate" action — triggers PagerDuty/OpsGenie', async () => {});
  it('updates original Slack message after action', async () => {});
  it('scopes action to correct tenant', async () => {});
});
```

#### Epic 1.4: S3 Raw Payload Archival

```typescript
describe('Raw Payload Archival', () => {
  it('saves raw webhook payload to S3 asynchronously', async () => {});
  it('S3 key includes tenantId, source, and timestamp', async () => {});
  it('archival failure does not block alert processing', async () => {});
  it('archived payload is retrievable for replay', async () => {});
  it('S3 lifecycle policy deletes after retention period', async () => {});
});
```

### 11.2 Anti-Pattern Fixes

#### Replace ioredis-mock with WindowStore Interface

```typescript
// BEFORE (anti-pattern):
// import RedisMock from 'ioredis-mock';
// const engine = new CorrelationEngine(new RedisMock());

// AFTER (correct):
interface WindowStore {
  addEvent(tenantId: string, key: string, event: Alert, ttlMs: number): Promise<void>;
  getWindow(tenantId: string, key: string): Promise<Alert[]>;
  clearWindow(tenantId: string, key: string): Promise<void>;
}

class InMemoryWindowStore implements WindowStore {
  private store = new Map<string, { events: Alert[]; expiresAt: number }>();
  
  async addEvent(tenantId: string, key: string, event: Alert, ttlMs: number) {
    const fullKey = `${tenantId}:${key}`;
    const existing = this.store.get(fullKey) || { events: [], expiresAt: Date.now() + ttlMs };
    existing.events.push(event);
    this.store.set(fullKey, existing);
  }

  async getWindow(tenantId: string, key: string): Promise<Alert[]> {
    const fullKey = `${tenantId}:${key}`;
    const entry = this.store.get(fullKey);
    if (!entry || entry.expiresAt < Date.now()) return [];
    return entry.events;
  }
}

// Unit tests use InMemoryWindowStore — no Redis dependency
// Integration tests use RedisWindowStore with Testcontainers
```

#### Replace sinon.useFakeTimers with Clock Interface

```typescript
// BEFORE (anti-pattern):
// sinon.useFakeTimers(new Date('2026-03-01T00:00:00Z'));

// AFTER (correct):
interface Clock {
  now(): number;
  advanceBy(ms: number): void;
}

class FakeClock implements Clock {
  private current: number;
  constructor(start: Date = new Date()) { this.current = start.getTime(); }
  now() { return this.current; }
  advanceBy(ms: number) { this.current += ms; }
}

class SystemClock implements Clock {
  now() { return Date.now(); }
  advanceBy() { throw new Error('Cannot advance system clock'); }
}

// Inject into CorrelationEngine:
const engine = new CorrelationEngine(new InMemoryWindowStore(), new FakeClock());
```

### 11.3 Trace Context Propagation Tests

```typescript
describe('Trace Context Propagation', () => {
  it('API Gateway passes trace_id to Lambda via X-Amzn-Trace-Id', async () => {});
  
  it('Lambda propagates trace_id into SQS message attributes', async () => {
    // Verify SQS message has MessageAttribute 'traceparent' with W3C format
    const msg = await getLastSQSMessage(localstack, 'alert-queue');
    expect(msg.MessageAttributes.traceparent).toBeDefined();
    expect(msg.MessageAttributes.traceparent.StringValue).toMatch(
      /^00-[0-9a-f]{32}-[0-9a-f]{16}-0[01]$/
    );
  });

  it('ECS Correlation Engine extracts trace_id from SQS message', async () => {
    // Verify the correlation span has the correct parent from SQS
    const spans = inMemoryExporter.getFinishedSpans();
    const correlationSpan = spans.find(s => s.name === 'alert.correlation');
    const ingestSpan = spans.find(s => s.name === 'webhook.ingest');
    expect(correlationSpan.parentSpanId).toBeDefined();
    // Parent chain must trace back to the original ingest span
  });

  it('end-to-end trace spans webhook → SQS → correlation → notification', async () => {
    // Fire a webhook, wait for Slack notification, verify all spans share trace_id
    const traceId = await fireWebhookAndGetTraceId();
    const spans = await getSpansByTraceId(traceId);
    const spanNames = spans.map(s => s.name);
    expect(spanNames).toContain('webhook.ingest');
    expect(spanNames).toContain('alert.normalize');
    expect(spanNames).toContain('alert.correlation');
    expect(spanNames).toContain('notification.slack');
  });
});
```

### 11.4 HMAC Security Hardening

```typescript
describe('HMAC Signature Validation (Hardened)', () => {
  it('uses crypto.timingSafeEqual, not === comparison', () => {
    // Inspect the source to verify timing-safe comparison
    const source = fs.readFileSync('src/ingestion/hmac.ts', 'utf8');
    expect(source).toContain('timingSafeEqual');
    expect(source).not.toMatch(/signature\s*===\s*/);
  });

  it('handles case-insensitive header names (dd-webhook-signature vs DD-WEBHOOK-SIGNATURE)', async () => {
    const payload = makeAlertPayload('datadog');
    const sig = computeHMAC(payload, DATADOG_SECRET);
    
    // Lowercase header
    const resp1 = await ingest(payload, { 'dd-webhook-signature': sig });
    expect(resp1.status).toBe(200);
    
    // Uppercase header
    const resp2 = await ingest(payload, { 'DD-WEBHOOK-SIGNATURE': sig });
    expect(resp2.status).toBe(200);
  });

  it('rejects completely missing signature header', async () => {
    const resp = await ingest(makeAlertPayload('datadog'), {});
    expect(resp.status).toBe(401);
  });

  it('rejects empty signature header', async () => {
    const resp = await ingest(makeAlertPayload('datadog'), { 'dd-webhook-signature': '' });
    expect(resp.status).toBe(401);
  });
});
```

### 11.5 SQS 256KB Payload Limit

```typescript
describe('Large Payload Handling', () => {
  it('compresses payloads >200KB before sending to SQS', async () => {
    const largePayload = makeLargeAlertPayload(300 * 1024); // 300KB
    const resp = await ingest(largePayload);
    expect(resp.status).toBe(200);

    const msg = await getLastSQSMessage(localstack, 'alert-queue');
    // Payload must be compressed or use S3 pointer
    expect(msg.Body.length).toBeLessThan(256 * 1024);
  });

  it('uses S3 pointer for payloads >256KB after compression', async () => {
    const hugePayload = makeLargeAlertPayload(500 * 1024); // 500KB
    const resp = await ingest(hugePayload);
    expect(resp.status).toBe(200);

    const msg = await getLastSQSMessage(localstack, 'alert-queue');
    const body = JSON.parse(msg.Body);
    expect(body.s3Pointer).toBeDefined();
    expect(body.s3Pointer).toMatch(/^s3:\/\/dd0c-alert-overflow\//);
  });

  it('strips unnecessary fields from Datadog payload before SQS', async () => {
    const payload = makeDatadogPayloadWithLargeTags(100); // 100 tags
    const resp = await ingest(payload);
    expect(resp.status).toBe(200);

    const msg = await getLastSQSMessage(localstack, 'alert-queue');
    const normalized = JSON.parse(msg.Body);
    // Only essential fields should remain
    expect(normalized.tags.length).toBeLessThanOrEqual(20);
  });

  it('rejects payloads >2MB at API Gateway level', async () => {
    const massive = makeLargeAlertPayload(3 * 1024 * 1024);
    const resp = await ingest(massive);
    expect(resp.status).toBe(413);
  });
});
```

### 11.6 DLQ Backpressure & Replay

```typescript
describe('DLQ Replay with Backpressure', () => {
  it('replays DLQ messages in batches of 100', async () => {
    await seedDLQ(10000); // 10K messages
    const replayer = new DLQReplayer({ batchSize: 100, delayBetweenBatchesMs: 500 });
    await replayer.start();

    // Verify batched processing
    expect(replayer.batchesProcessed).toBeGreaterThan(0);
    expect(replayer.maxConcurrentMessages).toBeLessThanOrEqual(100);
  });

  it('pauses replay if correlation engine error rate exceeds 10%', async () => {
    await seedDLQ(1000);
    const replayer = new DLQReplayer({ batchSize: 100, errorThreshold: 0.1 });
    
    // Simulate correlation engine returning errors
    mockCorrelationEngine.failRate = 0.15;
    await replayer.start();

    expect(replayer.state).toBe('paused');
    expect(replayer.pauseReason).toContain('error rate exceeded');
  });

  it('does not replay if circuit breaker is currently tripped', async () => {
    await seedDLQ(100);
    await tripCircuitBreaker();

    const replayer = new DLQReplayer();
    await replayer.start();

    expect(replayer.messagesReplayed).toBe(0);
    expect(replayer.state).toBe('blocked_by_circuit_breaker');
  });

  it('tracks replay progress for resumability', async () => {
    await seedDLQ(500);
    const replayer = new DLQReplayer({ batchSize: 50 });
    
    // Process 3 batches then stop
    await replayer.processNBatches(3);
    expect(replayer.checkpoint).toBe(150);

    // Resume from checkpoint
    const replayer2 = new DLQReplayer({ resumeFrom: replayer.checkpoint });
    await replayer2.start();
    expect(replayer2.startedFrom).toBe(150);
  });
});
```

### 11.7 Multi-Tenancy Isolation (DynamoDB)

```typescript
describe('DynamoDB Tenant Isolation', () => {
  it('all DAO methods require tenantId parameter', () => {
    // Compile-time check: DAO interface has tenantId as first param
    const daoSource = fs.readFileSync('src/data/incident-dao.ts', 'utf8');
    const methods = extractPublicMethods(daoSource);
    for (const method of methods) {
      expect(method.params[0].name).toBe('tenantId');
    }
  });

  it('query for tenant A returns zero results for tenant B data', async () => {
    const dao = new IncidentDAO(dynamoClient);
    await dao.create('tenant-A', makeIncident());
    await dao.create('tenant-B', makeIncident());

    const results = await dao.list('tenant-A');
    expect(results.every(r => r.tenantId === 'tenant-A')).toBe(true);
  });

  it('partition key always includes tenantId prefix', async () => {
    const dao = new IncidentDAO(dynamoClient);
    await dao.create('tenant-X', makeIncident());

    // Read raw DynamoDB item
    const item = await dynamoClient.scan({ TableName: 'dd0c-alert-main' });
    expect(item.Items[0].PK.S).toStartWith('TENANT#tenant-X');
  });
});
```

### 11.8 Slack Circuit Breaker

```typescript
describe('Slack Notification Circuit Breaker', () => {
  it('opens circuit after 10 consecutive 429s from Slack', async () => {
    const slackClient = new SlackClient({ circuitBreakerThreshold: 10 });
    for (let i = 0; i < 10; i++) {
      mockSlack.respondWith(429);
      await slackClient.send(makeMessage()).catch(() => {});
    }
    expect(slackClient.circuitState).toBe('open');
  });

  it('queues notifications while circuit is open', async () => {
    slackClient.openCircuit();
    await slackClient.send(makeMessage());
    expect(slackClient.queuedMessages).toBe(1);
  });

  it('half-opens circuit after 60 seconds', async () => {
    slackClient.openCircuit();
    clock.advanceBy(61000);
    expect(slackClient.circuitState).toBe('half-open');
  });

  it('drains queue on successful half-open probe', async () => {
    slackClient.openCircuit();
    slackClient.queue(makeMessage());
    slackClient.queue(makeMessage());
    clock.advanceBy(61000);
    mockSlack.respondWith(200);
    await slackClient.probe();
    expect(slackClient.circuitState).toBe('closed');
    expect(slackClient.queuedMessages).toBe(0);
  });
});
```

### 11.9 Updated Test Pyramid (Post-Review)

| Level | Original | Revised | Rationale |
|-------|----------|---------|-----------|
| Unit | 70% (~140) | 65% (~180) | More tests total, but integration share grows |
| Integration | 20% (~40) | 25% (~70) | Dashboard API, tenant isolation, trace propagation |
| E2E | 10% (~20) | 10% (~28) | Dashboard UI (Playwright), onboarding flow |

*End of P3 Review Remediation Addendum*

---

## 12. BMad Review Implementation (Must-Have Before Launch)

### 12.1 HMAC Timestamp Freshness (Replay Attack Prevention)

```typescript
describe('HMAC Replay Attack Prevention', () => {
  it('rejects Datadog webhook with timestamp older than 5 minutes', async () => {
    const payload = makeDatadogPayload();
    const staleTimestamp = Math.floor(Date.now() / 1000) - 301; // 5min + 1s
    const sig = computeDatadogHMAC(payload, staleTimestamp);
    
    const resp = await ingest(payload, {
      'dd-webhook-timestamp': staleTimestamp.toString(),
      'dd-webhook-signature': sig,
    });
    expect(resp.status).toBe(401);
    expect(resp.body.error).toContain('stale timestamp');
  });

  it('rejects PagerDuty webhook with missing timestamp', async () => {
    const payload = makePagerDutyPayload();
    const sig = computePagerDutyHMAC(payload);
    
    const resp = await ingest(payload, {
      'x-pagerduty-signature': sig,
      // No timestamp header
    });
    expect(resp.status).toBe(401);
  });

  it('rejects OpsGenie webhook replayed after 5 minutes', async () => {
    // OpsGenie doesn't always package timestamp cleanly
    // Must extract from payload body and validate
    const payload = makeOpsGeniePayload({ timestamp: fiveMinutesAgo() });
    const sig = computeOpsGenieHMAC(payload);
    
    const resp = await ingest(payload, { 'x-opsgenie-signature': sig });
    expect(resp.status).toBe(401);
  });

  it('accepts fresh webhook within 5-minute window', async () => {
    const payload = makeDatadogPayload();
    const freshTimestamp = Math.floor(Date.now() / 1000);
    const sig = computeDatadogHMAC(payload, freshTimestamp);
    
    const resp = await ingest(payload, {
      'dd-webhook-timestamp': freshTimestamp.toString(),
      'dd-webhook-signature': sig,
    });
    expect(resp.status).toBe(200);
  });
});
```

### 12.2 Cross-Tenant Negative Isolation Tests

```typescript
describe('DynamoDB Tenant Isolation (Negative Tests)', () => {
  it('Tenant A cannot read Tenant B incidents', async () => {
    // Seed data for both tenants
    await createIncident('tenant-a', { title: 'A incident' });
    await createIncident('tenant-b', { title: 'B incident' });
    
    // Query as Tenant A
    const results = await dao.listIncidents('tenant-a');
    
    // Explicitly assert Tenant B data is absent
    const tenantIds = results.map(r => r.tenantId);
    expect(tenantIds).not.toContain('tenant-b');
    expect(results.every(r => r.tenantId === 'tenant-a')).toBe(true);
  });

  it('Tenant A cannot read Tenant B analytics', async () => {
    await seedAnalytics('tenant-a', { alertCount: 100 });
    await seedAnalytics('tenant-b', { alertCount: 200 });
    
    const analytics = await dao.getAnalytics('tenant-a');
    expect(analytics.alertCount).toBe(100); // Not 300 (combined)
  });

  it('API returns 404 (not 403) for cross-tenant incident access', async () => {
    const incident = await createIncident('tenant-b', { title: 'secret' });
    
    const resp = await api.get(`/v1/incidents/${incident.id}`)
      .set('Authorization', `Bearer ${tenantAToken}`);
    
    // 404 not 403 — don't leak existence
    expect(resp.status).toBe(404);
  });
});
```

### 12.3 Correlation Window Edge Cases

```typescript
describe('Out-of-Order Alert Delivery', () => {
  it('late alert attaches to existing incident (not duplicate)', async () => {
    const clock = new FakeClock();
    const engine = new CorrelationEngine(new InMemoryWindowStore(), clock);
    
    // Alert 1 arrives at T=0
    const alert1 = makeAlert({ service: 'auth', fingerprint: 'cpu-high', timestamp: 0 });
    const incident1 = await engine.process(alert1);
    
    // Window closes at T=5min, incident shipped
    clock.advanceBy(5 * 60 * 1000);
    await engine.flushWindows();
    
    // Late alert arrives at T=6min with timestamp T=2min (within original window)
    const lateAlert = makeAlert({ service: 'auth', fingerprint: 'cpu-high', timestamp: 2 * 60 * 1000 });
    const result = await engine.process(lateAlert);
    
    // Must attach to existing incident, not create new one
    expect(result.incidentId).toBe(incident1.incidentId);
    expect(result.action).toBe('attached_to_existing');
  });

  it('very late alert (>2x window) creates new incident', async () => {
    const clock = new FakeClock();
    const engine = new CorrelationEngine(new InMemoryWindowStore(), clock);
    
    const alert1 = makeAlert({ service: 'auth', fingerprint: 'cpu-high' });
    const incident1 = await engine.process(alert1);
    
    // 15 minutes later (3x the 5-min window)
    clock.advanceBy(15 * 60 * 1000);
    
    const lateAlert = makeAlert({ service: 'auth', fingerprint: 'cpu-high' });
    const result = await engine.process(lateAlert);
    
    expect(result.incidentId).not.toBe(incident1.incidentId);
    expect(result.action).toBe('new_incident');
  });
});
```

### 12.4 SQS Claim-Check Round-Trip

```typescript
describe('SQS 256KB Claim-Check End-to-End', () => {
  it('large payload round-trips through S3 pointer', async () => {
    const largePayload = makeLargeAlertPayload(300 * 1024); // 300KB
    
    // Ingestion compresses and stores in S3
    const resp = await ingest(largePayload);
    expect(resp.status).toBe(200);
    
    // SQS message contains S3 pointer
    const sqsMsg = await getLastSQSMessage(localstack, 'alert-queue');
    const body = JSON.parse(sqsMsg.Body);
    expect(body.s3Pointer).toBeDefined();
    
    // Correlation engine fetches from S3 and processes
    const incident = await waitForIncidentCreated(5000);
    expect(incident).toBeDefined();
    expect(incident.sourceAlertCount).toBeGreaterThan(0);
  });

  it('S3 fetch timeout does not crash correlation engine', async () => {
    // Inject S3 latency (10 second delay)
    mockS3.setLatency(10000);
    
    const largePayload = makeLargeAlertPayload(300 * 1024);
    await ingest(largePayload);
    
    // Correlation engine should timeout and send to DLQ
    const dlqMsg = await getDLQMessage(localstack, 'alert-dlq', 15000);
    expect(dlqMsg).toBeDefined();
    
    // Engine is still healthy
    const health = await api.get('/health');
    expect(health.status).toBe(200);
  });
});
```

### 12.5 Free Tier Enforcement

```typescript
describe('Free Tier (10K alerts/month, 7-day retention)', () => {
  it('accepts alert at 9,999 count', async () => {
    await setAlertCounter('tenant-free', 9999);
    const resp = await ingestAsTenat('tenant-free', makeAlert());
    expect(resp.status).toBe(200);
  });

  it('rejects alert at 10,001 with upgrade prompt', async () => {
    await setAlertCounter('tenant-free', 10000);
    const resp = await ingestAsTenant('tenant-free', makeAlert());
    expect(resp.status).toBe(429);
    expect(resp.body.upgrade_url).toContain('stripe');
  });

  it('counter resets on first of month', async () => {
    await setAlertCounter('tenant-free', 10000);
    clock.advanceToFirstOfNextMonth();
    await runMonthlyReset();
    
    const resp = await ingestAsTenant('tenant-free', makeAlert());
    expect(resp.status).toBe(200);
  });

  it('purges data older than 7 days on free tier', async () => {
    await createIncident('tenant-free', { createdAt: eightDaysAgo() });
    await runRetentionPurge();
    
    const incidents = await dao.listIncidents('tenant-free');
    expect(incidents).toHaveLength(0);
  });

  it('retains data for 90 days on pro tier', async () => {
    await createIncident('tenant-pro', { createdAt: thirtyDaysAgo() });
    await runRetentionPurge();
    
    const incidents = await dao.listIncidents('tenant-pro');
    expect(incidents).toHaveLength(1);
  });
});
```

*End of P3 BMad Implementation*