From 1101fef09614a391f5aceb683f20ee3cbb776201 Mon Sep 17 00:00:00 2001 From: Max Mayfield Date: Sat, 28 Feb 2026 23:33:07 +0000 Subject: [PATCH] Update test architectures for P3, P4, P5 --- .../test-architecture/test-architecture.md | 1420 +++++++++++++++- .../test-architecture/test-architecture.md | 1488 +++++++++++------ .../test-architecture/test-architecture.md | 285 +++- 3 files changed, 2575 insertions(+), 618 deletions(-) diff --git a/products/03-alert-intelligence/test-architecture/test-architecture.md b/products/03-alert-intelligence/test-architecture/test-architecture.md index 20826c8..1744ed4 100644 --- a/products/03-alert-intelligence/test-architecture/test-architecture.md +++ b/products/03-alert-intelligence/test-architecture/test-architecture.md @@ -1,69 +1,1411 @@ # dd0c/alert — Test Architecture & TDD Strategy -**Product:** dd0c/alert (Alert Intelligence Platform) -**Version:** 2.0 | **Date:** 2026-02-28 | **Phase:** 7 — Test Architecture -**Stack:** TypeScript / Node.js 20 | Vitest | Testcontainers | LocalStack + +**Product:** dd0c/alert — Alert Intelligence Platform +**Author:** Test Architecture Phase +**Date:** February 28, 2026 +**Status:** V1 MVP — Solo Founder Scope --- -## 1. Testing Philosophy & TDD Workflow +## Section 1: Testing Philosophy & TDD Workflow -### 1.1 Core Principle +### 1.1 Core Philosophy -dd0c/alert is an intelligence platform — it makes decisions about what engineers see during incidents. A wrong suppression decision can hide a P1. A wrong correlation can create noise. **Tests are not optional; they are the specification.** +dd0c/alert is a **safety-critical observability tool** — a bug that silently suppresses a real alert during an incident is worse than having no tool at all. The test suite is the contract that guarantees "we will never eat your alerts." -Every behavioral rule in the Correlation Engine, Noise Scorer, and Notification Router must be expressed as a failing test before a single line of implementation is written. +Guiding principle: **tests describe observable behavior from the on-call engineer's perspective**. If a test can't be explained as "when X happens, the engineer sees Y," it's testing implementation, not behavior. -### 1.2 Red-Green-Refactor Cycle +For a solo founder, the test suite is also the **regression safety net** — it catches the subtle scoring bugs that would erode customer trust over weeks. + +### 1.2 Red-Green-Refactor Adapted to dd0c/alert ``` -RED → Write a failing test that describes the desired behavior. - The test must fail for the right reason (not a compile error). +RED → Write a failing test that describes the desired behavior + (e.g., "3 Datadog alerts for the same service within 5 minutes + should produce 1 correlated incident") -GREEN → Write the minimum code to make the test pass. - No gold-plating. No "while I'm here" changes. +GREEN → Write the minimum code to make it pass + (hardcode the window, just make it work) -REFACTOR → Clean up the implementation without breaking tests. - Extract functions, rename for clarity, remove duplication. - Tests stay green throughout. +REFACTOR → Clean up without breaking tests + (extract the window manager, add Redis backing, + optimize the fingerprinting) ``` -**Strict rule:** No implementation code is written without a failing test first. PRs that add implementation without a corresponding test are blocked by CI. +**When to write tests first (strict TDD):** +- All correlation logic (time-window clustering, service graph traversal, deploy correlation) +- All noise scoring algorithms (rule-based scoring, threshold calculations) +- All HMAC signature validation (security-critical) +- All fingerprinting/deduplication logic +- All suppression governance (strict vs. audit mode) +- All circuit breaker state transitions (suppression DLQ replay) -### 1.3 Test Naming Convention +**When integration tests lead (test-after, then harden):** +- Provider webhook parsers — implement against real payload samples, then lock in with contract tests +- SQS FIFO message ordering — test against LocalStack after implementation +- Slack message formatting — build the blocks, then snapshot test the output -Tests follow the `given_when_then` pattern using Vitest's `describe`/`it` structure: +**When E2E tests lead:** +- The 60-second time-to-value journey — define the happy path first, build backward +- Weekly noise digest generation — define expected output, then build the aggregation + +### 1.3 Test Naming Conventions ```typescript +// Unit tests (vitest) +describe('CorrelationEngine', () => { + it('groups alerts for same service within 5min window into single incident', () => {}); + it('extends window by 2min when alert arrives in last 30 seconds', () => {}); + it('caps window extension at 15 minutes total', () => {}); + it('merges downstream service alerts when upstream window is active', () => {}); +}); + describe('NoiseScorer', () => { - describe('given a deploy-correlated alert window', () => { - it('should boost noise score by 25 points when deploy is attached', () => { ... }); - it('should add 5 additional points when PR title contains "feature-flag"', () => { ... }); - it('should not boost score above 50 when service matches never-suppress safelist', () => { ... }); + it('scores deploy-correlated alerts higher when deploy is within 10min', () => {}); + it('returns zero noise score for first-ever alert from a service', () => {}); + it('adds 5 points when PR title matches config or feature-flag', () => {}); +}); + +describe('HmacValidator', () => { + it('rejects Datadog webhook with missing DD-WEBHOOK-SIGNATURE header', () => {}); + it('rejects PagerDuty webhook with tampered body', () => {}); + it('accepts valid signature and passes payload through', () => {}); +}); +``` + +**Rules:** +- Describe the **observable outcome**, not the internal mechanism +- Use present tense ("groups", "rejects", "scores") +- If you need "and" in the name, split into two tests +- Group by component in `describe` blocks + +--- + +## Section 2: Test Pyramid + +### 2.1 Ratio + +| Level | Target | Count (V1) | Runtime | +|-------|--------|------------|---------| +| Unit | 70% | ~350 tests | <30s | +| Integration | 20% | ~100 tests | <5min | +| E2E/Smoke | 10% | ~20 tests | <10min | + +### 2.2 Unit Test Targets (per component) + +| Component | Key Behaviors | Est. Tests | +|-----------|--------------|------------| +| Webhook Parsers (Datadog, PD, OpsGenie, Grafana) | Payload normalization, field mapping, batch handling | 60 | +| HMAC Validator | Signature verification per provider, rejection paths | 20 | +| Fingerprint Generator | Deterministic hashing, dedup detection | 15 | +| Correlation Engine | Time-window open/close/extend, service graph merge, deploy correlation | 80 | +| Noise Scorer | Rule-based scoring, deploy proximity weighting, threshold calculations | 60 | +| Suggestion Engine | Suppression recommendations, "what would have happened" calculations | 30 | +| Notification Formatter | Slack block formatting, digest generation, in-place message updates | 25 | +| Governance Policy | Strict/audit mode enforcement, panic mode, per-customer overrides | 30 | +| Feature Flags | Circuit breaker on suppression volume, flag lifecycle | 15 | +| Canonical Schema Mapper | Provider → canonical field mapping, severity normalization | 15 | + +### 2.3 Integration Test Boundaries + +| Boundary | What's Tested | Infrastructure | +|----------|--------------|----------------| +| Lambda → SQS FIFO | Message ordering, dedup, tenant partitioning | LocalStack | +| SQS → Correlation Engine | Consumer polling, batch processing, error handling | LocalStack | +| Correlation Engine → Redis | Window CRUD, sorted set operations, TTL expiry | Testcontainers Redis | +| Correlation Engine → DynamoDB | Incident persistence, tenant config reads | Testcontainers DynamoDB Local | +| Correlation Engine → TimescaleDB | Time-series writes, continuous aggregate queries | Testcontainers PostgreSQL + TimescaleDB | +| Notification Service → Slack | Block formatting, rate limiting, message update | WireMock | +| API Gateway → Lambda | Webhook routing, auth, throttling | LocalStack | + +### 2.4 E2E/Smoke Scenarios + +1. **60-Second TTV Journey**: Webhook received → alert in Slack within 60s +2. **Alert Storm Correlation**: 50 alerts in 2 minutes → grouped into 1 incident +3. **Deploy Correlation**: Deploy event + alert storm → deploy identified as trigger +4. **Noise Digest**: 7 days of alerts → weekly Slack digest with noise stats +5. **Multi-Provider Merge**: Datadog + PagerDuty alerts for same service → single incident +6. **Panic Mode**: Enable panic → all suppression stops → alerts pass through raw + +--- + +## Section 3: Unit Test Strategy + +### 3.1 Webhook Parsers + +Each provider parser is a pure function: payload in, canonical alert(s) out. No side effects, no DB calls. + +```typescript +// tests/unit/parsers/datadog.test.ts +describe('DatadogParser', () => { + it('normalizes single alert payload to canonical schema', () => {}); + it('normalizes batched alert array into multiple canonical alerts', () => {}); + it('maps Datadog P1 to critical, P5 to info', () => {}); + it('extracts service name from tags array', () => {}); + it('handles missing optional fields without throwing', () => {}); + it('generates stable fingerprint from title + service + tenant', () => {}); +}); + +// tests/unit/parsers/pagerduty.test.ts +describe('PagerDutyParser', () => { + it('normalizes incident.triggered event to canonical alert', () => {}); + it('normalizes incident.resolved event with resolution metadata', () => {}); + it('ignores incident.acknowledged events (not alerts)', () => {}); + it('maps PD urgency high to critical, low to info', () => {}); +}); + +// tests/unit/parsers/opsgenie.test.ts +describe('OpsGenieParser', () => { + it('normalizes alert.created action to canonical alert', () => {}); + it('extracts priority P1-P5 and maps to severity', () => {}); + it('handles custom fields in details object', () => {}); +}); + +// tests/unit/parsers/grafana.test.ts +describe('GrafanaParser', () => { + it('normalizes Grafana Alertmanager webhook payload', () => {}); + it('handles multiple alerts in single webhook (Grafana batches)', () => {}); + it('extracts dashboard URL as context link', () => {}); +}); +``` + +**Mocking strategy:** None needed — parsers are pure functions. Use recorded payload fixtures from `fixtures/webhooks/{provider}/`. + +**Fixture structure:** +``` +fixtures/webhooks/ + datadog/ + single-alert.json + batched-alerts.json + monitor-recovered.json + pagerduty/ + incident-triggered.json + incident-resolved.json + incident-acknowledged.json + opsgenie/ + alert-created.json + alert-closed.json + grafana/ + single-firing.json + multi-firing.json + resolved.json +``` + +### 3.2 HMAC Validator + +```typescript +describe('HmacValidator', () => { + // Datadog uses hex-encoded HMAC-SHA256 + it('validates correct Datadog DD-WEBHOOK-SIGNATURE header', () => {}); + it('rejects Datadog webhook with wrong signature', () => {}); + it('rejects Datadog webhook with missing signature header', () => {}); + + // PagerDuty uses v1= prefix with HMAC-SHA256 + it('validates correct PagerDuty X-PagerDuty-Signature header', () => {}); + it('rejects PagerDuty webhook with tampered body', () => {}); + + // OpsGenie uses different header name + it('validates correct OpsGenie X-OpsGenie-Signature header', () => {}); + + // Edge cases + it('rejects empty body with any signature', () => {}); + it('handles timing-safe comparison to prevent timing attacks', () => {}); +}); +``` + +**Mocking strategy:** None — crypto operations are deterministic. Use known secret + body + expected signature triples. + +### 3.3 Fingerprint Generator + +```typescript +describe('FingerprintGenerator', () => { + it('generates deterministic SHA-256 from tenant_id + provider + service + title', () => {}); + it('produces same fingerprint for identical alerts regardless of timestamp', () => {}); + it('produces different fingerprints when service differs', () => {}); + it('normalizes title whitespace before hashing', () => {}); + it('handles unicode characters in title consistently', () => {}); +}); +``` + +### 3.4 Correlation Engine + +The most complex component. Heavy use of table-driven tests. + +```typescript +describe('CorrelationEngine', () => { + describe('Time-Window Management', () => { + it('opens new 5min window on first alert for a service', () => {}); + it('adds subsequent alerts to existing open window', () => {}); + it('extends window by 2min when alert arrives in last 30 seconds', () => {}); + it('caps total window duration at 15 minutes', () => {}); + it('closes window after timeout with no new alerts', () => {}); + it('generates incident record when window closes', () => {}); + }); + + describe('Service Graph Correlation', () => { + it('merges downstream alerts into upstream window when dependency exists', () => {}); + it('does not merge alerts for unrelated services', () => {}); + it('handles circular dependencies without infinite loop', () => {}); + it('traverses multi-level dependency chains (A→B→C)', () => {}); + }); + + describe('Deploy Correlation', () => { + it('tags incident with deploy_id when deploy event within 10min of first alert', () => {}); + it('does not correlate deploy older than 10 minutes', () => {}); + it('correlates deploy to correct service even with multiple recent deploys', () => {}); + it('adds deploy correlation score boost to noise calculation', () => {}); + }); + + describe('Multi-Tenant Isolation', () => { + it('never correlates alerts across different tenants', () => {}); + it('maintains separate windows per tenant', () => {}); + it('handles concurrent alerts from multiple tenants', () => {}); }); }); ``` -Test file naming: `{module}.test.ts` for unit tests, `{module}.integration.test.ts` for integration tests, `{journey}.e2e.test.ts` for E2E. +**Mocking strategy:** +- Mock Redis client (`ioredis-mock`) for window state +- Mock DynamoDB client for service dependency reads +- Mock SQS for downstream message publishing +- Use `sinon.useFakeTimers()` for time-window testing -### 1.4 When Tests Lead (TDD Mandatory) +### 3.5 Noise Scorer -TDD is **mandatory** for: -- All noise scoring logic (`src/scoring/`) -- All correlation rules (`src/correlation/`) -- All suppression decisions (`src/suppression/`) -- HMAC validation per provider -- Canonical schema mapping (every provider parser) -- Feature flag circuit breaker logic -- Governance policy enforcement (`policy.json` evaluation) -- Any function with cyclomatic complexity > 3 +```typescript +describe('NoiseScorer', () => { + describe('Rule-Based Scoring', () => { + it('returns 0 for first-ever alert from a service (no history)', () => {}); + it('scores higher when alert has fired >5 times in 24 hours', () => {}); + it('scores higher when alert auto-resolved within 5 minutes', () => {}); + it('adds deploy correlation bonus (+15 points) when deploy is recent', () => {}); + it('adds feature-flag bonus (+5 points) when PR title matches config/feature-flag', () => {}); + it('caps total score at 100', () => {}); + it('never scores critical severity alerts above 80 (safety cap)', () => {}); + }); -TDD is **recommended but not enforced** for: -- Infrastructure glue code (SQS consumers, DynamoDB adapters) -- Slack Block Kit message formatting -- Dashboard API route handlers (covered by integration tests) + describe('Threshold Calculations', () => { + it('classifies score 0-30 as signal (keep)', () => {}); + it('classifies score 31-70 as review (annotate)', () => {}); + it('classifies score 71-100 as noise (suggest suppress)', () => {}); + it('uses tenant-specific thresholds when configured', () => {}); + }); -### 1.5 Test Ownership + describe('What-Would-Have-Happened', () => { + it('calculates suppression count for historical window', () => {}); + it('reports zero false negatives when no suppressed alert was critical', () => {}); + it('flags false negative when suppressed alert was later escalated', () => {}); + }); +}); +``` -Each epic owns its tests. The Correlation Engine team owns `src/correlation/**/*.test.ts`. No cross-team test ownership. If a test breaks due to a dependency change, the team that changed the dependency fixes the test. +**Mocking strategy:** Mock the alert history store (DynamoDB queries). Scorer logic itself is pure calculation. + +### 3.6 Notification Formatter + +```typescript +describe('NotificationFormatter', () => { + describe('Slack Blocks', () => { + it('formats single-alert notification with service, title, severity', () => {}); + it('formats correlated incident with alert count and sources', () => {}); + it('includes deploy trigger when deploy correlation exists', () => {}); + it('includes noise score badge (🟢 signal / 🟡 review / 🔴 noise)', () => {}); + it('includes feedback buttons (👍 Helpful / 👎 Not helpful)', () => {}); + it('formats in-place update message (replaces initial alert)', () => {}); + }); + + describe('Weekly Digest', () => { + it('aggregates 7 days of incidents into summary stats', () => {}); + it('highlights top 3 noisiest services', () => {}); + it('shows suppression savings ("would have saved X pages")', () => {}); + }); +}); +``` + +**Mocking strategy:** Snapshot tests — render the Slack blocks to JSON and compare against golden fixtures. + +### 3.7 Governance Policy Engine + +```typescript +describe('GovernancePolicy', () => { + describe('Mode Enforcement', () => { + it('in strict mode: annotates alerts but never suppresses', () => {}); + it('in audit mode: auto-suppresses with full logging', () => {}); + it('defaults new tenants to strict mode', () => {}); + }); + + describe('Panic Mode', () => { + it('when panic=true: all suppression stops immediately', () => {}); + it('when panic=true: all alerts pass through unmodified', () => {}); + it('panic mode activatable via Redis key check', () => {}); + it('panic mode shows banner in dashboard API response', () => {}); + }); + + describe('Per-Customer Override', () => { + it('customer can set stricter mode than system default', () => {}); + it('customer cannot set less restrictive mode than system default', () => {}); + it('merge logic: max_restrictive(system, customer)', () => {}); + }); + + describe('Policy Decision Logging', () => { + it('logs "suppressed by audit mode" with full context', () => {}); + it('logs "annotation-only, strict mode active" for strict tenants', () => {}); + it('logs "panic mode active — all alerts passing through"', () => {}); + }); +}); +``` + +### 3.8 Feature Flag Circuit Breaker + +```typescript +describe('SuppressionCircuitBreaker', () => { + it('allows suppression when volume is within baseline', () => {}); + it('trips breaker when suppression exceeds 2x baseline over 30min', () => {}); + it('auto-disables the scoring flag when breaker trips', () => {}); + it('replays suppressed alerts from DLQ when breaker trips', () => {}); + it('resets breaker after manual flag re-enable', () => {}); + it('tracks suppression count per flag in Redis sliding window', () => {}); +}); +``` --- + +## Section 4: Integration Test Strategy + +### 4.1 Webhook Contract Tests + +Each provider integration gets a contract test suite that validates the full path: HTTP request → Lambda → SQS message. + +```typescript +// tests/integration/webhooks/datadog.contract.test.ts +describe('Datadog Webhook Contract', () => { + let localstack: LocalStackContainer; + let sqsClient: SQSClient; + + beforeAll(async () => { + localstack = await new LocalStackContainer().start(); + sqsClient = new SQSClient({ endpoint: localstack.getEndpoint() }); + // Create SQS FIFO queue + await sqsClient.send(new CreateQueueCommand({ + QueueName: 'alert-ingested.fifo', + Attributes: { FifoQueue: 'true', ContentBasedDeduplication: 'true' } + })); + }); + + it('accepts valid Datadog webhook and produces canonical SQS message', async () => { + const payload = loadFixture('webhooks/datadog/single-alert.json'); + const signature = computeHmac(payload, TEST_SECRET); + + const res = await request(app) + .post('/v1/wh/tenant-123/datadog') + .set('DD-WEBHOOK-SIGNATURE', signature) + .send(payload); + + expect(res.status).toBe(200); + + const messages = await pollSqs(sqsClient, 'alert-ingested.fifo'); + expect(messages).toHaveLength(1); + expect(messages[0].body).toMatchObject({ + tenant_id: 'tenant-123', + provider: 'datadog', + severity: expect.stringMatching(/critical|high|medium|low|info/), + fingerprint: expect.stringMatching(/^[a-f0-9]{64}$/), + }); + }); + + it('rejects webhook with invalid HMAC and produces no SQS message', async () => { + const payload = loadFixture('webhooks/datadog/single-alert.json'); + + const res = await request(app) + .post('/v1/wh/tenant-123/datadog') + .set('DD-WEBHOOK-SIGNATURE', 'bad-signature') + .send(payload); + + expect(res.status).toBe(401); + const messages = await pollSqs(sqsClient, 'alert-ingested.fifo', { waitMs: 1000 }); + expect(messages).toHaveLength(0); + }); +}); +``` + +Repeat pattern for PagerDuty, OpsGenie, Grafana — each with provider-specific signature headers and payload formats. + +### 4.2 Correlation Engine → Redis Integration + +```typescript +// tests/integration/correlation/redis-windows.test.ts +describe('Correlation Engine + Redis', () => { + let redis: StartedTestContainer; + let redisClient: Redis; + + beforeAll(async () => { + redis = await new GenericContainer('redis:7-alpine') + .withExposedPorts(6379) + .start(); + redisClient = new Redis({ host: redis.getHost(), port: redis.getMappedPort(6379) }); + }); + + it('opens window in Redis sorted set with correct TTL', async () => { + await correlationEngine.processAlert(makeAlert({ service: 'payment-api' })); + + const windows = await redisClient.zrange('windows:tenant-123', 0, -1, 'WITHSCORES'); + expect(windows).toHaveLength(2); // [windowId, closesAtEpoch] + const ttl = await redisClient.ttl('window:tenant-123:payment-api'); + expect(ttl).toBeGreaterThan(280); // ~5min minus processing time + }); + + it('extends window when alert arrives in last 30 seconds', async () => { + // Open window, advance clock to T+4m31s, send another alert + await correlationEngine.processAlert(makeAlert({ service: 'payment-api' })); + vi.advanceTimersByTime(4 * 60 * 1000 + 31 * 1000); + await correlationEngine.processAlert(makeAlert({ service: 'payment-api' })); + + const ttl = await redisClient.ttl('window:tenant-123:payment-api'); + expect(ttl).toBeGreaterThan(100); // Extended by ~2min + }); + + it('isolates windows between tenants', async () => { + await correlationEngine.processAlert(makeAlert({ tenant: 'A', service: 'api' })); + await correlationEngine.processAlert(makeAlert({ tenant: 'B', service: 'api' })); + + const windowsA = await redisClient.zrange('windows:A', 0, -1); + const windowsB = await redisClient.zrange('windows:B', 0, -1); + expect(windowsA).toHaveLength(1); + expect(windowsB).toHaveLength(1); + expect(windowsA[0]).not.toBe(windowsB[0]); + }); +}); +``` + +### 4.3 Correlation Engine → DynamoDB Integration + +```typescript +// tests/integration/correlation/dynamodb-incidents.test.ts +describe('Correlation Engine + DynamoDB', () => { + let dynamodb: StartedTestContainer; + + beforeAll(async () => { + dynamodb = await new GenericContainer('amazon/dynamodb-local:latest') + .withExposedPorts(8000) + .start(); + // Create tables: alerts, incidents, tenant_config, service_dependencies + }); + + it('persists incident record when correlation window closes', async () => { + await correlationEngine.processAlert(makeAlert({ service: 'api' })); + await correlationEngine.processAlert(makeAlert({ service: 'api' })); + await correlationEngine.closeExpiredWindows(); + + const incidents = await queryIncidents('tenant-123'); + expect(incidents).toHaveLength(1); + expect(incidents[0].alert_count).toBe(2); + expect(incidents[0].services).toContain('api'); + }); + + it('reads service dependencies for cascading correlation', async () => { + await putServiceDependency('tenant-123', 'api', 'database'); + await correlationEngine.processAlert(makeAlert({ service: 'database' })); + await correlationEngine.processAlert(makeAlert({ service: 'api' })); + + // Both should be in the same window + const windows = await getActiveWindows('tenant-123'); + expect(windows).toHaveLength(1); + expect(windows[0].services).toEqual(expect.arrayContaining(['api', 'database'])); + }); +}); +``` + +### 4.4 Correlation Engine → TimescaleDB Integration + +```typescript +// tests/integration/correlation/timescaledb-trends.test.ts +describe('Correlation Engine + TimescaleDB', () => { + let pg: StartedTestContainer; + + beforeAll(async () => { + pg = await new GenericContainer('timescale/timescaledb:latest-pg16') + .withExposedPorts(5432) + .withEnvironment({ POSTGRES_PASSWORD: 'test' }) + .start(); + // Run migrations: create hypertables, continuous aggregates + }); + + it('writes alert frequency data to hypertable', async () => { + await correlationEngine.recordAlertEvent(makeAlert({ service: 'api' })); + const rows = await query('SELECT * FROM alert_events WHERE service = $1', ['api']); + expect(rows).toHaveLength(1); + }); + + it('continuous aggregate calculates hourly alert counts', async () => { + // Insert 10 alerts spread over 2 hours + await insertAlertEvents(10, { spreadHours: 2 }); + await refreshContinuousAggregate('hourly_alert_summary'); + + const summary = await query('SELECT * FROM hourly_alert_summary'); + expect(summary).toHaveLength(2); + expect(summary.reduce((s, r) => s + r.alert_count, 0)).toBe(10); + }); +}); +``` + +### 4.5 Notification Service → Slack (WireMock) + +```typescript +// tests/integration/notifications/slack.test.ts +describe('Notification Service + Slack', () => { + let wiremock: WireMockContainer; + + beforeAll(async () => { + wiremock = await new WireMockContainer().start(); + wiremock.stub({ + request: { method: 'POST', urlPath: '/api/chat.postMessage' }, + response: { status: 200, body: JSON.stringify({ ok: true, ts: '1234.5678' }) } + }); + wiremock.stub({ + request: { method: 'POST', urlPath: '/api/chat.update' }, + response: { status: 200, body: JSON.stringify({ ok: true }) } + }); + }); + + it('sends initial alert notification to correct Slack channel', async () => {}); + it('updates message in-place when correlation completes', async () => {}); + it('respects Slack rate limits (1 msg/sec per channel)', async () => {}); + it('retries on 429 with exponential backoff', async () => {}); + it('includes feedback buttons in correlated incident message', async () => {}); +}); +``` + +--- + +## Section 5: E2E & Smoke Tests + +### 5.1 Critical User Journeys + +**Journey 1: 60-Second Time-to-Value** + +The defining test for dd0c/alert. Validates the entire pipeline from webhook to Slack notification. + +```typescript +// tests/e2e/journeys/sixty-second-ttv.test.ts +describe('60-Second Time-to-Value', () => { + it('delivers first correlated incident to Slack within 60 seconds of webhook', async () => { + const start = Date.now(); + + // 1. Send Datadog webhook + await sendWebhook('datadog', fixtures.datadog.singleAlert, { tenant: 'e2e-tenant' }); + + // 2. Wait for Slack message + const slackMessage = await waitForSlackMessage('e2e-channel', { timeoutMs: 60_000 }); + + const elapsed = Date.now() - start; + expect(elapsed).toBeLessThan(60_000); + expect(slackMessage.text).toContain('New alert'); + expect(slackMessage.blocks).toBeDefined(); + }); +}); +``` + +**Journey 2: Alert Storm Correlation** + +```typescript +// tests/e2e/journeys/alert-storm.test.ts +describe('Alert Storm Correlation', () => { + it('groups 50 alerts in 2 minutes into a single correlated incident', async () => { + // Fire 50 alerts for same service over 2 minutes + for (let i = 0; i < 50; i++) { + await sendWebhook('datadog', makeAlertPayload({ + service: 'payment-api', + title: `High latency on payment-api (${i})`, + })); + await sleep(2400); // ~50 alerts in 2 min + } + + // Wait for correlation window to close + await sleep(5 * 60 * 1000 + 30_000); // 5min window + buffer + + const slackMessages = await getSlackMessages('e2e-channel'); + const incidentMessages = slackMessages.filter(m => m.text.includes('Incident')); + expect(incidentMessages).toHaveLength(1); + expect(incidentMessages[0].text).toContain('50 alerts grouped'); + }); +}); +``` + +**Journey 3: Deploy Correlation** + +```typescript +// tests/e2e/journeys/deploy-correlation.test.ts +describe('Deploy Correlation', () => { + it('identifies deploy as trigger when alerts follow within 10 minutes', async () => { + // 1. Send deploy event + await sendWebhook('github-actions', makeDeployPayload({ + service: 'payment-api', + commit: 'abc123', + pr_title: 'feat: add retry logic', + })); + + // 2. Wait 2 minutes, then fire alerts + await sleep(2 * 60 * 1000); + await sendWebhook('datadog', makeAlertPayload({ service: 'payment-api' })); + await sendWebhook('pagerduty', makeAlertPayload({ service: 'payment-api' })); + + // 3. Wait for correlation + await sleep(6 * 60 * 1000); + + const slackMessage = await getLatestSlackMessage('e2e-channel'); + expect(slackMessage.text).toContain('Deploy #'); + expect(slackMessage.text).toContain('abc123'); + }); +}); +``` + +**Journey 4: Panic Mode** + +```typescript +// tests/e2e/journeys/panic-mode.test.ts +describe('Panic Mode', () => { + it('stops all suppression immediately when panic mode is activated', async () => { + // 1. Enable audit mode, verify suppression works + await setGovernanceMode('e2e-tenant', 'audit'); + await sendNoisyAlerts(10); + const beforePanic = await getSlackMessages('e2e-channel'); + const suppressedBefore = beforePanic.filter(m => m.text.includes('suppressed')); + + // 2. Activate panic mode + await fetch('/admin/panic', { method: 'POST' }); + + // 3. Send more alerts — all should pass through + await sendNoisyAlerts(10); + const afterPanic = await getSlackMessages('e2e-channel'); + const rawAlerts = afterPanic.filter(m => !m.text.includes('suppressed')); + expect(rawAlerts.length).toBeGreaterThanOrEqual(10); + }); +}); +``` + +### 5.2 E2E Infrastructure + +```yaml +# docker-compose.e2e.yml +services: + localstack: + image: localstack/localstack:3 + environment: + SERVICES: sqs,s3,dynamodb,apigateway,lambda + ports: ["4566:4566"] + + timescaledb: + image: timescale/timescaledb:latest-pg16 + environment: + POSTGRES_PASSWORD: test + ports: ["5432:5432"] + + redis: + image: redis:7-alpine + ports: ["6379:6379"] + + wiremock: + image: wiremock/wiremock:3 + ports: ["8080:8080"] + volumes: + - ./fixtures/wiremock:/home/wiremock/mappings + + app: + build: . + environment: + AWS_ENDPOINT: http://localstack:4566 + REDIS_URL: redis://redis:6379 + TIMESCALE_URL: postgres://postgres:test@timescaledb:5432/test + SLACK_API_URL: http://wiremock:8080 + depends_on: [localstack, timescaledb, redis, wiremock] +``` + +### 5.3 Synthetic Alert Generation + +```typescript +// tests/e2e/helpers/alert-generator.ts +export function makeAlertPayload(overrides: Partial = {}): DatadogWebhookPayload { + return { + id: ulid(), + title: overrides.title ?? `Alert: ${faker.hacker.phrase()}`, + text: faker.lorem.sentence(), + date_happened: Math.floor(Date.now() / 1000), + priority: overrides.priority ?? 'normal', + tags: [`service:${overrides.service ?? 'test-service'}`], + alert_type: overrides.severity ?? 'warning', + ...overrides, + }; +} + +export async function sendNoisyAlerts(count: number, opts?: { service?: string }) { + for (let i = 0; i < count; i++) { + await sendWebhook('datadog', makeAlertPayload({ + service: opts?.service ?? 'noisy-service', + title: `Flapping alert #${i}`, + })); + } +} +``` + +--- + +## Section 6: Performance & Load Testing + +### 6.1 Alert Ingestion Throughput + +```typescript +// tests/perf/ingestion-throughput.test.ts +describe('Ingestion Throughput', () => { + it('processes 1000 webhooks/second without dropping payloads', async () => { + const results = await k6.run({ + vus: 100, + duration: '30s', + thresholds: { + http_req_duration: ['p95<200'], // 200ms p95 + http_req_failed: ['rate<0.001'], // <0.1% failure + }, + script: ` + import http from 'k6/http'; + export default function() { + http.post('${WEBHOOK_URL}/v1/wh/perf-tenant/datadog', + JSON.stringify(makeAlertPayload()), + { headers: { 'DD-WEBHOOK-SIGNATURE': validSig } } + ); + } + `, + }); + expect(results.metrics.http_req_failed.rate).toBeLessThan(0.001); + }); +}); +``` + +### 6.2 Correlation Latency Under Alert Storms + +```typescript +describe('Correlation Storm Performance', () => { + it('correlates 500 alerts across 10 services within 30 seconds', async () => { + const start = Date.now(); + + // Simulate incident storm: 500 alerts, 10 services, 2 minutes + await generateAlertStorm({ alerts: 500, services: 10, durationMs: 120_000 }); + + // Wait for all windows to close + await waitForIncidents('perf-tenant', { minCount: 1, timeoutMs: 30_000 }); + + const elapsed = Date.now() - start - 120_000; // subtract generation time + expect(elapsed).toBeLessThan(30_000); + }); + + it('Redis memory stays under 50MB during 10K active windows', async () => { + // Open 10K windows across 100 tenants + for (let t = 0; t < 100; t++) { + for (let s = 0; s < 100; s++) { + await correlationEngine.processAlert(makeAlert({ + tenant: `tenant-${t}`, + service: `service-${s}`, + })); + } + } + const memoryUsage = await redisClient.info('memory'); + const usedMb = parseRedisMemory(memoryUsage); + expect(usedMb).toBeLessThan(50); + }); +}); +``` + +### 6.3 Noise Scoring Latency + +```typescript +describe('Noise Scoring Performance', () => { + it('scores a correlated incident with 50 alerts in <100ms', async () => { + const incident = makeIncident({ alertCount: 50, withHistory: true }); + + const start = performance.now(); + const score = await noiseScorer.score(incident); + const elapsed = performance.now() - start; + + expect(elapsed).toBeLessThan(100); + expect(score).toBeGreaterThanOrEqual(0); + expect(score).toBeLessThanOrEqual(100); + }); +}); +``` + +### 6.4 Memory Pressure During High-Cardinality Correlation + +```typescript +describe('Memory Pressure', () => { + it('ECS task stays under 512MB with 1000 concurrent correlation windows', async () => { + // Monitor ECS task memory while processing high-cardinality alerts + const memBefore = process.memoryUsage().heapUsed; + + await processHighCardinalityAlerts({ tenants: 100, servicesPerTenant: 10 }); + + const memAfter = process.memoryUsage().heapUsed; + const deltaMb = (memAfter - memBefore) / 1024 / 1024; + expect(deltaMb).toBeLessThan(256); // Leave headroom in 512MB task + }); +}); +``` + +--- + +## Section 7: CI/CD Pipeline Integration + +### 7.1 Pipeline Stages + +``` +┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ +│ Pre-Commit │───▶│ PR Gate │───▶│ Merge │───▶│ Staging │───▶│ Prod │ +│ (local) │ │ (CI) │ │ (CI) │ │ (CD) │ │ (CD) │ +└─────────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ + lint + format unit tests full suite E2E + perf smoke + canary + type check integration coverage gate LocalStack deploy event + <10s <5min <10min <15min self-dogfood +``` + +### 7.2 Stage Details + +**Pre-Commit (local, <10s):** +- `eslint` + `prettier` format check +- `tsc --noEmit` type check +- Affected unit tests only (`vitest --changed`) + +**PR Gate (CI, <5min):** +- Full unit test suite +- Integration tests (Testcontainers spin up in CI) +- Schema migration lint (no DROP/RENAME/TYPE changes) +- Decision log presence check for scoring/correlation PRs +- Coverage diff: new code must have ≥80% coverage + +**Merge to Main (CI, <10min):** +- Full test suite (unit + integration) +- Coverage gate: overall ≥80%, scoring engine ≥90% +- CDK synth + diff (infrastructure changes) +- Security scan (`npm audit`, `trivy`) + +**Staging (CD, <15min):** +- Deploy to staging environment +- E2E journey tests against LocalStack +- Performance benchmarks (ingestion throughput, correlation latency) +- Synthetic alert generation + validation + +**Production (CD):** +- Canary deploy (10% traffic for 5 minutes) +- Smoke tests (send test webhook, verify Slack delivery) +- dd0c/alert dogfoods itself: deploy event sent to own webhook +- Automated rollback if error rate >1% during canary + +### 7.3 Coverage Thresholds + +| Component | Minimum | Target | +|-----------|---------|--------| +| Webhook Parsers | 90% | 95% | +| HMAC Validator | 95% | 100% | +| Correlation Engine | 85% | 90% | +| Noise Scorer | 90% | 95% | +| Governance Policy | 90% | 95% | +| Notification Formatter | 75% | 85% | +| Overall | 80% | 85% | + +### 7.4 Test Parallelization + +```yaml +# .github/workflows/test.yml +jobs: + unit: + runs-on: ubuntu-latest + strategy: + matrix: + shard: [1, 2, 3, 4] + steps: + - run: vitest --shard=${{ matrix.shard }}/4 + + integration: + runs-on: ubuntu-latest + strategy: + matrix: + suite: [webhooks, correlation, notifications, storage] + steps: + - run: vitest --project=integration --grep=${{ matrix.suite }} + + e2e: + needs: [unit, integration] + runs-on: ubuntu-latest + steps: + - run: docker compose -f docker-compose.e2e.yml up -d + - run: vitest --project=e2e +``` + +--- + +## Section 8: Transparent Factory Tenet Testing + +### 8.1 Atomic Flagging — Suppression Circuit Breaker + +```typescript +describe('Atomic Flagging', () => { + describe('Flag Lifecycle', () => { + it('new scoring rule flag defaults to false (off)', () => {}); + it('flag has owner and ttl metadata', () => {}); + it('CI blocks when flag at 100% exceeds 14-day TTL', () => {}); + }); + + describe('Circuit Breaker on Suppression Volume', () => { + it('allows suppression when volume is within 2x baseline', () => {}); + it('trips breaker when suppression exceeds 2x baseline over 30min', () => {}); + it('auto-disables the flag when breaker trips', () => {}); + it('buffers suppressed alerts in DLQ during normal operation', () => {}); + it('replays DLQ alerts when breaker trips', async () => { + // 1. Enable scoring flag, suppress 20 alerts + // 2. Trip the breaker by spiking suppression rate + // 3. Verify all 20 suppressed alerts are re-emitted from DLQ + // 4. Verify flag is now disabled + }); + it('DLQ retains alerts for 1 hour before expiry', () => {}); + }); + + describe('Local Evaluation', () => { + it('flag evaluation does not make network calls', () => {}); + it('flag state is cached in-memory and refreshed every 60s', () => {}); + }); +}); +``` + +### 8.2 Elastic Schema — Migration Validation + +```typescript +describe('Elastic Schema', () => { + describe('Migration Lint', () => { + it('rejects migration with DROP COLUMN statement', () => { + const migration = 'ALTER TABLE alert_events DROP COLUMN old_field;'; + expect(lintMigration(migration)).toContainError('DROP not allowed'); + }); + it('rejects migration with ALTER COLUMN TYPE', () => { + const migration = 'ALTER TABLE alert_events ALTER COLUMN severity TYPE integer;'; + expect(lintMigration(migration)).toContainError('TYPE change not allowed'); + }); + it('rejects migration with RENAME COLUMN', () => {}); + it('accepts migration with ADD COLUMN (nullable)', () => { + const migration = 'ALTER TABLE alert_events ADD COLUMN noise_score_v2 integer;'; + expect(lintMigration(migration)).toBeValid(); + }); + it('accepts migration with new table creation', () => {}); + }); + + describe('DynamoDB Schema', () => { + it('rejects attribute type change in table definition', () => {}); + it('accepts new attribute addition', () => {}); + it('V1 code ignores V2 attributes without error', () => {}); + }); + + describe('Sunset Enforcement', () => { + it('every migration file contains sunset_date comment', () => { + const migrations = glob.sync('migrations/*.sql'); + for (const m of migrations) { + const content = fs.readFileSync(m, 'utf-8'); + expect(content).toMatch(/-- sunset_date: \d{4}-\d{2}-\d{2}/); + } + }); + it('CI warns when migration is past sunset date', () => {}); + }); +}); +``` + +### 8.3 Cognitive Durability — Decision Log Validation + +```typescript +describe('Cognitive Durability', () => { + it('decision_log.json exists for every PR touching scoring/', () => { + // CI hook: check git diff for files in src/scoring/ + // If touched, require docs/decisions/*.json in the same PR + }); + + it('decision log has required fields', () => { + const logs = glob.sync('docs/decisions/*.json'); + for (const log of logs) { + const entry = JSON.parse(fs.readFileSync(log, 'utf-8')); + expect(entry).toHaveProperty('reasoning'); + expect(entry).toHaveProperty('alternatives_considered'); + expect(entry).toHaveProperty('confidence'); + expect(entry).toHaveProperty('timestamp'); + expect(entry).toHaveProperty('author'); + } + }); + + it('cyclomatic complexity stays under 10 for all scoring functions', () => { + // Run eslint with complexity rule + const result = execSync('eslint src/scoring/ --rule "complexity: [error, 10]"'); + expect(result.exitCode).toBe(0); + }); +}); +``` + +### 8.4 Semantic Observability — OTEL Span Assertions + +```typescript +describe('Semantic Observability', () => { + let spanExporter: InMemorySpanExporter; + + beforeEach(() => { + spanExporter = new InMemorySpanExporter(); + // Configure OTEL with in-memory exporter for testing + }); + + describe('Alert Evaluation Spans', () => { + it('emits parent alert_evaluation span for each alert', async () => { + await processAlert(makeAlert()); + const spans = spanExporter.getFinishedSpans(); + const evalSpan = spans.find(s => s.name === 'alert_evaluation'); + expect(evalSpan).toBeDefined(); + }); + + it('emits child noise_scoring span with score attributes', async () => { + await processAlert(makeAlert()); + const spans = spanExporter.getFinishedSpans(); + const scoreSpan = spans.find(s => s.name === 'noise_scoring'); + expect(scoreSpan).toBeDefined(); + expect(scoreSpan.attributes['alert.noise_score']).toBeGreaterThanOrEqual(0); + expect(scoreSpan.attributes['alert.noise_score']).toBeLessThanOrEqual(100); + }); + + it('emits child correlation_matching span with match data', async () => { + await processAlert(makeAlert()); + const spans = spanExporter.getFinishedSpans(); + const corrSpan = spans.find(s => s.name === 'correlation_matching'); + expect(corrSpan).toBeDefined(); + expect(corrSpan.attributes).toHaveProperty('alert.correlation_matches'); + }); + + it('emits suppression_decision span with reason', async () => { + await processAlert(makeAlert()); + const spans = spanExporter.getFinishedSpans(); + const suppSpan = spans.find(s => s.name === 'suppression_decision'); + expect(suppSpan.attributes).toHaveProperty('alert.suppressed'); + expect(suppSpan.attributes).toHaveProperty('alert.suppression_reason'); + }); + }); + + describe('PII Protection', () => { + it('never includes raw alert payload in span attributes', async () => { + await processAlert(makeAlert({ title: 'User john@example.com failed login' })); + const spans = spanExporter.getFinishedSpans(); + for (const span of spans) { + const attrs = JSON.stringify(span.attributes); + expect(attrs).not.toContain('john@example.com'); + } + }); + + it('uses hashed alert source identifier, not raw', async () => { + await processAlert(makeAlert({ source: 'prod-payment-api' })); + const spans = spanExporter.getFinishedSpans(); + const evalSpan = spans.find(s => s.name === 'alert_evaluation'); + expect(evalSpan.attributes['alert.source']).toMatch(/^[a-f0-9]+$/); + }); + }); +}); +``` + +### 8.5 Configurable Autonomy — Governance Policy Tests + +```typescript +describe('Configurable Autonomy', () => { + describe('Governance Mode Enforcement', () => { + it('strict mode: annotates but never suppresses', async () => { + setPolicy({ governance_mode: 'strict' }); + const result = await processNoisyAlert(makeAlert({ noiseScore: 95 })); + expect(result.suppressed).toBe(false); + expect(result.annotation).toContain('noise_score: 95'); + }); + + it('audit mode: auto-suppresses with logging', async () => { + setPolicy({ governance_mode: 'audit' }); + const result = await processNoisyAlert(makeAlert({ noiseScore: 95 })); + expect(result.suppressed).toBe(true); + expect(result.log).toContain('suppressed by audit mode'); + }); + }); + + describe('Panic Mode', () => { + it('activates in <1 second via API call', async () => { + const start = Date.now(); + await fetch('/admin/panic', { method: 'POST' }); + const panicActive = await redisClient.get('dd0c:panic'); + expect(Date.now() - start).toBeLessThan(1000); + expect(panicActive).toBe('true'); + }); + + it('stops all suppression when active', async () => { + await activatePanic(); + const results = await Promise.all( + Array.from({ length: 10 }, () => processNoisyAlert(makeAlert({ noiseScore: 99 }))) + ); + expect(results.every(r => r.suppressed === false)).toBe(true); + }); + }); + + describe('Per-Customer Override', () => { + it('customer strict overrides system audit', async () => { + setPolicy({ governance_mode: 'audit' }); + setCustomerPolicy('tenant-123', { governance_mode: 'strict' }); + const result = await processNoisyAlert(makeAlert({ tenant: 'tenant-123', noiseScore: 95 })); + expect(result.suppressed).toBe(false); + }); + + it('customer cannot downgrade from system strict to audit', async () => { + setPolicy({ governance_mode: 'strict' }); + setCustomerPolicy('tenant-123', { governance_mode: 'audit' }); + const result = await processNoisyAlert(makeAlert({ tenant: 'tenant-123', noiseScore: 95 })); + expect(result.suppressed).toBe(false); // System strict wins + }); + }); +}); +``` + +--- + +## Section 9: Test Data & Fixtures + +### 9.1 Directory Structure + +``` +tests/ + fixtures/ + webhooks/ + datadog/ + single-alert.json + batched-alerts.json + monitor-recovered.json + high-priority.json + pagerduty/ + incident-triggered.json + incident-resolved.json + incident-acknowledged.json + opsgenie/ + alert-created.json + alert-closed.json + grafana/ + single-firing.json + multi-firing.json + resolved.json + deploys/ + github-actions-success.json + github-actions-failure.json + gitlab-ci-pipeline.json + argocd-sync.json + scenarios/ + alert-storm-50-alerts.json + cascading-failure-3-services.json + flapping-alert-10-cycles.json + maintenance-window-suppression.json + deploy-correlated-incident.json + slack/ + initial-alert-blocks.json + correlated-incident-blocks.json + weekly-digest-blocks.json + schemas/ + canonical-alert.json + incident-record.json + tenant-config.json +``` + +### 9.2 Alert Payload Factory + +```typescript +// tests/helpers/factories.ts +export function makeCanonicalAlert(overrides: Partial = {}): CanonicalAlert { + return { + alert_id: ulid(), + tenant_id: overrides.tenant_id ?? 'test-tenant', + provider: overrides.provider ?? 'datadog', + service: overrides.service ?? 'test-service', + title: overrides.title ?? `Alert: ${faker.hacker.phrase()}`, + severity: overrides.severity ?? 'warning', + fingerprint: overrides.fingerprint ?? crypto.randomBytes(32).toString('hex'), + timestamp: overrides.timestamp ?? new Date().toISOString(), + raw_payload_s3_key: overrides.raw_payload_s3_key ?? `raw/${ulid()}.json`, + metadata: overrides.metadata ?? {}, + ...overrides, + }; +} + +export function makeIncident(overrides: Partial = {}): Incident { + const alertCount = overrides.alert_count ?? 5; + return { + incident_id: ulid(), + tenant_id: overrides.tenant_id ?? 'test-tenant', + services: overrides.services ?? ['test-service'], + alert_count: alertCount, + alerts: Array.from({ length: alertCount }, () => makeCanonicalAlert()), + noise_score: overrides.noise_score ?? 0, + deploy_correlation: overrides.deploy_correlation ?? null, + window_opened_at: overrides.window_opened_at ?? new Date().toISOString(), + window_closed_at: overrides.window_closed_at ?? new Date().toISOString(), + ...overrides, + }; +} + +export function makeDeployEvent(overrides: Partial = {}): DeployEvent { + return { + deploy_id: ulid(), + tenant_id: overrides.tenant_id ?? 'test-tenant', + service: overrides.service ?? 'test-service', + commit_sha: overrides.commit_sha ?? faker.git.commitSha(), + pr_title: overrides.pr_title ?? faker.git.commitMessage(), + deployed_at: overrides.deployed_at ?? new Date().toISOString(), + provider: overrides.provider ?? 'github-actions', + ...overrides, + }; +} +``` + +### 9.3 Noise Scenario Fixtures + +```typescript +// tests/helpers/scenarios.ts +export const NOISE_SCENARIOS = { + alertStorm: { + description: '50 alerts for same service in 2 minutes', + alerts: Array.from({ length: 50 }, (_, i) => makeCanonicalAlert({ + service: 'payment-api', + title: `High latency variant ${i}`, + timestamp: new Date(Date.now() + i * 2400).toISOString(), + })), + expectedIncidents: 1, + expectedNoiseScore: { min: 70, max: 95 }, + }, + + flappingAlert: { + description: 'Alert fires and resolves 10 times in 1 hour', + alerts: Array.from({ length: 20 }, (_, i) => makeCanonicalAlert({ + service: 'health-check', + title: 'Health check failed', + severity: i % 2 === 0 ? 'warning' : 'info', // alternating fire/resolve + timestamp: new Date(Date.now() + i * 3 * 60 * 1000).toISOString(), + })), + expectedNoiseScore: { min: 80, max: 100 }, + }, + + cascadingFailure: { + description: 'Database fails, then API, then frontend', + alerts: [ + makeCanonicalAlert({ service: 'database', severity: 'critical', timestamp: t(0) }), + makeCanonicalAlert({ service: 'api', severity: 'high', timestamp: t(30) }), + makeCanonicalAlert({ service: 'api', severity: 'high', timestamp: t(45) }), + makeCanonicalAlert({ service: 'frontend', severity: 'medium', timestamp: t(60) }), + makeCanonicalAlert({ service: 'frontend', severity: 'medium', timestamp: t(90) }), + ], + serviceDependencies: [['api', 'database'], ['frontend', 'api']], + expectedIncidents: 1, // All merged via dependency graph + expectedNoiseScore: { min: 0, max: 30 }, // Real incident, not noise + }, + + deployCorrelated: { + description: 'Deploy followed by alert storm', + deploy: makeDeployEvent({ service: 'payment-api', pr_title: 'feat: add retry logic' }), + alerts: Array.from({ length: 8 }, () => makeCanonicalAlert({ + service: 'payment-api', + severity: 'high', + })), + deployToAlertGapMs: 2 * 60 * 1000, // 2 minutes after deploy + expectedNoiseScore: { min: 50, max: 85 }, // Deploy correlation boosts noise score + }, +}; +``` + +--- + +## Section 10: TDD Implementation Order + +### 10.1 Bootstrap Sequence + +The test infrastructure itself must be built before any product code. This is the order: + +``` +Phase 0: Test Infrastructure (Week 0) + ├── 0.1 vitest config + TypeScript setup + ├── 0.2 Testcontainers helper (Redis, DynamoDB Local, TimescaleDB) + ├── 0.3 LocalStack helper (SQS, S3, API Gateway) + ├── 0.4 Fixture loader utility + ├── 0.5 Factory functions (makeCanonicalAlert, makeIncident, makeDeployEvent) + ├── 0.6 WireMock Slack stub + └── 0.7 CI pipeline with test stages +``` + +### 10.2 Epic-by-Epic TDD Order + +``` +Phase 1: Webhook Ingestion (Epic 1) — Tests First + ├── 1.1 RED: HMAC validator tests (all providers) + ├── 1.2 GREEN: Implement HMAC validation + ├── 1.3 RED: Datadog parser tests (single + batch) + ├── 1.4 GREEN: Implement Datadog parser + ├── 1.5 RED: PagerDuty parser tests + ├── 1.6 GREEN: Implement PagerDuty parser + ├── 1.7 RED: Fingerprint generator tests + ├── 1.8 GREEN: Implement fingerprinting + ├── 1.9 INTEGRATION: Lambda → SQS contract test + └── 1.10 REFACTOR: Extract provider parser interface + +Phase 2: Correlation Engine (Epic 2) — Tests First + ├── 2.1 RED: Time-window open/close/extend tests + ├── 2.2 GREEN: Implement window manager + ├── 2.3 RED: Service graph correlation tests + ├── 2.4 GREEN: Implement dependency traversal + ├── 2.5 RED: Deploy correlation tests + ├── 2.6 GREEN: Implement deploy tracker + ├── 2.7 INTEGRATION: Correlation → Redis window tests + ├── 2.8 INTEGRATION: Correlation → DynamoDB incident persistence + └── 2.9 INTEGRATION: Correlation → TimescaleDB trend writes + +Phase 3: Noise Analysis (Epic 3) — Tests First + ├── 3.1 RED: Rule-based noise scoring tests (all rules) + ├── 3.2 GREEN: Implement scorer + ├── 3.3 RED: Threshold classification tests + ├── 3.4 GREEN: Implement classifier + ├── 3.5 RED: "What would have happened" calculation tests + ├── 3.6 GREEN: Implement historical analysis + └── 3.7 REFACTOR: Extract scoring rules into configurable pipeline + +Phase 4: Notifications (Epic 4) — Integration Tests Lead + ├── 4.1 Implement Slack block formatter + ├── 4.2 RED: Snapshot tests for all message formats + ├── 4.3 INTEGRATION: Notification → Slack (WireMock) + ├── 4.4 RED: Rate limiting tests + └── 4.5 GREEN: Implement rate limiter + +Phase 5: Governance (Epic 10) — Tests First + ├── 5.1 RED: Strict/audit mode enforcement tests + ├── 5.2 GREEN: Implement policy engine + ├── 5.3 RED: Panic mode tests (<1s activation) + ├── 5.4 GREEN: Implement panic mode + ├── 5.5 RED: Circuit breaker + DLQ replay tests + ├── 5.6 GREEN: Implement circuit breaker + ├── 5.7 RED: OTEL span assertion tests + └── 5.8 GREEN: Instrument all components + +Phase 6: E2E Validation + ├── 6.1 60-second TTV journey + ├── 6.2 Alert storm correlation journey + ├── 6.3 Deploy correlation journey + ├── 6.4 Panic mode journey + └── 6.5 Performance benchmarks +``` + +### 10.3 "Never Ship Without" Checklist + +Before any release, these tests must pass: + +- [ ] All HMAC validation tests (security gate) +- [ ] All correlation window tests (correctness gate) +- [ ] All noise scoring tests (safety gate — never eat real alerts) +- [ ] All governance policy tests (compliance gate) +- [ ] Circuit breaker DLQ replay test (safety net gate) +- [ ] 60-second TTV E2E journey (product promise gate) +- [ ] PII protection span tests (privacy gate) +- [ ] Schema migration lint (no breaking changes) +- [ ] Coverage ≥80% overall, ≥90% on scoring engine + +--- + +*End of dd0c/alert Test Architecture* diff --git a/products/04-lightweight-idp/test-architecture/test-architecture.md b/products/04-lightweight-idp/test-architecture/test-architecture.md index c7e0417..da06895 100644 --- a/products/04-lightweight-idp/test-architecture/test-architecture.md +++ b/products/04-lightweight-idp/test-architecture/test-architecture.md @@ -1,623 +1,1109 @@ # dd0c/portal — Test Architecture & TDD Strategy -**Product:** Lightweight Internal Developer Portal -**Phase:** 6 — Architecture Design -**Date:** 2026-02-28 -**Status:** Draft + +**Product:** dd0c/portal — Lightweight Internal Developer Platform +**Author:** Test Architecture Phase +**Date:** February 28, 2026 +**Status:** V1 MVP — Solo Founder Scope --- -## 1. Testing Philosophy & TDD Workflow +## Section 1: Testing Philosophy & TDD Workflow -### Core Principle +### 1.1 Core Philosophy -dd0c/portal's most critical logic — ownership inference, discovery reconciliation, and confidence scoring — is pure algorithmic code with well-defined inputs and outputs. This is ideal TDD territory. The test suite is the specification. +dd0c/portal is a **trust-critical catalog tool** — if auto-discovery assigns a service to the wrong team, or misses a service entirely, the platform loses credibility instantly. The >80% auto-discovery accuracy target from the party mode review is a hard gate, not a suggestion. -The product's >80% discovery accuracy target is not a QA metric — it's a product promise. Tests enforce it continuously. +Guiding principle: **tests validate what the platform engineer sees in the catalog**. Every test should map to a visible outcome — a service appearing, an ownership assignment, a scorecard grade. -### Red-Green-Refactor Adapted to This Product +### 1.2 Red-Green-Refactor Adapted to dd0c/portal ``` -RED → Write a failing test that encodes a discovery heuristic or ownership rule -GREEN → Write the minimum code to pass it (no clever abstractions yet) -REFACTOR → Clean up once the rule is proven correct against real-world fixtures +RED → Write a failing test that describes the desired catalog state + (e.g., "after scanning an AWS account with 3 ECS services, + the catalog should contain 3 services with correct names") + +GREEN → Write the minimum code to make it pass + +REFACTOR → Extract the discovery logic, add confidence scoring, + optimize the scan parallelism ``` -**Adapted cycle for discovery heuristics:** +**When to write tests first (strict TDD):** +- All ownership inference logic (CODEOWNERS parsing, git blame weighting, signal merging) +- All service reconciliation (AWS + GitHub cross-referencing) +- All confidence scoring calculations +- All governance policy enforcement (strict suggest-only vs. audit auto-mutate) +- All phantom service quarantine logic -1. Capture a real-world failure case (e.g., "Lambda functions named `payment-*` were not grouped into a service") -2. Write a unit test encoding the expected grouping behavior using a fixture of that Lambda response -3. Fix the heuristic -4. Add the fixture to the regression suite permanently +**When integration tests lead:** +- AWS scanner (implement against LocalStack, then lock in contract tests) +- GitHub GraphQL scanner (implement against recorded responses, then contract test) +- Meilisearch indexing (build the index, then test search relevance) -This means every production accuracy bug becomes a permanent test. The test suite grows as a living record of every edge case the discovery engine has encountered. +**When E2E tests lead:** +- 5-minute auto-discovery journey — define the expected catalog state, build backward +- Cmd+K search experience — define expected search results, then build the index -### When to Write Tests First vs. Integration Tests Lead +### 1.3 Test Naming Conventions -| Scenario | Approach | Rationale | -|----------|----------|-----------| -| Ownership scoring algorithm | Unit-first TDD | Pure function, deterministic, no I/O | -| Discovery heuristics (CFN → service mapping) | Unit-first TDD | Deterministic logic over fixture data | -| GitHub GraphQL query construction | Unit-first TDD | Query builder logic is pure | -| AWS API pagination handling | Integration-first | Behavior depends on real API shape | -| Meilisearch index sync | Integration-first | Depends on Meilisearch document model | -| DynamoDB schema migrations | Integration-first | Requires real DynamoDB Local behavior | -| WebSocket progress events | E2E-first | Requires full pipeline to be meaningful | -| Stripe webhook handling | Integration-first | Depends on Stripe event payload shape | +```python +# Python unit tests (pytest) — AWS/GitHub scanners +class TestAWSScanner: + def test_discovers_ecs_services_from_cluster_listing(self): ... + def test_groups_resources_by_cloudformation_stack_name(self): ... + def test_assigns_confidence_095_to_cfn_stack_services(self): ... -### Test Naming Conventions +class TestOwnershipInference: + def test_codeowners_signal_weighted_040(self): ... + def test_top_committer_signal_weighted_030(self): ... + def test_returns_ambiguous_when_top_scores_tied_under_050(self): ... +``` -All tests follow the pattern: `[unit under test]_[scenario]_[expected outcome]` - -**TypeScript/Node.js (Jest):** ```typescript -describe('OwnershipInferenceEngine', () => { - describe('scoreOwnership', () => { - it('returns_primary_owner_when_codeowners_present_with_high_confidence', () => {}) - it('marks_service_unowned_when_top_score_below_threshold', () => {}) - it('marks_service_ambiguous_when_top_two_scores_within_tolerance', () => {}) - }) -}) -``` +// TypeScript tests (vitest) — API, frontend +describe('CatalogAPI', () => { + it('returns services sorted by confidence score descending', () => {}); + it('filters services by team ownership', () => {}); +}); -**Python (pytest):** -```python -class TestOwnershipScorer: - def test_codeowners_signal_weighted_highest_among_all_signals(self): ... - def test_git_blame_frequency_used_when_codeowners_absent(self): ... - def test_confidence_below_threshold_flags_service_as_unowned(self): ... +describe('OwnershipInference', () => { + it('merges CODEOWNERS + git blame + PR reviewer signals', () => {}); + it('flags service as ambiguous when confidence < 0.50', () => {}); +}); ``` -**File naming:** -- Unit tests: `*.test.ts` / `test_*.py` co-located with source -- Integration tests: `*.integration.test.ts` / `test_*_integration.py` in `tests/integration/` -- E2E tests: `tests/e2e/*.spec.ts` (Playwright) - --- -## 2. Test Pyramid +## Section 2: Test Pyramid -### Recommended Ratio: 70 / 20 / 10 +### 2.1 Ratio -``` - ┌─────────────┐ - │ E2E / Smoke│ 10% (~30 tests) - │ (Playwright)│ Critical user journeys only - ├─────────────┤ - │ Integration │ 20% (~80 tests) - │ (real deps) │ Service boundaries, API contracts - ├─────────────┤ - │ Unit │ 70% (~280 tests) - │ (pure logic)│ All heuristics, scoring, parsing - └─────────────┘ -``` +| Level | Target | Count (V1) | Runtime | +|-------|--------|------------|---------| +| Unit | 70% | ~300 tests | <30s | +| Integration | 20% | ~85 tests | <5min | +| E2E/Smoke | 10% | ~15 tests | <10min | -### Unit Test Targets (per component) +### 2.2 Unit Test Targets -| Component | Language | Test Framework | Target Coverage | -|-----------|----------|---------------|----------------| -| AWS Scanner (heuristics) | Python | pytest | 90% | -| GitHub Scanner (parsers) | Node.js | Jest | 90% | -| Reconciliation Engine | Node.js | Jest | 85% | -| Ownership Inference | Python | pytest | 95% | -| Portal API (route handlers) | Node.js | Jest + Supertest | 80% | -| Search proxy + cache logic | Node.js | Jest | 85% | -| Slack Bot command handlers | Node.js | Jest | 80% | -| Feature flag evaluation | Node.js/Python | Jest/pytest | 95% | -| Governance policy engine | Node.js | Jest | 95% | -| Schema migration validators | Node.js | Jest | 100% | +| Component | Key Behaviors | Est. Tests | +|-----------|--------------|------------| +| AWS Scanner (CloudFormation, ECS, Lambda, RDS) | Resource enumeration, tag extraction, service grouping | 50 | +| GitHub Scanner (repos, CODEOWNERS, workflows) | GraphQL parsing, CODEOWNERS parsing, CI/CD target extraction | 40 | +| Reconciliation Engine | AWS↔GitHub cross-reference, confidence scoring, dedup | 35 | +| Ownership Inference | Signal weighting, ambiguity detection, team resolution | 40 | +| Catalog API | CRUD, search, filtering, pagination | 30 | +| Governance Policy | Strict/audit modes, panic mode, per-team overrides | 25 | +| Feature Flags | Phantom quarantine circuit breaker, flag lifecycle | 15 | +| Scorecard Engine (V1 basic) | Criteria evaluation, grade calculation | 20 | +| Template Engine | Service template generation from catalog data | 15 | +| Slack Bot | Command parsing, response formatting | 30 | -### Integration Test Boundaries +### 2.3 Integration Test Boundaries -| Boundary | What to Test | Tool | -|----------|-------------|------| -| Discovery → GitHub API | GraphQL query shape, pagination, rate limit handling | MSW (mock service worker) or nock | -| Discovery → AWS APIs | boto3 call sequences, pagination, error handling | moto (AWS mock library) | -| Reconciler → PostgreSQL | Upsert logic, conflict resolution, RLS enforcement | Testcontainers (PostgreSQL) | -| Inference → PostgreSQL | Ownership write, confidence update, correction propagation | Testcontainers (PostgreSQL) | -| API → Meilisearch | Index sync, search query construction, tenant filter injection | Meilisearch test instance (Docker) | -| API → Redis | Cache set/get/invalidation, TTL behavior | ioredis-mock or Testcontainers (Redis) | -| Slack Bot → Portal API | Command → search → format response | Supertest against local API | -| Stripe webhook → API | Subscription activation, plan change, cancellation | Stripe CLI webhook forwarding | +| Boundary | What's Tested | Infrastructure | +|----------|--------------|----------------| +| AWS Scanner → AWS APIs | STS assume role, CloudFormation, ECS, Lambda, RDS listing | LocalStack | +| GitHub Scanner → GitHub API | GraphQL queries, rate limiting, pagination | WireMock (recorded responses) | +| Reconciler → PostgreSQL | Service upsert, ownership writes, conflict resolution | Testcontainers PostgreSQL | +| API → PostgreSQL | Catalog queries, tenant isolation, search | Testcontainers PostgreSQL | +| API → Meilisearch | Index sync, full-text search, faceted filtering | Testcontainers Meilisearch | +| API → Redis | Session management, cache invalidation, rate limiting | Testcontainers Redis | +| Slack Bot → Slack API | Command handling, block formatting | WireMock | +| Step Functions → Lambdas | Discovery orchestration flow | LocalStack | -### E2E / Smoke Test Scenarios +### 2.4 E2E/Smoke Scenarios -1. Full onboarding: GitHub OAuth → AWS connection → discovery trigger → catalog populated -2. Cmd+K search returns results in <200ms after discovery -3. Ownership correction propagates to similar services -4. Slack `/dd0c who owns` returns correct owner -5. Discovery accuracy: synthetic org with known ground truth scores >80% -6. Governance strict mode: discovery populates pending queue, not catalog directly -7. Panic mode: all catalog writes return 503 +1. **5-Minute Miracle**: Connect AWS + GitHub → auto-discover services → catalog populated with >80% accuracy +2. **Cmd+K Search**: Type service name → results appear in <200ms with correct ranking +3. **Ownership Assignment**: Discover services → infer ownership → correct team assigned +4. **Phantom Quarantine**: Bad discovery rule → phantom services quarantined, not added to catalog +5. **Panic Mode**: Enable panic → all discovery halts → catalog frozen read-only --- -## 3. Unit Test Strategy (Per Component) +## Section 3: Unit Test Strategy -### 3.1 AWS Scanner (Python / pytest) - -**What to test:** -- Resource-to-service grouping heuristics (the core logic) -- Confidence score assignment per signal type -- Pagination handling for each AWS API -- Cross-region scan aggregation -- Error handling for throttling, missing permissions, empty accounts - -**Key test cases:** +### 3.1 AWS Scanner ```python -# tests/unit/test_cfn_scanner.py +# tests/unit/scanners/test_aws_scanner.py class TestCloudFormationScanner: - def test_stack_name_becomes_service_name_with_high_confidence(self): - # Given a CFN stack named "payment-api" - # Expect service entity with name="payment-api", confidence=0.95 + def test_lists_all_stacks_with_pagination(self): ... + def test_extracts_service_name_from_stack_name(self): ... + def test_maps_stack_resources_to_service_components(self): ... + def test_assigns_confidence_095_to_cfn_discovered_services(self): ... + def test_handles_deleted_stacks_gracefully(self): ... + def test_extracts_service_team_project_tags(self): ... - def test_stack_tags_extracted_as_service_metadata(self): - # Given stack with tags {"service": "payment", "team": "payments"} - # Expect service.metadata includes both tags - - def test_stacks_in_multiple_regions_deduplicated_by_name(self): - # Given same stack name in us-east-1 and us-west-2 - # Expect single service entity with both regions in infrastructure - - def test_deleted_stacks_excluded_from_results(self): - # Given stack with status DELETE_COMPLETE - # Expect it is not included in discovered services - - def test_pagination_fetches_all_stacks_beyond_first_page(self): - # Given mock returning 2 pages of stacks - # Expect all stacks from both pages are processed +class TestECSScanner: + def test_lists_all_clusters_and_services(self): ... + def test_extracts_container_image_from_task_definition(self): ... + def test_maps_ecs_service_to_cfn_stack_when_tagged(self): ... + def test_standalone_ecs_service_without_cfn_gets_confidence_070(self): ... + def test_handles_empty_cluster_without_error(self): ... class TestLambdaScanner: - def test_lambdas_with_shared_prefix_grouped_into_single_service(self): - # Given ["payment-webhook", "payment-processor", "payment-refund"] - # Expect single service "payment" with confidence=0.60 + def test_lists_all_functions_with_pagination(self): ... + def test_extracts_api_gateway_event_source_mapping(self): ... + def test_links_lambda_to_api_gateway_route(self): ... + def test_standalone_lambda_without_trigger_still_discovered(self): ... - def test_lambda_with_apigw_trigger_gets_higher_confidence(self): - # Given Lambda with API Gateway event source mapping - # Expect confidence=0.85 (not 0.60) +class TestRDSScanner: + def test_lists_rds_instances_with_tags(self): ... + def test_maps_database_to_service_by_naming_prefix(self): ... + def test_maps_database_to_service_by_cfn_stack_membership(self): ... + def test_marks_rds_as_infrastructure_not_service(self): ... - def test_standalone_lambda_without_prefix_pattern_kept_as_individual(self): - # Given Lambda named "data-export-job" with no siblings - # Expect individual service entity, not grouped - -class TestServiceGroupingHeuristics: - def test_cfn_stack_takes_priority_over_ecs_service_for_same_name(self): - # Given CFN stack "payment-api" AND ECS service "payment-api" - # Expect single service entity (not duplicate), source=cloudformation - - def test_explicit_github_repo_tag_overrides_name_matching(self): - # Given AWS resource with tag github_repo="acme/payments-v2" - # Expect repo_link="acme/payments-v2" with confidence=0.95 - # (not fuzzy name match result) +class TestSTSRoleAssumption: + def test_assumes_cross_account_role_with_external_id(self): ... + def test_raises_clear_error_on_role_not_found(self): ... + def test_raises_clear_error_on_invalid_external_id(self): ... + def test_caches_credentials_until_expiry(self): ... ``` -**Mocking strategy:** -- Use `moto` to mock all boto3 calls — no real AWS calls in unit tests -- Fixture files in `tests/fixtures/aws/` contain realistic API response payloads -- Each fixture named after the scenario: `cfn_stacks_multi_region.json`, `lambda_functions_with_apigw.json` +**Mocking strategy:** `moto` library for AWS API mocking in unit tests. LocalStack for integration tests. + +### 3.2 GitHub Scanner ```python -@pytest.fixture -def mock_aws(aws_credentials): - with mock_cloudformation(), mock_ecs(), mock_lambda_(): - yield +# tests/unit/scanners/test_github_scanner.py -def test_full_scan_produces_expected_service_count(mock_aws, cfn_fixture): - setup_mock_cfn_stacks(cfn_fixture) - result = AWSScanner(tenant_id="test", role_arn="arn:aws:iam::123:role/test").scan() - assert len(result.services) == cfn_fixture["expected_service_count"] +class TestRepoScanner: + def test_lists_active_non_archived_non_forked_repos(self): ... + def test_extracts_primary_language(self): ... + def test_extracts_top_5_committers(self): ... + def test_batches_graphql_queries_at_100_repos_per_call(self): ... + def test_handles_rate_limit_with_retry_after(self): ... + def test_paginates_through_large_orgs(self): ... + +class TestCodeownersParser: + def test_parses_team_ownership_from_codeowners(self): ... + def test_handles_wildcard_pattern_matching(self): ... + def test_handles_multiple_owners_per_path(self): ... + def test_returns_empty_when_codeowners_missing(self): ... + def test_handles_comment_lines_and_blank_lines(self): ... + def test_resolves_github_team_to_display_name(self): ... + +class TestWorkflowParser: + def test_extracts_ecs_deploy_action_target(self): ... + def test_extracts_lambda_deploy_action_target(self): ... + def test_links_repo_to_aws_service_by_task_definition_name(self): ... + def test_handles_matrix_strategy_with_multiple_targets(self): ... + def test_ignores_non_deploy_workflows(self): ... + +class TestReadmeExtractor: + def test_extracts_first_descriptive_paragraph(self): ... + def test_skips_badges_and_header_images(self): ... + def test_returns_empty_for_missing_readme(self): ... + def test_truncates_at_500_characters(self): ... ``` ---- +**Mocking strategy:** Recorded GraphQL responses in `fixtures/github/`. Use `responses` library for HTTP mocking. -### 3.2 GitHub Scanner (Node.js / Jest) - -**What to test:** -- GraphQL query construction and batching -- CODEOWNERS file parsing (all valid formats) -- README first-paragraph extraction -- Deploy workflow target extraction -- Rate limit detection and backoff - -**Key test cases:** - -```typescript -// tests/unit/github-scanner/codeowners-parser.test.ts - -describe('CODEOWNERSParser', () => { - it('parses_simple_wildcard_ownership_to_team', () => { - const input = '* @acme/platform-team' - expect(parse(input)).toEqual([{ pattern: '*', owners: ['@acme/platform-team'] }]) - }) - - it('parses_path_specific_ownership', () => { - const input = '/src/payments/ @acme/payments-team' - expect(parse(input)).toEqual([{ pattern: '/src/payments/', owners: ['@acme/payments-team'] }]) - }) - - it('handles_multiple_owners_per_pattern', () => { - const input = '*.ts @acme/frontend @acme/platform' - expect(parse(input).owners).toHaveLength(2) - }) - - it('ignores_comment_lines', () => { - const input = '# This is a comment\n* @acme/team' - expect(parse(input)).toHaveLength(1) - }) - - it('returns_empty_array_for_missing_codeowners_file', () => { - expect(parse(null)).toEqual([]) - }) - - it('handles_individual_user_ownership_not_just_teams', () => { - const input = '* @sarah-chen' - expect(parse(input)[0].owners[0]).toBe('@sarah-chen') - }) -}) - -describe('READMEExtractor', () => { - it('extracts_first_non_heading_non_badge_paragraph', () => { - const readme = `# Payment Gateway\n\n![build](badge.svg)\n\nHandles Stripe checkout flows.` - expect(extractDescription(readme)).toBe('Handles Stripe checkout flows.') - }) - - it('returns_null_when_readme_has_only_headings_and_badges', () => { - const readme = `# Title\n\n![badge](url)` - expect(extractDescription(readme)).toBeNull() - }) -}) - -describe('WorkflowTargetExtractor', () => { - it('extracts_ecs_service_name_from_deploy_workflow', () => { - const yaml = loadFixture('deploy-workflow-ecs.yml') - expect(extractDeployTarget(yaml)).toEqual({ - type: 'ecs_service', - name: 'payment-api', - cluster: 'production' - }) - }) - - it('extracts_lambda_function_name_from_serverless_deploy', () => { - const yaml = loadFixture('deploy-workflow-lambda.yml') - expect(extractDeployTarget(yaml)).toEqual({ - type: 'lambda_function', - name: 'payment-webhook-handler' - }) - }) -}) -``` - -**Mocking strategy:** -- Use `nock` or `msw` to intercept GitHub GraphQL API calls -- Fixture files in `tests/fixtures/github/` for realistic API responses -- Test the GraphQL query builder separately from the HTTP client - ---- - -### 3.3 Reconciliation Engine (Node.js / Jest) - -**What to test:** -- Cross-referencing AWS resources with GitHub repos (all 5 matching rules) -- Deduplication when multiple signals point to the same service -- Conflict resolution when signals disagree -- Batch processing of SQS messages - -**Key test cases:** - -```typescript -describe('ReconciliationEngine', () => { - describe('matchAWSToGitHub', () => { - it('explicit_tag_match_takes_highest_priority', () => { - const awsService = buildAWSService({ tags: { github_repo: 'acme/payment-gateway' } }) - const ghRepo = buildGHRepo({ name: 'payment-gateway', org: 'acme' }) - const result = reconcile([awsService], [ghRepo]) - expect(result[0].repoLinkSource).toBe('explicit_tag') - expect(result[0].repoLinkConfidence).toBe(0.95) - }) - - it('deploy_workflow_match_used_when_no_explicit_tag', () => { - const awsService = buildAWSService({ name: 'payment-api' }) - const ghRepo = buildGHRepo({ deployTarget: 'payment-api' }) - const result = reconcile([awsService], [ghRepo]) - expect(result[0].repoLinkSource).toBe('deploy_workflow') - }) - - it('fuzzy_name_match_used_as_fallback', () => { - const awsService = buildAWSService({ name: 'payment-service' }) - const ghRepo = buildGHRepo({ name: 'payment-svc' }) - const result = reconcile([awsService], [ghRepo]) - expect(result[0].repoLinkSource).toBe('name_match') - expect(result[0].repoLinkConfidence).toBe(0.75) - }) - - it('no_match_produces_aws_only_service_entity', () => { - const awsService = buildAWSService({ name: 'legacy-monolith' }) - const result = reconcile([awsService], []) - expect(result[0].repoUrl).toBeNull() - expect(result[0].discoverySources).toContain('cloudformation') - expect(result[0].discoverySources).not.toContain('github_repo') - }) - - it('deduplicates_cfn_stack_and_ecs_service_with_same_name', () => { - const cfnService = buildAWSService({ source: 'cloudformation', name: 'payment-api' }) - const ecsService = buildAWSService({ source: 'ecs_service', name: 'payment-api' }) - const result = reconcile([cfnService, ecsService], []) - expect(result).toHaveLength(1) - expect(result[0].discoverySources).toContain('cloudformation') - expect(result[0].discoverySources).toContain('ecs_service') - }) - }) -}) -``` - ---- - -### 3.4 Ownership Inference Engine (Python / pytest) - -This is the highest-value unit test target. Ownership inference is the most complex logic and the most likely source of accuracy failures. - -**Key test cases:** +### 3.3 Reconciliation Engine ```python -class TestOwnershipScorer: - def test_codeowners_weighted_highest_at_0_40(self): - signals = [Signal(type='codeowners', team='payments', raw_score=1.0)] - result = score_ownership(signals) - assert result['payments'].weighted_score == pytest.approx(0.40) +# tests/unit/test_reconciler.py - def test_multiple_signals_summed_correctly(self): - signals = [ - Signal(type='codeowners', team='payments', raw_score=1.0), # 0.40 - Signal(type='cfn_tag', team='payments', raw_score=1.0), # 0.20 - Signal(type='git_blame_frequency', team='payments', raw_score=1.0), # 0.25 - ] - result = score_ownership(signals) - assert result['payments'].total_score == pytest.approx(0.85) - - def test_primary_owner_is_highest_scoring_team(self): - signals = [ - Signal(type='codeowners', team='payments', raw_score=1.0), - Signal(type='git_blame_frequency', team='platform', raw_score=1.0), - ] - result = score_ownership(signals) - assert result.primary_owner == 'payments' - - def test_service_marked_unowned_when_top_score_below_0_50(self): - signals = [Signal(type='git_blame_frequency', team='unknown', raw_score=0.3)] - result = score_ownership(signals) - assert result.status == 'unowned' - - def test_service_marked_ambiguous_when_top_two_within_0_10(self): - signals = [ - Signal(type='codeowners', team='payments', raw_score=0.8), - Signal(type='codeowners', team='platform', raw_score=0.75), - ] - result = score_ownership(signals) - assert result.status == 'ambiguous' - - def test_user_correction_overrides_all_inference_with_score_1_00(self): - signals = [ - Signal(type='codeowners', team='payments', raw_score=1.0), - Signal(type='user_correction', team='platform', raw_score=1.0), - ] - result = score_ownership(signals) - assert result.primary_owner == 'platform' - assert result.primary_confidence == 1.00 - assert result.primary_source == 'user_correction' - - def test_correction_propagation_applies_to_matching_repo_prefix(self): - correction = Correction(repo='payment-gateway', team='payments') - candidates = ['payment-processor', 'payment-webhook', 'auth-service'] - propagated = propagate_correction(correction, candidates) - assert 'payment-processor' in propagated - assert 'payment-webhook' in propagated - assert 'auth-service' not in propagated +class TestReconciler: + def test_matches_github_repo_to_aws_service_by_deploy_target(self): ... + def test_matches_github_repo_to_aws_service_by_naming_convention(self): ... + def test_merges_aws_and_github_metadata_into_single_service(self): ... + def test_deduplicates_services_discovered_from_multiple_sources(self): ... + def test_assigns_higher_confidence_when_both_sources_agree(self): ... + def test_creates_separate_services_when_no_cross_reference_found(self): ... + def test_preserves_manual_overrides_during_rescan(self): ... + def test_marks_previously_discovered_service_as_stale_when_missing(self): ... ``` ---- +### 3.4 Ownership Inference -### 3.5 Portal API — Route Handlers (Node.js / Jest + Supertest) +The highest-risk logic in the product. Exhaustive testing required. -**What to test:** -- Tenant isolation enforcement (tenant_id injected into every query) -- Search endpoint proxies to Meilisearch with mandatory tenant filter -- PATCH /services enforces correction logging -- Auth middleware rejects unauthenticated requests +```python +# tests/unit/test_ownership_inference.py + +class TestOwnershipInference: + # Signal weighting + def test_codeowners_signal_weighted_040(self): ... + def test_top_committer_signal_weighted_030(self): ... + def test_pr_reviewer_signal_weighted_020(self): ... + def test_aws_tag_signal_weighted_010(self): ... + + # Confidence calculation + def test_single_strong_signal_produces_moderate_confidence(self): ... + def test_multiple_agreeing_signals_produce_high_confidence(self): ... + def test_conflicting_signals_produce_low_confidence(self): ... + def test_returns_ambiguous_when_top_scores_tied(self): ... + def test_returns_ambiguous_when_confidence_under_050(self): ... + def test_flags_unowned_when_no_signals_found(self): ... + + # Edge cases + def test_handles_individual_owner_not_in_any_team(self): ... + def test_handles_deleted_github_team(self): ... + def test_handles_repo_with_single_committer(self): ... + def test_handles_repo_with_no_codeowners_file(self): ... + def test_manual_override_always_wins_regardless_of_signals(self): ... + + # Table-driven: signal combinations + @pytest.mark.parametrize("signals,expected_team,expected_confidence", [ + ({"codeowners": "team-a", "committers": "team-a", "reviewers": "team-a"}, "team-a", 0.90), + ({"codeowners": "team-a", "committers": "team-b", "reviewers": "team-a"}, "team-a", 0.60), + ({"codeowners": None, "committers": "team-b", "reviewers": "team-b"}, "team-b", 0.50), + ({"codeowners": "team-a", "committers": "team-b", "reviewers": "team-c"}, None, None), # ambiguous + ]) + def test_signal_combination_produces_expected_ownership(self, signals, expected_team, expected_confidence): ... +``` + +### 3.5 Catalog API ```typescript -describe('GET /api/v1/services/search', () => { - it('injects_tenant_id_filter_into_meilisearch_query', async () => { - const spy = jest.spyOn(meilisearchClient, 'search') - await request(app).get('/api/v1/services/search?q=payment').set('Authorization', `Bearer ${tenantAToken}`) - expect(spy).toHaveBeenCalledWith(expect.objectContaining({ - filter: expect.stringContaining(`tenant_id = '${TENANT_A_ID}'`) - })) - }) +// tests/unit/api/catalog.test.ts +describe('CatalogAPI', () => { + describe('Service CRUD', () => { + it('creates service with all required fields', () => {}); + it('returns 404 for non-existent service', () => {}); + it('updates service metadata without overwriting ownership', () => {}); + it('soft-deletes service (marks stale, does not remove)', () => {}); + }); - it('returns_401_when_no_auth_token_provided', async () => { - const res = await request(app).get('/api/v1/services/search?q=payment') - expect(res.status).toBe(401) - }) + describe('Search & Filtering', () => { + it('returns services sorted by confidence descending', () => {}); + it('filters by team ownership', () => {}); + it('filters by language', () => {}); + it('filters by discovery source (aws/github/manual)', () => {}); + it('paginates with cursor-based pagination', () => {}); + }); - it('tenant_a_cannot_see_tenant_b_services', async () => { - // Seed Meilisearch with services for both tenants - // Query as tenant A, assert no tenant B results - }) -}) - -describe('PATCH /api/v1/services/:id', () => { - it('stores_correction_in_corrections_table', async () => { - await request(app) - .patch(`/api/v1/services/${SERVICE_ID}`) - .send({ team_id: NEW_TEAM_ID }) - .set('Authorization', `Bearer ${adminToken}`) - const correction = await db.corrections.findFirst({ where: { service_id: SERVICE_ID } }) - expect(correction).toBeDefined() - expect(correction.new_value).toMatchObject({ team_id: NEW_TEAM_ID }) - }) - - it('sets_confidence_to_1_00_on_user_correction', async () => { - await request(app).patch(`/api/v1/services/${SERVICE_ID}`).send({ team_id: NEW_TEAM_ID }) - const ownership = await db.service_ownership.findFirst({ where: { service_id: SERVICE_ID } }) - expect(ownership.confidence).toBe(1.00) - expect(ownership.source).toBe('user_correction') - }) -}) + describe('Tenant Isolation', () => { + it('never returns services from another tenant', () => {}); + it('enforces tenant_id on all queries', () => {}); + }); +}); ``` -### 3.6 Slack Bot Command Handlers (Node.js / Jest) +### 3.6 Governance Policy Engine -**What to test:** -- Command parsing (`/dd0c who owns `) -- Typo tolerance matching logic (delegated to search, but bot needs to handle 0 results) -- Block kit message formatting -- Error handling (unauthorized workspace, missing service) +```typescript +describe('GovernancePolicy', () => { + describe('Mode Enforcement', () => { + it('strict mode: discovery populates pending review queue', () => {}); + it('strict mode: never auto-mutates catalog', () => {}); + it('audit mode: auto-applies discoveries with logging', () => {}); + it('defaults new tenants to strict mode', () => {}); + }); -### 3.7 Feature Flags & Governance Policy (Node.js / Jest) + describe('Panic Mode', () => { + it('halts all discovery scans when panic=true', () => {}); + it('freezes catalog as read-only', () => {}); + it('API returns 503 for write operations during panic', () => {}); + it('shows maintenance banner in API response headers', () => {}); + }); -**What to test:** -- Flag evaluation (`openfeature` provider) -- Governance strict vs. audit mode -- Panic mode blocking writes + describe('Per-Team Override', () => { + it('team can lock services to strict even when system is audit', () => {}); + it('team cannot downgrade from system strict to audit', () => {}); + it('merge logic: max_restrictive(system, team)', () => {}); + }); +}); +``` + +### 3.7 Feature Flag Circuit Breaker + +```typescript +describe('PhantomQuarantineBreaker', () => { + it('allows service creation when discovery rate is normal', () => {}); + it('trips breaker when >5 unconfirmed services created in single scan', () => {}); + it('quarantines phantom services instead of deleting them', () => {}); + it('auto-disables the discovery flag when breaker trips', () => {}); + it('quarantined services have status=quarantined, not active', () => {}); + it('quarantined services visible in admin review queue', () => {}); +}); +``` + +### 3.8 Slack Bot + +```typescript +describe('SlackBot', () => { + describe('Command Parsing', () => { + it('parses /portal search command', () => {}); + it('parses /portal service command', () => {}); + it('parses /portal owner command', () => {}); + it('returns help text for unknown commands', () => {}); + }); + + describe('Response Formatting', () => { + it('formats service card with name, team, language, links', () => {}); + it('formats search results as compact list (max 10)', () => {}); + it('formats ownership info with confidence badge', () => {}); + it('includes "View in Portal" button link', () => {}); + }); +}); +``` --- -## 4. Integration Test Strategy +## Section 4: Integration Test Strategy -Integration tests verify that our code correctly interacts with external boundaries: databases, caches, search indices, and third-party APIs. +### 4.1 AWS Scanner → LocalStack -### 4.1 Service Boundary Tests -- **Discovery ↔ GitHub/GitLab:** Use `nock` or `MSW` to mock the GitHub GraphQL endpoint. Assert that the Node.js scanner constructs the correct query and handles rate limits (HTTP 403/429) via retries. -- **Catalog ↔ PostgreSQL:** Use Testcontainers for PostgreSQL to verify complex `upsert` queries, foreign key constraints, and RLS (Row-Level Security) tenant isolation. -- **API ↔ Meilisearch:** Use a Meilisearch Docker container. Assert that document syncing (PostgreSQL -> SQS -> Meilisearch) completes and search queries with `tenant_id` filters return the expected subset of data. +```python +# tests/integration/scanners/test_aws_integration.py -### 4.2 Git Provider API Contract Tests -- Write scheduled "contract tests" that run against the *live* GitHub API daily using a dedicated test org. -- These detect if GitHub changes their GraphQL schema or rate limit behavior. -- Assert that `HEAD:CODEOWNERS` blob extraction still works. +class TestAWSIntegration: + @pytest.fixture(autouse=True) + def setup_localstack(self, localstack_endpoint): + """Create test resources in LocalStack.""" + self.cfn = boto3.client('cloudformation', endpoint_url=localstack_endpoint) + self.ecs = boto3.client('ecs', endpoint_url=localstack_endpoint) + # Create test stacks, clusters, services, lambdas + self.cfn.create_stack(StackName='payment-api', TemplateBody=MINIMAL_TEMPLATE) + self.ecs.create_cluster(clusterName='prod') + self.ecs.create_service(cluster='prod', serviceName='payment-api', ...) -### 4.3 Testcontainers for Local Infrastructure -- **Database:** `testcontainers-node` spinning up `postgres:15-alpine`. -- **Search:** `getmeili/meilisearch:latest`. -- **Cache:** `redis:7-alpine`. -- Run these in GitHub Actions via Docker-in-Docker. + def test_full_aws_scan_discovers_all_resource_types(self): ... + def test_scan_groups_resources_by_cfn_stack(self): ... + def test_scan_handles_cross_region_resources(self): ... + def test_scan_respects_api_rate_limits(self): ... + def test_scan_completes_within_60_seconds_for_50_resources(self): ... +``` + +### 4.2 GitHub Scanner → WireMock + +```python +# tests/integration/scanners/test_github_integration.py + +class TestGitHubIntegration: + @pytest.fixture(autouse=True) + def setup_wiremock(self, wiremock_url): + """Load recorded GitHub GraphQL responses.""" + # Stub: POST /graphql → recorded response with 10 repos + wiremock.stub_for(post('/graphql').will_return( + json_response(load_fixture('github/org-repos-page1.json')) + )) + + def test_full_github_scan_discovers_repos_with_metadata(self): ... + def test_scan_extracts_codeowners_for_each_repo(self): ... + def test_scan_extracts_deploy_workflows(self): ... + def test_scan_handles_graphql_rate_limit_with_retry(self): ... + def test_scan_paginates_through_100_plus_repos(self): ... +``` + +### 4.3 Reconciler → PostgreSQL + +```python +# tests/integration/test_reconciler_db.py + +class TestReconcilerDB: + @pytest.fixture(autouse=True) + def setup_db(self, pg_container): + """Run migrations against Testcontainers PostgreSQL.""" + run_migrations(pg_container.get_connection_url()) + + def test_upserts_discovered_service_without_duplicates(self): ... + def test_preserves_manual_ownership_override_on_rescan(self): ... + def test_marks_missing_services_as_stale(self): ... + def test_tenant_isolation_enforced_at_db_level(self): ... + def test_concurrent_scans_for_different_tenants_dont_conflict(self): ... +``` + +### 4.4 API → Meilisearch + +```typescript +// tests/integration/search/meilisearch.test.ts +describe('Meilisearch Integration', () => { + let meili: StartedTestContainer; + + beforeAll(async () => { + meili = await new GenericContainer('getmeili/meilisearch:v1') + .withExposedPorts(7700) + .start(); + // Index test services + await indexServices(testCatalog); + }); + + it('returns relevant results for service name search', async () => { + const results = await search('payment'); + expect(results[0].name).toContain('payment'); + }); + + it('returns results within 200ms for 1000-service catalog', async () => { + await indexServices(generate1000Services()); + const start = performance.now(); + await search('api'); + expect(performance.now() - start).toBeLessThan(200); + }); + + it('supports faceted filtering by team and language', async () => { + const results = await search('', { filters: { team: 'platform', language: 'TypeScript' } }); + expect(results.every(r => r.team === 'platform')).toBe(true); + }); +}); +``` + +### 4.5 Step Functions → Lambda Orchestration (LocalStack) + +```python +# tests/integration/test_discovery_orchestration.py + +class TestDiscoveryOrchestration: + def test_step_function_executes_aws_then_github_then_reconcile(self): ... + def test_step_function_retries_failed_scanner_once(self): ... + def test_step_function_completes_within_5_minutes(self): ... + def test_step_function_sends_completion_event_to_sqs(self): ... +``` --- -## 5. E2E & Smoke Tests +## Section 5: E2E & Smoke Tests -E2E tests treat the system as a black box, interacting only through the API and the React UI. We keep these fast and focused on the "5-Minute Miracle" critical path. +### 5.1 The 5-Minute Miracle -### 5.1 Critical User Journeys (Playwright) -1. **The Onboarding Flow:** Mock GitHub OAuth login -> Connect AWS (mock CFN role ARN validation) -> Trigger Discovery -> Wait for WebSocket completion -> Verify 147 services appear in catalog. -2. **Cmd+K Search:** Open modal (`Cmd+K`) -> type "pay" -> assert "payment-gateway" is highlighted in < 200ms -> press Enter -> assert service detail card opens. -3. **Correcting Ownership:** Open service detail -> Click "Correct Owner" -> select new team -> assert badge changes to 100% confidence -> assert Meilisearch is updated. +```typescript +// tests/e2e/journeys/five-minute-miracle.test.ts +describe('5-Minute Auto-Discovery', () => { + it('discovers >80% of services from AWS + GitHub within 5 minutes', async () => { + // Setup: LocalStack with 20 known services, WireMock GitHub with 15 repos + const knownServices = await setupTestInfrastructure(20); + const knownRepos = await setupTestGitHub(15); -### 5.2 The >80% Auto-Discovery Accuracy Validation -- **The "Party Mode" Org:** Maintain a real GitHub org and a mock AWS environment with exactly 100 known services, 10 known teams, and specific chaotic naming conventions. -- **The Assertion:** Run discovery. Assert that > 80 of the services are correctly inferred with the right primary owner and repo link. -- *This is the most important test in the suite. If a PR drops this below 80%, it cannot be merged.* + // Trigger discovery + const start = Date.now(); + await triggerDiscovery('e2e-tenant'); + await waitForDiscoveryComplete('e2e-tenant', { timeoutMs: 5 * 60 * 1000 }); + const elapsed = Date.now() - start; -### 5.3 Synthetic Topology Generation -- Script to generate `N` mock CFN stacks, `M` ECS services, and `K` GitHub repos to feed the E2E environment without hitting AWS/GitHub limits. + // Validate + expect(elapsed).toBeLessThan(5 * 60 * 1000); + const catalog = await getCatalog('e2e-tenant'); + const matchedServices = catalog.filter(s => + knownServices.some(k => s.name === k.name) + ); + const accuracy = matchedServices.length / knownServices.length; + expect(accuracy).toBeGreaterThan(0.80); + }); +}); +``` + +### 5.2 Cmd+K Search + +```typescript +describe('Cmd+K Search Experience', () => { + it('returns search results within 200ms', async () => { + await populateCatalog(100); + const start = performance.now(); + const results = await searchAPI('payment'); + expect(performance.now() - start).toBeLessThan(200); + expect(results.length).toBeGreaterThan(0); + }); + + it('ranks exact name match above partial match', async () => { + await populateCatalog([ + { name: 'payment-api' }, + { name: 'payment-processor' }, + { name: 'api-gateway' }, + ]); + const results = await searchAPI('payment-api'); + expect(results[0].name).toBe('payment-api'); + }); +}); +``` + +### 5.3 Phantom Quarantine Journey + +```typescript +describe('Phantom Quarantine', () => { + it('quarantines phantom services when discovery rule misfires', async () => { + // Enable a bad discovery flag that creates phantom services + await enableFlag('experimental-tag-scanner'); + + // Trigger discovery — bad rule creates 8 phantom services + await triggerDiscovery('e2e-tenant'); + await waitForDiscoveryComplete('e2e-tenant'); + + // Circuit breaker should have tripped (>5 unconfirmed) + const catalog = await getCatalog('e2e-tenant'); + const quarantined = catalog.filter(s => s.status === 'quarantined'); + expect(quarantined.length).toBeGreaterThanOrEqual(5); + + // Flag should be auto-disabled + const flagState = await getFlagState('experimental-tag-scanner'); + expect(flagState.enabled).toBe(false); + }); +}); +``` + +### 5.4 E2E Infrastructure + +```yaml +# docker-compose.e2e.yml +services: + localstack: + image: localstack/localstack:3 + environment: + SERVICES: sts,cloudformation,ecs,lambda,rds,s3,sqs,stepfunctions + ports: ["4566:4566"] + + postgres: + image: postgres:16-alpine + environment: + POSTGRES_PASSWORD: test + ports: ["5432:5432"] + + redis: + image: redis:7-alpine + ports: ["6379:6379"] + + meilisearch: + image: getmeili/meilisearch:v1 + ports: ["7700:7700"] + + wiremock: + image: wiremock/wiremock:3 + ports: ["8080:8080"] + volumes: + - ./fixtures/wiremock:/home/wiremock/mappings + + app: + build: . + environment: + AWS_ENDPOINT: http://localstack:4566 + DATABASE_URL: postgres://postgres:test@postgres:5432/test + REDIS_URL: redis://redis:6379 + MEILI_URL: http://meilisearch:7700 + GITHUB_API_URL: http://wiremock:8080 + SLACK_API_URL: http://wiremock:8080 + depends_on: [localstack, postgres, redis, meilisearch, wiremock] +``` --- -## 6. Performance & Load Testing - -Load tests ensure the serverless architecture scales correctly and the Cmd+K search remains instantaneous. +## Section 6: Performance & Load Testing ### 6.1 Discovery Scan Benchmarks -- **Target:** 500 AWS resources + 500 GitHub repos scanned and reconciled in < 120 seconds. -- **Tooling:** K6 or Artillery. Push 5,000 synthetic SQS messages into the Reconciler queue and measure Lambda batch processing throughput. + +```python +# tests/perf/test_discovery_performance.py + +class TestDiscoveryPerformance: + def test_aws_scan_completes_within_60s_for_50_resources(self): ... + def test_aws_scan_completes_within_3min_for_500_resources(self): ... + def test_github_scan_completes_within_60s_for_100_repos(self): ... + def test_github_scan_completes_within_3min_for_500_repos(self): ... + def test_full_discovery_pipeline_completes_within_5min_for_medium_org(self): + """Medium org: 200 AWS resources + 150 GitHub repos.""" + ... + def test_reconciliation_completes_within_30s_for_200_services(self): ... +``` ### 6.2 Catalog Query Latency -- **Target:** API search endpoint returns in < 100ms at the 99th percentile. -- **Test:** Load Meilisearch with 10,000 service documents. Fire 50 concurrent Cmd+K search requests per second. Assert p99 latency. -### 6.3 Concurrent Scorecard Evaluation -- Ensure the Python inference Lambda can evaluate 1,000 services concurrently without database connection exhaustion (using Aurora Serverless v2 connection pooling). +```typescript +describe('Catalog Query Performance', () => { + it('returns service list in <100ms with 1000 services', async () => { + await populateCatalog(1000); + const start = performance.now(); + await getCatalog('perf-tenant', { limit: 50 }); + expect(performance.now() - start).toBeLessThan(100); + }); + + it('Meilisearch returns results in <200ms with 5000 services', async () => { + await indexServices(generate5000Services()); + const start = performance.now(); + await search('payment'); + expect(performance.now() - start).toBeLessThan(200); + }); + + it('concurrent 50 catalog queries complete within 500ms p95', async () => { + await populateCatalog(1000); + const results = await Promise.all( + Array.from({ length: 50 }, () => timedQuery('perf-tenant')) + ); + const p95 = percentile(results.map(r => r.elapsed), 95); + expect(p95).toBeLessThan(500); + }); +}); +``` + +### 6.3 Ownership Inference at Scale + +```python +class TestOwnershipPerformance: + def test_infers_ownership_for_200_services_within_60s(self): ... + def test_memory_stays_under_256mb_during_500_service_inference(self): ... + def test_handles_org_with_50_teams_without_degradation(self): ... +``` --- -## 7. CI/CD Pipeline Integration +## Section 7: CI/CD Pipeline Integration -The test pyramid is enforced through GitHub Actions. +### 7.1 Pipeline Stages -### 7.1 Test Stages -- **Pre-commit:** Husky runs ESLint, Prettier, and fast unit tests (Jest/pytest) for changed files only. -- **PR Gate:** Runs the full Unit and Integration test suites. Blocks merge if coverage drops or tests fail. -- **Merge (Main):** Deploys to Staging. Runs E2E Critical User Journeys and the 80% Accuracy Validation suite against the Party Mode org. -- **Post-Deploy:** Smoke tests verify health endpoints and ALB routing in production. +``` +┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ +│ Pre-Commit │───▶│ PR Gate │───▶│ Merge │───▶│ Staging │───▶│ Prod │ +│ (local) │ │ (CI) │ │ (CI) │ │ (CD) │ │ (CD) │ +└─────────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ + lint + type unit tests full suite E2E + perf smoke + canary + <10s <5min <10min <15min 5-min miracle +``` ### 7.2 Coverage Thresholds -- Global Unit Test Coverage: 80% -- Ownership Inference & Reconciliation Logic: 95% -- Feature Flag & Governance Evaluators: 100% + +| Component | Minimum | Target | +|-----------|---------|--------| +| Ownership Inference | 90% | 95% | +| Reconciliation Engine | 85% | 90% | +| AWS Scanner | 80% | 85% | +| GitHub Scanner | 80% | 85% | +| Governance Policy | 90% | 95% | +| Catalog API | 80% | 85% | +| Overall | 80% | 85% | ### 7.3 Test Parallelization -- Jest tests run with `--maxWorkers=50%` locally, `100%` in CI. -- Integration tests using Testcontainers run serially per file to avoid database port conflicts, or use dynamic port binding and separate schemas for parallel execution. + +```yaml +# .github/workflows/test.yml +jobs: + unit-python: + runs-on: ubuntu-latest + strategy: + matrix: + suite: [scanners, reconciler, ownership, governance] + steps: + - run: pytest tests/unit/${{ matrix.suite }} -x --tb=short + + unit-typescript: + runs-on: ubuntu-latest + strategy: + matrix: + shard: [1, 2, 3] + steps: + - run: vitest --shard=${{ matrix.shard }}/3 + + integration: + runs-on: ubuntu-latest + services: + localstack: { image: localstack/localstack:3 } + postgres: { image: postgres:16-alpine } + redis: { image: redis:7-alpine } + meilisearch: { image: getmeili/meilisearch:v1 } + steps: + - run: pytest tests/integration/ -x + - run: vitest --project=integration + + e2e: + needs: [unit-python, unit-typescript, integration] + runs-on: ubuntu-latest + steps: + - run: docker compose -f docker-compose.e2e.yml up -d + - run: vitest --project=e2e +``` --- -## 8. Transparent Factory Tenet Testing +## Section 8: Transparent Factory Tenet Testing -Testing the governance and compliance features of the IDP itself. +### 8.1 Atomic Flagging — Phantom Quarantine Circuit Breaker -### 8.1 Feature Flag Circuit Breakers -- **Test:** Enable a flagged discovery heuristic that generates 10 phantom services. -- **Assert:** The system detects the threshold (>5 unconfirmed), auto-disables the flag, and marks the 10 services as `status: quarantined`. +```typescript +describe('Atomic Flagging', () => { + describe('Flag Lifecycle', () => { + it('new discovery source flag defaults to false', () => {}); + it('flag has owner and ttl metadata (max 14 days)', () => {}); + it('CI blocks when flag at 100% exceeds TTL', () => {}); + }); -### 8.2 Schema Migration Validation -- **Test:** Attempt to apply a PR that drops a column from the `services` table. -- **Assert:** CI migration validator script fails the build (additive-only rule). + describe('Phantom Quarantine Breaker', () => { + it('allows service creation when <5 unconfirmed per scan', () => {}); + it('trips breaker when >5 unconfirmed services in single scan', () => {}); + it('quarantines phantom services (status=quarantined)', () => {}); + it('auto-disables the discovery flag', () => {}); + it('quarantined services appear in admin review queue', () => {}); + it('admin can approve quarantined services into catalog', () => {}); + it('admin can purge quarantined services', () => {}); + }); -### 8.3 Decision Log Enforcement -- **Test:** Run a discovery scan where service ownership is inferred from `git blame`. -- **Assert:** A `decision_log` entry is written to PostgreSQL with the prompt/reasoning, alternatives, and confidence. + describe('Local Evaluation', () => { + it('flag check does not make network calls during scan', () => {}); + it('flag state refreshed from file/env every 60s', () => {}); + }); +}); +``` -### 8.4 OTEL Span Assertions -- **Test:** Run the Reconciler Lambda. -- **Assert:** The `catalog_scan` parent span contains child spans for `ownership_inference` with attributes for `catalog.service_id`, `catalog.ownership_signals`, and `catalog.confidence_score`. Use an in-memory OTEL exporter for testing. +### 8.2 Elastic Schema — Migration Validation -### 8.5 Governance Policy Enforcement -- **Test:** Set tenant policy to `strict` mode. Simulate auto-discovery finding a new service. -- **Assert:** Service is placed in the "pending review" queue and NOT visible in the main catalog. -- **Test:** Set `panic_mode: true`. Attempt a `PATCH /api/v1/services/123`. -- **Assert:** HTTP 503 Service Unavailable. +```python +class TestElasticSchema: + def test_rejects_migration_with_drop_column(self): ... + def test_rejects_migration_with_alter_column_type(self): ... + def test_rejects_migration_with_rename_column(self): ... + def test_accepts_migration_with_add_nullable_column(self): ... + def test_accepts_migration_with_new_table(self): ... + def test_v1_code_ignores_v2_columns_without_error(self): ... + def test_every_migration_has_sunset_date_comment(self): + for f in glob.glob('migrations/*.sql'): + content = open(f).read() + assert re.search(r'-- sunset_date: \d{4}-\d{2}-\d{2}', content) + def test_ci_warns_on_past_sunset_migrations(self): ... +``` + +### 8.3 Cognitive Durability — Decision Log Validation + +```typescript +describe('Cognitive Durability', () => { + it('decision_log.json required for PRs touching ownership inference', () => {}); + it('decision_log.json required for PRs touching reconciliation', () => {}); + + it('decision log has all required fields', () => { + const logs = glob.sync('docs/decisions/*.json'); + for (const log of logs) { + const entry = JSON.parse(fs.readFileSync(log, 'utf-8')); + expect(entry).toHaveProperty('reasoning'); + expect(entry).toHaveProperty('alternatives_considered'); + expect(entry).toHaveProperty('confidence'); + expect(entry).toHaveProperty('timestamp'); + expect(entry).toHaveProperty('author'); + } + }); + + it('ownership signal weight changes include before/after examples', () => { + // Decision logs for ownership changes must include sample scenarios + }); +}); +``` + +### 8.4 Semantic Observability — OTEL Span Assertions + +```typescript +describe('Semantic Observability', () => { + let spanExporter: InMemorySpanExporter; + + describe('Discovery Scan Spans', () => { + it('emits parent catalog_scan span', async () => { + await triggerDiscovery('test-tenant'); + const spans = spanExporter.getFinishedSpans(); + expect(spans.find(s => s.name === 'catalog_scan')).toBeDefined(); + }); + + it('emits child aws_scan and github_scan spans', async () => { + await triggerDiscovery('test-tenant'); + const spans = spanExporter.getFinishedSpans(); + expect(spans.find(s => s.name === 'aws_scan')).toBeDefined(); + expect(spans.find(s => s.name === 'github_scan')).toBeDefined(); + }); + }); + + describe('Ownership Inference Spans', () => { + it('emits ownership_inference span with all signals considered', async () => { + await inferOwnership('test-service'); + const span = spanExporter.getFinishedSpans().find(s => s.name === 'ownership_inference'); + expect(span.attributes['catalog.ownership_signals']).toBeDefined(); + expect(span.attributes['catalog.confidence_score']).toBeGreaterThanOrEqual(0); + }); + + it('includes rejected signals in span attributes', async () => { + await inferOwnership('test-service'); + const span = spanExporter.getFinishedSpans().find(s => s.name === 'ownership_inference'); + const signals = JSON.parse(span.attributes['catalog.ownership_signals']); + expect(signals.length).toBeGreaterThan(0); + // Each signal should have: source, team, weight, accepted/rejected + }); + }); + + describe('PII Protection', () => { + it('hashes repo names in span attributes', async () => { + await triggerDiscovery('test-tenant'); + const spans = spanExporter.getFinishedSpans(); + for (const span of spans) { + const attrs = JSON.stringify(span.attributes); + expect(attrs).not.toContain('payment-api'); // real name + } + }); + + it('hashes team names in ownership spans', async () => { + await inferOwnership('test-service'); + const span = spanExporter.getFinishedSpans().find(s => s.name === 'ownership_inference'); + expect(span.attributes['catalog.service_id']).toMatch(/^[a-f0-9]+$/); + }); + }); +}); +``` + +### 8.5 Configurable Autonomy — Governance Tests + +```typescript +describe('Configurable Autonomy', () => { + describe('Strict Mode (suggest-only)', () => { + it('discovery results go to pending review queue', async () => { + setPolicy({ governance_mode: 'strict' }); + await triggerDiscovery('test-tenant'); + const pending = await getPendingReview('test-tenant'); + expect(pending.length).toBeGreaterThan(0); + const catalog = await getCatalog('test-tenant'); + expect(catalog.length).toBe(0); // Nothing auto-added + }); + }); + + describe('Audit Mode (auto-mutate)', () => { + it('discovery results auto-applied to catalog with logging', async () => { + setPolicy({ governance_mode: 'audit' }); + await triggerDiscovery('test-tenant'); + const catalog = await getCatalog('test-tenant'); + expect(catalog.length).toBeGreaterThan(0); + const logs = await getPolicyLogs('test-tenant'); + expect(logs.some(l => l.includes('auto-created in audit mode'))).toBe(true); + }); + }); + + describe('Panic Mode', () => { + it('halts discovery scans immediately', async () => { + await activatePanic(); + const result = await triggerDiscovery('test-tenant'); + expect(result.status).toBe('halted'); + }); + + it('catalog API returns 503 for writes', async () => { + await activatePanic(); + const res = await fetch('/api/services', { method: 'POST', body: '{}' }); + expect(res.status).toBe(503); + }); + + it('catalog API allows reads during panic', async () => { + await activatePanic(); + const res = await fetch('/api/services'); + expect(res.status).toBe(200); + }); + }); + + describe('Per-Team Override', () => { + it('team strict lock prevents auto-mutation even in audit mode', async () => { + setPolicy({ governance_mode: 'audit' }); + setTeamPolicy('platform-team', { governance_mode: 'strict' }); + await triggerDiscovery('test-tenant'); + const platformServices = (await getCatalog('test-tenant')) + .filter(s => s.team === 'platform-team'); + expect(platformServices.length).toBe(0); // Blocked by team lock + }); + }); +}); +``` --- -## 9. Test Data & Fixtures +## Section 9: Test Data & Fixtures -High-quality fixtures are the lifeblood of this TDD strategy. +### 9.1 Directory Structure -### 9.1 GitHub/GitLab API Response Factories -- JSON files containing real obfuscated GraphQL responses for Repositories, `CODEOWNERS` blobs, and Team memberships. -- Use factories (e.g., `fishery` or custom functions) to easily override fields: `buildGHRepo({ name: 'auth-service', languages: ['Go'] })`. +``` +tests/ + fixtures/ + aws/ + cloudformation/ + payment-api-stack.json + user-service-stack.json + empty-stack.json + ecs/ + prod-cluster-services.json + staging-cluster-services.json + lambda/ + functions-list.json + api-gateway-mappings.json + rds/ + instances-list.json + github/ + graphql/ + org-repos-page1.json + org-repos-page2.json + repo-details-with-codeowners.json + repo-details-no-codeowners.json + codeowners/ + simple-team-ownership.txt + multi-path-ownership.txt + wildcard-patterns.txt + empty.txt + workflows/ + ecs-deploy.yml + lambda-deploy.yml + matrix-deploy.yml + non-deploy-ci.yml + scenarios/ + medium-org-200-resources.json + large-org-500-resources.json + conflicting-ownership.json + no-github-match.json + slack/ + service-card-blocks.json + search-results-blocks.json + ownership-info-blocks.json +``` -### 9.2 Synthetic Topology Generators -- Scripts that generate interconnected AWS resources (e.g., a CFN stack containing an API Gateway routing to 3 Lambdas interacting with 1 RDS instance). +### 9.2 Service Factory -### 9.3 `CODEOWNERS` and Git Blame Mocks -- Diverse `CODEOWNERS` files covering edge cases: wildcard matching, deep path matching, invalid syntax, user-vs-team owners. +```python +# tests/helpers/factories.py +def make_aws_service(overrides=None): + defaults = { + "name": f"service-{fake.word()}", + "source": "aws", + "aws_resources": [ + {"type": "ecs-service", "arn": f"arn:aws:ecs:us-east-1:123456789:service/prod/{fake.word()}"}, + ], + "tags": {"service": fake.word(), "team": fake.word()}, + "confidence": 0.85, + "discovered_at": datetime.utcnow().isoformat(), + } + return {**defaults, **(overrides or {})} + +def make_github_repo(overrides=None): + defaults = { + "name": f"{fake.word()}-{fake.word()}", + "language": random.choice(["TypeScript", "Python", "Go", "Java"]), + "codeowners": [{"path": "*", "owners": [f"@org/{fake.word()}-team"]}], + "top_committers": [fake.name() for _ in range(5)], + "has_deploy_workflow": random.choice([True, False]), + "deploy_target": None, + } + return {**defaults, **(overrides or {})} + +def make_catalog_service(overrides=None): + defaults = { + "service_id": str(uuid4()), + "tenant_id": "test-tenant", + "name": f"{fake.word()}-{random.choice(['api', 'service', 'worker', 'lambda'])}", + "team": f"{fake.word()}-team", + "language": random.choice(["TypeScript", "Python", "Go"]), + "sources": random.sample(["aws", "github"], k=random.randint(1, 2)), + "confidence": round(random.uniform(0.5, 1.0), 2), + "status": "active", + "ownership_signals": [], + } + return {**defaults, **(overrides or {})} +``` + +### 9.3 Synthetic Org Topology Generator + +```python +# tests/helpers/org_generator.py +def generate_org_topology(num_teams=5, services_per_team=10, repos_per_service=1.5): + """Generate a realistic org with teams, services, repos, and dependencies.""" + teams = [f"team-{fake.word()}" for _ in range(num_teams)] + services = [] + repos = [] + + for team in teams: + for i in range(services_per_team): + svc_name = f"{team.split('-')[1]}-{fake.word()}-{random.choice(['api', 'worker', 'lambda'])}" + services.append(make_aws_service({"name": svc_name, "tags": {"team": team}})) + + # Each service has 1-2 repos + for j in range(int(repos_per_service)): + repos.append(make_github_repo({ + "name": svc_name if j == 0 else f"{svc_name}-lib", + "codeowners": [{"path": "*", "owners": [f"@org/{team}"]}], + "deploy_target": svc_name if j == 0 else None, + })) + + return {"teams": teams, "services": services, "repos": repos} +``` --- -## 10. TDD Implementation Order +## Section 10: TDD Implementation Order -To bootstrap the platform efficiently, testing and development should follow this sequence based on Epic dependencies: +### 10.1 Bootstrap Sequence -1. **Epic 2 (GitHub Parsers):** Write pure unit tests for `CODEOWNERS` parser and `README` extractor. *Value: High ROI, zero dependencies.* -2. **Epic 1 (AWS Heuristics):** Write unit tests for mapping CFN stacks and Tags to Service entities. *Value: Core product logic.* -3. **Epic 2 (Ownership Inference):** TDD the scoring algorithm. Build the weighting math. *Value: The brain of the platform.* -4. **Epic 3 (Service Catalog Schema):** Integration tests for PostgreSQL RLS and upserting services. *Value: Data durability.* -5. **Epic 2 (Reconciliation):** Unit tests merging AWS and GitHub mock entities. *Value: Pipeline glue.* -6. **Epic 4 (Search Sync):** Integration tests for pushing DB updates to Meilisearch. -7. **Epic 5 (API & UI):** E2E test for the Cmd+K search flow. -8. **Epic 10 (Governance & Flags):** Unit tests for feature flag circuit breakers and strict mode. -9. **Epic 9 (Onboarding):** Playwright E2E for the 5-Minute Miracle flow. +``` +Phase 0: Test Infrastructure (Week 0) + ├── 0.1 pytest + vitest config + ├── 0.2 LocalStack helper (STS, CFN, ECS, Lambda, RDS, SQS, Step Functions) + ├── 0.3 Testcontainers helpers (PostgreSQL, Redis, Meilisearch) + ├── 0.4 WireMock GitHub GraphQL stubs + ├── 0.5 Factory functions (make_aws_service, make_github_repo, make_catalog_service) + ├── 0.6 Org topology generator + └── 0.7 CI pipeline with test stages +``` -This sequence ensures the most complex algorithmic logic is proven before it is wired to databases and APIs. +### 10.2 Epic-by-Epic TDD Order + +``` +Phase 1: AWS Discovery (Epic 1) — Tests First for STS, Integration-Led for Scanners + ├── 1.1 RED: STS role assumption tests (security-critical) + ├── 1.2 GREEN: Implement STS client + ├── 1.3 Implement CFN scanner against LocalStack + ├── 1.4 RED: CFN scanner unit tests (lock in behavior) + ├── 1.5 Implement ECS + Lambda + RDS scanners + ├── 1.6 RED: Scanner unit tests for each resource type + ├── 1.7 INTEGRATION: Full AWS scan against LocalStack + └── 1.8 REFACTOR: Extract scanner interface, add parallelism + +Phase 2: GitHub Discovery (Epic 2) — Integration-Led + ├── 2.1 Implement repo scanner against WireMock + ├── 2.2 RED: CODEOWNERS parser tests (strict TDD) + ├── 2.3 GREEN: Implement CODEOWNERS parser + ├── 2.4 RED: Workflow parser tests + ├── 2.5 GREEN: Implement workflow parser + ├── 2.6 INTEGRATION: Full GitHub scan against WireMock + └── 2.7 RED: Rate limit handling tests + +Phase 3: Reconciliation (Epic 3) — Tests First + ├── 3.1 RED: Cross-reference matching tests + ├── 3.2 GREEN: Implement reconciler + ├── 3.3 RED: Deduplication tests + ├── 3.4 GREEN: Implement dedup logic + ├── 3.5 INTEGRATION: Reconciler → PostgreSQL + └── 3.6 REFACTOR: Confidence scoring pipeline + +Phase 4: Ownership Inference (Epic 4) — Strict TDD + ├── 4.1 RED: Signal weighting tests (all combinations) + ├── 4.2 GREEN: Implement inference engine + ├── 4.3 RED: Ambiguity detection tests + ├── 4.4 GREEN: Implement ambiguity logic + ├── 4.5 RED: Manual override tests + ├── 4.6 GREEN: Implement override handling + └── 4.7 INTEGRATION: Inference → PostgreSQL + +Phase 5: Catalog API + Search (Epics 5-6) — Integration-Led + ├── 5.1 Implement API endpoints + ├── 5.2 RED: API unit tests (CRUD, filtering, pagination) + ├── 5.3 INTEGRATION: API → PostgreSQL + ├── 5.4 INTEGRATION: API → Meilisearch + └── 5.5 RED: Tenant isolation tests + +Phase 6: Governance (Epic 10) — Strict TDD + ├── 6.1 RED: Strict/audit mode tests + ├── 6.2 GREEN: Implement policy engine + ├── 6.3 RED: Panic mode tests + ├── 6.4 GREEN: Implement panic mode + ├── 6.5 RED: Phantom quarantine circuit breaker tests + ├── 6.6 GREEN: Implement circuit breaker + ├── 6.7 RED: OTEL span assertion tests + └── 6.8 GREEN: Instrument all components + +Phase 7: E2E Validation + ├── 7.1 5-Minute Miracle journey (>80% accuracy gate) + ├── 7.2 Cmd+K search journey (<200ms gate) + ├── 7.3 Phantom quarantine journey + ├── 7.4 Panic mode journey + └── 7.5 Performance benchmarks +``` + +### 10.3 "Never Ship Without" Checklist + +- [ ] All STS role assumption tests (security gate) +- [ ] All ownership inference tests (accuracy gate — >80%) +- [ ] All CODEOWNERS parser tests (correctness gate) +- [ ] All governance policy tests (compliance gate) +- [ ] Phantom quarantine circuit breaker test (safety gate) +- [ ] 5-Minute Miracle E2E journey (product promise gate) +- [ ] PII protection span tests (privacy gate) +- [ ] Schema migration lint (no breaking changes) +- [ ] Coverage ≥80% overall, ≥90% on ownership inference +- [ ] Meilisearch search latency <200ms with 1000 services + +--- + +*End of dd0c/portal Test Architecture* diff --git a/products/05-aws-cost-anomaly/test-architecture/test-architecture.md b/products/05-aws-cost-anomaly/test-architecture/test-architecture.md index 7bc363b..00b7784 100644 --- a/products/05-aws-cost-anomaly/test-architecture/test-architecture.md +++ b/products/05-aws-cost-anomaly/test-architecture/test-architecture.md @@ -1,103 +1,232 @@ # dd0c/cost — Test Architecture & TDD Strategy -**Version:** 2.0 -**Date:** February 28, 2026 -**Status:** Authoritative -**Audience:** Founding engineer, future contributors +**Product:** dd0c/cost — AWS Cost Anomaly Detective +**Author:** Test Architecture Phase +**Date:** February 28, 2026 +**Status:** V1 MVP — Solo Founder Scope --- -> **Guiding principle:** A cost anomaly detector that misses a $3,000 GPU instance is worse than useless — it's a liability. A cost anomaly detector that cries wolf 40% of the time gets disabled. Tests are the only way to ship with confidence at solo-founder velocity. +## Section 1: Testing Philosophy & TDD Workflow ---- +### 1.1 Core Philosophy -## Table of Contents +dd0c/cost sits at the intersection of **money and infrastructure**. A false negative means a customer loses thousands of dollars. A false positive means alert fatigue and churn. The test suite's primary job is to mathematically prove the anomaly scoring engine works across edge cases. -1. [Testing Philosophy & TDD Workflow](#1-testing-philosophy--tdd-workflow) -2. [Test Pyramid](#2-test-pyramid) -3. [Unit Test Strategy](#3-unit-test-strategy) -4. [Integration Test Strategy](#4-integration-test-strategy) -5. [E2E & Smoke Tests](#5-e2e--smoke-tests) -6. [Performance & Load Testing](#6-performance--load-testing) -7. [CI/CD Pipeline Integration](#7-cicd-pipeline-integration) -8. [Transparent Factory Tenet Testing](#8-transparent-factory-tenet-testing) -9. [Test Data & Fixtures](#9-test-data--fixtures) -10. [TDD Implementation Order](#10-tdd-implementation-order) +Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively unit-tested with synthetic data before any AWS APIs are mocked. ---- - -## 1. Testing Philosophy & TDD Workflow - -### Red-Green-Refactor for dd0c/cost - -TDD is non-negotiable for the anomaly scoring engine and baseline learning components. A scoring bug that ships to production means either missed anomalies (customers lose money) or false positives (customers disable the product). The cost of a test is minutes. The cost of a scoring bug is churn. - -**Where TDD is mandatory:** -- `src/scoring/` — every scoring signal, composite calculation, and severity classification -- `src/baseline/` — all statistical operations (mean, stddev, rolling window, cold-start transitions) -- `src/parsers/` — every CloudTrail event parser (RunInstances, CreateDBInstance, etc.) -- `src/pricing/` — pricing lookup logic and cost estimation -- `src/governance/` — policy.json evaluation, auto-promotion logic, panic mode - -**Where TDD is recommended but not mandatory:** -- `src/notifier/` — Slack Block Kit formatting (snapshot tests are sufficient) -- `src/api/` — REST handlers (contract tests cover these) -- `src/infra/` — CDK stacks (CDK assertions cover these) - -**Where tests follow implementation:** -- `src/onboarding/` — CloudFormation URL generation, Cognito flows (integration tests only) -- `src/slack/` — OAuth flows, signature verification (integration tests) - -### The Red-Green-Refactor Cycle +### 1.2 Red-Green-Refactor Adapted to dd0c/cost ``` -RED: Write a failing test that describes the desired behavior. - Name it precisely: what component, what input, what expected output. - Run it. Watch it fail. Confirm it fails for the right reason. +RED → Write a failing test that asserts a specific Z-score and severity + for a given historical baseline and new cost event. -GREEN: Write the minimum code to make the test pass. - No gold-plating. No "while I'm here" refactors. - Run the test. Watch it pass. +GREEN → Implement the scoring math to make it pass. -REFACTOR: Clean up the implementation without changing behavior. - Extract constants. Rename variables. Simplify logic. - Tests must still pass after every refactor step. +REFACTOR → Optimize the baseline lookup, extract novelty checks, + refine the heuristic weights. ``` -### Test Naming Convention +**When to write tests first (strict TDD):** +- Anomaly scoring engine (Z-scores, novelty checks, composite severity) +- Cold-start heuristics (fast-path for >$5/hr resources) +- Baseline calculation (moving averages, standard deviation) +- Governance policy (strict vs. audit mode, 14-day promotion) -All tests follow the pattern: `[unit under test] [scenario] [expected outcome]` +**When integration tests lead:** +- CloudTrail ingestion (implement against LocalStack EventBridge, then lock in) +- DynamoDB Single-Table schema (build access patterns, then integration test) + +**When E2E tests lead:** +- The Slack alert interaction (format block kit, test the "Snooze/Terminate" buttons) + +### 1.3 Test Naming Conventions ```typescript -// ✅ Good — precise, readable, searchable -describe('scoreAnomaly', () => { - it('returns critical severity when z-score exceeds 5.0 and instance type is novel', () => {}); - it('returns none severity when account is in cold-start and cost is below $0.50/hr', () => {}); - it('returns warning severity when actor is novel but cost is within 2 standard deviations', () => {}); - it('compounds severity when multiple signals fire simultaneously', () => {}); +describe('AnomalyScorer', () => { + it('assigns critical severity when Z-score > 3 and hourly cost > $1', () => {}); + it('flags actor novelty when IAM role has never launched this service', () => {}); + it('bypasses baseline and triggers fast-path critical for $10/hr instance', () => {}); }); -// ❌ Bad — vague, not searchable -describe('scoring', () => { - it('works correctly', () => {}); - it('handles edge cases', () => {}); +describe('CloudTrailNormalizer', () => { + it('extracts instance type and region from RunInstances event', () => {}); + it('looks up correct on-demand pricing for us-east-1 r6g.xlarge', () => {}); }); ``` -### Decision Log Requirement - -Per Transparent Factory tenet (Story 10.3), any PR touching `src/scoring/`, `src/baseline/`, or `src/detection/` must include a `docs/decisions/-.json` file. The test suite validates this in CI. - -```json -{ - "prompt": "Should Z-score threshold be 2.5 or 3.0?", - "reasoning": "At 2.5, false positive rate in design partner data was 28%. At 3.0, it dropped to 18% with only 2 additional missed true positives over 30 days.", - "alternatives_considered": ["2.0 (too noisy)", "3.5 (misses too many real anomalies)"], - "confidence": "medium", - "timestamp": "2026-02-28T10:00:00Z", - "author": "brian" -} -``` - --- +## Section 2: Test Pyramid + +### 2.1 Ratio + +| Level | Target | Count (V1) | Runtime | +|-------|--------|------------|---------| +| Unit | 70% | ~250 tests | <20s | +| Integration | 20% | ~80 tests | <3min | +| E2E/Smoke | 10% | ~15 tests | <5min | + +### 2.2 Unit Test Targets + +| Component | Key Behaviors | Est. Tests | +|-----------|--------------|------------| +| Event Normalizer | CloudTrail parsing, pricing lookup, deduplication | 40 | +| Baseline Engine | Running mean/stddev calculation, maturity checks | 35 | +| Anomaly Scorer | Z-score math, novelty detection, composite scoring | 50 | +| Remediation Handler | Stop/Terminate payload parsing, IAM role assumption logic | 20 | +| Notification Engine | Slack formatting, daily digest aggregation | 30 | +| Governance Policy | Mode enforcement, 14-day auto-promotion | 25 | +| Feature Flags | Circuit breaker on alert volume, flag metadata | 15 | + +--- + +## Section 3: Unit Test Strategy + +### 3.1 Cost Ingestion & Normalization + +```typescript +describe('CloudTrailNormalizer', () => { + it('normalizes EC2 RunInstances event to CostEvent schema', () => {}); + it('normalizes RDS CreateDBInstance event to CostEvent schema', () => {}); + it('extracts assumed role ARN as actor instead of base STS role', () => {}); + it('applies fallback pricing when instance type is not in static table', () => {}); + it('ignores non-cost-generating events (e.g., DescribeInstances)', () => {}); +}); +``` + +### 3.2 Anomaly Engine (The Math) + +```typescript +describe('AnomalyScorer', () => { + describe('Statistical Scoring (Z-Score)', () => { + it('returns score=0 when event cost exactly matches baseline mean', () => {}); + it('returns proportional score for Z-scores between 1.0 and 3.0', () => {}); + it('caps Z-score contribution at max threshold', () => {}); + }); + + describe('Novelty Scoring', () => { + it('adds novelty penalty when instance type is first seen for account', () => {}); + it('adds novelty penalty when IAM user has never provisioned this service', () => {}); + }); + + describe('Cold-Start Fast Path', () => { + it('flags $5/hr instance as warning when baseline < 14 days', () => {}); + it('flags $25/hr instance as critical immediately, bypassing baseline', () => {}); + it('ignores $0.10/hr instances during cold-start learning period', () => {}); + }); +}); +``` + +### 3.3 Baseline Learning + +```typescript +describe('BaselineCalculator', () => { + it('updates running mean and stddev using Welford algorithm', () => {}); + it('adds new actor to observed_actors set', () => {}); + it('marks baseline as mature when event_count > 20 and age_days > 14', () => {}); +}); +``` + +--- + +## Section 4: Integration Test Strategy + +### 4.1 DynamoDB Data Layer (Testcontainers) + +```typescript +describe('DynamoDB Single-Table Patterns', () => { + it('writes CostEvent and updates Baseline in single transaction', async () => {}); + it('queries all anomalies for tenant within time range', async () => {}); + it('fetches tenant config and Slack tokens securely', async () => {}); +}); +``` + +### 4.2 AWS API Contract Tests + +```typescript +describe('AWS Cross-Account Actions', () => { + // Uses LocalStack to simulate target account + it('assumes target account remediation role successfully', async () => {}); + it('executes ec2:StopInstances when remediation approved', async () => {}); + it('executes rds:DeleteDBInstance with skip-final-snapshot', async () => {}); +}); +``` + +--- + +## Section 5: E2E & Smoke Tests + +### 5.1 Critical User Journeys + +**Journey 1: Real-Time Anomaly Detection** +1. Send synthetic `RunInstances` event to EventBridge (p9.16xlarge, $40/hr). +2. Verify system processes event and triggers fast-path (no baseline). +3. Verify Slack alert is generated with correct cost estimate. + +**Journey 2: Interactive Remediation** +1. Send webhook simulating user clicking "Stop Instance" in Slack. +2. Verify API Gateway → Lambda executes `StopInstances` against LocalStack. +3. Verify Slack message updates to "Remediation Successful". + +--- + +## Section 6: Performance & Load Testing + +```typescript +describe('Ingestion Throughput', () => { + it('processes 500 CloudTrail events/second via SQS FIFO', async () => {}); + it('DynamoDB baseline updates complete in <20ms p95', async () => {}); +}); +``` + +--- + +## Section 7: CI/CD Pipeline Integration + +- **PR Gate:** Unit tests (<2min), Coverage >85% (Scoring engine >95%). +- **Merge:** Integration tests with LocalStack & Testcontainers DynamoDB. +- **Staging:** E2E journeys against isolated staging AWS account. + +--- + +## Section 8: Transparent Factory Tenet Testing + +### 8.1 Atomic Flagging (Circuit Breaker) +```typescript +it('auto-disables scoring rule if it generates >10 alerts/hour for single tenant', () => {}); +``` + +### 8.2 Configurable Autonomy (14-Day Auto-Promotion) +```typescript +it('keeps new tenant in strict mode (log-only) for first 14 days', () => {}); +it('auto-promotes to audit mode (auto-alert) on day 15 if false-positive rate < 10%', () => {}); +``` + +--- + +## Section 9: Test Data & Fixtures + +``` +fixtures/ + cloudtrail/ + ec2-runinstances.json + rds-create-db.json + lambda-create-function.json + baselines/ + mature-steady-spend.json + volatile-dev-account.json + cold-start.json +``` + +--- + +## Section 10: TDD Implementation Order + +1. **Phase 1:** Anomaly math + Unit tests (Strict TDD). +2. **Phase 2:** CloudTrail normalizer + Pricing tables. +3. **Phase 3:** DynamoDB single-table implementation (Integration led). +4. **Phase 4:** Slack formatting + Remediation Lambda. +5. **Phase 5:** Governance policies (14-day promotion logic). + +*End of dd0c/cost Test Architecture*