Update test architectures for P3, P4, P5

2026-02-28 23:33:07 +00:00
parent 5ee95d8b13
commit 1101fef096
3 changed files with 2575 additions and 618 deletions
--- a/products/03-alert-intelligence/test-architecture/test-architecture.md
+++ b/products/03-alert-intelligence/test-architecture/test-architecture.md
--- a/products/04-lightweight-idp/test-architecture/test-architecture.md
+++ b/products/04-lightweight-idp/test-architecture/test-architecture.md
--- a/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
+++ b/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
@@ -1,103 +1,232 @@
 # dd0c/cost — Test Architecture & TDD Strategy
-**Version:** 2.0  
+**Product:** dd0c/cost — AWS Cost Anomaly Detective
 **Author:** Test Architecture Phase
 **Date:** February 28, 2026
-**Status:** Authoritative  
+**Status:** V1 MVP — Solo Founder Scope
 **Audience:** Founding engineer, future contributors
 ---
-> **Guiding principle:** A cost anomaly detector that misses a $3,000 GPU instance is worse than useless — it's a liability. A cost anomaly detector that cries wolf 40% of the time gets disabled. Tests are the only way to ship with confidence at solo-founder velocity.
+## Section 1: Testing Philosophy & TDD Workflow
---
+### 1.1 Core Philosophy
-## Table of Contents
+dd0c/cost sits at the intersection of **money and infrastructure**. A false negative means a customer loses thousands of dollars. A false positive means alert fatigue and churn. The test suite's primary job is to mathematically prove the anomaly scoring engine works across edge cases.
-1. [Testing Philosophy & TDD Workflow](#1-testing-philosophy--tdd-workflow)
+Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively unit-tested with synthetic data before any AWS APIs are mocked.
 2. [Test Pyramid](#2-test-pyramid)
 3. [Unit Test Strategy](#3-unit-test-strategy)
 4. [Integration Test Strategy](#4-integration-test-strategy)
 5. [E2E & Smoke Tests](#5-e2e--smoke-tests)
 6. [Performance & Load Testing](#6-performance--load-testing)
 7. [CI/CD Pipeline Integration](#7-cicd-pipeline-integration)
 8. [Transparent Factory Tenet Testing](#8-transparent-factory-tenet-testing)
 9. [Test Data & Fixtures](#9-test-data--fixtures)
 10. [TDD Implementation Order](#10-tdd-implementation-order)
---
+### 1.2 Red-Green-Refactor Adapted to dd0c/cost
 ## 1. Testing Philosophy & TDD Workflow
 ### Red-Green-Refactor for dd0c/cost
 TDD is non-negotiable for the anomaly scoring engine and baseline learning components. A scoring bug that ships to production means either missed anomalies (customers lose money) or false positives (customers disable the product). The cost of a test is minutes. The cost of a scoring bug is churn.
 **Where TDD is mandatory:**
 - `src/scoring/` — every scoring signal, composite calculation, and severity classification
 - `src/baseline/` — all statistical operations (mean, stddev, rolling window, cold-start transitions)
 - `src/parsers/` — every CloudTrail event parser (RunInstances, CreateDBInstance, etc.)
 - `src/pricing/` — pricing lookup logic and cost estimation
 - `src/governance/` — policy.json evaluation, auto-promotion logic, panic mode
 **Where TDD is recommended but not mandatory:**
 - `src/notifier/` — Slack Block Kit formatting (snapshot tests are sufficient)
 - `src/api/` — REST handlers (contract tests cover these)
 - `src/infra/` — CDK stacks (CDK assertions cover these)
 **Where tests follow implementation:**
 - `src/onboarding/` — CloudFormation URL generation, Cognito flows (integration tests only)
 - `src/slack/` — OAuth flows, signature verification (integration tests)
 ### The Red-Green-Refactor Cycle
 ```
-RED:   Write a failing test that describes the desired behavior.
+RED   → Write a failing test that asserts a specific Z-score and severity
-       Name it precisely: what component, what input, what expected output.
+         for a given historical baseline and new cost event.
       Run it. Watch it fail. Confirm it fails for the right reason.
-GREEN: Write the minimum code to make the test pass.
+GREEN → Implement the scoring math to make it pass.
       No gold-plating. No "while I'm here" refactors.
       Run the test. Watch it pass.
-REFACTOR: Clean up the implementation without changing behavior.
+REFACTOR → Optimize the baseline lookup, extract novelty checks,
-          Extract constants. Rename variables. Simplify logic.
+            refine the heuristic weights.
          Tests must still pass after every refactor step.
 ```
-### Test Naming Convention
+**When to write tests first (strict TDD):**
 - Anomaly scoring engine (Z-scores, novelty checks, composite severity)
 - Cold-start heuristics (fast-path for >$5/hr resources)
 - Baseline calculation (moving averages, standard deviation)
 - Governance policy (strict vs. audit mode, 14-day promotion)
-All tests follow the pattern: `[unit under test] [scenario] [expected outcome]`
+**When integration tests lead:**
 - CloudTrail ingestion (implement against LocalStack EventBridge, then lock in)
 - DynamoDB Single-Table schema (build access patterns, then integration test)
 **When E2E tests lead:**
 - The Slack alert interaction (format block kit, test the "Snooze/Terminate" buttons)
 ### 1.3 Test Naming Conventions
 ```typescript
-// ✅ Good — precise, readable, searchable
+describe('AnomalyScorer', () => {
-describe('scoreAnomaly', () => {
+  it('assigns critical severity when Z-score > 3 and hourly cost > $1', () => {});
-  it('returns critical severity when z-score exceeds 5.0 and instance type is novel', () => {});
+  it('flags actor novelty when IAM role has never launched this service', () => {});
-  it('returns none severity when account is in cold-start and cost is below $0.50/hr', () => {});
+  it('bypasses baseline and triggers fast-path critical for $10/hr instance', () => {});
  it('returns warning severity when actor is novel but cost is within 2 standard deviations', () => {});
  it('compounds severity when multiple signals fire simultaneously', () => {});
 });
-// ❌ Bad — vague, not searchable
+describe('CloudTrailNormalizer', () => {
-describe('scoring', () => {
+  it('extracts instance type and region from RunInstances event', () => {});
-  it('works correctly', () => {});
+  it('looks up correct on-demand pricing for us-east-1 r6g.xlarge', () => {});
  it('handles edge cases', () => {});
 });
 ```
 ### Decision Log Requirement
 Per Transparent Factory tenet (Story 10.3), any PR touching `src/scoring/`, `src/baseline/`, or `src/detection/` must include a `docs/decisions/<YYYY-MM-DD>-<slug>.json` file. The test suite validates this in CI.
 ```json
 {
  "prompt": "Should Z-score threshold be 2.5 or 3.0?",
  "reasoning": "At 2.5, false positive rate in design partner data was 28%. At 3.0, it dropped to 18% with only 2 additional missed true positives over 30 days.",
  "alternatives_considered": ["2.0 (too noisy)", "3.5 (misses too many real anomalies)"],
  "confidence": "medium",
  "timestamp": "2026-02-28T10:00:00Z",
  "author": "brian"
 }
 ```
 ---
 ## Section 2: Test Pyramid
 ### 2.1 Ratio
 | Level | Target | Count (V1) | Runtime |
 |-------|--------|------------|---------|
 | Unit | 70% | ~250 tests | <20s |
 | Integration | 20% | ~80 tests | <3min |
 | E2E/Smoke | 10% | ~15 tests | <5min |
 ### 2.2 Unit Test Targets
 | Component | Key Behaviors | Est. Tests |
 |-----------|--------------|------------|
 | Event Normalizer | CloudTrail parsing, pricing lookup, deduplication | 40 |
 | Baseline Engine | Running mean/stddev calculation, maturity checks | 35 |
 | Anomaly Scorer | Z-score math, novelty detection, composite scoring | 50 |
 | Remediation Handler | Stop/Terminate payload parsing, IAM role assumption logic | 20 |
 | Notification Engine | Slack formatting, daily digest aggregation | 30 |
 | Governance Policy | Mode enforcement, 14-day auto-promotion | 25 |
 | Feature Flags | Circuit breaker on alert volume, flag metadata | 15 |
 ---
 ## Section 3: Unit Test Strategy
 ### 3.1 Cost Ingestion & Normalization
 ```typescript
 describe('CloudTrailNormalizer', () => {
  it('normalizes EC2 RunInstances event to CostEvent schema', () => {});
  it('normalizes RDS CreateDBInstance event to CostEvent schema', () => {});
  it('extracts assumed role ARN as actor instead of base STS role', () => {});
  it('applies fallback pricing when instance type is not in static table', () => {});
  it('ignores non-cost-generating events (e.g., DescribeInstances)', () => {});
 });
 ```
 ### 3.2 Anomaly Engine (The Math)
 ```typescript
 describe('AnomalyScorer', () => {
  describe('Statistical Scoring (Z-Score)', () => {
    it('returns score=0 when event cost exactly matches baseline mean', () => {});
    it('returns proportional score for Z-scores between 1.0 and 3.0', () => {});
    it('caps Z-score contribution at max threshold', () => {});
  });
  describe('Novelty Scoring', () => {
    it('adds novelty penalty when instance type is first seen for account', () => {});
    it('adds novelty penalty when IAM user has never provisioned this service', () => {});
  });
  describe('Cold-Start Fast Path', () => {
    it('flags $5/hr instance as warning when baseline < 14 days', () => {});
    it('flags $25/hr instance as critical immediately, bypassing baseline', () => {});
    it('ignores $0.10/hr instances during cold-start learning period', () => {});
  });
 });
 ```
 ### 3.3 Baseline Learning
 ```typescript
 describe('BaselineCalculator', () => {
  it('updates running mean and stddev using Welford algorithm', () => {});
  it('adds new actor to observed_actors set', () => {});
  it('marks baseline as mature when event_count > 20 and age_days > 14', () => {});
 });
 ```
 ---
 ## Section 4: Integration Test Strategy
 ### 4.1 DynamoDB Data Layer (Testcontainers)
 ```typescript
 describe('DynamoDB Single-Table Patterns', () => {
  it('writes CostEvent and updates Baseline in single transaction', async () => {});
  it('queries all anomalies for tenant within time range', async () => {});
  it('fetches tenant config and Slack tokens securely', async () => {});
 });
 ```
 ### 4.2 AWS API Contract Tests
 ```typescript
 describe('AWS Cross-Account Actions', () => {
  // Uses LocalStack to simulate target account
  it('assumes target account remediation role successfully', async () => {});
  it('executes ec2:StopInstances when remediation approved', async () => {});
  it('executes rds:DeleteDBInstance with skip-final-snapshot', async () => {});
 });
 ```
 ---
 ## Section 5: E2E & Smoke Tests
 ### 5.1 Critical User Journeys
 **Journey 1: Real-Time Anomaly Detection**
 1. Send synthetic `RunInstances` event to EventBridge (p9.16xlarge, $40/hr).
 2. Verify system processes event and triggers fast-path (no baseline).
 3. Verify Slack alert is generated with correct cost estimate.
 **Journey 2: Interactive Remediation**
 1. Send webhook simulating user clicking "Stop Instance" in Slack.
 2. Verify API Gateway → Lambda executes `StopInstances` against LocalStack.
 3. Verify Slack message updates to "Remediation Successful".
 ---
 ## Section 6: Performance & Load Testing
 ```typescript
 describe('Ingestion Throughput', () => {
  it('processes 500 CloudTrail events/second via SQS FIFO', async () => {});
  it('DynamoDB baseline updates complete in <20ms p95', async () => {});
 });
 ```
 ---
 ## Section 7: CI/CD Pipeline Integration
 - **PR Gate:** Unit tests (<2min), Coverage >85% (Scoring engine >95%).
 - **Merge:** Integration tests with LocalStack & Testcontainers DynamoDB.
 - **Staging:** E2E journeys against isolated staging AWS account.
 ---
 ## Section 8: Transparent Factory Tenet Testing
 ### 8.1 Atomic Flagging (Circuit Breaker)
 ```typescript
 it('auto-disables scoring rule if it generates >10 alerts/hour for single tenant', () => {});
 ```
 ### 8.2 Configurable Autonomy (14-Day Auto-Promotion)
 ```typescript
 it('keeps new tenant in strict mode (log-only) for first 14 days', () => {});
 it('auto-promotes to audit mode (auto-alert) on day 15 if false-positive rate < 10%', () => {});
 ```
 ---
 ## Section 9: Test Data & Fixtures
 ```
 fixtures/
  cloudtrail/
    ec2-runinstances.json
    rds-create-db.json
    lambda-create-function.json
  baselines/
    mature-steady-spend.json
    volatile-dev-account.json
    cold-start.json
 ```
 ---
 ## Section 10: TDD Implementation Order
 1. **Phase 1:** Anomaly math + Unit tests (Strict TDD).
 2. **Phase 2:** CloudTrail normalizer + Pricing tables.
 3. **Phase 3:** DynamoDB single-table implementation (Integration led).
 4. **Phase 4:** Slack formatting + Remediation Lambda.
 5. **Phase 5:** Governance policies (14-day promotion logic).
 *End of dd0c/cost Test Architecture*