Files
dd0c/products/05-aws-cost-anomaly/test-architecture/test-architecture.md
Max Mayfield 5ee95d8b13 dd0c: full product research pipeline - 6 products, 8 phases each
Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
        product-brief, architecture, epics (incl. Epic 10 TF compliance),
        test-architecture (TDD strategy)

Brand strategy and market research included.
2026-02-28 17:35:02 +00:00

4.2 KiB

dd0c/cost — Test Architecture & TDD Strategy

Version: 2.0
Date: February 28, 2026
Status: Authoritative
Audience: Founding engineer, future contributors


Guiding principle: A cost anomaly detector that misses a $3,000 GPU instance is worse than useless — it's a liability. A cost anomaly detector that cries wolf 40% of the time gets disabled. Tests are the only way to ship with confidence at solo-founder velocity.


Table of Contents

  1. Testing Philosophy & TDD Workflow
  2. Test Pyramid
  3. Unit Test Strategy
  4. Integration Test Strategy
  5. E2E & Smoke Tests
  6. Performance & Load Testing
  7. CI/CD Pipeline Integration
  8. Transparent Factory Tenet Testing
  9. Test Data & Fixtures
  10. TDD Implementation Order

1. Testing Philosophy & TDD Workflow

Red-Green-Refactor for dd0c/cost

TDD is non-negotiable for the anomaly scoring engine and baseline learning components. A scoring bug that ships to production means either missed anomalies (customers lose money) or false positives (customers disable the product). The cost of a test is minutes. The cost of a scoring bug is churn.

Where TDD is mandatory:

  • src/scoring/ — every scoring signal, composite calculation, and severity classification
  • src/baseline/ — all statistical operations (mean, stddev, rolling window, cold-start transitions)
  • src/parsers/ — every CloudTrail event parser (RunInstances, CreateDBInstance, etc.)
  • src/pricing/ — pricing lookup logic and cost estimation
  • src/governance/ — policy.json evaluation, auto-promotion logic, panic mode

Where TDD is recommended but not mandatory:

  • src/notifier/ — Slack Block Kit formatting (snapshot tests are sufficient)
  • src/api/ — REST handlers (contract tests cover these)
  • src/infra/ — CDK stacks (CDK assertions cover these)

Where tests follow implementation:

  • src/onboarding/ — CloudFormation URL generation, Cognito flows (integration tests only)
  • src/slack/ — OAuth flows, signature verification (integration tests)

The Red-Green-Refactor Cycle

RED:   Write a failing test that describes the desired behavior.
       Name it precisely: what component, what input, what expected output.
       Run it. Watch it fail. Confirm it fails for the right reason.

GREEN: Write the minimum code to make the test pass.
       No gold-plating. No "while I'm here" refactors.
       Run the test. Watch it pass.

REFACTOR: Clean up the implementation without changing behavior.
          Extract constants. Rename variables. Simplify logic.
          Tests must still pass after every refactor step.

Test Naming Convention

All tests follow the pattern: [unit under test] [scenario] [expected outcome]

// ✅ Good — precise, readable, searchable
describe('scoreAnomaly', () => {
  it('returns critical severity when z-score exceeds 5.0 and instance type is novel', () => {});
  it('returns none severity when account is in cold-start and cost is below $0.50/hr', () => {});
  it('returns warning severity when actor is novel but cost is within 2 standard deviations', () => {});
  it('compounds severity when multiple signals fire simultaneously', () => {});
});

// ❌ Bad — vague, not searchable
describe('scoring', () => {
  it('works correctly', () => {});
  it('handles edge cases', () => {});
});

Decision Log Requirement

Per Transparent Factory tenet (Story 10.3), any PR touching src/scoring/, src/baseline/, or src/detection/ must include a docs/decisions/<YYYY-MM-DD>-<slug>.json file. The test suite validates this in CI.

{
  "prompt": "Should Z-score threshold be 2.5 or 3.0?",
  "reasoning": "At 2.5, false positive rate in design partner data was 28%. At 3.0, it dropped to 18% with only 2 additional missed true positives over 30 days.",
  "alternatives_considered": ["2.0 (too noisy)", "3.5 (misses too many real anomalies)"],
  "confidence": "medium",
  "timestamp": "2026-02-28T10:00:00Z",
  "author": "brian"
}