Files

Max Mayfield 2fe0ed856e Add Gemini TDD reviews for all 6 products

P1, P2, P3, P4, P6 reviewed by Gemini subagents.
P5 reviewed manually (Gemini credential errors).
All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.

2026-03-01 00:29:24 +00:00

9.5 KiB

Raw Blame History

dd0c/cost — Test Architecture Review

Reviewer: TDD Consultant (Manual Review) Date: March 1, 2026 Verdict: 🔴 NEEDS SIGNIFICANT WORK — This is the weakest test architecture of all 6 products.

1. Coverage Analysis

This document is 232 lines. For comparison, P1 (route) is 2,241 lines and P6 (run) is 1,762 lines. The coverage gaps are massive.

Epic	Coverage Status	Notes
Epic 1: CloudTrail Ingestion	⚠️ Partial	Section 3.1 has 5 test cases for the normalizer. Missing: SQS FIFO deduplication tests, DLQ retry behavior, EventBridge cross-account rule tests, S3 raw event archival. Story 1.2 (SQS + DLQ) has zero dedicated tests.
Epic 2: Anomaly Detection	✅ Decent	Section 3.2 covers Z-score, novelty, cold-start. But missing: composite score weighting tests, edge cases (zero stddev, negative costs, NaN handling), baseline maturity transition tests.
Epic 3: Zombie Hunter	❌ Missing	Zero test cases. The daily scan for idle/stopped resources that are still costing money has no tests at all.
Epic 4: Notification & Remediation	⚠️ Thin	Section 4.2 has 3 integration tests for cross-account actions. Missing: Slack Block Kit formatting tests, daily digest aggregation, snooze/dismiss logic, interactive payload signature validation.
Epic 5: Onboarding & PLG	❌ Missing	Zero test cases. CloudFormation template generation, Stripe billing, free tier enforcement — none tested.
Epic 6: Dashboard API	❌ Missing	Zero test cases. REST API endpoints, tenant isolation, query performance — nothing.
Epic 7: Dashboard UI	❌ Missing	Zero test cases.
Epic 8: Infrastructure (CDK)	❌ Missing	Zero test cases. No CDK snapshot tests, no infrastructure drift detection (ironic).
Epic 9: Multi-Account Management	❌ Missing	Zero test cases. Account linking, bulk scanning, cross-account permissions — nothing.
Epic 10: Transparent Factory	🔴 Skeletal	Section 8 has exactly 3 test cases total across 2 of 5 tenets. Elastic Schema, Cognitive Durability, and Semantic Observability have zero tests.

Bottom line: 5 of 10 epics have zero test coverage in this document. This is a skeleton, not a test architecture.

2. TDD Workflow Critique

The philosophy in Section 1 is sound — "test the math first" is correct for an anomaly detection product. But the execution is incomplete:

The "strict TDD" list correctly identifies scoring and governance as test-first. Good.
The "integration tests lead" for CloudTrail ingestion is acceptable.
Missing: No guidance on testing the Welford algorithm implementation. This is a numerical algorithm with known floating-point edge cases (catastrophic cancellation with large values). The test architecture should mandate property-based testing (e.g., fast-check) for the baseline calculator, not just 3 example-based tests.
Missing: No guidance on testing the 14-day auto-promotion state machine. This is a time-dependent state transition that needs fake clock testing.

3. Test Pyramid Balance

The 70/20/10 ratio is stated but not justified. For dd0c/cost:

Unit tests should be higher (80%) — the anomaly scoring engine is pure math. It should have exhaustive property-based tests, not just 50 example tests.
Integration tests (15%) — DynamoDB Single-Table patterns, EventBridge→SQS→Lambda pipeline, cross-account STS.
E2E (5%) — two journeys is fine for V1, but they need to be more detailed.

The current Section 6 (Performance) has exactly 2 test cases. For a product that processes CloudTrail events at scale, this is dangerously thin.

4. Anti-Patterns

Section 3.3 — Welford Algorithm: Only 3 tests for a numerical algorithm. This is the "happy path only" anti-pattern. Missing: what happens when stddev is 0 (division by zero in Z-score)? What happens with a single data point? What happens with extremely large values (float overflow)?
Section 4.1 — DynamoDB Transaction Test: "writes CostEvent and updates Baseline in single transaction" — this tests the happy path. Where's the test for transaction failure? DynamoDB transactions can fail due to conflicts, and the system must handle partial writes.
Section 5 — E2E Journeys: Journey 2 tests "Stop Instance" remediation but doesn't test what happens when the customer's IAM role has been revoked between alert and remediation click. This is a real-world race condition.
No negative tests anywhere. What happens when CloudTrail sends malformed JSON? What happens when the pricing table doesn't have the instance type? (Section 3.1 mentions "fallback pricing" but there's only 1 test for it.)

5. Transparent Factory Gaps

Section 8 is the biggest problem. It has 3 test cases across 2 tenets. Here's what's missing:

Atomic Flagging (1 test → needs ~10)

Missing: flag default state (off), flag TTL enforcement, flag owner metadata, local evaluation (no network calls), CI block on expired flags, multiple concurrent flags.
The single circuit breaker test uses ">10 alerts/hour" but Epic 10.1 specifies ">3x baseline" — inconsistency.

Elastic Schema (0 tests → needs ~8)

Zero tests. Need: migration lint (no DROP/RENAME/TYPE), additive-only DynamoDB attribute changes, V1 code ignoring V2 attributes, sunset date enforcement, dual-write during migration.

Cognitive Durability (0 tests → needs ~5)

Zero tests. Need: decision log schema validation, CI enforcement for scoring PRs, cyclomatic complexity gate, decision log presence check.

Semantic Observability (0 tests → needs ~8)

Zero tests. Need: OTEL span emission on every anomaly scoring decision, span attributes (cost.anomaly_score, cost.z_score, cost.baseline_days), PII protection (account ID hashing), fast-path span attributes.

Configurable Autonomy (2 tests → needs ~8)

The 14-day auto-promotion tests are good but incomplete. Missing: panic mode activation (<1s), panic mode stops all alerting, per-account governance override, policy decision logging, governance drift monitoring.

6. Performance Test Gaps

Section 6 has 2 tests. For a real-time cost monitoring product, this is inadequate:

Missing: Burst ingestion (what happens when 1000 CloudTrail events arrive in 1 second during an auto-scaling event?)
Missing: Baseline calculation performance with 90 days of historical data per account
Missing: Anomaly scoring latency under concurrent multi-account evaluation
Missing: DynamoDB hot partition detection (all events for one account hitting the same partition key)
Missing: SQS FIFO throughput limits (300 msg/s per message group — what happens when a large account exceeds this?)
Missing: Lambda cold start impact on end-to-end latency

7. Missing Test Scenarios

Security

CloudTrail event forgery: What if someone sends fake CloudTrail events to the EventBridge bus? HMAC/signature validation?
Slack interactive payload signature: Slack sends a signing secret with interactive payloads. No test validates this.
Cross-account IAM role revocation: Customer revokes the dd0c role between alert and remediation click.
Remediation authorization: Who can click "Terminate"? No RBAC tests for remediation actions.

Data Integrity

CloudTrail event deduplication: CloudTrail can send duplicate events. SQS FIFO dedup is mentioned in Epic 1.2 but has zero tests.
Baseline corruption recovery: What if a DynamoDB write partially fails and corrupts the running mean/stddev? No recovery tests.
Pricing table staleness: Static pricing tables will become stale. No test validates that the system handles unknown instance types gracefully beyond the single "fallback pricing" test.
Cost calculation precision: Floating-point arithmetic on money. No tests for rounding behavior or currency precision.

Operational

DLQ overflow: What happens when the DLQ fills up? No backpressure tests.
Multi-tenant isolation: No tests ensuring one tenant's anomalies don't leak to another tenant's Slack channel.
Account onboarding race condition: What if CloudTrail events arrive before the account is fully onboarded?

8. Top 5 Recommendations (Prioritized)

Expand to cover all 10 epics. 5 epics have zero tests. At minimum, add unit test stubs for Zombie Hunter (Epic 3), Onboarding (Epic 5), and Dashboard API (Epic 6). These are customer-facing features.
Rewrite Section 8 (Transparent Factory) from scratch. 3 tests across 2 tenets is unacceptable. Every tenet needs 5-10 tests. The Elastic Schema and Semantic Observability sections are completely empty.
Add property-based testing for the anomaly math. The Welford algorithm, Z-score calculation, and composite scoring are numerical — they need fast-check or equivalent, not just example-based tests. Test edge cases: zero stddev, single data point, NaN, Infinity, negative costs.
Add security tests. Slack payload signature validation, CloudTrail event authenticity, cross-account IAM revocation handling, remediation RBAC. This product executes StopInstances and DeleteDBInstance — security testing is non-negotiable.
Expand performance section to 10+ tests. Burst ingestion, baseline calculation at scale, DynamoDB hot partitions, SQS FIFO throughput limits, Lambda cold starts. The current 2 tests give zero confidence in production readiness.

This document needs a complete rewrite before it can guide TDD implementation. The scoring engine tests are a good start, but everything else is a placeholder.

9.5 KiB Raw Blame History