Add Gemini TDD reviews for all 6 products

P1, P2, P3, P4, P6 reviewed by Gemini subagents.
P5 reviewed manually (Gemini credential errors).
All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.
This commit is contained in:
2026-03-01 00:29:24 +00:00
parent 1101fef096
commit 2fe0ed856e
6 changed files with 501 additions and 0 deletions

View File

@@ -0,0 +1,135 @@
# dd0c/cost — Test Architecture Review
**Reviewer:** TDD Consultant (Manual Review)
**Date:** March 1, 2026
**Verdict:** 🔴 NEEDS SIGNIFICANT WORK — This is the weakest test architecture of all 6 products.
---
## 1. Coverage Analysis
This document is 232 lines. For comparison, P1 (route) is 2,241 lines and P6 (run) is 1,762 lines. The coverage gaps are massive.
| Epic | Coverage Status | Notes |
|------|----------------|-------|
| Epic 1: CloudTrail Ingestion | ⚠️ Partial | Section 3.1 has 5 test cases for the normalizer. Missing: SQS FIFO deduplication tests, DLQ retry behavior, EventBridge cross-account rule tests, S3 raw event archival. Story 1.2 (SQS + DLQ) has zero dedicated tests. |
| Epic 2: Anomaly Detection | ✅ Decent | Section 3.2 covers Z-score, novelty, cold-start. But missing: composite score weighting tests, edge cases (zero stddev, negative costs, NaN handling), baseline maturity transition tests. |
| Epic 3: Zombie Hunter | ❌ Missing | Zero test cases. The daily scan for idle/stopped resources that are still costing money has no tests at all. |
| Epic 4: Notification & Remediation | ⚠️ Thin | Section 4.2 has 3 integration tests for cross-account actions. Missing: Slack Block Kit formatting tests, daily digest aggregation, snooze/dismiss logic, interactive payload signature validation. |
| Epic 5: Onboarding & PLG | ❌ Missing | Zero test cases. CloudFormation template generation, Stripe billing, free tier enforcement — none tested. |
| Epic 6: Dashboard API | ❌ Missing | Zero test cases. REST API endpoints, tenant isolation, query performance — nothing. |
| Epic 7: Dashboard UI | ❌ Missing | Zero test cases. |
| Epic 8: Infrastructure (CDK) | ❌ Missing | Zero test cases. No CDK snapshot tests, no infrastructure drift detection (ironic). |
| Epic 9: Multi-Account Management | ❌ Missing | Zero test cases. Account linking, bulk scanning, cross-account permissions — nothing. |
| Epic 10: Transparent Factory | 🔴 Skeletal | Section 8 has exactly 3 test cases total across 2 of 5 tenets. Elastic Schema, Cognitive Durability, and Semantic Observability have zero tests. |
**Bottom line:** 5 of 10 epics have zero test coverage in this document. This is a skeleton, not a test architecture.
---
## 2. TDD Workflow Critique
The philosophy in Section 1 is sound — "test the math first" is correct for an anomaly detection product. But the execution is incomplete:
- The "strict TDD" list correctly identifies scoring and governance as test-first. Good.
- The "integration tests lead" for CloudTrail ingestion is acceptable.
- **Missing:** No guidance on testing the Welford algorithm implementation. This is a numerical algorithm with known floating-point edge cases (catastrophic cancellation with large values). The test architecture should mandate property-based testing (e.g., `fast-check`) for the baseline calculator, not just 3 example-based tests.
- **Missing:** No guidance on testing the 14-day auto-promotion state machine. This is a time-dependent state transition that needs fake clock testing.
---
## 3. Test Pyramid Balance
The 70/20/10 ratio is stated but not justified. For dd0c/cost:
- **Unit tests should be higher (80%)** — the anomaly scoring engine is pure math. It should have exhaustive property-based tests, not just 50 example tests.
- **Integration tests (15%)** — DynamoDB Single-Table patterns, EventBridge→SQS→Lambda pipeline, cross-account STS.
- **E2E (5%)** — two journeys is fine for V1, but they need to be more detailed.
The current Section 6 (Performance) has exactly 2 test cases. For a product that processes CloudTrail events at scale, this is dangerously thin.
---
## 4. Anti-Patterns
1. **Section 3.3 — Welford Algorithm:** Only 3 tests for a numerical algorithm. This is the "happy path only" anti-pattern. Missing: what happens when stddev is 0 (division by zero in Z-score)? What happens with a single data point? What happens with extremely large values (float overflow)?
2. **Section 4.1 — DynamoDB Transaction Test:** "writes CostEvent and updates Baseline in single transaction" — this tests the happy path. Where's the test for transaction failure? DynamoDB transactions can fail due to conflicts, and the system must handle partial writes.
3. **Section 5 — E2E Journeys:** Journey 2 tests "Stop Instance" remediation but doesn't test what happens when the customer's IAM role has been revoked between alert and remediation click. This is a real-world race condition.
4. **No negative tests anywhere.** What happens when CloudTrail sends malformed JSON? What happens when the pricing table doesn't have the instance type? (Section 3.1 mentions "fallback pricing" but there's only 1 test for it.)
---
## 5. Transparent Factory Gaps
Section 8 is the biggest problem. It has 3 test cases across 2 tenets. Here's what's missing:
### Atomic Flagging (1 test → needs ~10)
- Missing: flag default state (off), flag TTL enforcement, flag owner metadata, local evaluation (no network calls), CI block on expired flags, multiple concurrent flags.
- The single circuit breaker test uses ">10 alerts/hour" but Epic 10.1 specifies ">3x baseline" — inconsistency.
### Elastic Schema (0 tests → needs ~8)
- Zero tests. Need: migration lint (no DROP/RENAME/TYPE), additive-only DynamoDB attribute changes, V1 code ignoring V2 attributes, sunset date enforcement, dual-write during migration.
### Cognitive Durability (0 tests → needs ~5)
- Zero tests. Need: decision log schema validation, CI enforcement for scoring PRs, cyclomatic complexity gate, decision log presence check.
### Semantic Observability (0 tests → needs ~8)
- Zero tests. Need: OTEL span emission on every anomaly scoring decision, span attributes (cost.anomaly_score, cost.z_score, cost.baseline_days), PII protection (account ID hashing), fast-path span attributes.
### Configurable Autonomy (2 tests → needs ~8)
- The 14-day auto-promotion tests are good but incomplete. Missing: panic mode activation (<1s), panic mode stops all alerting, per-account governance override, policy decision logging, governance drift monitoring.
---
## 6. Performance Test Gaps
Section 6 has 2 tests. For a real-time cost monitoring product, this is inadequate:
- **Missing:** Burst ingestion (what happens when 1000 CloudTrail events arrive in 1 second during an auto-scaling event?)
- **Missing:** Baseline calculation performance with 90 days of historical data per account
- **Missing:** Anomaly scoring latency under concurrent multi-account evaluation
- **Missing:** DynamoDB hot partition detection (all events for one account hitting the same partition key)
- **Missing:** SQS FIFO throughput limits (300 msg/s per message group — what happens when a large account exceeds this?)
- **Missing:** Lambda cold start impact on end-to-end latency
---
## 7. Missing Test Scenarios
### Security
- **CloudTrail event forgery:** What if someone sends fake CloudTrail events to the EventBridge bus? HMAC/signature validation?
- **Slack interactive payload signature:** Slack sends a signing secret with interactive payloads. No test validates this.
- **Cross-account IAM role revocation:** Customer revokes the dd0c role between alert and remediation click.
- **Remediation authorization:** Who can click "Terminate"? No RBAC tests for remediation actions.
### Data Integrity
- **CloudTrail event deduplication:** CloudTrail can send duplicate events. SQS FIFO dedup is mentioned in Epic 1.2 but has zero tests.
- **Baseline corruption recovery:** What if a DynamoDB write partially fails and corrupts the running mean/stddev? No recovery tests.
- **Pricing table staleness:** Static pricing tables will become stale. No test validates that the system handles unknown instance types gracefully beyond the single "fallback pricing" test.
- **Cost calculation precision:** Floating-point arithmetic on money. No tests for rounding behavior or currency precision.
### Operational
- **DLQ overflow:** What happens when the DLQ fills up? No backpressure tests.
- **Multi-tenant isolation:** No tests ensuring one tenant's anomalies don't leak to another tenant's Slack channel.
- **Account onboarding race condition:** What if CloudTrail events arrive before the account is fully onboarded?
---
## 8. Top 5 Recommendations (Prioritized)
1. **Expand to cover all 10 epics.** 5 epics have zero tests. At minimum, add unit test stubs for Zombie Hunter (Epic 3), Onboarding (Epic 5), and Dashboard API (Epic 6). These are customer-facing features.
2. **Rewrite Section 8 (Transparent Factory) from scratch.** 3 tests across 2 tenets is unacceptable. Every tenet needs 5-10 tests. The Elastic Schema and Semantic Observability sections are completely empty.
3. **Add property-based testing for the anomaly math.** The Welford algorithm, Z-score calculation, and composite scoring are numerical — they need `fast-check` or equivalent, not just example-based tests. Test edge cases: zero stddev, single data point, NaN, Infinity, negative costs.
4. **Add security tests.** Slack payload signature validation, CloudTrail event authenticity, cross-account IAM revocation handling, remediation RBAC. This product executes `StopInstances` and `DeleteDBInstance` — security testing is non-negotiable.
5. **Expand performance section to 10+ tests.** Burst ingestion, baseline calculation at scale, DynamoDB hot partitions, SQS FIFO throughput limits, Lambda cold starts. The current 2 tests give zero confidence in production readiness.
---
*This document needs a complete rewrite before it can guide TDD implementation. The scoring engine tests are a good start, but everything else is a placeholder.*