Add Gemini TDD reviews for all 6 products
P1, P2, P3, P4, P6 reviewed by Gemini subagents. P5 reviewed manually (Gemini credential errors). All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.
This commit is contained in:
75
products/03-alert-intelligence/test-architecture/review.md
Normal file
75
products/03-alert-intelligence/test-architecture/review.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# TDD Consultant Review: dd0c/alert Test Architecture
|
||||
**Reviewer:** Max Mayfield (Senior TDD Consultant)
|
||||
**Date:** Feb 28, 2026
|
||||
|
||||
Alright, I read through the test architecture and the epics. You've got a lot of buzzwords in here, and the general structure isn't terrible, but there are some gaping holes you're casually ignoring. A safety-critical observability tool that "never eats alerts" shouldn't have blind spots this big.
|
||||
|
||||
Here is my brutal, unfiltered breakdown.
|
||||
|
||||
---
|
||||
|
||||
## 1. Coverage Gaps Per Epic
|
||||
Let's compare your `test-architecture.md` to `epics.md`. You conveniently "forgot" to write tests for half the product.
|
||||
|
||||
* **Epic 1 (Webhook Ingestion) & Epic 2:** Story 1.4 requires saving the raw payload asynchronously to S3 for audit/replay. I see *zero* mention of S3 storage testing in your architecture. If that async save fails, you lose the audit trail.
|
||||
* **Epic 5 (Slack Bot):** Story 5.3 introduces interactive feedback actions (clicking buttons in Slack). Section 3.6 tests the *formatting* of these buttons, but there are no tests for the API Gateway endpoint that receives the Slack interaction payload, validates it, and updates the DB.
|
||||
* **Epic 6 (Dashboard API) & Epic 7 (Dashboard UI):** Completely AWOL. You have a whole API with JWT auth, pagination, and analytics endpoints (Stories 6.1 - 6.4), plus a React SPA (Stories 7.1 - 7.4). Your test architecture stops at Slack. How are you testing JWT validation? Tenant isolation on API reads? React component rendering? Total ghost town.
|
||||
* **Epic 9 (Onboarding/PLG):** Story 9.4 mandates free tier limitations (max 10,000 alerts/month, purging data after 7 days). Where are the tests proving the ingestion API drops payloads at the quota limit? Missing.
|
||||
|
||||
## 2. TDD Workflow Critique
|
||||
Your Section 1.2 "Red-Green-Refactor" adaptation is mostly fine until you start making excuses.
|
||||
|
||||
* **"When integration tests lead (test-after)"**: Classic trap. Writing parsers against real payloads and "locking them in" later means your design is tightly coupled to the implementation from day one. You should still write the failing unit test for the parser behavior *first*, using the fixture as the input.
|
||||
* **"When E2E tests lead"**: Building backward from a 60-second TTV happy path isn't TDD, it's ATDD/BDD. That's fine, but call it what it is. Don't pretend you're doing strict TDD here.
|
||||
* **Mocking Pure Functions:** You claim parsers are pure functions in 3.1, which is great. But then in your contract tests (4.1), you spin up LocalStack and fire HTTP requests just to see if the parser output hits SQS. You're bleeding domain logic into integration tests.
|
||||
|
||||
## 3. Test Pyramid Balance
|
||||
You claim 70% Unit, 20% Integration, 10% E2E. For a "solo founder," 20 E2E tests and 100 Integration tests running on LocalStack + Testcontainers (Redis, DynamoDB, Timescale, WireMock) is going to be an anchor around your neck.
|
||||
|
||||
* Your CI pipeline (7.2) claims the PR Gate takes <5 mins with Testcontainers spinning up. I'll believe that when I see it. Java/Node + 4 heavy Docker containers pulling and starting will easily push you past 10 minutes unless your caching is god-tier.
|
||||
* You are over-relying on heavy integration tests for things that should be tested in-memory with repository fakes. You have `ioredis-mock` in unit tests (3.4) but then test the exact same TTL logic in integration (4.2). Pick a lane.
|
||||
|
||||
## 4. Anti-Patterns
|
||||
You've got a few nasty anti-patterns mixed in with your "best practices."
|
||||
|
||||
* **Mocking Pure Logic:** You say `ioredis-mock` is for "unit tests" (3.4). Redis *is* infrastructure. If you're unit testing the correlation engine with `ioredis-mock`, you aren't doing unit testing; you're doing flaky integration testing with a half-baked mock. A true unit test should use an in-memory repository fake implementing a `WindowStore` interface. Save Redis for the integration adapter test.
|
||||
* **Time Manipulation with Sinon:** Section 3.4 explicitly calls out `sinon.useFakeTimers()` for the correlation engine. This is a disaster waiting to happen across async boundaries, especially with `ioredis-mock`. Pass a `Clock` interface or a `getCurrentTime()` function into your correlation engine instead of monkey-patching the Node.js event loop.
|
||||
* **Testing Slack Rate Limits with WireMock:** In 4.5, you test Slack rate limits (1 msg/sec) using WireMock. WireMock won't accurately simulate Slack's complex rate limiting (bucket depletion and `Retry-After` headers) unless you script it perfectly. This will give you a false sense of security.
|
||||
* **Table-Driven Tests mention:** You claim heavy use of table-driven tests (3.4), but your pseudo-code is just standard `describe/it` blocks. If you are doing TDD for scoring, you should be using `.each` arrays defining input state -> expected score.
|
||||
|
||||
## 5. Transparent Factory Tenet Gaps
|
||||
Your TF Tenets (Section 8) look nice on paper, but I found the holes.
|
||||
|
||||
* **Atomic Flagging (8.1):** You test the circuit breaker, which is good. But where is the test for the OpenFeature provider itself? How do you guarantee the feature flags fall back to the safe default (off) when the JSON file provider or env vars are missing/malformed? Missing.
|
||||
* **Elastic Schema (8.2):** Migration linting is cool, but what about the dual-write rollback scenarios? If V2 deployment fails, can V1 code still read the alerts written during the 5-minute canary? No tests for backward compatibility during the migration window.
|
||||
* **Semantic Observability (8.4):** You assert spans are created, but you completely miss **Trace Context Propagation**. If the `trace_id` isn't passed from API Gateway -> Lambda -> SQS -> ECS Correlation Engine, your observability is dead on arrival. You need a test proving that the span parent-child relationship crosses the SQS boundary correctly.
|
||||
|
||||
## 6. Performance Test Gaps
|
||||
You wrote tests for 1000 webhooks/second (6.1). Cool. But you forgot the real world.
|
||||
|
||||
* **Payload Size Spikes:** You're testing with `makeAlertPayload()`, which generates a tiny JSON. What happens when a Datadog monitor with 50 tags and a huge custom message drops a 2MB payload on your API Gateway? Does Lambda choke? Does SQS reject it (SQS limit is 256KB)? Missing test.
|
||||
* **Database Growth Over Time:** Your TimescaleDB test (6.2) assumes a clean slate. What's the query latency after 30 days of continuous aggregates and millions of rows?
|
||||
* **DynamoDB Cold Starts:** No tests for DynamoDB throttling under burst. A 500-alert storm hitting a new tenant partition will get rate-limited.
|
||||
|
||||
## 7. Missing Scenarios (The Reality Check)
|
||||
A system like dd0c/alert isn't going to break on the happy path. It's going to fail on edge cases you haven't considered.
|
||||
|
||||
* **HMAC Bypass (3.2):** What if the signature header is sent as `dd-webhook-signature` vs `DD-WEBHOOK-SIGNATURE`? What if it's completely omitted? You claim to test missing headers, but you missed testing timing attacks on the string comparison itself. Use `crypto.timingSafeEqual` and write a test that enforces it's not a generic `===` comparison.
|
||||
* **Multi-Tenancy Bleed (4.3):** You test that Redis windows isolate per tenant (4.2), but you forgot to test DynamoDB queries. What happens if a bad `partition_key` logic leaks `tenant_A` incidents into `tenant_B`'s dashboard? You need a strict assertion enforcing Tenant ID injection on all Data Access Objects.
|
||||
* **DLQ Overflow (8.1):** Your circuit breaker replays from the DLQ. What if the DLQ has 100,000 alerts? Replaying them all at once into the correlation engine will just trip the breaker again or OOM the ECS task. You need a backpressure or batching test for the replay mechanism.
|
||||
* **Slack Rate Limits (4.5):** You test retries on 429, but what if Slack totally blocks your app due to sustained 429s (like 1,000 requests in a minute)? You don't have a test for a circuit breaker *to* Slack.
|
||||
|
||||
---
|
||||
|
||||
## 8. Top 5 Recommendations
|
||||
Before you write another line of code, fix these.
|
||||
|
||||
1. **Stop Ignoring Half the App:** You need an Epic 6 and Epic 7 section in this architecture document. Test the API JWT auth, pagination, tenant isolation, and the React SPA. Also, test the OAuth signup and free tier limits from Epic 9. A solo founder can't afford a broken onboarding flow.
|
||||
2. **Fix the Anti-Pattern in Correlation Engine Tests:** Drop `ioredis-mock` and `sinon.useFakeTimers()`. Create a `WindowStore` interface and use an in-memory implementation for unit tests. Inject a clock. Save the real Redis and Testcontainers for a single integration test class.
|
||||
3. **Add Trace Context Propagation Tests:** If the trace ID gets lost between API Gateway, Lambda, SQS, and the ECS Fargate worker, your Semantic Observability is useless. Write a test asserting that the span parent-child relationship crosses the SQS boundary correctly.
|
||||
4. **Handle the Edge Cases (HMAC & DLQ):** Write a test specifically enforcing `crypto.timingSafeEqual` for HMAC. For the DLQ replay, add a test that asserts backpressure—replay the alerts in manageable batches, not a 100k message firehose.
|
||||
5. **Test Large Payloads (The 256KB SQS Limit):** SQS has a hard 256KB limit. Write a test where Datadog sends a 500KB payload (which happens with large stack traces or massive tag arrays). Your ingestion Lambda needs to compress it, chunk it, or strip unnecessary fields *before* hitting SQS.
|
||||
|
||||
You're trying to move fast, which is fine, but you're missing the exact things that will wake you up at 3 AM. Fix these, and you'll actually have a "safety-critical observability tool."
|
||||
|
||||
— Max
|
||||
Reference in New Issue
Block a user