Files
Max Mayfield 2fe0ed856e Add Gemini TDD reviews for all 6 products
P1, P2, P3, P4, P6 reviewed by Gemini subagents.
P5 reviewed manually (Gemini credential errors).
All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.
2026-03-01 00:29:24 +00:00

9.3 KiB

TDD Consultant Review: dd0c/alert Test Architecture

Reviewer: Max Mayfield (Senior TDD Consultant) Date: Feb 28, 2026

Alright, I read through the test architecture and the epics. You've got a lot of buzzwords in here, and the general structure isn't terrible, but there are some gaping holes you're casually ignoring. A safety-critical observability tool that "never eats alerts" shouldn't have blind spots this big.

Here is my brutal, unfiltered breakdown.


1. Coverage Gaps Per Epic

Let's compare your test-architecture.md to epics.md. You conveniently "forgot" to write tests for half the product.

  • Epic 1 (Webhook Ingestion) & Epic 2: Story 1.4 requires saving the raw payload asynchronously to S3 for audit/replay. I see zero mention of S3 storage testing in your architecture. If that async save fails, you lose the audit trail.
  • Epic 5 (Slack Bot): Story 5.3 introduces interactive feedback actions (clicking buttons in Slack). Section 3.6 tests the formatting of these buttons, but there are no tests for the API Gateway endpoint that receives the Slack interaction payload, validates it, and updates the DB.
  • Epic 6 (Dashboard API) & Epic 7 (Dashboard UI): Completely AWOL. You have a whole API with JWT auth, pagination, and analytics endpoints (Stories 6.1 - 6.4), plus a React SPA (Stories 7.1 - 7.4). Your test architecture stops at Slack. How are you testing JWT validation? Tenant isolation on API reads? React component rendering? Total ghost town.
  • Epic 9 (Onboarding/PLG): Story 9.4 mandates free tier limitations (max 10,000 alerts/month, purging data after 7 days). Where are the tests proving the ingestion API drops payloads at the quota limit? Missing.

2. TDD Workflow Critique

Your Section 1.2 "Red-Green-Refactor" adaptation is mostly fine until you start making excuses.

  • "When integration tests lead (test-after)": Classic trap. Writing parsers against real payloads and "locking them in" later means your design is tightly coupled to the implementation from day one. You should still write the failing unit test for the parser behavior first, using the fixture as the input.
  • "When E2E tests lead": Building backward from a 60-second TTV happy path isn't TDD, it's ATDD/BDD. That's fine, but call it what it is. Don't pretend you're doing strict TDD here.
  • Mocking Pure Functions: You claim parsers are pure functions in 3.1, which is great. But then in your contract tests (4.1), you spin up LocalStack and fire HTTP requests just to see if the parser output hits SQS. You're bleeding domain logic into integration tests.

3. Test Pyramid Balance

You claim 70% Unit, 20% Integration, 10% E2E. For a "solo founder," 20 E2E tests and 100 Integration tests running on LocalStack + Testcontainers (Redis, DynamoDB, Timescale, WireMock) is going to be an anchor around your neck.

  • Your CI pipeline (7.2) claims the PR Gate takes <5 mins with Testcontainers spinning up. I'll believe that when I see it. Java/Node + 4 heavy Docker containers pulling and starting will easily push you past 10 minutes unless your caching is god-tier.
  • You are over-relying on heavy integration tests for things that should be tested in-memory with repository fakes. You have ioredis-mock in unit tests (3.4) but then test the exact same TTL logic in integration (4.2). Pick a lane.

4. Anti-Patterns

You've got a few nasty anti-patterns mixed in with your "best practices."

  • Mocking Pure Logic: You say ioredis-mock is for "unit tests" (3.4). Redis is infrastructure. If you're unit testing the correlation engine with ioredis-mock, you aren't doing unit testing; you're doing flaky integration testing with a half-baked mock. A true unit test should use an in-memory repository fake implementing a WindowStore interface. Save Redis for the integration adapter test.
  • Time Manipulation with Sinon: Section 3.4 explicitly calls out sinon.useFakeTimers() for the correlation engine. This is a disaster waiting to happen across async boundaries, especially with ioredis-mock. Pass a Clock interface or a getCurrentTime() function into your correlation engine instead of monkey-patching the Node.js event loop.
  • Testing Slack Rate Limits with WireMock: In 4.5, you test Slack rate limits (1 msg/sec) using WireMock. WireMock won't accurately simulate Slack's complex rate limiting (bucket depletion and Retry-After headers) unless you script it perfectly. This will give you a false sense of security.
  • Table-Driven Tests mention: You claim heavy use of table-driven tests (3.4), but your pseudo-code is just standard describe/it blocks. If you are doing TDD for scoring, you should be using .each arrays defining input state -> expected score.

5. Transparent Factory Tenet Gaps

Your TF Tenets (Section 8) look nice on paper, but I found the holes.

  • Atomic Flagging (8.1): You test the circuit breaker, which is good. But where is the test for the OpenFeature provider itself? How do you guarantee the feature flags fall back to the safe default (off) when the JSON file provider or env vars are missing/malformed? Missing.
  • Elastic Schema (8.2): Migration linting is cool, but what about the dual-write rollback scenarios? If V2 deployment fails, can V1 code still read the alerts written during the 5-minute canary? No tests for backward compatibility during the migration window.
  • Semantic Observability (8.4): You assert spans are created, but you completely miss Trace Context Propagation. If the trace_id isn't passed from API Gateway -> Lambda -> SQS -> ECS Correlation Engine, your observability is dead on arrival. You need a test proving that the span parent-child relationship crosses the SQS boundary correctly.

6. Performance Test Gaps

You wrote tests for 1000 webhooks/second (6.1). Cool. But you forgot the real world.

  • Payload Size Spikes: You're testing with makeAlertPayload(), which generates a tiny JSON. What happens when a Datadog monitor with 50 tags and a huge custom message drops a 2MB payload on your API Gateway? Does Lambda choke? Does SQS reject it (SQS limit is 256KB)? Missing test.
  • Database Growth Over Time: Your TimescaleDB test (6.2) assumes a clean slate. What's the query latency after 30 days of continuous aggregates and millions of rows?
  • DynamoDB Cold Starts: No tests for DynamoDB throttling under burst. A 500-alert storm hitting a new tenant partition will get rate-limited.

7. Missing Scenarios (The Reality Check)

A system like dd0c/alert isn't going to break on the happy path. It's going to fail on edge cases you haven't considered.

  • HMAC Bypass (3.2): What if the signature header is sent as dd-webhook-signature vs DD-WEBHOOK-SIGNATURE? What if it's completely omitted? You claim to test missing headers, but you missed testing timing attacks on the string comparison itself. Use crypto.timingSafeEqual and write a test that enforces it's not a generic === comparison.
  • Multi-Tenancy Bleed (4.3): You test that Redis windows isolate per tenant (4.2), but you forgot to test DynamoDB queries. What happens if a bad partition_key logic leaks tenant_A incidents into tenant_B's dashboard? You need a strict assertion enforcing Tenant ID injection on all Data Access Objects.
  • DLQ Overflow (8.1): Your circuit breaker replays from the DLQ. What if the DLQ has 100,000 alerts? Replaying them all at once into the correlation engine will just trip the breaker again or OOM the ECS task. You need a backpressure or batching test for the replay mechanism.
  • Slack Rate Limits (4.5): You test retries on 429, but what if Slack totally blocks your app due to sustained 429s (like 1,000 requests in a minute)? You don't have a test for a circuit breaker to Slack.

8. Top 5 Recommendations

Before you write another line of code, fix these.

  1. Stop Ignoring Half the App: You need an Epic 6 and Epic 7 section in this architecture document. Test the API JWT auth, pagination, tenant isolation, and the React SPA. Also, test the OAuth signup and free tier limits from Epic 9. A solo founder can't afford a broken onboarding flow.
  2. Fix the Anti-Pattern in Correlation Engine Tests: Drop ioredis-mock and sinon.useFakeTimers(). Create a WindowStore interface and use an in-memory implementation for unit tests. Inject a clock. Save the real Redis and Testcontainers for a single integration test class.
  3. Add Trace Context Propagation Tests: If the trace ID gets lost between API Gateway, Lambda, SQS, and the ECS Fargate worker, your Semantic Observability is useless. Write a test asserting that the span parent-child relationship crosses the SQS boundary correctly.
  4. Handle the Edge Cases (HMAC & DLQ): Write a test specifically enforcing crypto.timingSafeEqual for HMAC. For the DLQ replay, add a test that asserts backpressure—replay the alerts in manageable batches, not a 100k message firehose.
  5. Test Large Payloads (The 256KB SQS Limit): SQS has a hard 256KB limit. Write a test where Datadog sends a 500KB payload (which happens with large stack traces or massive tag arrays). Your ingestion Lambda needs to compress it, chunk it, or strip unnecessary fields before hitting SQS.

You're trying to move fast, which is fine, but you're missing the exact things that will wake you up at 3 AM. Fix these, and you'll actually have a "safety-critical observability tool."

— Max