Files

Max Mayfield 2fe0ed856e Add Gemini TDD reviews for all 6 products

P1, P2, P3, P4, P6 reviewed by Gemini subagents.
P5 reviewed manually (Gemini credential errors).
All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.

2026-03-01 00:29:24 +00:00

9.2 KiB

Raw Blame History

Test Architecture Review: dd0c/drift

Reviewer: Max Mayfield (Senior TDD Consultant) Date: February 28, 2026

Alright, let's get real. I read through the test-architecture.md and the epics.md documents. You talk a big game about "correctness is non-negotiable" and the secret scrubber being untouchable. I respect that. But when we actually look at how this maps to the product epics, the cracks start showing. You're hiding behind "too many moving parts" to avoid unit testing the hard stuff, and your UI testing strategy is completely MIA.

Here is my reality check on your testing strategy.

1. Coverage Gaps Per Epic

I cross-referenced your architecture with the V1 Epics. Here’s what you conveniently forgot to test:

Epic 6 (Dashboard UI): You wrote an entire testing architecture document and completely ignored the frontend. Where are the React Testing Library specs? Where is Playwright/Cypress for the SPA? You have zero UI test targets in Section 2.2. If the frontend can't render the diff correctly or the OAuth flow breaks, your pristine Go agent doesn't matter.
Epic 9 (Onboarding & PLG):
- Story 9.4 (Stripe Billing): No mention of testing Stripe webhooks. How are you validating signature verification locally? Where is the stripe-cli mock in your Docker Compose setup?
- Story 9.2 (drift init): How are you unit testing local filesystem traversal across Windows, Mac, and Linux without making it a flaky mess?
Epic 8 (Infrastructure): Story 8.1 defines your infrastructure in Terraform. Where is Terratest? You're building an IaC drift detection tool but not testing your own IaC. That's ironic.
Epic 2 (Agent Communication): Story 2.1 mentions mTLS certificate generation and exchange. You haven't documented a single test for certificate expiration, rotation, or revocation handling.

2. TDD Workflow Critique

Your adaptation of TDD is pragmatic, but you're giving yourself too many free passes.

"Too many moving parts to unit test first": You explicitly called out the Remediation Round-Trip (Slack button → agent apply) and the Onboarding flow as things you'll E2E test first. That's a classic excuse for coupled architecture. If an orchestrator has too many moving parts to unit test, decouple the orchestrator. You should be able to unit test the remediation dispatcher with stubs before firing up LocalStack.
CI/CD Pipeline Configuration: "Validate by running it, not by unit testing YAML." Fair, but at least use something like actionlint for GitHub Actions. Don't just push and pray.
Test-After for DynamoDB: Writing the schema and then integration testing against DynamoDB Local is fine, but you should still unit test your repository interfaces using mocks to verify domain logic independently of the DB.

3. Test Pyramid Balance

You’ve got a 10% E2E, 20% Integration, and 70% Unit split. Targeting ~500 tests.

The Problem with 50 E2E Tests: Running 50 E2E tests against LocalStack and a real PostgreSQL DB inside your CI pipeline will be brutally slow. LocalStack is heavy. You won't hit that < 5 min PR promise with 50 E2E flows unless you heavily parallelize, and if you do, your LocalStack instance will probably buckle under the load or you'll hit state contention.
The Fix: You need a "Smoke" tier separated from E2E. You only need 5-10 E2E critical paths (like the Onboarding and Revert flows). The rest of those E2E tests should be pushed down to Integration or UI component tests. Stop counting tests as a vanity metric and focus on coverage.

4. Anti-patterns

I saw some code smells in your examples that need addressing before they become a nightmare:

Table-Driven Tests: Your Go table-driven pattern in Section 3.1 is standard, but you're missing t.Parallel() inside the t.Run() closure. Without it, those tests run sequentially, wasting time.
Shared LocalStack State: You noted go test -p 4 ./... (4 packages in parallel) but then said integration tests must not be parallelized within packages. If Package A and Package B both talk to LocalStack S3 at the same time, they will step on each other unless buckets and resources are dynamically named per-test. You haven't mentioned dynamic naming. That's a one-way ticket to flaky CI.
Time-Based Polling: waitForRemediationPlan(t, driftEvent.ID, 5*time.Second) — Polling is better than hard sleeping, but relying on real time in E2E tests is a smell. Use deterministic event triggers or WebSockets/SQS mocks to know exactly when the plan is ready.

5. Transparent Factory Tenet Gaps

You talked a lot about Epic 10 compliance, but you missed some of the hardest parts to test.

Atomic Flagging (10.1): You test that the circuit breaker trips on false positives, but how is the distributed state (Redis/Dynamo) of those dismissals tested across multiple agent instances? You only showed an in-memory OpenFeature provider.
Elastic Schema (10.2): Validating additive schema changes via CI linting is cute. But you aren't testing the actual agent reading V1 and V2 items concurrently. What happens when an old agent parses a new DynamoDB item with _v2 attributes? Where is the backward-compatibility serialization test?
Semantic Observability (10.4): You assert that OTEL spans are emitted per resource (good), but there’s no trace continuity test. Does the drift_scan parent span correctly link to the POST /v1/drift-reports HTTP trace and then the SQS consumer trace? That's the entire point of distributed tracing, and your test architecture completely ignores the boundary crossings.
Configurable Autonomy (10.5): Panic mode is tested (good). But what about race conditions when panic mode triggers mid-remediation? Are there tests for in-flight command abortion?

6. Performance Test Gaps

Your section 6 metrics are aggressive (< 50ms for 100 resources), but you missed the boat on the backend.

Database Connection Pool Exhaustion: What happens to PostgreSQL when 1,000 agents heartbeat at the same time? You need a load test specifically for the RDS proxy or connection pooler.
SQS FIFO Limits: SQS FIFO standard throughput is 300 messages/sec. Your k6 load test target is 50 concurrent agents. What happens at 500 agents? You need a stress test that hits the SaaS with a thundering herd of drift reports, not just a gradual ramp-up.
SaaS API Memory Profile: You benchmarked the Go agent, but the Node.js/TypeScript Event Processor is a black box. Where are the V8 memory profiling tests to ensure it doesn't OOM while processing a 1MB drift report payload?

7. Missing Scenarios

You overlooked a few critical edge cases that keep SREs up at night:

Security (RBAC/IAM forgery): You test RLS for multi-tenancy (excellent). But what about an agent attempting to impersonate another agent by forging a stack_id in its payload, while using a valid API key? The API must cross-check the API key's org_id against the payload's stack_id ownership.
Security (Replay Attacks): What prevents a malicious actor from capturing a drift report payload and replaying it endlessly to DoS your Event Processor or skew the drift score?
Multi-tenancy Noisy Neighbor: If Org A has 10,000 drifted resources and Org B has 10, does Org A delay Org B's SQS processing? You need test cases for priority queueing or fair-share processing.
Race Conditions (Duplicate Processing): FIFO deduplication helps, but exact-once processing logic inside the Event Processor DB transaction isn't explicitly tested for unique constraints. What if the same report_id makes it past SQS? Does PostgreSQL block it?
Race Conditions (Remediation vs. Polling): The agent starts a remediation apply (Epic 7.1), but the scheduled drift scan (Epic 1.2) kicks off concurrently. Who wins? Does the scan report half-applied state?

8. Top 5 Recommendations

This architecture is solid overall. It has a high bar for correctness, and the 100% scrubber coverage rule is perfect. But to fix the gaps, do this immediately:

Stop Ignoring the UI: Add a dedicated UI testing section (Epic 6) to your architecture using React Testing Library for components and Playwright for the critical path (Auth -> View Stack -> Diff Viewer). If the dashboard is broken, your agent's perfect diff is useless.
Mock the Orchestrator for TDD: Your excuse for not unit-testing the remediation round-trip and onboarding flow is weak. Use stubs for AWS and GitHub API boundaries so you can TDD the orchestrator logic before hitting the slow LocalStack E2E layer.
Trim the Fat on E2E Tests: Cap your E2E suite at 10 high-value flows. If you push 50 LocalStack integration tests into every PR, your CI will crawl and your engineers will start ignoring failures. Push the rest down to integration tests with mock HTTP services.
Test the Infrastructure Code: You are literally building an IaC drift detection tool. Epic 8.1 needs Terratest or an equivalent to validate your own Terraform. Dogfooding starts at home.
Fix Distributed Tracing Gaps: Add cross-boundary assertions for Epic 10.4. Ensure the drift_scan parent span generated in Go correctly passes its W3C trace context to the TypeScript API, through SQS, and into the Event Processor.

This is Max, signing off. Make it work.

9.2 KiB Raw Blame History Unescape Escape