dd0c/products/01-llm-cost-router/test-architecture/review.md at main

Files

Max Mayfield 2fe0ed856e Add Gemini TDD reviews for all 6 products

P1, P2, P3, P4, P6 reviewed by Gemini subagents.
P5 reviewed manually (Gemini credential errors).
All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.

2026-03-01 00:29:24 +00:00

5.8 KiB

Raw Permalink Blame History

4. Anti-Patterns Identified

Testing Implementation instead of Behavior (Over-Mocking):
- Section 3.1 & 3.4: MockKeyCache and MockKeyStore are used to test auth middleware. Mocks hide serialization overhead, connection pool limits, and timeout behavior. A Redis cache test should hit an ephemeral Redis instance via Testcontainers.
Brittle Security Tests:
- Section 3.4 (provider_credential_is_stored_encrypted): The assertion assert_ne!(stored.encrypted_key, b"sk-plaintext-key") and checking the length is a classic anti-pattern. This test passes if you accidentally overwrite the key with random garbage or an empty GCM nonce. You must write a test that decrypts the key and verifies it matches the original plaintext, using the actual KMS envelope decryption flow.
Missing Concurrency in Circuit Breaker Tests:
- Section 3.2 & 4.1: The Redis-backed circuit breaker test circuit_breaker_state_is_shared_across_two_proxy_instances tests manual force_open, but not concurrent threshold tripping. What if 50 requests fail simultaneously across 5 proxy instances? Does the circuit breaker transition cleanly or enter a race condition?

5. Transparent Factory Gaps

Section 8 attempts to map tests to the 5 tenets, but misses critical scenarios:

Atomic Flagging (Story 10.1): Tests check off/on/auto-disable. Gap: What happens when the flag provider is unreachable? Is the default fallback to passthrough safe? No test asserts this fail-open/fail-closed behavior.
Elastic Schema (Story 10.2): Tests DDL regex. Gap: No tests for long-running transaction locks during migrations. A test must verify that ALTER TABLE concurrent index creation doesn't block proxy auth reads.
Semantic Observability (Story 10.4): Tests span emission. Gap: Context propagation. Does the trace ID correctly propagate from the incoming request (e.g., traceparent header) to the downstream LLM provider? If the client passes a trace ID, the proxy must stitch it.
Configurable Autonomy (Story 10.5): Tests strict/audit/panic modes. Gap: Panic mode API security. Who can trigger panic mode? There is no test verifying that only an Owner/Admin can invoke the POST /admin/panic endpoint.

6. Performance Test Gaps

The latency budget (<5ms proxy overhead) is tested via criterion and k6, but the edge cases are completely ignored.

Missing Chaos Scenarios: Section 6.3 lists "Kill TimescaleDB" and "Kill Redis". It is missing "Slow DB". A dead DB fails fast (ECONNREFUSED). A slow DB (e.g., hanging connections, 5-second inserts) causes connection pool exhaustion and memory bloat. You need a tc netem delay test on the DB port to prove the bounded mpsc channel correctly drops telemetry without blocking the proxy hot-path.
Memory Fragmentation / Garbage Collection: The p99 latency target is highly sensitive to memory reallocation over time. A 60-second k6 test will not catch memory fragmentation. There must be a 24-hour soak test running at a low but sustained TPS to prove the <5ms SLA holds true in a long-lived process.

7. Missing Critical Test Scenarios

These specific scenarios are missing and represent severe risks to the product:

Strict Multi-Tenancy Cross-Talk: There is no explicit test proving Org A cannot access Org B's routing rules, API keys, or request history via the Dashboard API. An IDOR test suite is mandatory.
SSE Connection Drops (Billing Leak): What happens if a client drops the connection mid-stream? Does the proxy continue downloading from the provider, or does it abort the upstream request? Does the customer get billed for the partial tokens received so far? No test covers this financial risk.
KMS Key Rotation Verification: No test verifies that the KMS DEK envelope encryption actually rotates properly, or that old DEKs can still decrypt old keys after a rotation event.
Rate Limiter Burst/Race Condition: The Redis rate limiter test uses a single loop. It needs a concurrent load test to ensure the atomic INCR commands enforce limits correctly under a massive traffic burst without allowing overage.

8. Recommendations (Top 5 Prioritized Improvements)

Ditch Mockall for Hot-Path Integration Tests: Remove MockKeyCache and MockKeyStore from the proxy auth and routing tests. Replace them with Testcontainers Redis/Postgres. For a <5ms SLA proxy, testing the implementation of a mock is useless. You must test the real DB driver latency and connection pool behavior. (Reference: Section 3.1 & 4.1).
Add a "Slow Dependency" Chaos Test: Add a test that injects a 5-second artificial network delay to TimescaleDB and Redis. Verify that the proxy degrades gracefully (dropping telemetry, falling back to PG for auth) without blocking the main event loop or violating the <5ms SLA. (Reference: Section 6.3).
TDD Provider Translation via Fixtures: Mandate test-first development for the OpenAI/Anthropic translation layer using the official JSON schemas and recorded fixtures before implementation. Do not write translators "test-after". (Reference: Section 1.2 & Epic 1).
Implement an Explicit IDOR Test Suite: Add a test suite for the Dashboard API that specifically authenticates as Org A and attempts to fetch/mutate resources belonging to Org B (API keys, routing rules, telemetry). (Reference: Section 3.4 & Epic 4).
Test Decryption, Not Just Encryption: Rewrite the provider_credential_is_stored_encrypted test. It currently passes if the DB stores random garbage. It must encrypt the key, store it, fetch it, decrypt it via the KMS flow, and assert it matches the original plaintext. (Reference: Section 3.4).

Review generated by TDD Review Subagent. Ensure all critical paths are addressed before approving the V1 Architecture.

5.8 KiB Raw Permalink Blame History