P1 route: Gemini — 'Ship the proxy, stop writing tests for the tests' P2 drift: Gemini — mTLS revocation, state lock corruption, RLS pool leak P3 alert: Gemini — replay attacks, trace propagation, SQS claim-check P4 portal: Manual — discovery reliability is existential risk P5 cost: Manual — concurrent baselines, remediation RBAC, pricing staleness P6 run: Gemini — policy update loophole, AST parsing, audit streaming
3.3 KiB
dd0c/route — BMad Code Review
Reviewer: BMad Code Review Agent (Gemini) Date: March 1, 2026 Verdict: "Ship the proxy. Stop writing tests for the tests."
The Reality Check
Completeness
10 epics covered with property-based tests, but missing:
- Provider failovers and token math. What happens when OpenAI gives 529 and Anthropic takes 40s to timeout? Where are edge-case tokenization tests (weird unicode in
cl100k_basevs Anthropic's tokenizer) that will screw up billing?
Realism
Can a solo founder implement this? Absolutely not as written. 2,553 lines of test architecture for a V1 proxy is top-tier procrastination. Ruthlessly prioritize the hot path (routing & billing). Everything else is a distraction.
Anti-Patterns
- Replacing mocks with Testcontainers is correct, but Testcontainers are slow. If CI takes 20 minutes per PR, you'll stop running tests.
- Testing Dashboard UI via E2E is a trap. UI churns daily in V1. Those tests will flake and get ignored.
Severity-Rated Findings
🔴 Critical
-
Latency benchmarks (<5ms) in shared CI. GitHub Actions has variable CPU/IO. Latency tests will flake constantly. Move to dedicated runners or drop hard thresholds in CI.
-
No tests for API key redaction in logging/panic traces. The proxy handles raw OpenAI/Anthropic keys. A panic in axum handlers could dump the bearer token into TimescaleDB or application logs. This is a security incident waiting to happen.
🟡 Important
-
Token calculation edge cases and provider failover timeouts missing from proxy engine epic.
-
Self-hosted mode needs memory-constraint and custom CA cert validation. Enterprise users behind corporate firewalls with SSL MITM. Postgres and TimescaleDB will eat all RAM if not tuned — need a test verifying containers start with strict
deploy.resources.limits.memory. -
Transparent Factory tenet tests are checkbox theater. Checking
assert(span.name == "proxy_request")is useless. Tests must validate spans containtenant_idandcost_usdattributes needed for the billing pipeline.
🟢 Nice-to-Have
- Chaos tests (slow DB/Redis). Cool story, unnecessary for V1.
V1 Cut List (Skip for MVP)
- Dashboard UI E2E Tests — Axe them. Test the API thoroughly. If the UI breaks, you'll see it.
- Chaos Tests (Slow DB/Redis) — If Redis is slow, your proxy will be slow. Accept it for V1.
- OTEL Trace Propagation Tests — Visually verify in Jaeger/Honeycomb once and move on.
- 24-hour Soak Test — Your users are your soak test.
- Slack Integration Epic — Manually test it.
Must-Have Before Launch
- SSE Connection Drop Billing Leak Tests — If a user drops an SSE stream mid-generation and you don't bill for tokens already streamed, you go bankrupt. Biggest financial risk.
- IDOR & Tenant Isolation — Cross the streams on user data or billing, you're dead.
- Cost Calculation Property Tests — Random payloads, verify
prompt_tokens * rate + comp_tokens * ratealways holds. - Testcontainer Hot Path — Full integration:
Request → Axum → Redis (rate limit) → Provider → Axum → Postgres (billing)using real containers. - API Key Redaction — Prove that panics, error logs, and telemetry never contain raw provider API keys.
"Ship the proxy. Stop writing tests for the tests."