BMad code reviews complete for all 6 products
P1 route: Gemini — 'Ship the proxy, stop writing tests for the tests' P2 drift: Gemini — mTLS revocation, state lock corruption, RLS pool leak P3 alert: Gemini — replay attacks, trace propagation, SQS claim-check P4 portal: Manual — discovery reliability is existential risk P5 cost: Manual — concurrent baselines, remediation RBAC, pricing staleness P6 run: Gemini — policy update loophole, AST parsing, audit streaming
This commit is contained in:
64
products/01-llm-cost-router/test-architecture/bmad-review.md
Normal file
64
products/01-llm-cost-router/test-architecture/bmad-review.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# dd0c/route — BMad Code Review
|
||||
|
||||
**Reviewer:** BMad Code Review Agent (Gemini)
|
||||
**Date:** March 1, 2026
|
||||
**Verdict:** "Ship the proxy. Stop writing tests for the tests."
|
||||
|
||||
---
|
||||
|
||||
## The Reality Check
|
||||
|
||||
### Completeness
|
||||
10 epics covered with property-based tests, but missing:
|
||||
- **Provider failovers and token math.** What happens when OpenAI gives 529 and Anthropic takes 40s to timeout? Where are edge-case tokenization tests (weird unicode in `cl100k_base` vs Anthropic's tokenizer) that will screw up billing?
|
||||
|
||||
### Realism
|
||||
Can a solo founder implement this? **Absolutely not** as written. 2,553 lines of test architecture for a V1 proxy is top-tier procrastination. Ruthlessly prioritize the hot path (routing & billing). Everything else is a distraction.
|
||||
|
||||
### Anti-Patterns
|
||||
- Replacing mocks with Testcontainers is correct, but Testcontainers are slow. If CI takes 20 minutes per PR, you'll stop running tests.
|
||||
- Testing Dashboard UI via E2E is a trap. UI churns daily in V1. Those tests will flake and get ignored.
|
||||
|
||||
---
|
||||
|
||||
## Severity-Rated Findings
|
||||
|
||||
### 🔴 Critical
|
||||
|
||||
1. **Latency benchmarks (<5ms) in shared CI.** GitHub Actions has variable CPU/IO. Latency tests will flake constantly. Move to dedicated runners or drop hard thresholds in CI.
|
||||
|
||||
2. **No tests for API key redaction in logging/panic traces.** The proxy handles raw OpenAI/Anthropic keys. A panic in axum handlers could dump the bearer token into TimescaleDB or application logs. This is a security incident waiting to happen.
|
||||
|
||||
### 🟡 Important
|
||||
|
||||
3. **Token calculation edge cases and provider failover timeouts** missing from proxy engine epic.
|
||||
|
||||
4. **Self-hosted mode needs memory-constraint and custom CA cert validation.** Enterprise users behind corporate firewalls with SSL MITM. Postgres and TimescaleDB will eat all RAM if not tuned — need a test verifying containers start with strict `deploy.resources.limits.memory`.
|
||||
|
||||
5. **Transparent Factory tenet tests are checkbox theater.** Checking `assert(span.name == "proxy_request")` is useless. Tests must validate spans contain `tenant_id` and `cost_usd` attributes needed for the billing pipeline.
|
||||
|
||||
### 🟢 Nice-to-Have
|
||||
|
||||
6. **Chaos tests (slow DB/Redis).** Cool story, unnecessary for V1.
|
||||
|
||||
---
|
||||
|
||||
## V1 Cut List (Skip for MVP)
|
||||
|
||||
- **Dashboard UI E2E Tests** — Axe them. Test the API thoroughly. If the UI breaks, you'll see it.
|
||||
- **Chaos Tests (Slow DB/Redis)** — If Redis is slow, your proxy will be slow. Accept it for V1.
|
||||
- **OTEL Trace Propagation Tests** — Visually verify in Jaeger/Honeycomb once and move on.
|
||||
- **24-hour Soak Test** — Your users are your soak test.
|
||||
- **Slack Integration Epic** — Manually test it.
|
||||
|
||||
## Must-Have Before Launch
|
||||
|
||||
1. **SSE Connection Drop Billing Leak Tests** — If a user drops an SSE stream mid-generation and you don't bill for tokens already streamed, you go bankrupt. Biggest financial risk.
|
||||
2. **IDOR & Tenant Isolation** — Cross the streams on user data or billing, you're dead.
|
||||
3. **Cost Calculation Property Tests** — Random payloads, verify `prompt_tokens * rate + comp_tokens * rate` always holds.
|
||||
4. **Testcontainer Hot Path** — Full integration: `Request → Axum → Redis (rate limit) → Provider → Axum → Postgres (billing)` using real containers.
|
||||
5. **API Key Redaction** — Prove that panics, error logs, and telemetry never contain raw provider API keys.
|
||||
|
||||
---
|
||||
|
||||
*"Ship the proxy. Stop writing tests for the tests."*
|
||||
Reference in New Issue
Block a user