2242 lines
83 KiB
Markdown
2242 lines
83 KiB
Markdown
|
|
# dd0c/route — Test Architecture & TDD Strategy
|
||
|
|
|
||
|
|
**Product:** dd0c/route — LLM Cost Router & Optimization Dashboard
|
||
|
|
**Author:** Test Architecture Phase
|
||
|
|
**Date:** February 28, 2026
|
||
|
|
**Status:** V1 MVP — Solo Founder Scope
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 1: Testing Philosophy & TDD Workflow
|
||
|
|
|
||
|
|
### 1.1 Core Philosophy
|
||
|
|
|
||
|
|
dd0c/route is a **latency-sensitive proxy** with correctness requirements that compound: a wrong routing decision costs money, a wrong cost calculation misleads customers, and a wrong auth check is a security incident. Tests are not optional — they are the specification.
|
||
|
|
|
||
|
|
The guiding principle: **tests describe behavior, not implementation**. A test that breaks when you rename a private function is a bad test. A test that breaks when you accidentally route a complex request to a cheap model is a good test.
|
||
|
|
|
||
|
|
For a solo founder, the test suite is also the **second developer** — it catches regressions when Brian is moving fast and hasn't slept enough.
|
||
|
|
|
||
|
|
### 1.2 Red-Green-Refactor Adapted to dd0c
|
||
|
|
|
||
|
|
The standard TDD cycle applies, but with product-specific adaptations:
|
||
|
|
|
||
|
|
```
|
||
|
|
RED → Write a failing test that describes the desired behavior
|
||
|
|
(e.g., "a request tagged feature=classify should route to gpt-4o-mini")
|
||
|
|
|
||
|
|
GREEN → Write the minimum code to make it pass
|
||
|
|
(no premature optimization — just make it work)
|
||
|
|
|
||
|
|
REFACTOR → Clean up without breaking tests
|
||
|
|
(extract the complexity classifier into its own module,
|
||
|
|
add the proptest property suite, optimize the hot path)
|
||
|
|
```
|
||
|
|
|
||
|
|
**When to write tests first (strict TDD):**
|
||
|
|
- All Router Brain logic (routing rules, complexity classifier, cost calculations)
|
||
|
|
- All auth/security paths (key validation, JWT issuance, RBAC checks)
|
||
|
|
- All cost calculation formulas (cost_saved = cost_original - cost_actual)
|
||
|
|
- All circuit breaker state transitions
|
||
|
|
- All schema migration validators
|
||
|
|
|
||
|
|
**When integration tests lead (test-after, then harden):**
|
||
|
|
- Provider format translation (OpenAI ↔ Anthropic) — write the translator, then write contract tests against real response fixtures
|
||
|
|
- TimescaleDB continuous aggregate queries — create the schema, run queries against Testcontainers, then lock in the behavior
|
||
|
|
- SSE streaming passthrough — implement the stream relay, then write tests that assert chunk ordering and [DONE] handling
|
||
|
|
|
||
|
|
**When E2E tests lead:**
|
||
|
|
- The "first route" onboarding journey — define the happy path first, then build backward
|
||
|
|
- The Shadow Audit CLI output format — define the expected terminal output, then build the parser
|
||
|
|
|
||
|
|
### 1.3 Test Naming Conventions
|
||
|
|
|
||
|
|
All tests follow the **Given-When-Then** naming pattern expressed as a single descriptive string:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// Rust unit tests
|
||
|
|
#[test]
|
||
|
|
fn complexity_classifier_returns_low_for_short_extraction_prompts() { ... }
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn router_selects_cheapest_model_when_strategy_is_cheapest_and_complexity_is_low() { ... }
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn circuit_breaker_opens_after_threshold_error_rate_exceeded() { ... }
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn cost_calculation_returns_zero_savings_when_requested_and_used_model_are_identical() { ... }
|
||
|
|
```
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// Integration tests (in tests/ directory)
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_forwards_streaming_request_to_openai_and_returns_sse_chunks() { ... }
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_returns_401_when_api_key_is_revoked() { ... }
|
||
|
|
```
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// TypeScript (Dashboard UI / CLI)
|
||
|
|
describe('CostTreemap', () => {
|
||
|
|
it('renders spend breakdown by feature tag when data is loaded', () => { ... });
|
||
|
|
it('shows empty state when no requests exist for the selected period', () => { ... });
|
||
|
|
});
|
||
|
|
|
||
|
|
describe('dd0c-scan CLI', () => {
|
||
|
|
it('detects gpt-4o usage in TypeScript files and estimates monthly cost', () => { ... });
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
**Rules:**
|
||
|
|
- No `test_` prefix in Rust (redundant inside `#[cfg(test)]`)
|
||
|
|
- No `should_` prefix (verbose, adds no information)
|
||
|
|
- Use `_` as word separator in Rust, camelCase in TypeScript
|
||
|
|
- Name describes the **observable outcome**, not the internal mechanism
|
||
|
|
- If you can't name the test without saying "and", split it into two tests
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 2: Test Pyramid
|
||
|
|
|
||
|
|
### 2.1 Recommended Ratio
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────┐
|
||
|
|
│ E2E / Smoke │ ~5% (~20 tests)
|
||
|
|
│ (Playwright, │
|
||
|
|
│ k6 journeys) │
|
||
|
|
─┴─────────────────┴─
|
||
|
|
┌───────────────────────┐
|
||
|
|
│ Integration Tests │ ~20% (~80 tests)
|
||
|
|
│ (Testcontainers, │
|
||
|
|
│ contract tests) │
|
||
|
|
─┴───────────────────────┴─
|
||
|
|
┌─────────────────────────────┐
|
||
|
|
│ Unit Tests │ ~75% (~300 tests)
|
||
|
|
│ (#[cfg(test)], proptest, │
|
||
|
|
│ mockall, vitest) │
|
||
|
|
└─────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
**Target: ~400 tests at V1 launch.** Fast feedback loop is more valuable than exhaustive coverage at this stage. Every test must run in CI in under 5 minutes total.
|
||
|
|
|
||
|
|
### 2.2 Unit Test Targets (per component)
|
||
|
|
|
||
|
|
| Component | Target Test Count | Key Focus |
|
||
|
|
|-----------|------------------|-----------|
|
||
|
|
| Router Brain (rule engine) | ~60 | Rule matching, strategy execution, edge cases |
|
||
|
|
| Complexity Classifier | ~40 | Token count thresholds, regex patterns, confidence scores |
|
||
|
|
| Cost Calculator | ~30 | Formula correctness, precision, zero-savings edge cases |
|
||
|
|
| Circuit Breaker | ~25 | State transitions, threshold logic, Redis key format |
|
||
|
|
| Auth (key validation, JWT) | ~30 | Valid/invalid/revoked keys, JWT claims, RBAC |
|
||
|
|
| Provider Translators | ~30 | OpenAI↔Anthropic format mapping, streaming chunks |
|
||
|
|
| Analytics Pipeline (batch logic) | ~20 | Batching thresholds, flush triggers, error handling |
|
||
|
|
| Dashboard API handlers | ~40 | Request validation, response shape, error codes |
|
||
|
|
| Shadow Audit CLI parser | ~25 | File detection, token estimation, report formatting |
|
||
|
|
| **Total** | **~300** | |
|
||
|
|
|
||
|
|
### 2.3 Integration Test Boundaries
|
||
|
|
|
||
|
|
| Boundary | Test Type | Tool |
|
||
|
|
|----------|-----------|------|
|
||
|
|
| Proxy → TimescaleDB | DB integration | Testcontainers (TimescaleDB image) |
|
||
|
|
| Proxy → Redis | Cache integration | Testcontainers (Redis image) |
|
||
|
|
| Proxy → PostgreSQL | DB integration | Testcontainers (PostgreSQL image) |
|
||
|
|
| Proxy → OpenAI API | Contract test | Recorded fixtures (no live calls in CI) |
|
||
|
|
| Proxy → Anthropic API | Contract test | Recorded fixtures |
|
||
|
|
| Dashboard API → PostgreSQL | DB integration | Testcontainers |
|
||
|
|
| Dashboard API → TimescaleDB | DB integration | Testcontainers |
|
||
|
|
| Worker → TimescaleDB | DB integration | Testcontainers |
|
||
|
|
| Worker → SES | Mock integration | `wiremock-rs` |
|
||
|
|
| Worker → Slack webhooks | Mock integration | `wiremock-rs` |
|
||
|
|
|
||
|
|
### 2.4 E2E / Smoke Test Scenarios
|
||
|
|
|
||
|
|
| Scenario | Priority | Tool |
|
||
|
|
|----------|----------|------|
|
||
|
|
| New user signs up via GitHub OAuth and gets API key | P0 | Playwright |
|
||
|
|
| Developer swaps base URL and first request routes correctly | P0 | curl / k6 |
|
||
|
|
| Routing rule created in UI takes effect on next proxy request | P0 | Playwright + k6 |
|
||
|
|
| Budget alert fires when threshold is crossed | P1 | k6 + webhook receiver |
|
||
|
|
| `npx dd0c-scan` runs on sample repo and produces report | P1 | Node.js test runner |
|
||
|
|
| Dashboard treemap renders after 100 synthetic requests | P1 | Playwright |
|
||
|
|
| Proxy continues routing when TimescaleDB is unavailable | P0 | Chaos (kill container) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 3: Unit Test Strategy (Per Component)
|
||
|
|
|
||
|
|
### 3.1 Proxy Engine (`crates/proxy`)
|
||
|
|
|
||
|
|
**What to test:**
|
||
|
|
- Request parsing: extraction of model, messages, headers, stream flag
|
||
|
|
- Auth middleware: Redis cache hit, cache miss → PG fallback, revoked key, malformed key
|
||
|
|
- Response header injection: `X-DD0C-Model`, `X-DD0C-Cost`, `X-DD0C-Saved` values
|
||
|
|
- SSE chunk passthrough: ordering, `[DONE]` detection, token count extraction from final chunk
|
||
|
|
- Graceful degradation: telemetry channel full → drop event, don't block request
|
||
|
|
- Rate limiting: per-key counter increment, 429 response when exceeded
|
||
|
|
|
||
|
|
**Key test cases:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[cfg(test)]
|
||
|
|
mod proxy_tests {
|
||
|
|
use super::*;
|
||
|
|
use mockall::predicate::*;
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn parse_request_extracts_model_and_stream_flag() {
|
||
|
|
let body = r#"{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}],"stream":true}"#;
|
||
|
|
let req = ProxyRequest::parse(body).unwrap();
|
||
|
|
assert_eq!(req.model, "gpt-4o");
|
||
|
|
assert!(req.stream);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn parse_request_extracts_dd0c_feature_tag_from_headers() {
|
||
|
|
let headers = make_headers([("X-DD0C-Feature", "classify")]);
|
||
|
|
let tags = extract_tags(&headers);
|
||
|
|
assert_eq!(tags.feature, Some("classify".to_string()));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn auth_middleware_returns_401_for_unknown_key() {
|
||
|
|
let mut mock_cache = MockKeyCache::new();
|
||
|
|
mock_cache.expect_get().returning(|_| Ok(None));
|
||
|
|
let mut mock_db = MockKeyStore::new();
|
||
|
|
mock_db.expect_lookup().returning(|_| Ok(None));
|
||
|
|
|
||
|
|
let result = validate_api_key("dd0c_sk_live_unknown", &mock_cache, &mock_db).await;
|
||
|
|
assert_eq!(result, Err(AuthError::InvalidKey));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn auth_middleware_caches_valid_key_after_db_lookup() {
|
||
|
|
let mut mock_cache = MockKeyCache::new();
|
||
|
|
mock_cache.expect_get().returning(|_| Ok(None));
|
||
|
|
mock_cache.expect_set().times(1).returning(|_, _| Ok(()));
|
||
|
|
let mut mock_db = MockKeyStore::new();
|
||
|
|
mock_db.expect_lookup().returning(|_| Ok(Some(make_api_key())));
|
||
|
|
|
||
|
|
validate_api_key("dd0c_sk_live_valid", &mock_cache, &mock_db).await.unwrap();
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn telemetry_emitter_drops_event_when_channel_is_full_without_blocking() {
|
||
|
|
let (tx, _rx) = tokio::sync::mpsc::channel(1);
|
||
|
|
tx.try_send(make_event()).unwrap(); // fill the channel
|
||
|
|
let result = try_emit_telemetry(&tx, make_event());
|
||
|
|
assert!(result.is_ok()); // graceful drop, no panic
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Mocking strategy:**
|
||
|
|
- `MockKeyCache` and `MockKeyStore` via `mockall` for auth tests
|
||
|
|
- `MockLlmProvider` for dispatch tests — returns canned responses without network
|
||
|
|
- Bounded `mpsc` channels to test backpressure behavior
|
||
|
|
|
||
|
|
**Property-based tests (`proptest`):**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
use proptest::prelude::*;
|
||
|
|
|
||
|
|
proptest! {
|
||
|
|
#[test]
|
||
|
|
fn api_key_hash_is_deterministic(key in "[a-zA-Z0-9]{32}") {
|
||
|
|
let h1 = hash_api_key(&key);
|
||
|
|
let h2 = hash_api_key(&key);
|
||
|
|
prop_assert_eq!(h1, h2);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn response_headers_never_contain_prompt_content(
|
||
|
|
prompt in ".{1,500}",
|
||
|
|
model in "gpt-4o|gpt-4o-mini|claude-3-haiku"
|
||
|
|
) {
|
||
|
|
let headers = build_response_headers(&make_routing_decision(&model), &make_event());
|
||
|
|
for (_, value) in &headers {
|
||
|
|
prop_assert!(!value.to_str().unwrap_or("").contains(&prompt));
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 3.2 Router Brain (`crates/shared/router`)
|
||
|
|
|
||
|
|
This is the highest-value test target. Routing logic directly affects customer savings — bugs here cost money.
|
||
|
|
|
||
|
|
**What to test:**
|
||
|
|
- Rule matching: first-match-wins, tag matching, model matching, complexity matching
|
||
|
|
- Strategy execution: `passthrough`, `cheapest`, `quality_first`, `cascading`
|
||
|
|
- Budget enforcement: hard limit reached → throttle to cheapest or reject
|
||
|
|
- Complexity classifier: token count thresholds, regex pattern matching, confidence output
|
||
|
|
- Cost calculation: formula correctness, floating-point precision, zero-savings case
|
||
|
|
- Circuit breaker: CLOSED→OPEN→HALF_OPEN→CLOSED transitions, Redis key format
|
||
|
|
|
||
|
|
**Key test cases:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[cfg(test)]
|
||
|
|
mod router_tests {
|
||
|
|
use super::*;
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn rule_engine_returns_first_matching_rule_by_priority() {
|
||
|
|
let rules = vec![
|
||
|
|
make_rule(0, match_feature: "classify", strategy: Cheapest),
|
||
|
|
make_rule(1, match_feature: "classify", strategy: Passthrough),
|
||
|
|
];
|
||
|
|
let req = make_request_with_feature("classify");
|
||
|
|
let decision = evaluate_rules(&rules, &req, &cost_tables());
|
||
|
|
assert_eq!(decision.strategy, RoutingStrategy::Cheapest);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn rule_engine_falls_through_to_passthrough_when_no_rules_match() {
|
||
|
|
let rules = vec![make_rule(0, match_feature: "summarize", strategy: Cheapest)];
|
||
|
|
let req = make_request_with_feature("classify");
|
||
|
|
let decision = evaluate_rules(&rules, &req, &cost_tables());
|
||
|
|
assert_eq!(decision.strategy, RoutingStrategy::Passthrough);
|
||
|
|
assert_eq!(decision.target_model, req.model);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn cheapest_strategy_selects_lowest_cost_model_from_chain() {
|
||
|
|
let chain = vec!["gpt-4o", "gpt-4o-mini", "claude-3-haiku"];
|
||
|
|
let costs = cost_tables_with(&[
|
||
|
|
("gpt-4o", 2.50, 10.00),
|
||
|
|
("gpt-4o-mini", 0.15, 0.60),
|
||
|
|
("claude-3-haiku",0.25, 1.25),
|
||
|
|
]);
|
||
|
|
let model = select_cheapest(&chain, &costs, 500, 100);
|
||
|
|
assert_eq!(model, "gpt-4o-mini");
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn classifier_returns_low_for_short_extraction_system_prompt() {
|
||
|
|
let messages = vec![
|
||
|
|
system("Extract the sentiment. Reply with one word."),
|
||
|
|
user("The product is great!"),
|
||
|
|
];
|
||
|
|
let result = classify_complexity(&messages, "gpt-4o");
|
||
|
|
assert_eq!(result.level, ComplexityLevel::Low);
|
||
|
|
assert!(result.confidence > 0.7);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn classifier_returns_high_for_code_generation_prompt() {
|
||
|
|
let messages = vec![
|
||
|
|
system("You are an expert software engineer. Write production-quality code."),
|
||
|
|
user("Implement a binary search tree with insertion, deletion, and traversal."),
|
||
|
|
];
|
||
|
|
let result = classify_complexity(&messages, "gpt-4o");
|
||
|
|
assert_eq!(result.level, ComplexityLevel::High);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn cost_saved_is_zero_when_requested_and_used_model_are_identical() {
|
||
|
|
let event = make_event("gpt-4o-mini", "gpt-4o-mini", 1000, 200);
|
||
|
|
assert_eq!(calculate_cost_saved(&event, &cost_tables()), 0.0);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn cost_saved_is_positive_when_routed_to_cheaper_model() {
|
||
|
|
let costs = cost_tables_with(&[
|
||
|
|
("gpt-4o", 2.50, 10.00),
|
||
|
|
("gpt-4o-mini", 0.15, 0.60),
|
||
|
|
]);
|
||
|
|
let event = make_event("gpt-4o", "gpt-4o-mini", 1_000_000, 200_000);
|
||
|
|
let saved = calculate_cost_saved(&event, &costs);
|
||
|
|
// (2.50-0.15)*1 + (10.00-0.60)*0.2 = 2.35 + 1.88 = 4.23
|
||
|
|
assert!((saved - 4.23).abs() < 0.01);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn circuit_breaker_transitions_to_open_after_error_threshold() {
|
||
|
|
let mut cb = CircuitBreaker::new(threshold: 0.10, window_secs: 60);
|
||
|
|
for _ in 0..9 { cb.record_success(); }
|
||
|
|
cb.record_failure(); // 10% error rate — exactly at threshold
|
||
|
|
assert_eq!(cb.state(), CircuitState::Open);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn circuit_breaker_transitions_to_half_open_after_cooldown() {
|
||
|
|
let mut cb = CircuitBreaker::new(threshold: 0.10, window_secs: 60);
|
||
|
|
cb.force_open();
|
||
|
|
cb.advance_time(Duration::from_secs(31));
|
||
|
|
assert_eq!(cb.state(), CircuitState::HalfOpen);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Property-based tests:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
proptest! {
|
||
|
|
#[test]
|
||
|
|
fn cheapest_strategy_never_selects_more_expensive_model(
|
||
|
|
input_tokens in 1u32..1_000_000u32,
|
||
|
|
output_tokens in 1u32..100_000u32,
|
||
|
|
) {
|
||
|
|
let chain = vec!["gpt-4o", "gpt-4o-mini", "claude-3-haiku"];
|
||
|
|
let costs = cost_tables();
|
||
|
|
let selected = select_cheapest(&chain, &costs, input_tokens, output_tokens);
|
||
|
|
let selected_cost = compute_cost(&selected, input_tokens, output_tokens, &costs);
|
||
|
|
for model in &chain {
|
||
|
|
let model_cost = compute_cost(model, input_tokens, output_tokens, &costs);
|
||
|
|
prop_assert!(selected_cost <= model_cost);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn complexity_classifier_never_panics_on_arbitrary_input(
|
||
|
|
system_prompt in ".*",
|
||
|
|
user_message in ".*",
|
||
|
|
model in "gpt-4o|gpt-4o-mini|claude-3-haiku",
|
||
|
|
) {
|
||
|
|
let messages = vec![system(&system_prompt), user(&user_message)];
|
||
|
|
let result = classify_complexity(&messages, &model);
|
||
|
|
prop_assert!(result.confidence >= 0.0 && result.confidence <= 1.0);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn cost_saved_is_never_negative(
|
||
|
|
input_tokens in 1u32..1_000_000u32,
|
||
|
|
output_tokens in 1u32..100_000u32,
|
||
|
|
) {
|
||
|
|
let costs = cost_tables();
|
||
|
|
for (requested, used) in routable_model_pairs() {
|
||
|
|
let event = make_event(requested, used, input_tokens, output_tokens);
|
||
|
|
prop_assert!(calculate_cost_saved(&event, &costs) >= 0.0);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 3.3 Analytics Pipeline (telemetry worker)
|
||
|
|
|
||
|
|
**What to test:**
|
||
|
|
- Batch collector flushes at 100 events OR 1 second, whichever comes first
|
||
|
|
- Handles worker panic without losing buffered events (bounded channel survives)
|
||
|
|
- `RequestEvent` serializes correctly to PostgreSQL COPY format
|
||
|
|
- Graceful degradation: DB unavailable → events dropped, proxy continues unaffected
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[tokio::test]
|
||
|
|
async fn batch_collector_flushes_after_100_events_before_timeout() {
|
||
|
|
let (tx, rx) = mpsc::channel(1000);
|
||
|
|
let flush_count = Arc::new(AtomicU32::new(0));
|
||
|
|
let mock_db = MockTelemetryDb::counting(flush_count.clone());
|
||
|
|
let worker = spawn_batch_worker(rx, mock_db, 100, Duration::from_secs(10));
|
||
|
|
|
||
|
|
for _ in 0..100 { tx.send(make_event()).await.unwrap(); }
|
||
|
|
tokio::time::sleep(Duration::from_millis(50)).await;
|
||
|
|
|
||
|
|
assert_eq!(flush_count.load(Ordering::SeqCst), 1);
|
||
|
|
worker.abort();
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn batch_collector_flushes_partial_batch_after_interval() {
|
||
|
|
let (tx, rx) = mpsc::channel(1000);
|
||
|
|
let flush_count = Arc::new(AtomicU32::new(0));
|
||
|
|
let mock_db = MockTelemetryDb::counting(flush_count.clone());
|
||
|
|
let worker = spawn_batch_worker(rx, mock_db, 100, Duration::from_secs(1));
|
||
|
|
|
||
|
|
tx.send(make_event()).await.unwrap(); // only 1 event
|
||
|
|
tokio::time::sleep(Duration::from_millis(1100)).await;
|
||
|
|
|
||
|
|
assert_eq!(flush_count.load(Ordering::SeqCst), 1);
|
||
|
|
worker.abort();
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_continues_routing_when_telemetry_db_is_unavailable() {
|
||
|
|
let failing_db = MockTelemetryDb::always_failing();
|
||
|
|
let (tx, rx) = mpsc::channel(1000);
|
||
|
|
spawn_batch_worker(rx, failing_db, 1, Duration::from_millis(10));
|
||
|
|
|
||
|
|
// Proxy should still be able to send events without blocking
|
||
|
|
for _ in 0..200 {
|
||
|
|
let _ = tx.try_send(make_event()); // may drop when full — that's fine
|
||
|
|
}
|
||
|
|
// No panic, no deadlock
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 3.4 Dashboard API (`crates/api`)
|
||
|
|
|
||
|
|
**What to test:**
|
||
|
|
- GitHub OAuth: state parameter validation, code exchange, user upsert
|
||
|
|
- JWT issuance: claims (sub, org_id, role, exp), RS256 signature verification
|
||
|
|
- RBAC: member cannot modify routing rules, owner can do everything
|
||
|
|
- API key CRUD: create returns full key once, list returns prefix only, revoke invalidates
|
||
|
|
- Provider credential encryption: stored value differs from plaintext input
|
||
|
|
- Request inspector: pagination cursor, filter application, no prompt content in response
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[tokio::test]
|
||
|
|
async fn create_api_key_returns_full_key_only_once() {
|
||
|
|
let app = test_app().await;
|
||
|
|
let resp = app.post("/api/orgs/test-org/keys")
|
||
|
|
.json(&json!({"name": "production"})).send().await;
|
||
|
|
|
||
|
|
assert_eq!(resp.status(), 201);
|
||
|
|
let body: Value = resp.json().await;
|
||
|
|
assert!(body["key"].as_str().unwrap().starts_with("dd0c_sk_live_"));
|
||
|
|
|
||
|
|
// Listing must NOT return the full key
|
||
|
|
let list: Value = app.get("/api/orgs/test-org/keys").send().await.json().await;
|
||
|
|
assert!(list["data"][0]["key"].is_null());
|
||
|
|
assert!(list["data"][0]["key_prefix"].as_str().unwrap().len() < 20);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn member_role_cannot_create_routing_rules() {
|
||
|
|
let app = test_app_with_role(Role::Member).await;
|
||
|
|
let resp = app.post("/api/orgs/test-org/routing/rules")
|
||
|
|
.json(&make_rule_payload()).send().await;
|
||
|
|
assert_eq!(resp.status(), 403);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn request_inspector_never_returns_prompt_content() {
|
||
|
|
let app = test_app_with_events(100).await;
|
||
|
|
let body: Value = app.get("/api/orgs/test-org/requests").send().await.json().await;
|
||
|
|
for event in body["data"].as_array().unwrap() {
|
||
|
|
assert!(event.get("messages").is_none());
|
||
|
|
assert!(event.get("prompt").is_none());
|
||
|
|
assert!(event.get("content").is_none());
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn provider_credential_is_stored_encrypted() {
|
||
|
|
let app = test_app().await;
|
||
|
|
app.put("/api/orgs/test-org/providers/openai")
|
||
|
|
.json(&json!({"api_key": "sk-plaintext-key"})).send().await;
|
||
|
|
|
||
|
|
let stored = fetch_raw_credential_from_db("test-org", "openai").await;
|
||
|
|
assert_ne!(stored.encrypted_key, b"sk-plaintext-key");
|
||
|
|
assert!(stored.encrypted_key.len() > 16); // has GCM nonce + ciphertext
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 3.5 Shadow Audit CLI (`cli/`)
|
||
|
|
|
||
|
|
**What to test:**
|
||
|
|
- File scanner detects OpenAI/Anthropic SDK usage in `.ts`, `.js`, `.py`
|
||
|
|
- Model extractor parses model string from SDK call arguments
|
||
|
|
- Token estimator produces non-zero estimate for non-empty prompts
|
||
|
|
- Report formatter includes savings percentage, top opportunities, sign-up CTA
|
||
|
|
- Offline mode works when pricing cache exists on disk
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
describe('FileScanner', () => {
|
||
|
|
it('detects openai SDK usage in TypeScript files', () => {
|
||
|
|
const code = `const r = await client.chat.completions.create({ model: 'gpt-4o' })`;
|
||
|
|
const calls = scanFile('service.ts', code);
|
||
|
|
expect(calls).toHaveLength(1);
|
||
|
|
expect(calls[0].model).toBe('gpt-4o');
|
||
|
|
});
|
||
|
|
|
||
|
|
it('detects anthropic SDK usage in Python files', () => {
|
||
|
|
const code = `client.messages.create(model="claude-3-opus-20240229")`;
|
||
|
|
const calls = scanFile('service.py', code);
|
||
|
|
expect(calls[0].model).toBe('claude-3-opus-20240229');
|
||
|
|
});
|
||
|
|
|
||
|
|
it('ignores commented-out SDK calls', () => {
|
||
|
|
const code = `// client.chat.completions.create({ model: 'gpt-4o' })`;
|
||
|
|
expect(scanFile('service.ts', code)).toHaveLength(0);
|
||
|
|
});
|
||
|
|
});
|
||
|
|
|
||
|
|
describe('SavingsReport', () => {
|
||
|
|
it('calculates positive savings when cheaper model is available', () => {
|
||
|
|
const calls = [{ model: 'gpt-4o', estimatedMonthlyTokens: 10_000_000 }];
|
||
|
|
const report = generateReport(calls, mockPricingTable);
|
||
|
|
expect(report.totalSavings).toBeGreaterThan(0);
|
||
|
|
expect(report.savingsPercentage).toBeGreaterThan(0);
|
||
|
|
});
|
||
|
|
|
||
|
|
it('includes sign-up CTA in formatted output', () => {
|
||
|
|
const output = formatReport(mockReport);
|
||
|
|
expect(output).toContain('route.dd0c.dev');
|
||
|
|
});
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 4: Integration Test Strategy
|
||
|
|
|
||
|
|
### 4.1 Service Boundary Tests
|
||
|
|
|
||
|
|
Integration tests live in `tests/` at the crate root and use **Testcontainers** to spin up real dependencies. No mocks at the service boundary — if it talks to a database, it talks to a real one.
|
||
|
|
|
||
|
|
**Dependency:** `testcontainers` crate + Docker daemon in CI.
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# Cargo.toml (dev-dependencies)
|
||
|
|
[dev-dependencies]
|
||
|
|
testcontainers = "0.15"
|
||
|
|
testcontainers-modules = { version = "0.3", features = ["postgres", "redis"] }
|
||
|
|
tokio = { version = "1", features = ["full", "test-util"] }
|
||
|
|
wiremock = "0.6"
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Proxy ↔ TimescaleDB
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/analytics_integration.rs
|
||
|
|
use testcontainers::clients::Cli;
|
||
|
|
use testcontainers_modules::postgres::Postgres;
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn batch_worker_inserts_events_into_timescaledb_hypertable() {
|
||
|
|
let docker = Cli::default();
|
||
|
|
let pg = docker.run(Postgres::default().with_tag("15-alpine"));
|
||
|
|
let db_url = format!("postgres://postgres:postgres@localhost:{}/postgres", pg.get_host_port_ipv4(5432));
|
||
|
|
|
||
|
|
run_migrations(&db_url).await;
|
||
|
|
enable_timescaledb(&db_url).await;
|
||
|
|
|
||
|
|
let (tx, rx) = mpsc::channel(100);
|
||
|
|
let worker = spawn_batch_worker(rx, db_url.clone(), 10, Duration::from_millis(100));
|
||
|
|
|
||
|
|
for _ in 0..10 {
|
||
|
|
tx.send(make_event()).await.unwrap();
|
||
|
|
}
|
||
|
|
tokio::time::sleep(Duration::from_millis(200)).await;
|
||
|
|
|
||
|
|
let count: i64 = sqlx::query_scalar("SELECT COUNT(*) FROM request_events")
|
||
|
|
.fetch_one(&pool(&db_url).await).await.unwrap();
|
||
|
|
assert_eq!(count, 10);
|
||
|
|
worker.abort();
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn continuous_aggregate_reflects_inserted_events_after_refresh() {
|
||
|
|
// ... setup TimescaleDB, insert 100 events, trigger aggregate refresh,
|
||
|
|
// assert hourly_cost_summary has correct totals
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Proxy ↔ Redis
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/cache_integration.rs
|
||
|
|
use testcontainers_modules::redis::Redis;
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn api_key_cache_stores_and_retrieves_key_within_ttl() {
|
||
|
|
let docker = Cli::default();
|
||
|
|
let redis = docker.run(Redis::default());
|
||
|
|
let client = connect_redis(redis.get_host_port_ipv4(6379)).await;
|
||
|
|
|
||
|
|
let key = make_api_key();
|
||
|
|
cache_api_key(&client, &key, Duration::from_secs(60)).await.unwrap();
|
||
|
|
|
||
|
|
let retrieved = get_cached_key(&client, &key.hash).await.unwrap();
|
||
|
|
assert_eq!(retrieved.unwrap().org_id, key.org_id);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn circuit_breaker_state_is_shared_across_two_proxy_instances() {
|
||
|
|
let docker = Cli::default();
|
||
|
|
let redis = docker.run(Redis::default());
|
||
|
|
let client1 = connect_redis(redis.get_host_port_ipv4(6379)).await;
|
||
|
|
let client2 = connect_redis(redis.get_host_port_ipv4(6379)).await;
|
||
|
|
|
||
|
|
let cb1 = RedisCircuitBreaker::new("openai", client1);
|
||
|
|
let cb2 = RedisCircuitBreaker::new("openai", client2);
|
||
|
|
|
||
|
|
cb1.force_open().await.unwrap();
|
||
|
|
|
||
|
|
// Instance 2 should see the open circuit set by instance 1
|
||
|
|
assert_eq!(cb2.state().await.unwrap(), CircuitState::Open);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn rate_limit_counter_increments_and_enforces_limit() {
|
||
|
|
let docker = Cli::default();
|
||
|
|
let redis = docker.run(Redis::default());
|
||
|
|
let client = connect_redis(redis.get_host_port_ipv4(6379)).await;
|
||
|
|
|
||
|
|
let limiter = RateLimiter::new(client, limit: 5, window: Duration::from_secs(60));
|
||
|
|
for _ in 0..5 {
|
||
|
|
assert!(limiter.check_and_increment("key_abc").await.unwrap());
|
||
|
|
}
|
||
|
|
// 6th request should be rejected
|
||
|
|
assert!(!limiter.check_and_increment("key_abc").await.unwrap());
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Dashboard API ↔ PostgreSQL
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/api_db_integration.rs
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn create_org_and_api_key_persists_to_postgres() {
|
||
|
|
let docker = Cli::default();
|
||
|
|
let pg = docker.run(Postgres::default());
|
||
|
|
let pool = setup_test_db(pg.get_host_port_ipv4(5432)).await;
|
||
|
|
|
||
|
|
let org = create_organization(&pool, "Acme Corp").await.unwrap();
|
||
|
|
let (key, raw) = create_api_key(&pool, org.id, "production").await.unwrap();
|
||
|
|
|
||
|
|
// Raw key is never stored
|
||
|
|
let stored: ApiKey = sqlx::query_as("SELECT * FROM api_keys WHERE id = $1")
|
||
|
|
.bind(key.id).fetch_one(&pool).await.unwrap();
|
||
|
|
assert_ne!(stored.key_hash, raw); // hash != raw key
|
||
|
|
assert!(stored.key_prefix.starts_with("dd0c_sk_"));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn routing_rules_are_returned_in_priority_order() {
|
||
|
|
let pool = test_pool().await;
|
||
|
|
let org_id = seed_org(&pool).await;
|
||
|
|
|
||
|
|
insert_rule(&pool, org_id, priority: 10, name: "low priority").await;
|
||
|
|
insert_rule(&pool, org_id, priority: 1, name: "high priority").await;
|
||
|
|
insert_rule(&pool, org_id, priority: 5, name: "mid priority").await;
|
||
|
|
|
||
|
|
let rules = get_routing_rules(&pool, org_id).await.unwrap();
|
||
|
|
assert_eq!(rules[0].name, "high priority");
|
||
|
|
assert_eq!(rules[1].name, "mid priority");
|
||
|
|
assert_eq!(rules[2].name, "low priority");
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 4.2 Contract Tests for OpenAI API Compatibility
|
||
|
|
|
||
|
|
The proxy's core promise is drop-in OpenAI compatibility. Contract tests verify this using **recorded fixtures** — real OpenAI/Anthropic responses captured once and replayed in CI without live API calls.
|
||
|
|
|
||
|
|
**Fixture capture workflow:**
|
||
|
|
1. Run `cargo test --features=record-fixtures` once against live APIs (requires real keys)
|
||
|
|
2. Fixtures saved to `tests/fixtures/openai/` and `tests/fixtures/anthropic/`
|
||
|
|
3. CI always uses recorded fixtures — no live API calls, no flakiness, no cost
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/contract_openai.rs
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_response_matches_openai_response_schema() {
|
||
|
|
let fixture = load_fixture("openai/chat_completions_non_streaming.json");
|
||
|
|
let mock_provider = WireMock::start().await;
|
||
|
|
Mock::given(method("POST"))
|
||
|
|
.and(path("/v1/chat/completions"))
|
||
|
|
.respond_with(ResponseTemplate::new(200).set_body_json(&fixture))
|
||
|
|
.mount(&mock_provider).await;
|
||
|
|
|
||
|
|
let proxy = start_test_proxy(mock_provider.uri()).await;
|
||
|
|
let response = proxy.post("/v1/chat/completions")
|
||
|
|
.header("Authorization", "Bearer dd0c_sk_live_test")
|
||
|
|
.json(&standard_chat_request())
|
||
|
|
.send().await;
|
||
|
|
|
||
|
|
assert_eq!(response.status(), 200);
|
||
|
|
let body: Value = response.json().await;
|
||
|
|
// Assert OpenAI schema compliance
|
||
|
|
assert!(body["id"].as_str().unwrap().starts_with("chatcmpl-"));
|
||
|
|
assert_eq!(body["object"], "chat.completion");
|
||
|
|
assert!(body["choices"][0]["message"]["content"].is_string());
|
||
|
|
assert!(body["usage"]["prompt_tokens"].is_number());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_preserves_sse_chunk_ordering_for_streaming_requests() {
|
||
|
|
let fixture_chunks = load_sse_fixture("openai/chat_completions_streaming.txt");
|
||
|
|
let mock_provider = WireMock::start().await;
|
||
|
|
Mock::given(method("POST"))
|
||
|
|
.respond_with(ResponseTemplate::new(200)
|
||
|
|
.set_body_raw(fixture_chunks, "text/event-stream"))
|
||
|
|
.mount(&mock_provider).await;
|
||
|
|
|
||
|
|
let proxy = start_test_proxy(mock_provider.uri()).await;
|
||
|
|
let chunks = collect_sse_chunks(proxy, streaming_chat_request()).await;
|
||
|
|
|
||
|
|
// Verify chunk ordering and [DONE] termination
|
||
|
|
assert!(chunks.last().unwrap().contains("[DONE]"));
|
||
|
|
let content: String = chunks.iter()
|
||
|
|
.filter_map(|c| extract_delta_content(c))
|
||
|
|
.collect();
|
||
|
|
assert!(!content.is_empty());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_translates_anthropic_response_to_openai_format() {
|
||
|
|
let anthropic_fixture = load_fixture("anthropic/messages_response.json");
|
||
|
|
let mock_anthropic = WireMock::start().await;
|
||
|
|
Mock::given(method("POST"))
|
||
|
|
.and(path("/v1/messages"))
|
||
|
|
.respond_with(ResponseTemplate::new(200).set_body_json(&anthropic_fixture))
|
||
|
|
.mount(&mock_anthropic).await;
|
||
|
|
|
||
|
|
let proxy = start_test_proxy_with_anthropic(mock_anthropic.uri()).await;
|
||
|
|
let response: Value = proxy.post("/v1/chat/completions")
|
||
|
|
.json(&chat_request_routed_to_anthropic())
|
||
|
|
.send().await.json().await;
|
||
|
|
|
||
|
|
// Response must look like OpenAI even though it came from Anthropic
|
||
|
|
assert_eq!(response["object"], "chat.completion");
|
||
|
|
assert!(response["choices"][0]["message"]["content"].is_string());
|
||
|
|
assert!(response["usage"]["prompt_tokens"].is_number());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn proxy_passes_through_provider_429_with_original_body() {
|
||
|
|
let mock_provider = WireMock::start().await;
|
||
|
|
Mock::given(method("POST"))
|
||
|
|
.respond_with(ResponseTemplate::new(429)
|
||
|
|
.set_body_json(&json!({"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}})))
|
||
|
|
.mount(&mock_provider).await;
|
||
|
|
|
||
|
|
let proxy = start_test_proxy(mock_provider.uri()).await;
|
||
|
|
let response = proxy.post("/v1/chat/completions")
|
||
|
|
.json(&standard_chat_request()).send().await;
|
||
|
|
|
||
|
|
assert_eq!(response.status(), 429);
|
||
|
|
assert_eq!(response.headers()["X-DD0C-Provider-Error"], "true");
|
||
|
|
let body: Value = response.json().await;
|
||
|
|
assert_eq!(body["error"]["type"], "rate_limit_error");
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 4.3 Worker Integration Tests
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/worker_integration.rs
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn weekly_digest_worker_queries_correct_date_range() {
|
||
|
|
let pool = test_timescaledb_pool().await;
|
||
|
|
seed_events_for_last_7_days(&pool, org_id: "test-org", count: 500).await;
|
||
|
|
|
||
|
|
let mock_ses = WireMock::start().await;
|
||
|
|
Mock::given(method("POST"))
|
||
|
|
.and(path("/v2/email/outbound-emails"))
|
||
|
|
.respond_with(ResponseTemplate::new(200))
|
||
|
|
.mount(&mock_ses).await;
|
||
|
|
|
||
|
|
run_weekly_digest("test-org", &pool, mock_ses.uri()).await.unwrap();
|
||
|
|
|
||
|
|
let requests = mock_ses.received_requests().await.unwrap();
|
||
|
|
assert_eq!(requests.len(), 1);
|
||
|
|
let email_body: Value = serde_json::from_slice(&requests[0].body).unwrap();
|
||
|
|
assert!(email_body["subject"].as_str().unwrap().contains("savings"));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn budget_alert_fires_exactly_once_when_threshold_crossed() {
|
||
|
|
let pool = test_pool().await;
|
||
|
|
let alert = seed_alert(&pool, threshold: 100.0).await;
|
||
|
|
seed_spend(&pool, org_id: alert.org_id, amount: 105.0).await;
|
||
|
|
|
||
|
|
let mock_slack = WireMock::start().await;
|
||
|
|
Mock::given(method("POST")).respond_with(ResponseTemplate::new(200))
|
||
|
|
.mount(&mock_slack).await;
|
||
|
|
|
||
|
|
// Run evaluator twice — alert should only fire once
|
||
|
|
evaluate_alerts(&pool, mock_slack.uri()).await.unwrap();
|
||
|
|
evaluate_alerts(&pool, mock_slack.uri()).await.unwrap();
|
||
|
|
|
||
|
|
let requests = mock_slack.received_requests().await.unwrap();
|
||
|
|
assert_eq!(requests.len(), 1); // not 2 — deduplication works
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 5: E2E & Smoke Tests
|
||
|
|
|
||
|
|
### 5.1 Critical User Journeys
|
||
|
|
|
||
|
|
These are the flows that must work on every deploy. If any of these break, the product is broken.
|
||
|
|
|
||
|
|
#### Journey 1: First Route (P0)
|
||
|
|
```
|
||
|
|
1. Developer signs up via GitHub OAuth
|
||
|
|
2. Org + API key created automatically
|
||
|
|
3. Developer copies curl command from onboarding wizard
|
||
|
|
4. curl request hits proxy with dd0c key
|
||
|
|
5. Request routes to correct model per default rules
|
||
|
|
6. Response headers contain X-DD0C-Model, X-DD0C-Cost, X-DD0C-Saved
|
||
|
|
7. Request appears in dashboard request inspector within 5 seconds
|
||
|
|
```
|
||
|
|
|
||
|
|
**Playwright test:**
|
||
|
|
```typescript
|
||
|
|
test('first route onboarding journey completes in under 2 minutes', async ({ page }) => {
|
||
|
|
await page.goto('https://staging.route.dd0c.dev');
|
||
|
|
await page.click('[data-testid="github-signin"]');
|
||
|
|
// ... OAuth mock in staging
|
||
|
|
await expect(page.locator('[data-testid="api-key-display"]')).toBeVisible();
|
||
|
|
|
||
|
|
const apiKey = await page.locator('[data-testid="api-key-value"]').textContent();
|
||
|
|
expect(apiKey).toMatch(/^dd0c_sk_live_/);
|
||
|
|
|
||
|
|
// Simulate the curl command
|
||
|
|
const response = await fetch('https://proxy.staging.route.dd0c.dev/v1/chat/completions', {
|
||
|
|
method: 'POST',
|
||
|
|
headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
|
||
|
|
body: JSON.stringify({ model: 'gpt-4o', messages: [{ role: 'user', content: 'Say hello' }] })
|
||
|
|
});
|
||
|
|
expect(response.status).toBe(200);
|
||
|
|
expect(response.headers.get('X-DD0C-Model-Used')).toBeTruthy();
|
||
|
|
|
||
|
|
// Request should appear in inspector
|
||
|
|
await page.goto(`https://staging.route.dd0c.dev/dashboard/requests`);
|
||
|
|
await expect(page.locator('[data-testid="request-row"]').first()).toBeVisible({ timeout: 10000 });
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Journey 2: Routing Rule Takes Effect (P0)
|
||
|
|
```
|
||
|
|
1. User creates routing rule: feature=classify → cheapest from [gpt-4o-mini, claude-haiku]
|
||
|
|
2. Sends request with X-DD0C-Feature: classify header requesting gpt-4o
|
||
|
|
3. Proxy routes to gpt-4o-mini (cheapest)
|
||
|
|
4. Response header X-DD0C-Model-Used = gpt-4o-mini
|
||
|
|
5. Dashboard shows savings for this request
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Journey 3: Graceful Degradation (P0)
|
||
|
|
```
|
||
|
|
1. TimescaleDB container is killed
|
||
|
|
2. Proxy continues accepting and routing requests
|
||
|
|
3. Requests return 200 with correct routing
|
||
|
|
4. No 500 errors from proxy
|
||
|
|
5. When TimescaleDB recovers, telemetry resumes
|
||
|
|
```
|
||
|
|
|
||
|
|
**k6 chaos test:**
|
||
|
|
```javascript
|
||
|
|
// tests/e2e/chaos_timescaledb.js
|
||
|
|
import http from 'k6/http';
|
||
|
|
import { check } from 'k6';
|
||
|
|
|
||
|
|
export let options = { vus: 10, duration: '60s' };
|
||
|
|
|
||
|
|
export default function () {
|
||
|
|
const res = http.post('https://proxy.staging.route.dd0c.dev/v1/chat/completions',
|
||
|
|
JSON.stringify({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'ping' }] }),
|
||
|
|
{ headers: { 'Authorization': 'Bearer dd0c_sk_test_...', 'Content-Type': 'application/json' } }
|
||
|
|
);
|
||
|
|
check(res, {
|
||
|
|
'status is 200': (r) => r.status === 200,
|
||
|
|
'routing header present': (r) => r.headers['X-DD0C-Model-Used'] !== undefined,
|
||
|
|
});
|
||
|
|
}
|
||
|
|
// Run this while: docker stop dd0c-timescaledb
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5.2 Staging Environment Requirements
|
||
|
|
|
||
|
|
| Requirement | Detail |
|
||
|
|
|-------------|--------|
|
||
|
|
| Isolated AWS account | Separate from prod — no shared RDS, no shared Redis |
|
||
|
|
| GitHub OAuth app | Separate OAuth app pointing to staging callback URL |
|
||
|
|
| Synthetic LLM providers | `wiremock` or `mockoon` containers replacing real OpenAI/Anthropic |
|
||
|
|
| Seeded data | 10K synthetic `request_events` pre-loaded for dashboard testing |
|
||
|
|
| Feature flags | All flags default-off in staging; tests explicitly enable them |
|
||
|
|
| Teardown | Staging DB wiped and re-seeded on each E2E run |
|
||
|
|
|
||
|
|
### 5.3 Synthetic Traffic Generation
|
||
|
|
|
||
|
|
For dashboard and performance tests, a traffic generator seeds realistic request patterns:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tools/traffic-gen/src/main.rs
|
||
|
|
// Generates realistic request distributions matching real usage patterns
|
||
|
|
|
||
|
|
struct TrafficProfile {
|
||
|
|
requests_per_second: f64,
|
||
|
|
feature_distribution: HashMap<String, f64>, // {"classify": 0.4, "summarize": 0.3, ...}
|
||
|
|
model_distribution: HashMap<String, f64>, // {"gpt-4o": 0.6, "gpt-4o-mini": 0.4}
|
||
|
|
streaming_ratio: f64, // 0.3 = 30% streaming
|
||
|
|
}
|
||
|
|
|
||
|
|
// Usage: cargo run --bin traffic-gen -- --profile realistic --duration 60s --target staging
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 6: Performance & Load Testing
|
||
|
|
|
||
|
|
### 6.1 Latency Budget Tests (<5ms proxy overhead)
|
||
|
|
|
||
|
|
The <5ms overhead SLA is the product's core technical promise. It must be continuously validated.
|
||
|
|
|
||
|
|
**Benchmark setup:** Use `criterion` for micro-benchmarks on the hot path components.
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# Cargo.toml
|
||
|
|
[[bench]]
|
||
|
|
name = "hot_path"
|
||
|
|
harness = false
|
||
|
|
|
||
|
|
[dev-dependencies]
|
||
|
|
criterion = { version = "0.5", features = ["async_tokio"] }
|
||
|
|
```
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// benches/hot_path.rs
|
||
|
|
use criterion::{criterion_group, criterion_main, Criterion};
|
||
|
|
|
||
|
|
fn bench_complexity_classifier(c: &mut Criterion) {
|
||
|
|
let messages = vec![
|
||
|
|
system("Extract the sentiment. Reply with one word."),
|
||
|
|
user("The product is great!"),
|
||
|
|
];
|
||
|
|
c.bench_function("complexity_classifier_short_prompt", |b| {
|
||
|
|
b.iter(|| classify_complexity(&messages, "gpt-4o"))
|
||
|
|
});
|
||
|
|
// Target: <500µs (well within the 2ms budget)
|
||
|
|
}
|
||
|
|
|
||
|
|
fn bench_rule_engine_10_rules(c: &mut Criterion) {
|
||
|
|
let rules = make_rules(10);
|
||
|
|
let req = make_request_with_feature("classify");
|
||
|
|
let costs = cost_tables();
|
||
|
|
c.bench_function("rule_engine_10_rules", |b| {
|
||
|
|
b.iter(|| evaluate_rules(&rules, &req, &costs))
|
||
|
|
});
|
||
|
|
// Target: <1ms
|
||
|
|
}
|
||
|
|
|
||
|
|
fn bench_api_key_hash_lookup(c: &mut Criterion) {
|
||
|
|
let key = "dd0c_sk_live_a3f2b8c9d4e5f6a7b8c9d4e5f6a7b8c9";
|
||
|
|
c.bench_function("api_key_sha256_hash", |b| {
|
||
|
|
b.iter(|| hash_api_key(key))
|
||
|
|
});
|
||
|
|
// Target: <100µs
|
||
|
|
}
|
||
|
|
|
||
|
|
criterion_group!(benches, bench_complexity_classifier, bench_rule_engine_10_rules, bench_api_key_hash_lookup);
|
||
|
|
criterion_main!(benches);
|
||
|
|
```
|
||
|
|
|
||
|
|
**CI gate:** If any benchmark regresses by >20% vs. the baseline, the PR is blocked.
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# .github/workflows/bench.yml
|
||
|
|
- name: Run benchmarks
|
||
|
|
run: cargo bench -- --output-format bencher | tee bench_output.txt
|
||
|
|
- name: Compare with baseline
|
||
|
|
uses: benchmark-action/github-action-benchmark@v1
|
||
|
|
with:
|
||
|
|
tool: cargo
|
||
|
|
output-file-path: bench_output.txt
|
||
|
|
alert-threshold: '120%'
|
||
|
|
fail-on-alert: true
|
||
|
|
```
|
||
|
|
|
||
|
|
### 6.2 Throughput Benchmarks
|
||
|
|
|
||
|
|
**k6 load test — sustained throughput:**
|
||
|
|
|
||
|
|
```javascript
|
||
|
|
// tests/load/throughput.js
|
||
|
|
import http from 'k6/http';
|
||
|
|
import { check, sleep } from 'k6';
|
||
|
|
import { Rate, Trend } from 'k6/metrics';
|
||
|
|
|
||
|
|
const proxyOverhead = new Trend('proxy_overhead_ms');
|
||
|
|
const errorRate = new Rate('errors');
|
||
|
|
|
||
|
|
export let options = {
|
||
|
|
stages: [
|
||
|
|
{ duration: '2m', target: 50 }, // ramp up
|
||
|
|
{ duration: '5m', target: 50 }, // sustained load
|
||
|
|
{ duration: '2m', target: 100 }, // peak load
|
||
|
|
{ duration: '1m', target: 0 }, // ramp down
|
||
|
|
],
|
||
|
|
thresholds: {
|
||
|
|
'proxy_overhead_ms': ['p(99)<5'], // THE SLA
|
||
|
|
'http_req_duration': ['p(99)<500'], // total including LLM
|
||
|
|
'errors': ['rate<0.01'], // <1% error rate
|
||
|
|
},
|
||
|
|
};
|
||
|
|
|
||
|
|
export default function () {
|
||
|
|
const start = Date.now();
|
||
|
|
const res = http.post(
|
||
|
|
`${__ENV.PROXY_URL}/v1/chat/completions`,
|
||
|
|
JSON.stringify({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'ping' }] }),
|
||
|
|
{ headers: { 'Authorization': `Bearer ${__ENV.DD0C_KEY}`, 'Content-Type': 'application/json' } }
|
||
|
|
);
|
||
|
|
|
||
|
|
const overhead = parseInt(res.headers['X-DD0C-Latency-Overhead-Ms'] || '999');
|
||
|
|
proxyOverhead.add(overhead);
|
||
|
|
errorRate.add(res.status !== 200);
|
||
|
|
|
||
|
|
check(res, { 'status 200': (r) => r.status === 200 });
|
||
|
|
sleep(0.1);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Targets:**
|
||
|
|
|
||
|
|
| Metric | Target | Blocking |
|
||
|
|
|--------|--------|---------|
|
||
|
|
| Proxy overhead P99 | <5ms | Yes — blocks deploy |
|
||
|
|
| Proxy overhead P50 | <2ms | No — informational |
|
||
|
|
| Total request P99 | <500ms (excl. LLM time) | Yes |
|
||
|
|
| Error rate | <1% | Yes |
|
||
|
|
| Throughput | >500 req/s per proxy task | No — informational |
|
||
|
|
|
||
|
|
### 6.3 Chaos & Fault Injection
|
||
|
|
|
||
|
|
| Scenario | Tool | Expected Behavior | Pass Criteria |
|
||
|
|
|----------|------|-------------------|---------------|
|
||
|
|
| Kill TimescaleDB | `docker stop` | Proxy continues routing, telemetry dropped | 0 proxy 5xx errors |
|
||
|
|
| Kill Redis | `docker stop` | Auth falls back to PG, rate limiting disabled | <10% latency increase |
|
||
|
|
| OpenAI returns 429 | WireMock | Fallback to Anthropic within 1 retry | Request succeeds, `was_fallback=true` |
|
||
|
|
| Anthropic returns 500 | WireMock | Circuit opens, fallback to gpt-4o | Request succeeds or 503 with header |
|
||
|
|
| All providers return 500 | WireMock | 503 with `X-DD0C-Fallback-Exhausted` | Correct error code, no panic |
|
||
|
|
| Network partition (50% packet loss) | `tc netem` | Increased latency, no crashes | P99 < 2x normal |
|
||
|
|
| Proxy OOM | `--memory 256m` Docker limit | ECS restarts task, ALB routes to healthy | <30s recovery |
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Chaos test runner script
|
||
|
|
#!/bin/bash
|
||
|
|
# tests/chaos/run_chaos.sh
|
||
|
|
|
||
|
|
echo "=== Chaos Test: TimescaleDB Failure ==="
|
||
|
|
docker stop dd0c-timescaledb-test
|
||
|
|
sleep 5
|
||
|
|
k6 run --env PROXY_URL=http://localhost:8080 tests/load/throughput.js --duration 30s
|
||
|
|
docker start dd0c-timescaledb-test
|
||
|
|
echo "TimescaleDB recovered"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 7: CI/CD Pipeline Integration
|
||
|
|
|
||
|
|
### 7.1 Test Stages
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ git commit (pre-commit hook) │
|
||
|
|
│ ├─ cargo fmt --check │
|
||
|
|
│ ├─ cargo clippy -- -D warnings │
|
||
|
|
│ ├─ grep for forbidden DDL keywords in new migration files │
|
||
|
|
│ └─ check decision_log.json present if router/ files changed │
|
||
|
|
└─────────────────────────────────────────────────────────────────┘
|
||
|
|
│ push
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ PR / push to branch │
|
||
|
|
│ ├─ cargo test --workspace (unit tests only, no Docker) │
|
||
|
|
│ ├─ cargo bench (regression check vs. baseline) │
|
||
|
|
│ ├─ vitest --run (UI unit tests) │
|
||
|
|
│ ├─ eslint + tsc --noEmit (UI type check) │
|
||
|
|
│ └─ cargo audit (dependency vulnerability scan) │
|
||
|
|
│ Target: <3 minutes │
|
||
|
|
└─────────────────────────────────────────────────────────────────┘
|
||
|
|
│ PR approved
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ merge to main │
|
||
|
|
│ ├─ All PR checks (re-run) │
|
||
|
|
│ ├─ Integration tests (Testcontainers — requires Docker) │
|
||
|
|
│ ├─ Contract tests (fixture-based, no live APIs) │
|
||
|
|
│ ├─ Coverage report (tarpaulin) — gate at 70% │
|
||
|
|
│ └─ Flag TTL audit (fail if any flag > 14 days at 100%) │
|
||
|
|
│ Target: <8 minutes │
|
||
|
|
└─────────────────────────────────────────────────────────────────┘
|
||
|
|
│ tests pass
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ deploy to staging │
|
||
|
|
│ ├─ docker build + push to ECR │
|
||
|
|
│ ├─ sqlx migrate run (staging DB) │
|
||
|
|
│ ├─ ECS rolling deploy │
|
||
|
|
│ ├─ Smoke tests (k6, 60s, 10 VUs) │
|
||
|
|
│ └─ Playwright E2E (critical journeys only) │
|
||
|
|
│ Target: <15 minutes total │
|
||
|
|
└─────────────────────────────────────────────────────────────────┘
|
||
|
|
│ staging green
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ deploy to production │
|
||
|
|
│ ├─ ECS rolling deploy │
|
||
|
|
│ ├─ Synthetic canary (1 req/min via CloudWatch Synthetics) │
|
||
|
|
│ └─ Rollback trigger: error rate >5% for 3 minutes │
|
||
|
|
└─────────────────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
### 7.2 GitHub Actions Configuration
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# .github/workflows/ci.yml
|
||
|
|
name: CI
|
||
|
|
|
||
|
|
on:
|
||
|
|
push:
|
||
|
|
branches: [main]
|
||
|
|
pull_request:
|
||
|
|
|
||
|
|
jobs:
|
||
|
|
unit-tests:
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
steps:
|
||
|
|
- uses: actions/checkout@v4
|
||
|
|
- uses: dtolnay/rust-toolchain@stable
|
||
|
|
with:
|
||
|
|
components: clippy, rustfmt
|
||
|
|
- uses: Swatinem/rust-cache@v2
|
||
|
|
- run: cargo fmt --check
|
||
|
|
- run: cargo clippy --workspace -- -D warnings
|
||
|
|
- run: cargo test --workspace --lib # unit tests only (no integration)
|
||
|
|
- run: cd ui && npm ci && npx vitest --run
|
||
|
|
- run: cd cli && npm ci && npx vitest --run
|
||
|
|
|
||
|
|
integration-tests:
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
needs: unit-tests
|
||
|
|
services:
|
||
|
|
docker:
|
||
|
|
image: docker:dind
|
||
|
|
options: --privileged
|
||
|
|
steps:
|
||
|
|
- uses: actions/checkout@v4
|
||
|
|
- uses: dtolnay/rust-toolchain@stable
|
||
|
|
- uses: Swatinem/rust-cache@v2
|
||
|
|
- run: cargo test --workspace --test '*' # integration tests in tests/
|
||
|
|
|
||
|
|
coverage:
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
needs: integration-tests
|
||
|
|
steps:
|
||
|
|
- uses: actions/checkout@v4
|
||
|
|
- uses: dtolnay/rust-toolchain@stable
|
||
|
|
- run: cargo install cargo-tarpaulin
|
||
|
|
- run: cargo tarpaulin --workspace --out Xml --output-dir coverage/
|
||
|
|
- uses: codecov/codecov-action@v4
|
||
|
|
with:
|
||
|
|
fail_ci_if_error: true
|
||
|
|
threshold: 70 # block merge if coverage drops below 70%
|
||
|
|
|
||
|
|
benchmarks:
|
||
|
|
runs-on: ubuntu-latest
|
||
|
|
needs: unit-tests
|
||
|
|
steps:
|
||
|
|
- uses: actions/checkout@v4
|
||
|
|
- uses: dtolnay/rust-toolchain@stable
|
||
|
|
- uses: Swatinem/rust-cache@v2
|
||
|
|
- run: cargo bench -- --output-format bencher | tee bench_output.txt
|
||
|
|
- uses: benchmark-action/github-action-benchmark@v1
|
||
|
|
with:
|
||
|
|
tool: cargo
|
||
|
|
output-file-path: bench_output.txt
|
||
|
|
alert-threshold: '120%'
|
||
|
|
fail-on-alert: true
|
||
|
|
github-token: ${{ secrets.GITHUB_TOKEN }}
|
||
|
|
auto-push: ${{ github.ref == 'refs/heads/main' }}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 7.3 Coverage Thresholds
|
||
|
|
|
||
|
|
| Crate | Minimum Coverage | Rationale |
|
||
|
|
|-------|-----------------|-----------|
|
||
|
|
| `crates/shared` (router, cost) | 85% | Core business logic — high confidence required |
|
||
|
|
| `crates/proxy` | 75% | Hot path — streaming paths are hard to unit test |
|
||
|
|
| `crates/api` | 75% | Auth and RBAC paths must be covered |
|
||
|
|
| `crates/worker` | 65% | Async scheduling is harder to test deterministically |
|
||
|
|
| `cli/` | 70% | Parser logic must be covered |
|
||
|
|
| `ui/` | 60% | UI components — visual testing supplements unit tests |
|
||
|
|
|
||
|
|
Coverage is measured by `cargo-tarpaulin` for Rust and `vitest --coverage` for TypeScript. Coverage gates block merges but do not block deploys (a deploy with lower coverage is better than a rollback).
|
||
|
|
|
||
|
|
### 7.4 Test Parallelization
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# .cargo/config.toml
|
||
|
|
[test]
|
||
|
|
# Run unit tests in parallel (default), integration tests sequentially
|
||
|
|
# Integration tests use Testcontainers — each gets its own container
|
||
|
|
```
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# GitHub Actions matrix for parallel integration test suites
|
||
|
|
integration-tests:
|
||
|
|
strategy:
|
||
|
|
matrix:
|
||
|
|
suite: [proxy, api, worker, analytics]
|
||
|
|
steps:
|
||
|
|
- run: cargo test --test ${{ matrix.suite }}_integration
|
||
|
|
```
|
||
|
|
|
||
|
|
Each integration test suite spins up its own Testcontainers instances — no shared state, no port conflicts, fully parallelizable.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 8: Transparent Factory Tenet Testing
|
||
|
|
|
||
|
|
### 8.1 Atomic Flagging — Feature Flag Behavior (Story 10.1)
|
||
|
|
|
||
|
|
Every flag must be testable in three states: off (default), on, and auto-disabled (circuit tripped).
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/feature_flags.rs
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn routing_strategy_uses_passthrough_when_flag_is_off() {
|
||
|
|
let flags = FlagProvider::from_json(json!({
|
||
|
|
"cascading_routing": { "enabled": false }
|
||
|
|
}));
|
||
|
|
let req = make_request_with_feature("classify");
|
||
|
|
let decision = route_with_flags(&req, &flags, &cost_tables()).await;
|
||
|
|
assert_eq!(decision.strategy, RoutingStrategy::Passthrough);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn routing_strategy_uses_cascading_when_flag_is_on() {
|
||
|
|
let flags = FlagProvider::from_json(json!({
|
||
|
|
"cascading_routing": { "enabled": true }
|
||
|
|
}));
|
||
|
|
let req = make_request_with_feature("classify");
|
||
|
|
let decision = route_with_flags(&req, &flags, &cost_tables()).await;
|
||
|
|
assert_eq!(decision.strategy, RoutingStrategy::Cascading);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn flag_auto_disables_when_p99_latency_increases_by_more_than_5_percent() {
|
||
|
|
let flags = Arc::new(Mutex::new(FlagProvider::from_json(json!({
|
||
|
|
"new_complexity_classifier": { "enabled": true, "owner": "brian", "ttl_days": 7 }
|
||
|
|
}))));
|
||
|
|
|
||
|
|
let monitor = FlagHealthMonitor::new(flags.clone(), baseline_p99_ms: 4.0);
|
||
|
|
|
||
|
|
// Simulate latency spike
|
||
|
|
for _ in 0..100 {
|
||
|
|
monitor.record_latency(4.3); // 7.5% above baseline
|
||
|
|
}
|
||
|
|
|
||
|
|
tokio::time::sleep(Duration::from_secs(31)).await;
|
||
|
|
|
||
|
|
let current_flags = flags.lock().await;
|
||
|
|
assert!(!current_flags.is_enabled("new_complexity_classifier"),
|
||
|
|
"flag should have auto-disabled due to latency regression");
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn flag_with_expired_ttl_fails_ci_audit() {
|
||
|
|
let flags = vec![
|
||
|
|
FlagDefinition {
|
||
|
|
name: "old_feature".to_string(),
|
||
|
|
rollout_pct: 100,
|
||
|
|
created_at: Utc::now() - Duration::days(20),
|
||
|
|
ttl_days: 14,
|
||
|
|
owner: "brian".to_string(),
|
||
|
|
}
|
||
|
|
];
|
||
|
|
let violations = audit_flag_ttls(&flags);
|
||
|
|
assert_eq!(violations.len(), 1);
|
||
|
|
assert_eq!(violations[0].flag_name, "old_feature");
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Flag test matrix** — every flag must have tests for all three states:
|
||
|
|
|
||
|
|
| Flag | Off behavior | On behavior | Auto-disable trigger |
|
||
|
|
|------|-------------|-------------|---------------------|
|
||
|
|
| `cascading_routing` | Passthrough | Try cheapest, escalate on error | P99 >5% regression |
|
||
|
|
| `complexity_classifier_v2` | Use heuristic v1 | Use ML classifier | Error rate >2% |
|
||
|
|
| `provider_failover_anthropic` | No Anthropic fallback | Anthropic in fallback chain | Anthropic error rate >10% |
|
||
|
|
| `cost_table_auto_refresh` | Manual refresh only | Background 60s refresh | N/A |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 8.2 Elastic Schema — Migration Validation (Story 10.2)
|
||
|
|
|
||
|
|
CI must reject any migration containing destructive DDL.
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tools/migration-lint/src/main.rs
|
||
|
|
|
||
|
|
const FORBIDDEN_PATTERNS: &[&str] = &[
|
||
|
|
r"DROP\s+TABLE",
|
||
|
|
r"DROP\s+COLUMN",
|
||
|
|
r"ALTER\s+TABLE\s+\w+\s+RENAME",
|
||
|
|
r"ALTER\s+COLUMN\s+\w+\s+TYPE",
|
||
|
|
r"TRUNCATE",
|
||
|
|
];
|
||
|
|
|
||
|
|
pub fn lint_migration(sql: &str) -> Vec<LintViolation> {
|
||
|
|
FORBIDDEN_PATTERNS.iter()
|
||
|
|
.filter_map(|pattern| {
|
||
|
|
let re = Regex::new(pattern).unwrap();
|
||
|
|
if re.is_match(&sql.to_uppercase()) {
|
||
|
|
Some(LintViolation { pattern: pattern.to_string(), sql: sql.to_string() })
|
||
|
|
} else {
|
||
|
|
None
|
||
|
|
}
|
||
|
|
})
|
||
|
|
.collect()
|
||
|
|
}
|
||
|
|
|
||
|
|
#[cfg(test)]
|
||
|
|
mod tests {
|
||
|
|
use super::*;
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn lint_rejects_drop_table() {
|
||
|
|
let sql = "DROP TABLE request_events;";
|
||
|
|
assert!(!lint_migration(sql).is_empty());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn lint_rejects_alter_column_type() {
|
||
|
|
let sql = "ALTER TABLE request_events ALTER COLUMN latency_ms TYPE BIGINT;";
|
||
|
|
assert!(!lint_migration(sql).is_empty());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn lint_accepts_add_nullable_column() {
|
||
|
|
let sql = "ALTER TABLE request_events ADD COLUMN cache_key VARCHAR(64) NULL;";
|
||
|
|
assert!(lint_migration(sql).is_empty());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn lint_accepts_create_index() {
|
||
|
|
let sql = "CREATE INDEX CONCURRENTLY idx_re_model ON request_events(model_used);";
|
||
|
|
assert!(lint_migration(sql).is_empty());
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn migration_file_includes_sunset_date_comment() {
|
||
|
|
let sql = "-- sunset_date: 2026-03-30\nALTER TABLE orgs ADD COLUMN tier_v2 VARCHAR(20) NULL;";
|
||
|
|
assert!(has_sunset_date_comment(sql));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[test]
|
||
|
|
fn migration_without_sunset_date_fails_lint() {
|
||
|
|
let sql = "ALTER TABLE orgs ADD COLUMN tier_v2 VARCHAR(20) NULL;";
|
||
|
|
assert!(!has_sunset_date_comment(sql));
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Dual-write pattern test:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[tokio::test]
|
||
|
|
async fn dual_write_writes_to_both_old_and_new_schema_in_same_transaction() {
|
||
|
|
let pool = test_pool().await;
|
||
|
|
// Simulate migration window: both `plan` (old) and `plan_v2` (new) columns exist
|
||
|
|
sqlx::query("ALTER TABLE organizations ADD COLUMN plan_v2 VARCHAR(30) NULL")
|
||
|
|
.execute(&pool).await.unwrap();
|
||
|
|
|
||
|
|
let org_id = create_org_dual_write(&pool, plan: "pro").await.unwrap();
|
||
|
|
|
||
|
|
let row = sqlx::query("SELECT plan, plan_v2 FROM organizations WHERE id = $1")
|
||
|
|
.bind(org_id).fetch_one(&pool).await.unwrap();
|
||
|
|
assert_eq!(row.get::<String, _>("plan"), "pro");
|
||
|
|
assert_eq!(row.get::<String, _>("plan_v2"), "pro"); // written to both
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 8.3 Cognitive Durability — Decision Log Validation (Story 10.3)
|
||
|
|
|
||
|
|
CI enforces that PRs touching routing or cost logic include a `decision_log.json` entry.
|
||
|
|
|
||
|
|
```python
|
||
|
|
# tools/decision-log-check/check.py
|
||
|
|
# Run as: python check.py --changed-files <list>
|
||
|
|
|
||
|
|
import json, sys, re
|
||
|
|
from pathlib import Path
|
||
|
|
|
||
|
|
GUARDED_PATHS = ["src/router/", "src/cost/", "migrations/"]
|
||
|
|
REQUIRED_FIELDS = ["prompt", "reasoning", "alternatives_considered", "confidence", "timestamp", "author"]
|
||
|
|
|
||
|
|
def check_decision_log(changed_files: list[str]) -> list[str]:
|
||
|
|
errors = []
|
||
|
|
touches_guarded = any(
|
||
|
|
any(f.startswith(p) for p in GUARDED_PATHS)
|
||
|
|
for f in changed_files
|
||
|
|
)
|
||
|
|
if not touches_guarded:
|
||
|
|
return []
|
||
|
|
|
||
|
|
log_files = list(Path("docs/decisions").glob("*.json"))
|
||
|
|
if not log_files:
|
||
|
|
return ["No decision_log.json found in docs/decisions/ for changes to guarded paths"]
|
||
|
|
|
||
|
|
# Check the most recently modified log file
|
||
|
|
latest = max(log_files, key=lambda p: p.stat().st_mtime)
|
||
|
|
try:
|
||
|
|
log = json.loads(latest.read_text())
|
||
|
|
for field in REQUIRED_FIELDS:
|
||
|
|
if field not in log:
|
||
|
|
errors.append(f"decision_log missing required field: {field}")
|
||
|
|
except json.JSONDecodeError as e:
|
||
|
|
errors.append(f"decision_log.json is not valid JSON: {e}")
|
||
|
|
|
||
|
|
return errors
|
||
|
|
|
||
|
|
# Tests for the checker itself
|
||
|
|
def test_check_passes_when_log_present_with_all_fields():
|
||
|
|
# ... test implementation
|
||
|
|
|
||
|
|
def test_check_fails_when_log_missing_reasoning_field():
|
||
|
|
# ... test implementation
|
||
|
|
```
|
||
|
|
|
||
|
|
**Cyclomatic complexity enforcement:**
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# .clippy.toml
|
||
|
|
cognitive-complexity-threshold = 10
|
||
|
|
```
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# CI step
|
||
|
|
- run: cargo clippy --workspace -- -W clippy::cognitive_complexity -D warnings
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 8.4 Semantic Observability — OTEL Span Assertion Tests (Story 10.4)
|
||
|
|
|
||
|
|
Tests verify that routing decisions emit correctly structured OpenTelemetry spans.
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/observability.rs
|
||
|
|
use opentelemetry_sdk::testing::trace::InMemorySpanExporter;
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn routing_decision_emits_ai_routing_decision_span() {
|
||
|
|
let exporter = InMemorySpanExporter::default();
|
||
|
|
let tracer = setup_test_tracer(exporter.clone());
|
||
|
|
|
||
|
|
let req = make_request("gpt-4o", feature: "classify");
|
||
|
|
let _decision = route_request_with_tracing(&req, &tracer, &cost_tables()).await;
|
||
|
|
|
||
|
|
let spans = exporter.get_finished_spans().unwrap();
|
||
|
|
let routing_span = spans.iter()
|
||
|
|
.find(|s| s.name == "ai_routing_decision")
|
||
|
|
.expect("ai_routing_decision span must be emitted");
|
||
|
|
|
||
|
|
// Assert required attributes
|
||
|
|
let attrs = span_attrs_as_map(routing_span);
|
||
|
|
assert!(attrs.contains_key("ai.model_selected"));
|
||
|
|
assert!(attrs.contains_key("ai.model_alternatives"));
|
||
|
|
assert!(attrs.contains_key("ai.cost_delta"));
|
||
|
|
assert!(attrs.contains_key("ai.complexity_score"));
|
||
|
|
assert!(attrs.contains_key("ai.routing_strategy"));
|
||
|
|
assert!(attrs.contains_key("ai.prompt_hash"));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn routing_span_never_contains_raw_prompt_content() {
|
||
|
|
let exporter = InMemorySpanExporter::default();
|
||
|
|
let tracer = setup_test_tracer(exporter.clone());
|
||
|
|
|
||
|
|
let secret_prompt = "My secret quarterly revenue is $4.2M";
|
||
|
|
let req = make_request_with_prompt("gpt-4o", secret_prompt);
|
||
|
|
route_request_with_tracing(&req, &tracer, &cost_tables()).await;
|
||
|
|
|
||
|
|
let spans = exporter.get_finished_spans().unwrap();
|
||
|
|
for span in &spans {
|
||
|
|
for (key, value) in span_attrs_as_map(span) {
|
||
|
|
assert!(!format!("{:?}", value).contains(secret_prompt),
|
||
|
|
"span attribute '{}' contains raw prompt content", key);
|
||
|
|
}
|
||
|
|
for event in &span.events {
|
||
|
|
assert!(!event.name.contains(secret_prompt));
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn prompt_hash_is_sha256_of_first_500_chars_of_system_prompt() {
|
||
|
|
let exporter = InMemorySpanExporter::default();
|
||
|
|
let tracer = setup_test_tracer(exporter.clone());
|
||
|
|
|
||
|
|
let system_prompt = "You are a helpful assistant.";
|
||
|
|
let req = make_request_with_system_prompt("gpt-4o", system_prompt);
|
||
|
|
route_request_with_tracing(&req, &tracer, &cost_tables()).await;
|
||
|
|
|
||
|
|
let spans = exporter.get_finished_spans().unwrap();
|
||
|
|
let routing_span = spans.iter().find(|s| s.name == "ai_routing_decision").unwrap();
|
||
|
|
let attrs = span_attrs_as_map(routing_span);
|
||
|
|
|
||
|
|
let expected_hash = sha256_hex(&system_prompt[..system_prompt.len().min(500)]);
|
||
|
|
assert_eq!(attrs["ai.prompt_hash"].as_str().unwrap(), expected_hash);
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn routing_span_is_child_of_request_trace() {
|
||
|
|
let exporter = InMemorySpanExporter::default();
|
||
|
|
let tracer = setup_test_tracer(exporter.clone());
|
||
|
|
|
||
|
|
route_request_with_tracing(&make_request("gpt-4o", feature: "test"), &tracer, &cost_tables()).await;
|
||
|
|
|
||
|
|
let spans = exporter.get_finished_spans().unwrap();
|
||
|
|
let request_span = spans.iter().find(|s| s.name == "proxy_request").unwrap();
|
||
|
|
let routing_span = spans.iter().find(|s| s.name == "ai_routing_decision").unwrap();
|
||
|
|
|
||
|
|
assert_eq!(routing_span.parent_span_id, request_span.span_context.span_id());
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 8.5 Configurable Autonomy — Governance Policy Tests (Story 10.5)
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// tests/governance.rs
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn strict_mode_blocks_automatic_routing_rule_update() {
|
||
|
|
let policy = Policy { governance_mode: GovernanceMode::Strict, panic_mode: false };
|
||
|
|
let result = apply_routing_rule_update(&policy, make_rule_update()).await;
|
||
|
|
assert_eq!(result, Err(GovernanceError::BlockedByStrictMode));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn audit_mode_applies_change_and_logs_it() {
|
||
|
|
let policy = Policy { governance_mode: GovernanceMode::Audit, panic_mode: false };
|
||
|
|
let log = Arc::new(Mutex::new(vec![]));
|
||
|
|
let result = apply_routing_rule_update_with_log(&policy, make_rule_update(), log.clone()).await;
|
||
|
|
|
||
|
|
assert!(result.is_ok());
|
||
|
|
let entries = log.lock().await;
|
||
|
|
assert_eq!(entries.len(), 1);
|
||
|
|
assert!(entries[0].contains("Allowed by audit mode"));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn panic_mode_freezes_all_routing_to_hardcoded_provider() {
|
||
|
|
let policy = Policy { governance_mode: GovernanceMode::Audit, panic_mode: true };
|
||
|
|
let req = make_request_with_feature("classify"); // would normally route to gpt-4o-mini
|
||
|
|
|
||
|
|
let decision = route_with_policy(&req, &policy, &cost_tables()).await;
|
||
|
|
|
||
|
|
assert_eq!(decision.strategy, RoutingStrategy::Passthrough);
|
||
|
|
assert_eq!(decision.target_provider, Provider::OpenAI); // hardcoded fallback
|
||
|
|
assert!(decision.reason.contains("panic mode"));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn panic_mode_disables_auto_failover() {
|
||
|
|
let policy = Policy { governance_mode: GovernanceMode::Audit, panic_mode: true };
|
||
|
|
// Even if primary provider fails, panic mode should not auto-failover
|
||
|
|
let mock_openai = WireMock::start().await;
|
||
|
|
Mock::given(method("POST")).respond_with(ResponseTemplate::new(500))
|
||
|
|
.mount(&mock_openai).await;
|
||
|
|
|
||
|
|
let result = dispatch_with_policy(&policy, mock_openai.uri()).await;
|
||
|
|
// Should return the provider error, not silently failover
|
||
|
|
assert_eq!(result.unwrap_err(), DispatchError::ProviderError(500));
|
||
|
|
}
|
||
|
|
|
||
|
|
#[tokio::test]
|
||
|
|
async fn policy_file_changes_are_hot_reloaded_within_5_seconds() {
|
||
|
|
let policy_path = temp_policy_file(GovernanceMode::Audit);
|
||
|
|
let watcher = PolicyWatcher::new(&policy_path);
|
||
|
|
|
||
|
|
// Change to strict mode
|
||
|
|
write_policy_file(&policy_path, GovernanceMode::Strict);
|
||
|
|
tokio::time::sleep(Duration::from_secs(5)).await;
|
||
|
|
|
||
|
|
assert_eq!(watcher.current_mode(), GovernanceMode::Strict);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 9: Test Data & Fixtures
|
||
|
|
|
||
|
|
### 9.1 Factory Patterns for Test Data
|
||
|
|
|
||
|
|
All test data is created via factory functions — no raw struct literals scattered across tests. Factories provide sensible defaults with override capability.
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// crates/shared/src/testing/factories.rs
|
||
|
|
// Feature-gated: only compiled in test builds
|
||
|
|
|
||
|
|
#[cfg(any(test, feature = "test-utils"))]
|
||
|
|
pub mod factories {
|
||
|
|
use crate::models::*;
|
||
|
|
use uuid::Uuid;
|
||
|
|
use chrono::Utc;
|
||
|
|
|
||
|
|
pub struct OrgFactory {
|
||
|
|
name: String,
|
||
|
|
plan: String,
|
||
|
|
monthly_spend_limit: Option<f64>,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl OrgFactory {
|
||
|
|
pub fn new() -> Self {
|
||
|
|
Self {
|
||
|
|
name: format!("Test Org {}", &Uuid::new_v4().to_string()[..8]),
|
||
|
|
plan: "free".to_string(),
|
||
|
|
monthly_spend_limit: None,
|
||
|
|
}
|
||
|
|
}
|
||
|
|
pub fn pro(mut self) -> Self { self.plan = "pro".to_string(); self }
|
||
|
|
pub fn with_spend_limit(mut self, limit: f64) -> Self {
|
||
|
|
self.monthly_spend_limit = Some(limit); self
|
||
|
|
}
|
||
|
|
pub fn build(self) -> Organization {
|
||
|
|
Organization {
|
||
|
|
id: Uuid::new_v4(),
|
||
|
|
name: self.name,
|
||
|
|
slug: slugify(&self.name),
|
||
|
|
plan: self.plan,
|
||
|
|
monthly_llm_spend_limit: self.monthly_spend_limit,
|
||
|
|
created_at: Utc::now(),
|
||
|
|
updated_at: Utc::now(),
|
||
|
|
..Default::default()
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
pub struct RequestEventFactory {
|
||
|
|
org_id: Uuid,
|
||
|
|
model_requested: String,
|
||
|
|
model_used: String,
|
||
|
|
feature_tag: Option<String>,
|
||
|
|
input_tokens: u32,
|
||
|
|
output_tokens: u32,
|
||
|
|
cost_actual: f64,
|
||
|
|
cost_original: f64,
|
||
|
|
latency_ms: u32,
|
||
|
|
status_code: u16,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl RequestEventFactory {
|
||
|
|
pub fn new(org_id: Uuid) -> Self {
|
||
|
|
Self {
|
||
|
|
org_id,
|
||
|
|
model_requested: "gpt-4o".to_string(),
|
||
|
|
model_used: "gpt-4o-mini".to_string(),
|
||
|
|
feature_tag: Some("classify".to_string()),
|
||
|
|
input_tokens: 500,
|
||
|
|
output_tokens: 50,
|
||
|
|
cost_actual: 0.000083,
|
||
|
|
cost_original: 0.001375,
|
||
|
|
latency_ms: 3,
|
||
|
|
status_code: 200,
|
||
|
|
}
|
||
|
|
}
|
||
|
|
pub fn with_model(mut self, requested: &str, used: &str) -> Self {
|
||
|
|
self.model_requested = requested.to_string();
|
||
|
|
self.model_used = used.to_string();
|
||
|
|
self
|
||
|
|
}
|
||
|
|
pub fn with_feature(mut self, feature: &str) -> Self {
|
||
|
|
self.feature_tag = Some(feature.to_string()); self
|
||
|
|
}
|
||
|
|
pub fn with_tokens(mut self, input: u32, output: u32) -> Self {
|
||
|
|
self.input_tokens = input;
|
||
|
|
self.output_tokens = output;
|
||
|
|
self
|
||
|
|
}
|
||
|
|
pub fn failed(mut self) -> Self {
|
||
|
|
self.status_code = 500; self
|
||
|
|
}
|
||
|
|
pub fn build(self) -> RequestEvent {
|
||
|
|
RequestEvent {
|
||
|
|
id: Uuid::new_v4(),
|
||
|
|
org_id: self.org_id,
|
||
|
|
timestamp: Utc::now(),
|
||
|
|
model_requested: self.model_requested,
|
||
|
|
model_used: self.model_used,
|
||
|
|
feature_tag: self.feature_tag,
|
||
|
|
input_tokens: self.input_tokens,
|
||
|
|
output_tokens: self.output_tokens,
|
||
|
|
cost_actual: self.cost_actual,
|
||
|
|
cost_original: self.cost_original,
|
||
|
|
cost_saved: self.cost_original - self.cost_actual,
|
||
|
|
latency_ms: self.latency_ms,
|
||
|
|
status_code: self.status_code,
|
||
|
|
..Default::default()
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
pub struct RoutingRuleFactory {
|
||
|
|
org_id: Uuid,
|
||
|
|
priority: i32,
|
||
|
|
strategy: RoutingStrategy,
|
||
|
|
match_feature: Option<String>,
|
||
|
|
model_chain: Vec<String>,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl RoutingRuleFactory {
|
||
|
|
pub fn cheapest(org_id: Uuid) -> Self {
|
||
|
|
Self {
|
||
|
|
org_id,
|
||
|
|
priority: 0,
|
||
|
|
strategy: RoutingStrategy::Cheapest,
|
||
|
|
match_feature: None,
|
||
|
|
model_chain: vec!["gpt-4o-mini".to_string(), "claude-3-haiku".to_string()],
|
||
|
|
}
|
||
|
|
}
|
||
|
|
pub fn for_feature(mut self, feature: &str) -> Self {
|
||
|
|
self.match_feature = Some(feature.to_string()); self
|
||
|
|
}
|
||
|
|
pub fn with_priority(mut self, p: i32) -> Self { self.priority = p; self }
|
||
|
|
pub fn build(self) -> RoutingRule { /* ... */ }
|
||
|
|
}
|
||
|
|
|
||
|
|
// Convenience helpers
|
||
|
|
pub fn make_event(org_id: Uuid) -> RequestEvent {
|
||
|
|
RequestEventFactory::new(org_id).build()
|
||
|
|
}
|
||
|
|
|
||
|
|
pub fn make_events(org_id: Uuid, count: usize) -> Vec<RequestEvent> {
|
||
|
|
(0..count).map(|_| make_event(org_id)).collect()
|
||
|
|
}
|
||
|
|
|
||
|
|
pub fn make_events_spread_over_days(org_id: Uuid, count: usize, days: u32) -> Vec<RequestEvent> {
|
||
|
|
(0..count).map(|i| {
|
||
|
|
let mut event = make_event(org_id);
|
||
|
|
event.timestamp = Utc::now() - Duration::days((i % days as usize) as i64);
|
||
|
|
event
|
||
|
|
}).collect()
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**TypeScript factories for UI and CLI tests:**
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// ui/src/testing/factories.ts
|
||
|
|
|
||
|
|
export const makeOrg = (overrides: Partial<Organization> = {}): Organization => ({
|
||
|
|
id: crypto.randomUUID(),
|
||
|
|
name: 'Test Org',
|
||
|
|
slug: 'test-org',
|
||
|
|
plan: 'free',
|
||
|
|
createdAt: new Date().toISOString(),
|
||
|
|
...overrides,
|
||
|
|
});
|
||
|
|
|
||
|
|
export const makeDashboardSummary = (overrides: Partial<DashboardSummary> = {}): DashboardSummary => ({
|
||
|
|
period: '7d',
|
||
|
|
totalRequests: 42850,
|
||
|
|
totalCost: 127.43,
|
||
|
|
totalCostWithoutRouting: 891.20,
|
||
|
|
totalSaved: 763.77,
|
||
|
|
savingsPercentage: 85.7,
|
||
|
|
avgLatencyMs: 4.2,
|
||
|
|
...overrides,
|
||
|
|
});
|
||
|
|
|
||
|
|
export const makeRequestEvent = (overrides: Partial<RequestEvent> = {}): RequestEvent => ({
|
||
|
|
id: `req_${Math.random().toString(36).slice(2, 10)}`,
|
||
|
|
timestamp: new Date().toISOString(),
|
||
|
|
modelRequested: 'gpt-4o',
|
||
|
|
modelUsed: 'gpt-4o-mini',
|
||
|
|
provider: 'openai',
|
||
|
|
featureTag: 'classify',
|
||
|
|
inputTokens: 142,
|
||
|
|
outputTokens: 8,
|
||
|
|
cost: 0.000026,
|
||
|
|
costWithoutRouting: 0.000435,
|
||
|
|
saved: 0.000409,
|
||
|
|
latencyMs: 245,
|
||
|
|
complexity: 'LOW',
|
||
|
|
status: 200,
|
||
|
|
...overrides,
|
||
|
|
});
|
||
|
|
|
||
|
|
export const makeTreemapData = (): TreemapNode[] => [
|
||
|
|
{ name: 'classify', value: 450.20, children: [
|
||
|
|
{ name: 'gpt-4o-mini', value: 320.10 },
|
||
|
|
{ name: 'claude-3-haiku', value: 130.10 },
|
||
|
|
]},
|
||
|
|
{ name: 'summarize', value: 280.50, children: [
|
||
|
|
{ name: 'gpt-4o', value: 280.50 },
|
||
|
|
]},
|
||
|
|
];
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 9.2 Provider Response Mocks (OpenAI & Anthropic)
|
||
|
|
|
||
|
|
Recorded fixtures live in `tests/fixtures/`. They are captured once from real APIs and committed to the repo.
|
||
|
|
|
||
|
|
```
|
||
|
|
tests/fixtures/
|
||
|
|
├── openai/
|
||
|
|
│ ├── chat_completions_non_streaming.json
|
||
|
|
│ ├── chat_completions_streaming.txt # raw SSE stream
|
||
|
|
│ ├── chat_completions_streaming_with_usage.txt
|
||
|
|
│ ├── chat_completions_tool_call.json
|
||
|
|
│ ├── embeddings_response.json
|
||
|
|
│ ├── error_rate_limit_429.json
|
||
|
|
│ ├── error_invalid_api_key_401.json
|
||
|
|
│ └── error_server_error_500.json
|
||
|
|
├── anthropic/
|
||
|
|
│ ├── messages_response.json
|
||
|
|
│ ├── messages_streaming.txt
|
||
|
|
│ ├── error_overloaded_529.json
|
||
|
|
│ └── error_rate_limit_429.json
|
||
|
|
└── dd0c/
|
||
|
|
├── routing_decision_cheapest.json # expected routing decision output
|
||
|
|
├── routing_decision_cascading.json
|
||
|
|
└── request_event_full.json # full RequestEvent with all fields
|
||
|
|
```
|
||
|
|
|
||
|
|
**OpenAI non-streaming fixture:**
|
||
|
|
```json
|
||
|
|
// tests/fixtures/openai/chat_completions_non_streaming.json
|
||
|
|
{
|
||
|
|
"id": "chatcmpl-test123",
|
||
|
|
"object": "chat.completion",
|
||
|
|
"created": 1709251200,
|
||
|
|
"model": "gpt-4o-mini-2024-07-18",
|
||
|
|
"choices": [{
|
||
|
|
"index": 0,
|
||
|
|
"message": { "role": "assistant", "content": "This is a billing inquiry." },
|
||
|
|
"finish_reason": "stop"
|
||
|
|
}],
|
||
|
|
"usage": {
|
||
|
|
"prompt_tokens": 42,
|
||
|
|
"completion_tokens": 8,
|
||
|
|
"total_tokens": 50
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**OpenAI streaming fixture:**
|
||
|
|
```
|
||
|
|
// tests/fixtures/openai/chat_completions_streaming.txt
|
||
|
|
data: {"id":"chatcmpl-test123","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
|
||
|
|
|
||
|
|
data: {"id":"chatcmpl-test123","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"delta":{"content":"This"},"finish_reason":null}]}
|
||
|
|
|
||
|
|
data: {"id":"chatcmpl-test123","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}
|
||
|
|
|
||
|
|
data: {"id":"chatcmpl-test123","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":42,"completion_tokens":3,"total_tokens":45}}
|
||
|
|
|
||
|
|
data: [DONE]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Fixture loader utility:**
|
||
|
|
```rust
|
||
|
|
// tests/common/fixtures.rs
|
||
|
|
|
||
|
|
pub fn load_fixture(path: &str) -> serde_json::Value {
|
||
|
|
let fixture_path = Path::new(env!("CARGO_MANIFEST_DIR"))
|
||
|
|
.join("tests/fixtures")
|
||
|
|
.join(path);
|
||
|
|
let content = fs::read_to_string(&fixture_path)
|
||
|
|
.unwrap_or_else(|_| panic!("fixture not found: {}", fixture_path.display()));
|
||
|
|
serde_json::from_str(&content)
|
||
|
|
.unwrap_or_else(|_| panic!("fixture is not valid JSON: {}", path))
|
||
|
|
}
|
||
|
|
|
||
|
|
pub fn load_sse_fixture(path: &str) -> Vec<u8> {
|
||
|
|
let fixture_path = Path::new(env!("CARGO_MANIFEST_DIR"))
|
||
|
|
.join("tests/fixtures")
|
||
|
|
.join(path);
|
||
|
|
fs::read(&fixture_path)
|
||
|
|
.unwrap_or_else(|_| panic!("SSE fixture not found: {}", fixture_path.display()))
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 9.3 Cost Table Fixtures
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// crates/shared/src/testing/cost_tables.rs
|
||
|
|
|
||
|
|
#[cfg(any(test, feature = "test-utils"))]
|
||
|
|
pub fn cost_tables() -> CostTables {
|
||
|
|
CostTables::from_vec(vec![
|
||
|
|
ModelCost {
|
||
|
|
provider: Provider::OpenAI,
|
||
|
|
model_id: "gpt-4o-2024-11-20".to_string(),
|
||
|
|
model_alias: "gpt-4o".to_string(),
|
||
|
|
input_cost_per_m: 2.50,
|
||
|
|
output_cost_per_m: 10.00,
|
||
|
|
quality_tier: QualityTier::Frontier,
|
||
|
|
max_context: 128_000,
|
||
|
|
supports_streaming: true,
|
||
|
|
supports_tools: true,
|
||
|
|
supports_vision: true,
|
||
|
|
},
|
||
|
|
ModelCost {
|
||
|
|
provider: Provider::OpenAI,
|
||
|
|
model_id: "gpt-4o-mini-2024-07-18".to_string(),
|
||
|
|
model_alias: "gpt-4o-mini".to_string(),
|
||
|
|
input_cost_per_m: 0.15,
|
||
|
|
output_cost_per_m: 0.60,
|
||
|
|
quality_tier: QualityTier::Economy,
|
||
|
|
max_context: 128_000,
|
||
|
|
supports_streaming: true,
|
||
|
|
supports_tools: true,
|
||
|
|
supports_vision: true,
|
||
|
|
},
|
||
|
|
ModelCost {
|
||
|
|
provider: Provider::Anthropic,
|
||
|
|
model_id: "claude-3-haiku-20240307".to_string(),
|
||
|
|
model_alias: "claude-3-haiku".to_string(),
|
||
|
|
input_cost_per_m: 0.25,
|
||
|
|
output_cost_per_m: 1.25,
|
||
|
|
quality_tier: QualityTier::Economy,
|
||
|
|
max_context: 200_000,
|
||
|
|
supports_streaming: true,
|
||
|
|
supports_tools: true,
|
||
|
|
supports_vision: false,
|
||
|
|
},
|
||
|
|
ModelCost {
|
||
|
|
provider: Provider::Anthropic,
|
||
|
|
model_id: "claude-3-5-sonnet-20241022".to_string(),
|
||
|
|
model_alias: "claude-3-5-sonnet".to_string(),
|
||
|
|
input_cost_per_m: 3.00,
|
||
|
|
output_cost_per_m: 15.00,
|
||
|
|
quality_tier: QualityTier::Frontier,
|
||
|
|
max_context: 200_000,
|
||
|
|
supports_streaming: true,
|
||
|
|
supports_tools: true,
|
||
|
|
supports_vision: true,
|
||
|
|
},
|
||
|
|
])
|
||
|
|
}
|
||
|
|
|
||
|
|
/// Returns all valid (requested, used) pairs where routing makes sense
|
||
|
|
#[cfg(any(test, feature = "test-utils"))]
|
||
|
|
pub fn routable_model_pairs() -> Vec<(&'static str, &'static str)> {
|
||
|
|
vec![
|
||
|
|
("gpt-4o", "gpt-4o-mini"),
|
||
|
|
("gpt-4o", "claude-3-haiku"),
|
||
|
|
("claude-3-5-sonnet", "gpt-4o-mini"),
|
||
|
|
("claude-3-5-sonnet", "claude-3-haiku"),
|
||
|
|
// Same model (zero savings)
|
||
|
|
("gpt-4o-mini", "gpt-4o-mini"),
|
||
|
|
("gpt-4o", "gpt-4o"),
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Section 10: TDD Implementation Order
|
||
|
|
|
||
|
|
### 10.1 Bootstrap Sequence
|
||
|
|
|
||
|
|
Before writing any product tests, the test infrastructure itself must be bootstrapped. This is the meta-TDD step.
|
||
|
|
|
||
|
|
```
|
||
|
|
Week 0 (before Epic 1 code):
|
||
|
|
|
||
|
|
Day 1: Test infrastructure setup
|
||
|
|
├─ Add dev-dependencies: mockall, proptest, testcontainers, wiremock, criterion
|
||
|
|
├─ Create crates/shared/src/testing/ module (factories, cost_tables, helpers)
|
||
|
|
├─ Create tests/common/ (fixture loader, test app builder, DB setup helpers)
|
||
|
|
├─ Write and pass: "test infrastructure compiles and factories produce valid structs"
|
||
|
|
└─ Set up cargo-tarpaulin and confirm coverage reporting works
|
||
|
|
|
||
|
|
Day 2: CI pipeline skeleton
|
||
|
|
├─ Create .github/workflows/ci.yml with unit test job (no tests yet — just passes)
|
||
|
|
├─ Add benchmark job with baseline capture
|
||
|
|
├─ Add migration lint script and test it against a sample migration
|
||
|
|
└─ Confirm: `git push` → CI green (trivially)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 10.2 Epic-by-Epic TDD Order
|
||
|
|
|
||
|
|
Tests must be written in dependency order — you can't test the Router Brain without the cost table fixtures, and you can't test the Analytics Pipeline without the proxy event schema.
|
||
|
|
|
||
|
|
```
|
||
|
|
Phase 1: Foundation (Epic 1 — Proxy Engine)
|
||
|
|
─────────────────────────────────────────────
|
||
|
|
WRITE FIRST (before any proxy code):
|
||
|
|
1. test: parse_request_extracts_model_and_stream_flag
|
||
|
|
2. test: auth_middleware_returns_401_for_unknown_key
|
||
|
|
3. test: auth_middleware_caches_valid_key_after_db_lookup
|
||
|
|
4. test: response_headers_contain_routing_metadata
|
||
|
|
5. test: telemetry_emitter_drops_event_when_channel_is_full
|
||
|
|
|
||
|
|
THEN implement proxy core to make them pass.
|
||
|
|
|
||
|
|
THEN add property tests:
|
||
|
|
6. proptest: api_key_hash_is_deterministic
|
||
|
|
7. proptest: response_headers_never_contain_prompt_content
|
||
|
|
|
||
|
|
THEN add integration tests (requires Docker):
|
||
|
|
8. integration: proxy_forwards_request_to_mock_openai_and_returns_200
|
||
|
|
9. integration: proxy_returns_401_for_revoked_key_after_cache_invalidation
|
||
|
|
10. contract: proxy_response_matches_openai_response_schema
|
||
|
|
11. contract: proxy_preserves_sse_chunk_ordering_for_streaming_requests
|
||
|
|
12. contract: proxy_translates_anthropic_response_to_openai_format
|
||
|
|
|
||
|
|
Phase 2: Intelligence (Epic 2 — Router Brain)
|
||
|
|
──────────────────────────────────────────────
|
||
|
|
WRITE FIRST:
|
||
|
|
13. test: rule_engine_returns_first_matching_rule_by_priority
|
||
|
|
14. test: rule_engine_falls_through_to_passthrough_when_no_rules_match
|
||
|
|
15. test: cheapest_strategy_selects_lowest_cost_model_from_chain
|
||
|
|
16. test: classifier_returns_low_for_short_extraction_system_prompt
|
||
|
|
17. test: classifier_returns_high_for_code_generation_prompt
|
||
|
|
18. test: cost_saved_is_zero_when_models_are_identical
|
||
|
|
19. test: cost_saved_is_positive_when_routed_to_cheaper_model
|
||
|
|
20. test: circuit_breaker_transitions_to_open_after_error_threshold
|
||
|
|
21. test: circuit_breaker_transitions_to_half_open_after_cooldown
|
||
|
|
|
||
|
|
THEN implement Router Brain.
|
||
|
|
|
||
|
|
THEN add property tests:
|
||
|
|
22. proptest: cheapest_strategy_never_selects_more_expensive_model
|
||
|
|
23. proptest: complexity_classifier_never_panics_on_arbitrary_input
|
||
|
|
24. proptest: cost_saved_is_never_negative
|
||
|
|
|
||
|
|
THEN add integration tests:
|
||
|
|
25. integration: circuit_breaker_state_is_shared_across_two_proxy_instances
|
||
|
|
26. integration: routing_rule_loaded_from_db_takes_effect_on_next_request
|
||
|
|
|
||
|
|
Phase 3: Data (Epic 3 — Analytics Pipeline)
|
||
|
|
─────────────────────────────────────────────
|
||
|
|
WRITE FIRST:
|
||
|
|
27. test: batch_collector_flushes_after_100_events_before_timeout
|
||
|
|
28. test: batch_collector_flushes_partial_batch_after_interval
|
||
|
|
29. test: proxy_continues_routing_when_telemetry_db_is_unavailable
|
||
|
|
|
||
|
|
THEN implement analytics worker.
|
||
|
|
|
||
|
|
THEN add integration tests:
|
||
|
|
30. integration: batch_worker_inserts_events_into_timescaledb_hypertable
|
||
|
|
31. integration: continuous_aggregate_reflects_inserted_events_after_refresh
|
||
|
|
|
||
|
|
Phase 4: Control Plane (Epic 4 — Dashboard API)
|
||
|
|
─────────────────────────────────────────────────
|
||
|
|
WRITE FIRST:
|
||
|
|
32. test: create_api_key_returns_full_key_only_once
|
||
|
|
33. test: member_role_cannot_create_routing_rules
|
||
|
|
34. test: request_inspector_never_returns_prompt_content
|
||
|
|
35. test: provider_credential_is_stored_encrypted
|
||
|
|
36. test: revoked_api_key_returns_401_on_next_proxy_request
|
||
|
|
|
||
|
|
THEN implement Dashboard API.
|
||
|
|
|
||
|
|
THEN add integration tests:
|
||
|
|
37. integration: create_org_and_api_key_persists_to_postgres
|
||
|
|
38. integration: routing_rules_are_returned_in_priority_order
|
||
|
|
39. integration: dashboard_summary_query_returns_correct_aggregates
|
||
|
|
|
||
|
|
Phase 5: Transparent Factory (Epic 10 — Cross-cutting)
|
||
|
|
────────────────────────────────────────────────────────
|
||
|
|
These tests are written ALONGSIDE the features they govern, not after.
|
||
|
|
|
||
|
|
40. test: routing_strategy_uses_passthrough_when_flag_is_off (with Epic 2)
|
||
|
|
41. test: flag_auto_disables_when_p99_latency_increases_by_5_percent (with Epic 2)
|
||
|
|
42. test: lint_rejects_drop_table (before any migration)
|
||
|
|
43. test: routing_decision_emits_ai_routing_decision_span (with Epic 2)
|
||
|
|
44. test: routing_span_never_contains_raw_prompt_content (with Epic 2)
|
||
|
|
45. test: strict_mode_blocks_automatic_routing_rule_update (with Epic 2)
|
||
|
|
46. test: panic_mode_freezes_all_routing_to_hardcoded_provider (with Epic 2)
|
||
|
|
|
||
|
|
Phase 6: UI & CLI (Epics 5 & 6)
|
||
|
|
─────────────────────────────────
|
||
|
|
47. vitest: CostTreemap renders spend breakdown by feature tag
|
||
|
|
48. vitest: RoutingRulesEditor allows drag-to-reorder priority
|
||
|
|
49. vitest: RequestInspector filters by feature tag
|
||
|
|
50. vitest: dd0c-scan detects gpt-4o usage in TypeScript files
|
||
|
|
51. vitest: SavingsReport calculates positive savings
|
||
|
|
|
||
|
|
Phase 7: E2E (after staging environment is live)
|
||
|
|
──────────────────────────────────────────────────
|
||
|
|
52. playwright: first_route_onboarding_journey_completes_in_under_2_minutes
|
||
|
|
53. playwright: routing_rule_created_in_ui_takes_effect_on_next_request
|
||
|
|
54. k6: proxy_overhead_p99_is_under_5ms_at_50_concurrent_users
|
||
|
|
55. k6: proxy_continues_routing_when_timescaledb_is_killed
|
||
|
|
```
|
||
|
|
|
||
|
|
### 10.3 Test Count Milestones
|
||
|
|
|
||
|
|
| Milestone | Tests Written | Coverage Target | Gate |
|
||
|
|
|-----------|--------------|-----------------|------|
|
||
|
|
| End of Epic 1 | ~50 | 60% proxy crate | CI green |
|
||
|
|
| End of Epic 2 | ~120 | 80% shared/router | CI green |
|
||
|
|
| End of Epic 3 | ~150 | 70% worker | CI green |
|
||
|
|
| End of Epic 4 | ~220 | 75% api crate | CI green |
|
||
|
|
| End of Epic 10 | ~280 | 80% overall | CI green |
|
||
|
|
| End of Epic 5+6 | ~320 | 75% overall | CI green |
|
||
|
|
| V1 Launch | ~400 | 75% overall | Deploy gate |
|
||
|
|
|
||
|
|
### 10.4 The "Test It First" Checklist
|
||
|
|
|
||
|
|
Before writing any new function, ask:
|
||
|
|
|
||
|
|
```
|
||
|
|
□ Does this function have a clear, testable contract?
|
||
|
|
(If not, the function is probably doing too much — split it)
|
||
|
|
|
||
|
|
□ Can I write the test without knowing the implementation?
|
||
|
|
(If not, the abstraction is wrong — redesign the interface)
|
||
|
|
|
||
|
|
□ Does this function touch the hot path?
|
||
|
|
→ Add a criterion benchmark
|
||
|
|
|
||
|
|
□ Does this function handle money (cost calculations)?
|
||
|
|
→ Add proptest property tests
|
||
|
|
|
||
|
|
□ Does this function touch auth or security?
|
||
|
|
→ Add tests for the invalid/revoked/malformed cases explicitly
|
||
|
|
|
||
|
|
□ Does this function emit telemetry or spans?
|
||
|
|
→ Add an OTEL span assertion test
|
||
|
|
|
||
|
|
□ Does this function change routing behavior?
|
||
|
|
→ Add a feature flag test (off/on/auto-disabled)
|
||
|
|
|
||
|
|
□ Does this function modify the database schema?
|
||
|
|
→ Add a migration lint test and a dual-write test
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Appendix: Test Toolchain Summary
|
||
|
|
|
||
|
|
| Tool | Language | Purpose | Config |
|
||
|
|
|------|----------|---------|--------|
|
||
|
|
| `cargo test` | Rust | Unit + integration test runner | `Cargo.toml` |
|
||
|
|
| `mockall` | Rust | Mock generation for traits | `#[automock]` attribute |
|
||
|
|
| `proptest` | Rust | Property-based testing | `proptest!` macro |
|
||
|
|
| `criterion` | Rust | Micro-benchmarks | `[[bench]]` in Cargo.toml |
|
||
|
|
| `testcontainers` | Rust | Real DB/Redis in tests | Docker required |
|
||
|
|
| `wiremock` | Rust | HTTP mock server | `WireMock::start().await` |
|
||
|
|
| `cargo-tarpaulin` | Rust | Code coverage | `cargo tarpaulin` |
|
||
|
|
| `cargo-audit` | Rust | Dependency vulnerability scan | `cargo audit` |
|
||
|
|
| `vitest` | TypeScript | Unit tests for UI + CLI | `vitest.config.ts` |
|
||
|
|
| `@testing-library/react` | TypeScript | React component tests | With vitest |
|
||
|
|
| `Playwright` | TypeScript | E2E browser tests | `playwright.config.ts` |
|
||
|
|
| `k6` | JavaScript | Load + chaos tests | `k6 run` |
|
||
|
|
| `migration-lint` | Python/Bash | DDL safety checks | Pre-commit + CI |
|
||
|
|
| `decision-log-check` | Python | Cognitive durability enforcement | CI only |
|
||
|
|
| `benchmark-action` | GitHub Actions | Benchmark regression detection | `.github/workflows/` |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*Test Architecture document generated for dd0c/route V1 MVP.*
|
||
|
|
*Total estimated test count at V1 launch: ~400 tests.*
|
||
|
|
*Target CI runtime: <8 minutes (unit + integration), <15 minutes (full pipeline with E2E).*
|