Files
dd0c/products/01-llm-cost-router/acceptance-specs/acceptance-specs.md
Max Mayfield c1484426cc Phase 3: BDD acceptance specs for P1 (route) and P5 (cost)
P1: 50+ scenarios across 10 epics, all stories covered
P5: 55+ scenarios across 10 epics, written manually (Sonnet credential failures)
Remaining P2/P3/P4/P6 in progress via subagents
2026-03-01 01:50:30 +00:00

21 KiB

dd0c/route — BDD Acceptance Test Specifications

Phase 3: Given/When/Then per Story Date: March 1, 2026


Epic 1: Proxy Engine

Story 1.1: OpenAI SDK Drop-In Compatibility

Feature: OpenAI SDK Drop-In Compatibility

  Scenario: Non-streaming request proxied successfully
    Given a valid API key "sk-test-123" exists in Redis cache
    And the upstream OpenAI endpoint is healthy
    When I send POST /v1/chat/completions with model "gpt-4o" and stream=false
    Then the response status is 200
    And the response body matches the OpenAI ChatCompletion schema
    And the response includes usage.prompt_tokens and usage.completion_tokens

  Scenario: Request with invalid API key is rejected
    Given no API key "sk-invalid" exists in Redis or PostgreSQL
    When I send POST /v1/chat/completions with Authorization "Bearer sk-invalid"
    Then the response status is 401
    And the response body contains error "Invalid API key"

  Scenario: API key validated from Redis cache (fast path)
    Given API key "sk-cached" exists in Redis cache
    When I send POST /v1/chat/completions with Authorization "Bearer sk-cached"
    Then the key is validated without querying PostgreSQL
    And the request is forwarded to the upstream provider

  Scenario: API key falls back to PostgreSQL when Redis is unavailable
    Given API key "sk-db-only" exists in PostgreSQL but NOT in Redis
    When I send POST /v1/chat/completions with Authorization "Bearer sk-db-only"
    Then the key is validated via PostgreSQL
    And the key is written back to Redis cache for future requests

Story 1.2: SSE Streaming Support

Feature: SSE Streaming Passthrough

  Scenario: Streaming response is proxied chunk-by-chunk
    Given a valid API key and upstream OpenAI is healthy
    When I send POST /v1/chat/completions with stream=true
    Then the response Content-Type is "text/event-stream"
    And I receive multiple SSE chunks ending with "data: [DONE]"
    And each chunk matches the OpenAI streaming delta schema

  Scenario: Client disconnects mid-stream
    Given a streaming request is in progress
    When the client drops the TCP connection after 5 chunks
    Then the proxy aborts the upstream provider connection
    And no further chunks are buffered in memory

Story 1.3: <5ms Latency Overhead

Feature: Proxy Latency Budget

  Scenario: P99 latency overhead under 5ms
    Given 1000 non-streaming requests are sent sequentially
    When I measure the delta between proxy response time and upstream response time
    Then the P99 delta is less than 5 milliseconds

  Scenario: Telemetry emission does not block the hot path
    Given TimescaleDB is experiencing 5-second write latency
    When I send a request through the proxy
    Then the proxy responds within the 5ms overhead budget
    And the telemetry event is queued in the mpsc channel (not dropped)

Story 1.4: Transparent Error Passthrough

Feature: Provider Error Passthrough

  Scenario: Rate limit error passed through transparently
    Given the upstream provider returns HTTP 429 with Retry-After header
    When the proxy receives this response
    Then the proxy returns HTTP 429 to the client
    And the Retry-After header is preserved

  Scenario: Provider 500 error passed through
    Given the upstream provider returns HTTP 500
    When the proxy receives this response
    Then the proxy returns HTTP 500 to the client
    And the original error body is preserved

Epic 2: Router Brain

Story 2.1: Complexity Classification

Feature: Request Complexity Classification

  Scenario: Simple extraction task classified as low complexity
    Given a request with system prompt "Extract the name from this text"
    And token count is under 500
    When the complexity classifier evaluates the request
    Then the complexity score is "low"
    And classification completes in under 2ms

  Scenario: Multi-turn reasoning classified as high complexity
    Given a request with 10+ messages in the conversation
    And system prompt contains "analyze", "reason", or "compare"
    When the complexity classifier evaluates the request
    Then the complexity score is "high"

  Scenario: Unknown pattern defaults to medium complexity
    Given a request with no recognizable task pattern
    When the complexity classifier evaluates the request
    Then the complexity score is "medium"

Story 2.2: Routing Rules

Feature: Configurable Routing Rules

  Scenario: First-match routing rule applied
    Given routing rule "if feature=classify -> cheapest from [gpt-4o-mini, claude-haiku]"
    And the request includes header X-DD0C-Feature: classify
    When the router evaluates the request
    Then the request is routed to the cheapest model in the rule set
    And the routing decision is logged with strategy "cheapest"

  Scenario: No matching rule falls through to default
    Given routing rules exist but none match the request headers
    When the router evaluates the request
    Then the request is routed using the "passthrough" strategy
    And the original model in the request body is used

Story 2.3: Automatic Fallback

Feature: Provider Fallback Chain

  Scenario: Primary provider fails, fallback succeeds
    Given routing rule specifies fallback chain [openai, anthropic]
    And OpenAI returns HTTP 503
    When the proxy processes the request
    Then the request is retried against Anthropic
    And the response is returned successfully
    And the routing decision log shows "fallback triggered"

  Scenario: Circuit breaker opens after sustained failures
    Given OpenAI error rate exceeds 10% over the last 60 seconds
    When a new request arrives targeting OpenAI
    Then the circuit breaker is OPEN for OpenAI
    And the request is immediately routed to the fallback provider
    And no request is sent to OpenAI

Story 2.4: Real-Time Cost Savings

Feature: Cost Savings Calculation

  Scenario: Savings calculated when model is downgraded
    Given the original request specified model "gpt-4o" ($15/1M input tokens)
    And the router downgraded to "gpt-4o-mini" ($0.15/1M input tokens)
    And the request used 1000 input tokens
    When the cost calculator runs
    Then cost_original is $0.015
    And cost_actual is $0.00015
    And cost_saved is $0.01485

  Scenario: Zero savings when passthrough (no routing)
    Given the request was routed with strategy "passthrough"
    When the cost calculator runs
    Then cost_saved is $0.00

Epic 3: Analytics Pipeline

Story 3.1: Non-Blocking Telemetry

Feature: Asynchronous Telemetry Emission

  Scenario: Telemetry emitted without blocking request
    Given the proxy processes a request successfully
    When the response is sent to the client
    Then a RequestEvent is emitted to the mpsc channel
    And the channel send completes in under 1ms

  Scenario: Bounded channel drops events when full
    Given the mpsc channel is at capacity (1000 events)
    When a new telemetry event is emitted
    Then the event is dropped (not blocking the proxy)
    And a "telemetry_dropped" counter is incremented

Story 3.2: Fast Dashboard Queries

Feature: Continuous Aggregates for Dashboard

  Scenario: Hourly cost summary is pre-calculated
    Given 10,000 request events exist in the last hour
    When I query GET /api/dashboard/summary?period=1h
    Then the response returns in under 200ms
    And total_cost, total_saved, and avg_latency are present

  Scenario: Treemap data returns cost breakdown by team and model
    Given request events tagged with team="payments" and team="search"
    When I query GET /api/dashboard/treemap?period=7d
    Then the response includes cost breakdowns per team per model

Story 3.3: Automatic Data Compression

Feature: TimescaleDB Compression

  Scenario: Chunks older than 7 days are compressed
    Given request_events data exists from 10 days ago
    When the compression policy runs
    Then chunks older than 7 days are compressed by 90%+
    And compressed data is still queryable via continuous aggregates

Epic 4: Dashboard API

Story 4.1: GitHub OAuth Signup

Feature: GitHub OAuth Authentication

  Scenario: New user signs up via GitHub OAuth
    Given the user has a valid GitHub account
    When they complete the /api/auth/github OAuth flow
    Then an organization is created automatically
    And a JWT access token is returned
    And a refresh token is stored in Redis

  Scenario: Existing user logs in via GitHub OAuth
    Given the user already has an organization
    When they complete the /api/auth/github OAuth flow
    Then a new JWT is issued for the existing organization
    And no duplicate organization is created

Story 4.2: Routing Rules & Provider Keys CRUD

Feature: Routing Rules Management

  Scenario: Create a new routing rule
    Given I am authenticated as an Owner
    When I POST /api/orgs/{id}/routing-rules with a valid rule body
    Then the rule is created and returned with an ID
    And the rule is loaded into the proxy within 60 seconds

  Scenario: Provider API key is encrypted at rest
    Given I POST /api/orgs/{id}/provider-keys with key "sk-live-abc123"
    When the key is stored in PostgreSQL
    Then the stored value is AES-256-GCM encrypted
    And decrypting with the correct KMS key returns "sk-live-abc123"

Story 4.3: Dashboard Data Endpoints

Feature: Dashboard Summary & Treemap

  Scenario: Summary endpoint returns aggregated metrics
    Given request events exist for the authenticated org
    When I GET /api/dashboard/summary?period=30d
    Then the response includes total_cost, total_saved, request_count, avg_latency

  Scenario: Request inspector with filters
    Given 500 request events exist for the org
    When I GET /api/requests?feature=classify&limit=20
    Then 20 results are returned matching feature="classify"
    And no prompt content is included in the response

Story 4.4: API Key Revocation

Feature: API Key Revocation

  Scenario: Revoked key is immediately blocked
    Given API key "sk-compromised" is active
    When I DELETE /api/orgs/{id}/api-keys/{key_id}
    Then the key is removed from Redis cache immediately
    And subsequent requests with "sk-compromised" return 401

Epic 5: Dashboard UI

Story 5.1: Cost Treemap

Feature: AI Spend Treemap Visualization

  Scenario: Treemap renders cost breakdown
    Given the user is logged into the dashboard
    And cost data exists for teams "payments" and "search"
    When the dashboard loads with period=7d
    Then a treemap visualization renders
    And the "payments" block is proportionally larger if it spent more

Story 5.2: Real-Time Savings Counter

Feature: Savings Counter

  Scenario: Weekly savings counter updates
    Given the user is on the dashboard
    And $127.50 has been saved this week
    When the dashboard loads
    Then the counter displays "You saved $127.50 this week"

Story 5.3: Routing Rules Editor

Feature: Routing Rules Editor UI

  Scenario: Create rule via drag-and-drop interface
    Given the user navigates to Settings > Routing Rules
    When they create a new rule with feature="classify" and strategy="cheapest"
    And drag it to priority position 1
    Then the rule is saved via the API
    And the rule list reflects the new priority order

Story 5.4: Request Inspector

Feature: Request Inspector

  Scenario: Inspect routing decision for a specific request
    Given the user opens the Request Inspector
    When they click on a specific request row
    Then the detail panel shows: model_selected, model_alternatives, cost_delta, routing_strategy
    And no prompt content is displayed

Epic 6: Shadow Audit CLI

Story 6.1: Zero-Setup Codebase Scan

Feature: Shadow Audit CLI Scan

  Scenario: Scan TypeScript project for LLM usage
    Given a TypeScript project with 3 files calling openai.chat.completions.create()
    When I run npx dd0c-scan in the project root
    Then the CLI detects 3 LLM API call sites
    And estimates monthly cost based on heuristic token counts
    And displays projected savings with dd0c/route

  Scenario: Scan runs completely offline with cached pricing
    Given the pricing table was cached from a previous run
    And the network is unavailable
    When I run npx dd0c-scan
    Then the scan completes using cached pricing data
    And a warning is shown that pricing may be stale

Story 6.2: No Source Code Exfiltration

Feature: Local-Only Scan

  Scenario: No source code sent to server
    Given I run npx dd0c-scan without --opt-in flag
    When the scan completes
    Then zero HTTP requests are made to any dd0c server
    And the report is rendered entirely in the terminal

Story 6.3: Terminal Report

Feature: Terminal Report Output

  Scenario: Top Opportunities report
    Given the scan found 5 files with LLM calls
    When the report renders
    Then it shows "Top Opportunities" sorted by potential savings
    And each entry includes: file path, current model, suggested model, estimated monthly savings

Epic 7: Slack Integration

Story 7.1: Weekly Savings Digest

Feature: Monday Morning Digest

  Scenario: Weekly digest sent via email
    Given it is Monday at 9:00 AM UTC
    And the org has cost data from the previous week
    When the dd0c-worker cron fires
    Then an email is sent via AWS SES to the org admin
    And the email includes: total_saved, top_model_savings, week_over_week_trend

  Scenario: No digest sent if no activity
    Given the org had zero requests last week
    When the cron fires
    Then no email is sent

Story 7.2: Budget Alert via Slack

Feature: Budget Threshold Alerts

  Scenario: Daily spend exceeds configured threshold
    Given the org configured alert threshold at $100/day
    And today's spend reaches $101
    When the dd0c-worker evaluates thresholds
    Then a Slack webhook is fired with the alert payload
    And the payload includes X-DD0C-Signature header
    And last_fired_at is updated to prevent duplicate alerts

  Scenario: Alert not re-fired for same incident
    Given an alert was already fired for today's threshold breach
    When the worker evaluates thresholds again
    Then no duplicate Slack webhook is sent

Epic 8: Infrastructure & DevOps

Story 8.1: ECS Fargate Deployment

Feature: Containerized Deployment

  Scenario: CDK deploys all services to ECS Fargate
    Given the CDK stack is synthesized
    When cdk deploy is executed
    Then ECS services are created for proxy, api, and worker
    And ALB routes /v1/* to proxy and /api/* to api
    And CloudFront serves static dashboard assets from S3

Story 8.2: CI/CD Pipeline

Feature: GitHub Actions CI/CD

  Scenario: Push to main triggers full pipeline
    Given code is pushed to the main branch
    When GitHub Actions triggers
    Then tests run (unit + integration + canary suite)
    And Docker images are built and pushed to ECR
    And ECS services are updated with rolling deployment
    And zero downtime is maintained during deployment

Story 8.3: CloudWatch Alarms

Feature: Monitoring & Alerting

  Scenario: P99 latency alarm fires
    Given proxy P99 latency exceeds 50ms for 5 minutes
    When CloudWatch evaluates the alarm
    Then a PagerDuty incident is created

Epic 9: Onboarding & PLG

Story 9.1: One-Click GitHub Signup

Feature: Frictionless Signup

  Scenario: New user completes signup in under 60 seconds
    Given the user clicks "Sign up with GitHub"
    When they authorize the OAuth app
    Then an org is created, an API key is generated
    And the user lands on the onboarding wizard
    And total elapsed time is under 60 seconds

Story 9.2: Free Tier Enforcement

Feature: Free Tier ($50/month routed spend)

  Scenario: Free tier user within limit
    Given the org is on the free tier
    And this month's routed spend is $45
    When a new request is proxied
    Then the request is processed normally

  Scenario: Free tier user exceeds limit
    Given the org is on the free tier
    And this month's routed spend is $50.01
    When a new request is proxied
    Then the proxy returns HTTP 429
    And the response body includes an upgrade CTA with Stripe Checkout link

  Scenario: Monthly counter resets on the 1st
    Given the org used $50 last month
    When the calendar rolls to the 1st of the new month
    Then the Redis counter is reset to $0
    And requests are processed normally again

Story 9.3: API Key Management

Feature: API Key CRUD

  Scenario: Generate new API key
    Given I am authenticated as an Owner
    When I POST /api/orgs/{id}/api-keys
    Then a new API key is returned in plaintext (shown once)
    And the key is stored as a bcrypt hash in PostgreSQL
    And the key is cached in Redis for fast auth

  Scenario: Rotate API key
    Given API key "sk-old" is active
    When I POST /api/orgs/{id}/api-keys/{key_id}/rotate
    Then a new key "sk-new" is returned
    And "sk-old" is immediately invalidated

Story 9.4: First Route Onboarding

Feature: 2-Minute First Route

  Scenario: User completes onboarding wizard
    Given the user just signed up
    When they copy the API key (step 1)
    And paste the provided curl command (step 2)
    And the request appears in the dashboard (step 3)
    Then the onboarding is marked complete
    And a PostHog event "routing.savings.first_dollar" is tracked (if savings occurred)

Story 9.5: Team Invites

Feature: Team Member Invites

  Scenario: Invite team member via email
    Given I am an Owner of org "acme"
    When I POST /api/orgs/{id}/invites with email "dev@acme.com"
    Then an email with a magic link is sent
    And the magic link JWT expires in 72 hours

  Scenario: Invited user joins existing org
    Given "dev@acme.com" received an invite magic link
    When they click the link and complete GitHub OAuth
    Then they are added to org "acme" as a Member (not Owner)
    And no new org is created

Epic 10: Transparent Factory Compliance

Story 10.1: Atomic Flagging

Feature: Feature Flag Infrastructure

  Scenario: New routing strategy behind a flag (default off)
    Given feature flag "enable_cascading_router" exists with default=off
    When a request arrives that would trigger cascading routing
    Then the passthrough strategy is used instead
    And the flag evaluation completes without network calls

  Scenario: Flag auto-disables on latency regression
    Given flag "enable_new_classifier" is at 50% rollout
    And P99 latency increased by 8% since flag was enabled
    When the circuit breaker evaluates flag health
    Then the flag is auto-disabled within 30 seconds
    And an alert is fired

  Scenario: CI blocks deployment with expired flag
    Given flag "old_experiment" has TTL expired and rollout=100%
    When CI runs the flag audit
    Then the build fails with "Expired flag at full rollout: old_experiment"

Story 10.2: Elastic Schema

Feature: Additive-Only Schema Migrations

  Scenario: Migration with DROP COLUMN is rejected
    Given a migration file contains "ALTER TABLE request_events DROP COLUMN old_field"
    When CI runs the schema lint
    Then the build fails with "Destructive schema change detected"

  Scenario: V1 code ignores V2 fields
    Given a request_events row contains a new "routing_v2" column
    When V1 Rust code deserializes the row
    Then deserialization succeeds (unknown fields ignored)
    And no error is logged

Story 10.3: Cognitive Durability

Feature: Decision Logs for Routing Logic

  Scenario: PR touching router requires decision log
    Given a PR modifies files in src/router/
    And no decision_log.json is included
    When CI runs the decision log check
    Then the build fails with "Decision log required for routing changes"

  Scenario: Cyclomatic complexity exceeds cap
    Given a function in src/router/ has cyclomatic complexity 12
    When cargo clippy runs with cognitive_complexity threshold 10
    Then the lint fails

Story 10.4: Semantic Observability

Feature: AI Reasoning Spans

  Scenario: Routing decision emits OTEL span
    Given a request is routed from gpt-4o to gpt-4o-mini
    When the routing decision completes
    Then an "ai_routing_decision" span is created
    And span attributes include: ai.model_selected, ai.cost_delta, ai.complexity_score
    And ai.prompt_hash is a SHA-256 hash (not raw content)

  Scenario: No PII in any span
    Given a request with user email in the prompt
    When the span is emitted
    Then no span attribute contains the email address
    And ai.prompt_hash is the only prompt-related attribute

Story 10.5: Configurable Autonomy

Feature: Governance Policy

  Scenario: Strict mode blocks auto-applied config changes
    Given governance_mode is "strict"
    When the background task attempts to refresh routing rules
    Then the refresh is blocked
    And a log entry "Blocked by strict mode" is written

  Scenario: Panic mode freezes to last-known-good
    Given panic_mode is triggered via POST /admin/panic
    When a new request arrives
    Then routing uses the frozen last-known-good configuration
    And auto-failover is disabled
    And the response header includes "X-DD0C-Panic: active"

End of dd0c/route BDD Acceptance Specifications — 10 Epics, 50+ Scenarios