Phase 3: BDD acceptance specs for P1 (route) and P5 (cost)

P1: 50+ scenarios across 10 epics, all stories covered P5: 55+ scenarios across 10 epics, written manually (Sonnet credential failures) Remaining P2/P3/P4/P6 in progress via subagents
2026-03-01 01:50:30 +00:00
parent 03bfe931fc
commit c1484426cc
2 changed files with 1295 additions and 0 deletions
--- a/products/01-llm-cost-router/acceptance-specs/acceptance-specs.md
+++ b/products/01-llm-cost-router/acceptance-specs/acceptance-specs.md
@@ -0,0 +1,685 @@
+# dd0c/route — BDD Acceptance Test Specifications
+
+**Phase 3: Given/When/Then per Story**
+**Date:** March 1, 2026
+
+---
+
+## Epic 1: Proxy Engine
+
+### Story 1.1: OpenAI SDK Drop-In Compatibility
+
+```gherkin
+Feature: OpenAI SDK Drop-In Compatibility
+
+  Scenario: Non-streaming request proxied successfully
+    Given a valid API key "sk-test-123" exists in Redis cache
+    And the upstream OpenAI endpoint is healthy
+    When I send POST /v1/chat/completions with model "gpt-4o" and stream=false
+    Then the response status is 200
+    And the response body matches the OpenAI ChatCompletion schema
+    And the response includes usage.prompt_tokens and usage.completion_tokens
+
+  Scenario: Request with invalid API key is rejected
+    Given no API key "sk-invalid" exists in Redis or PostgreSQL
+    When I send POST /v1/chat/completions with Authorization "Bearer sk-invalid"
+    Then the response status is 401
+    And the response body contains error "Invalid API key"
+
+  Scenario: API key validated from Redis cache (fast path)
+    Given API key "sk-cached" exists in Redis cache
+    When I send POST /v1/chat/completions with Authorization "Bearer sk-cached"
+    Then the key is validated without querying PostgreSQL
+    And the request is forwarded to the upstream provider
+
+  Scenario: API key falls back to PostgreSQL when Redis is unavailable
+    Given API key "sk-db-only" exists in PostgreSQL but NOT in Redis
+    When I send POST /v1/chat/completions with Authorization "Bearer sk-db-only"
+    Then the key is validated via PostgreSQL
+    And the key is written back to Redis cache for future requests
+```
+
+### Story 1.2: SSE Streaming Support
+
+```gherkin
+Feature: SSE Streaming Passthrough
+
+  Scenario: Streaming response is proxied chunk-by-chunk
+    Given a valid API key and upstream OpenAI is healthy
+    When I send POST /v1/chat/completions with stream=true
+    Then the response Content-Type is "text/event-stream"
+    And I receive multiple SSE chunks ending with "data: [DONE]"
+    And each chunk matches the OpenAI streaming delta schema
+
+  Scenario: Client disconnects mid-stream
+    Given a streaming request is in progress
+    When the client drops the TCP connection after 5 chunks
+    Then the proxy aborts the upstream provider connection
+    And no further chunks are buffered in memory
+```
+
+### Story 1.3: <5ms Latency Overhead
+
+```gherkin
+Feature: Proxy Latency Budget
+
+  Scenario: P99 latency overhead under 5ms
+    Given 1000 non-streaming requests are sent sequentially
+    When I measure the delta between proxy response time and upstream response time
+    Then the P99 delta is less than 5 milliseconds
+
+  Scenario: Telemetry emission does not block the hot path
+    Given TimescaleDB is experiencing 5-second write latency
+    When I send a request through the proxy
+    Then the proxy responds within the 5ms overhead budget
+    And the telemetry event is queued in the mpsc channel (not dropped)
+```
+
+### Story 1.4: Transparent Error Passthrough
+
+```gherkin
+Feature: Provider Error Passthrough
+
+  Scenario: Rate limit error passed through transparently
+    Given the upstream provider returns HTTP 429 with Retry-After header
+    When the proxy receives this response
+    Then the proxy returns HTTP 429 to the client
+    And the Retry-After header is preserved
+
+  Scenario: Provider 500 error passed through
+    Given the upstream provider returns HTTP 500
+    When the proxy receives this response
+    Then the proxy returns HTTP 500 to the client
+    And the original error body is preserved
+```
+
+---
+
+## Epic 2: Router Brain
+
+### Story 2.1: Complexity Classification
+
+```gherkin
+Feature: Request Complexity Classification
+
+  Scenario: Simple extraction task classified as low complexity
+    Given a request with system prompt "Extract the name from this text"
+    And token count is under 500
+    When the complexity classifier evaluates the request
+    Then the complexity score is "low"
+    And classification completes in under 2ms
+
+  Scenario: Multi-turn reasoning classified as high complexity
+    Given a request with 10+ messages in the conversation
+    And system prompt contains "analyze", "reason", or "compare"
+    When the complexity classifier evaluates the request
+    Then the complexity score is "high"
+
+  Scenario: Unknown pattern defaults to medium complexity
+    Given a request with no recognizable task pattern
+    When the complexity classifier evaluates the request
+    Then the complexity score is "medium"
+```
+
+### Story 2.2: Routing Rules
+
+```gherkin
+Feature: Configurable Routing Rules
+
+  Scenario: First-match routing rule applied
+    Given routing rule "if feature=classify -> cheapest from [gpt-4o-mini, claude-haiku]"
+    And the request includes header X-DD0C-Feature: classify
+    When the router evaluates the request
+    Then the request is routed to the cheapest model in the rule set
+    And the routing decision is logged with strategy "cheapest"
+
+  Scenario: No matching rule falls through to default
+    Given routing rules exist but none match the request headers
+    When the router evaluates the request
+    Then the request is routed using the "passthrough" strategy
+    And the original model in the request body is used
+```
+
+### Story 2.3: Automatic Fallback
+
+```gherkin
+Feature: Provider Fallback Chain
+
+  Scenario: Primary provider fails, fallback succeeds
+    Given routing rule specifies fallback chain [openai, anthropic]
+    And OpenAI returns HTTP 503
+    When the proxy processes the request
+    Then the request is retried against Anthropic
+    And the response is returned successfully
+    And the routing decision log shows "fallback triggered"
+
+  Scenario: Circuit breaker opens after sustained failures
+    Given OpenAI error rate exceeds 10% over the last 60 seconds
+    When a new request arrives targeting OpenAI
+    Then the circuit breaker is OPEN for OpenAI
+    And the request is immediately routed to the fallback provider
+    And no request is sent to OpenAI
+```
+
+### Story 2.4: Real-Time Cost Savings
+
+```gherkin
+Feature: Cost Savings Calculation
+
+  Scenario: Savings calculated when model is downgraded
+    Given the original request specified model "gpt-4o" ($15/1M input tokens)
+    And the router downgraded to "gpt-4o-mini" ($0.15/1M input tokens)
+    And the request used 1000 input tokens
+    When the cost calculator runs
+    Then cost_original is $0.015
+    And cost_actual is $0.00015
+    And cost_saved is $0.01485
+
+  Scenario: Zero savings when passthrough (no routing)
+    Given the request was routed with strategy "passthrough"
+    When the cost calculator runs
+    Then cost_saved is $0.00
+```
+
+---
+
+## Epic 3: Analytics Pipeline
+
+### Story 3.1: Non-Blocking Telemetry
+
+```gherkin
+Feature: Asynchronous Telemetry Emission
+
+  Scenario: Telemetry emitted without blocking request
+    Given the proxy processes a request successfully
+    When the response is sent to the client
+    Then a RequestEvent is emitted to the mpsc channel
+    And the channel send completes in under 1ms
+
+  Scenario: Bounded channel drops events when full
+    Given the mpsc channel is at capacity (1000 events)
+    When a new telemetry event is emitted
+    Then the event is dropped (not blocking the proxy)
+    And a "telemetry_dropped" counter is incremented
+```
+
+### Story 3.2: Fast Dashboard Queries
+
+```gherkin
+Feature: Continuous Aggregates for Dashboard
+
+  Scenario: Hourly cost summary is pre-calculated
+    Given 10,000 request events exist in the last hour
+    When I query GET /api/dashboard/summary?period=1h
+    Then the response returns in under 200ms
+    And total_cost, total_saved, and avg_latency are present
+
+  Scenario: Treemap data returns cost breakdown by team and model
+    Given request events tagged with team="payments" and team="search"
+    When I query GET /api/dashboard/treemap?period=7d
+    Then the response includes cost breakdowns per team per model
+```
+
+### Story 3.3: Automatic Data Compression
+
+```gherkin
+Feature: TimescaleDB Compression
+
+  Scenario: Chunks older than 7 days are compressed
+    Given request_events data exists from 10 days ago
+    When the compression policy runs
+    Then chunks older than 7 days are compressed by 90%+
+    And compressed data is still queryable via continuous aggregates
+```
+
+---
+
+## Epic 4: Dashboard API
+
+### Story 4.1: GitHub OAuth Signup
+
+```gherkin
+Feature: GitHub OAuth Authentication
+
+  Scenario: New user signs up via GitHub OAuth
+    Given the user has a valid GitHub account
+    When they complete the /api/auth/github OAuth flow
+    Then an organization is created automatically
+    And a JWT access token is returned
+    And a refresh token is stored in Redis
+
+  Scenario: Existing user logs in via GitHub OAuth
+    Given the user already has an organization
+    When they complete the /api/auth/github OAuth flow
+    Then a new JWT is issued for the existing organization
+    And no duplicate organization is created
+```
+
+### Story 4.2: Routing Rules & Provider Keys CRUD
+
+```gherkin
+Feature: Routing Rules Management
+
+  Scenario: Create a new routing rule
+    Given I am authenticated as an Owner
+    When I POST /api/orgs/{id}/routing-rules with a valid rule body
+    Then the rule is created and returned with an ID
+    And the rule is loaded into the proxy within 60 seconds
+
+  Scenario: Provider API key is encrypted at rest
+    Given I POST /api/orgs/{id}/provider-keys with key "sk-live-abc123"
+    When the key is stored in PostgreSQL
+    Then the stored value is AES-256-GCM encrypted
+    And decrypting with the correct KMS key returns "sk-live-abc123"
+```
+
+### Story 4.3: Dashboard Data Endpoints
+
+```gherkin
+Feature: Dashboard Summary & Treemap
+
+  Scenario: Summary endpoint returns aggregated metrics
+    Given request events exist for the authenticated org
+    When I GET /api/dashboard/summary?period=30d
+    Then the response includes total_cost, total_saved, request_count, avg_latency
+
+  Scenario: Request inspector with filters
+    Given 500 request events exist for the org
+    When I GET /api/requests?feature=classify&limit=20
+    Then 20 results are returned matching feature="classify"
+    And no prompt content is included in the response
+```
+
+### Story 4.4: API Key Revocation
+
+```gherkin
+Feature: API Key Revocation
+
+  Scenario: Revoked key is immediately blocked
+    Given API key "sk-compromised" is active
+    When I DELETE /api/orgs/{id}/api-keys/{key_id}
+    Then the key is removed from Redis cache immediately
+    And subsequent requests with "sk-compromised" return 401
+```
+
+---
+
+## Epic 5: Dashboard UI
+
+### Story 5.1: Cost Treemap
+
+```gherkin
+Feature: AI Spend Treemap Visualization
+
+  Scenario: Treemap renders cost breakdown
+    Given the user is logged into the dashboard
+    And cost data exists for teams "payments" and "search"
+    When the dashboard loads with period=7d
+    Then a treemap visualization renders
+    And the "payments" block is proportionally larger if it spent more
+```
+
+### Story 5.2: Real-Time Savings Counter
+
+```gherkin
+Feature: Savings Counter
+
+  Scenario: Weekly savings counter updates
+    Given the user is on the dashboard
+    And $127.50 has been saved this week
+    When the dashboard loads
+    Then the counter displays "You saved $127.50 this week"
+```
+
+### Story 5.3: Routing Rules Editor
+
+```gherkin
+Feature: Routing Rules Editor UI
+
+  Scenario: Create rule via drag-and-drop interface
+    Given the user navigates to Settings > Routing Rules
+    When they create a new rule with feature="classify" and strategy="cheapest"
+    And drag it to priority position 1
+    Then the rule is saved via the API
+    And the rule list reflects the new priority order
+```
+
+### Story 5.4: Request Inspector
+
+```gherkin
+Feature: Request Inspector
+
+  Scenario: Inspect routing decision for a specific request
+    Given the user opens the Request Inspector
+    When they click on a specific request row
+    Then the detail panel shows: model_selected, model_alternatives, cost_delta, routing_strategy
+    And no prompt content is displayed
+```
+
+---
+
+## Epic 6: Shadow Audit CLI
+
+### Story 6.1: Zero-Setup Codebase Scan
+
+```gherkin
+Feature: Shadow Audit CLI Scan
+
+  Scenario: Scan TypeScript project for LLM usage
+    Given a TypeScript project with 3 files calling openai.chat.completions.create()
+    When I run npx dd0c-scan in the project root
+    Then the CLI detects 3 LLM API call sites
+    And estimates monthly cost based on heuristic token counts
+    And displays projected savings with dd0c/route
+
+  Scenario: Scan runs completely offline with cached pricing
+    Given the pricing table was cached from a previous run
+    And the network is unavailable
+    When I run npx dd0c-scan
+    Then the scan completes using cached pricing data
+    And a warning is shown that pricing may be stale
+```
+
+### Story 6.2: No Source Code Exfiltration
+
+```gherkin
+Feature: Local-Only Scan
+
+  Scenario: No source code sent to server
+    Given I run npx dd0c-scan without --opt-in flag
+    When the scan completes
+    Then zero HTTP requests are made to any dd0c server
+    And the report is rendered entirely in the terminal
+```
+
+### Story 6.3: Terminal Report
+
+```gherkin
+Feature: Terminal Report Output
+
+  Scenario: Top Opportunities report
+    Given the scan found 5 files with LLM calls
+    When the report renders
+    Then it shows "Top Opportunities" sorted by potential savings
+    And each entry includes: file path, current model, suggested model, estimated monthly savings
+```
+
+---
+
+## Epic 7: Slack Integration
+
+### Story 7.1: Weekly Savings Digest
+
+```gherkin
+Feature: Monday Morning Digest
+
+  Scenario: Weekly digest sent via email
+    Given it is Monday at 9:00 AM UTC
+    And the org has cost data from the previous week
+    When the dd0c-worker cron fires
+    Then an email is sent via AWS SES to the org admin
+    And the email includes: total_saved, top_model_savings, week_over_week_trend
+
+  Scenario: No digest sent if no activity
+    Given the org had zero requests last week
+    When the cron fires
+    Then no email is sent
+```
+
+### Story 7.2: Budget Alert via Slack
+
+```gherkin
+Feature: Budget Threshold Alerts
+
+  Scenario: Daily spend exceeds configured threshold
+    Given the org configured alert threshold at $100/day
+    And today's spend reaches $101
+    When the dd0c-worker evaluates thresholds
+    Then a Slack webhook is fired with the alert payload
+    And the payload includes X-DD0C-Signature header
+    And last_fired_at is updated to prevent duplicate alerts
+
+  Scenario: Alert not re-fired for same incident
+    Given an alert was already fired for today's threshold breach
+    When the worker evaluates thresholds again
+    Then no duplicate Slack webhook is sent
+```
+
+---
+
+## Epic 8: Infrastructure & DevOps
+
+### Story 8.1: ECS Fargate Deployment
+
+```gherkin
+Feature: Containerized Deployment
+
+  Scenario: CDK deploys all services to ECS Fargate
+    Given the CDK stack is synthesized
+    When cdk deploy is executed
+    Then ECS services are created for proxy, api, and worker
+    And ALB routes /v1/* to proxy and /api/* to api
+    And CloudFront serves static dashboard assets from S3
+```
+
+### Story 8.2: CI/CD Pipeline
+
+```gherkin
+Feature: GitHub Actions CI/CD
+
+  Scenario: Push to main triggers full pipeline
+    Given code is pushed to the main branch
+    When GitHub Actions triggers
+    Then tests run (unit + integration + canary suite)
+    And Docker images are built and pushed to ECR
+    And ECS services are updated with rolling deployment
+    And zero downtime is maintained during deployment
+```
+
+### Story 8.3: CloudWatch Alarms
+
+```gherkin
+Feature: Monitoring & Alerting
+
+  Scenario: P99 latency alarm fires
+    Given proxy P99 latency exceeds 50ms for 5 minutes
+    When CloudWatch evaluates the alarm
+    Then a PagerDuty incident is created
+```
+
+---
+
+## Epic 9: Onboarding & PLG
+
+### Story 9.1: One-Click GitHub Signup
+
+```gherkin
+Feature: Frictionless Signup
+
+  Scenario: New user completes signup in under 60 seconds
+    Given the user clicks "Sign up with GitHub"
+    When they authorize the OAuth app
+    Then an org is created, an API key is generated
+    And the user lands on the onboarding wizard
+    And total elapsed time is under 60 seconds
+```
+
+### Story 9.2: Free Tier Enforcement
+
+```gherkin
+Feature: Free Tier ($50/month routed spend)
+
+  Scenario: Free tier user within limit
+    Given the org is on the free tier
+    And this month's routed spend is $45
+    When a new request is proxied
+    Then the request is processed normally
+
+  Scenario: Free tier user exceeds limit
+    Given the org is on the free tier
+    And this month's routed spend is $50.01
+    When a new request is proxied
+    Then the proxy returns HTTP 429
+    And the response body includes an upgrade CTA with Stripe Checkout link
+
+  Scenario: Monthly counter resets on the 1st
+    Given the org used $50 last month
+    When the calendar rolls to the 1st of the new month
+    Then the Redis counter is reset to $0
+    And requests are processed normally again
+```
+
+### Story 9.3: API Key Management
+
+```gherkin
+Feature: API Key CRUD
+
+  Scenario: Generate new API key
+    Given I am authenticated as an Owner
+    When I POST /api/orgs/{id}/api-keys
+    Then a new API key is returned in plaintext (shown once)
+    And the key is stored as a bcrypt hash in PostgreSQL
+    And the key is cached in Redis for fast auth
+
+  Scenario: Rotate API key
+    Given API key "sk-old" is active
+    When I POST /api/orgs/{id}/api-keys/{key_id}/rotate
+    Then a new key "sk-new" is returned
+    And "sk-old" is immediately invalidated
+```
+
+### Story 9.4: First Route Onboarding
+
+```gherkin
+Feature: 2-Minute First Route
+
+  Scenario: User completes onboarding wizard
+    Given the user just signed up
+    When they copy the API key (step 1)
+    And paste the provided curl command (step 2)
+    And the request appears in the dashboard (step 3)
+    Then the onboarding is marked complete
+    And a PostHog event "routing.savings.first_dollar" is tracked (if savings occurred)
+```
+
+### Story 9.5: Team Invites
+
+```gherkin
+Feature: Team Member Invites
+
+  Scenario: Invite team member via email
+    Given I am an Owner of org "acme"
+    When I POST /api/orgs/{id}/invites with email "dev@acme.com"
+    Then an email with a magic link is sent
+    And the magic link JWT expires in 72 hours
+
+  Scenario: Invited user joins existing org
+    Given "dev@acme.com" received an invite magic link
+    When they click the link and complete GitHub OAuth
+    Then they are added to org "acme" as a Member (not Owner)
+    And no new org is created
+```
+
+---
+
+## Epic 10: Transparent Factory Compliance
+
+### Story 10.1: Atomic Flagging
+
+```gherkin
+Feature: Feature Flag Infrastructure
+
+  Scenario: New routing strategy behind a flag (default off)
+    Given feature flag "enable_cascading_router" exists with default=off
+    When a request arrives that would trigger cascading routing
+    Then the passthrough strategy is used instead
+    And the flag evaluation completes without network calls
+
+  Scenario: Flag auto-disables on latency regression
+    Given flag "enable_new_classifier" is at 50% rollout
+    And P99 latency increased by 8% since flag was enabled
+    When the circuit breaker evaluates flag health
+    Then the flag is auto-disabled within 30 seconds
+    And an alert is fired
+
+  Scenario: CI blocks deployment with expired flag
+    Given flag "old_experiment" has TTL expired and rollout=100%
+    When CI runs the flag audit
+    Then the build fails with "Expired flag at full rollout: old_experiment"
+```
+
+### Story 10.2: Elastic Schema
+
+```gherkin
+Feature: Additive-Only Schema Migrations
+
+  Scenario: Migration with DROP COLUMN is rejected
+    Given a migration file contains "ALTER TABLE request_events DROP COLUMN old_field"
+    When CI runs the schema lint
+    Then the build fails with "Destructive schema change detected"
+
+  Scenario: V1 code ignores V2 fields
+    Given a request_events row contains a new "routing_v2" column
+    When V1 Rust code deserializes the row
+    Then deserialization succeeds (unknown fields ignored)
+    And no error is logged
+```
+
+### Story 10.3: Cognitive Durability
+
+```gherkin
+Feature: Decision Logs for Routing Logic
+
+  Scenario: PR touching router requires decision log
+    Given a PR modifies files in src/router/
+    And no decision_log.json is included
+    When CI runs the decision log check
+    Then the build fails with "Decision log required for routing changes"
+
+  Scenario: Cyclomatic complexity exceeds cap
+    Given a function in src/router/ has cyclomatic complexity 12
+    When cargo clippy runs with cognitive_complexity threshold 10
+    Then the lint fails
+```
+
+### Story 10.4: Semantic Observability
+
+```gherkin
+Feature: AI Reasoning Spans
+
+  Scenario: Routing decision emits OTEL span
+    Given a request is routed from gpt-4o to gpt-4o-mini
+    When the routing decision completes
+    Then an "ai_routing_decision" span is created
+    And span attributes include: ai.model_selected, ai.cost_delta, ai.complexity_score
+    And ai.prompt_hash is a SHA-256 hash (not raw content)
+
+  Scenario: No PII in any span
+    Given a request with user email in the prompt
+    When the span is emitted
+    Then no span attribute contains the email address
+    And ai.prompt_hash is the only prompt-related attribute
+```
+
+### Story 10.5: Configurable Autonomy
+
+```gherkin
+Feature: Governance Policy
+
+  Scenario: Strict mode blocks auto-applied config changes
+    Given governance_mode is "strict"
+    When the background task attempts to refresh routing rules
+    Then the refresh is blocked
+    And a log entry "Blocked by strict mode" is written
+
+  Scenario: Panic mode freezes to last-known-good
+    Given panic_mode is triggered via POST /admin/panic
+    When a new request arrives
+    Then routing uses the frozen last-known-good configuration
+    And auto-failover is disabled
+    And the response header includes "X-DD0C-Panic: active"
+```
+
+---
+
+*End of dd0c/route BDD Acceptance Specifications — 10 Epics, 50+ Scenarios*