Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
product-brief, architecture, epics (incl. Epic 10 TF compliance),
test-architecture (TDD strategy)
Brand strategy and market research included.
24 KiB
dd0c/route — V1 MVP Epics
This document outlines the core Epics and User Stories for the V1 MVP of dd0c/route, designed for a solo founder to implement in 1-3 day chunks per story.
Epic 1: Proxy Engine
Description: Core Rust proxy that sits between the client application and LLM providers. Must maintain strict OpenAI API compatibility, support SSE streaming, and introduce <5ms latency overhead.
User Stories
- Story 1.1: As a developer, I want to swap my
OPENAI_BASE_URLto the proxy endpoint, so that my existing OpenAI SDK works without code changes. - Story 1.2: As a developer, I want streaming support (SSE) preserved, so that my chat applications remain responsive while using the proxy.
- Story 1.3: As a platform engineer, I want the proxy latency overhead to be <5ms, so that intelligent routing doesn't degrade our application's user experience.
- Story 1.4: As a developer, I want provider errors (e.g., rate limits) to be passed through transparently, so that my app's existing error handling continues to work.
Acceptance Criteria
- Implements
POST /v1/chat/completionsfor both streaming (stream: true) and non-streaming requests. - Validates the
Authorization: Bearerheader against a Redis cache (falling back to DB). - Successfully forwards requests to OpenAI and Anthropic, translating formats if necessary.
- Asynchronously emits telemetry events to an in-memory channel without blocking the hot path.
- P99 latency overhead is measured at <5ms.
Estimate: 13 points
Dependencies: None
Technical Notes:
- Stack: Rust,
tokio,hyper,axum. - Use connection pooling for upstream providers to eliminate TLS handshake overhead.
- For streaming, parse only the first chunk/headers to make a routing decision, then passthrough. Count tokens from the final SSE chunk (e.g.,
[DONE]).
Epic 2: Router Brain
Description: The intelligence core of dd0c/route embedded within the proxy. It evaluates incoming requests against routing rules, classifies complexity heuristically, checks cost tables, and executes fallback chains.
User Stories
- Story 2.1: As an engineering manager, I want the router to classify the complexity of requests, so that simple extraction tasks are downgraded to cheaper models.
- Story 2.2: As an engineering manager, I want to configure routing rules (e.g., if feature=classify -> use cheapest from [gpt-4o-mini, claude-haiku]), so that I can automatically save money on predictable workloads.
- Story 2.3: As an application developer, I want the router to automatically fallback to an alternative model if the primary model fails or rate-limits, so that my application remains highly available.
- Story 2.4: As an engineering manager, I want cost savings calculated instantly based on up-to-date provider pricing, so that my dashboard data is immediately accurate.
Acceptance Criteria
- Heuristic complexity classifier runs in <2ms based on token count, task patterns (regex on system prompt), and model hints.
- Evaluates first-match routing rules based on request tags (
X-DD0C-Feature,X-DD0C-Team). - Executes "passthrough", "cheapest", "quality-first", and "cascading" routing strategies.
- Enforces circuit breakers on downstream providers (e.g., open circuit if error rate > 10%).
- Calculates
cost_saved = cost_original - cost_actualon the fly using in-memory cost tables.
Estimate: 8 points
Dependencies: Epic 1 (Proxy Engine)
Technical Notes:
- Stack: Rust.
- Run purely in-memory on the proxy hot path. No DB queries per request.
- Cost tables and routing rules must be loaded at startup and refreshed via a background task every 60s.
- Use
serde_jsonto inspect themessagesarray for complexity classification but do not persist the prompt. - Circuit breaker state must be shared via Redis so all proxy instances agree on provider health.
Epic 3: Analytics Pipeline
Description: High-throughput logging and aggregation system using TimescaleDB. Focuses on ingesting asynchronous telemetry from the Proxy Engine without blocking request processing.
User Stories
- Story 3.1: As a platform engineer, I want the proxy to emit telemetry without blocking the main request thread, so that our application performance remains unaffected.
- Story 3.2: As an engineering manager, I want my dashboard queries to be lightning fast even with millions of rows, so that I can quickly slice and dice our AI spend.
- Story 3.3: As an engineering manager, I want historical telemetry to be compressed or aged out automatically, so that the database storage costs remain minimal.
Acceptance Criteria
- Proxy emits a
RequestEventover an in-memorympscchannel viatokio::spawn. - A background worker batches events and inserts them into TimescaleDB every 1s or 100 events using bulk
COPY INTO. - Continuous aggregates (
hourly_cost_summary,daily_cost_summary) are created and updated on schedule to pre-calculatetotal_cost,total_saved, andavg_latency. - TimescaleDB compression policies compress chunks older than 7 days by 90%+.
- The proxy must degrade gracefully if the analytics database is unavailable.
Estimate: 8 points
Dependencies: Epic 1 (Proxy Engine)
Technical Notes:
- Stack: Rust (worker), PostgreSQL/TimescaleDB.
- Write the TimescaleDB migration scripts for the hypertable
request_eventsand the continuous aggregates. - Batching must be robust to worker panics (use bounded channels).
Epic 4: Dashboard API
Description: Axum REST API providing authentication, org/team management, routing rule CRUD, and data endpoints for the frontend dashboard. Focuses on frictionless developer onboarding.
User Stories
- Story 4.1: As an engineering manager, I want to authenticate via GitHub OAuth, so that I can create an organization and get an API key in under 60 seconds without remembering a password.
- Story 4.2: As an engineering manager, I want to manage my organization's routing rules and provider API keys securely, so that dd0c/route can successfully broker requests to OpenAI and Anthropic.
- Story 4.3: As an engineering manager, I want an endpoint that provides my historical spend and savings summary, so that I can visualize it in the UI.
- Story 4.4: As a platform engineer, I want to revoke an active API key, so that compromised credentials are immediately blocked.
Acceptance Criteria
- Implements
/api/auth/githubOAuth flow issuing JWTs and refresh tokens. - Implements
/api/orgsCRUD for managing an organization and API keys. - Implements
/api/dashboard/summaryand/api/dashboard/treemapqueries hitting the TimescaleDB continuous aggregates. - Implements
/api/requestsfor the request inspector with filters (e.g.,model,feature,team). - Securely stores and encrypts provider API keys in PostgreSQL using an AES-256-GCM Data Encryption Key.
- Enforces an RBAC model (Owner, Member) per organization.
Estimate: 13 points
Dependencies: Epic 3 (Analytics Pipeline)
Technical Notes:
- Stack: Rust (
axum), PostgreSQL. - Reuse
tokioruntime to minimize context switches for a solo founder. - Use
oauth2crate for GitHub integration. JWTs are signed with RS256, refresh tokens in Redis. - Ensure API keys are hashed (SHA-256) before storage; raw keys are never stored.
Epic 5: Dashboard UI
Description: The React SPA serving the cost attribution dashboard. Visualizes the AI spend treemap, routing rules editor, real-time ticker, and request inspector. This is the product's primary visual "Aha" moment.
User Stories
- Story 5.1: As an engineering manager, I want to see a treemap of my organization's AI spend broken down by team, feature, and model, so that I can instantly identify the most expensive areas of my application.
- Story 5.2: As an engineering manager, I want a real-time counter showing "You saved $X this week," so that I feel confident the tool is paying for itself.
- Story 5.3: As a platform engineer, I want an interface to configure routing rules (e.g., drag-to-reorder priority), so that I can instruct the proxy without editing config files.
- Story 5.4: As a platform engineer, I want a request inspector that displays metadata, cost, latency, and the specific routing decision for every request, so that I can debug why a certain model was chosen.
Acceptance Criteria
- React + Vite SPA deployed as static assets to S3 + CloudFront.
- Treemap visualization renders cost aggregations dynamically over selected time periods (7d/30d/90d).
- A routing rules editor allows CRUD operations and priority reordering for a team's rules.
- Request Inspector table displays paginated, filterable (
feature,team,status) lists of telemetry without showing prompt content. - Allows an admin to securely input OpenAI and Anthropic API keys.
Estimate: 13 points
Dependencies: Epic 4 (Dashboard API)
Technical Notes:
- Stack: React, TypeScript, Vite, Tailwind CSS.
- No SSR required for V1 (keep it simple). Use
react-queryor similar for data fetching and caching. - Build the treemap with a charting library like D3 or Recharts.
- Emphasize speed—data fetches should resolve from continuous aggregates in <200ms.
Epic 6: Shadow Audit CLI
Description: The PLG "Shadow Audit" command-line tool (npx dd0c-scan). It analyzes a local codebase for LLM API calls, estimates monthly cost based on prompt templates, and projects savings with dd0c/route.
User Stories
- Story 6.1: As a developer, I want a zero-setup CLI tool (
npx dd0c-scan) that scans my codebase and estimates how much money I'm currently wasting on overqualified LLMs, so that I can convince my manager to use dd0c/route. - Story 6.2: As an engineering manager, I want the CLI to run locally without sending my source code to a third party, so that I can securely audit my own projects.
- Story 6.3: As an engineering manager, I want a clean, visually appealing terminal report showing "Top Opportunities" for model downgrades, so that I immediately see the value of routing.
Acceptance Criteria
- Parses a local directory for OpenAI or Anthropic SDK usage in TypeScript/JavaScript/Python files.
- Identifies the models requested in the code and estimates token usage heuristically based on the strings passed to the SDK.
- Hits
/api/v1/pricing/currentto fetch the latest cost tables and calculates an estimated monthly bill and projected savings. - Outputs a formatted terminal report showing total potential savings and a breakdown of the highest-impact files.
- Anonymized scan summary is sent to the server only if the user explicitly opts in.
Estimate: 8 points
Dependencies: Epic 4 (Dashboard API - Pricing Endpoint)
Technical Notes:
- Stack: Node.js,
commander,chalk, simple regex parsers for Python/JS SDKs. - Keep the CLI lightweight, fast, and dependency-free as possible. No actual LLM parsing; use heuristics (string length/structure) for token estimates.
- Must run completely offline if the pricing table is cached.
Epic 7: Slack Integration
Description: The primary retention mechanism and anomaly alerting system. An asynchronous worker task dispatches weekly savings digests and threshold-based budget alerts to Slack and Email.
User Stories
- Story 7.1: As an engineering manager, I want an automated weekly digest summarizing my team's AI savings, so that I can easily report to the CFO that our tooling investment is paying off.
- Story 7.2: As a platform engineer, I want to configure a budget limit (e.g., alert if daily spend > $100) and receive a Slack webhook notification immediately, so that I can stop a retry storm before the bill gets out of hand.
- Story 7.3: As an engineering manager, I want an email version of the weekly digest, so that I can forward it straight to my leadership team.
Acceptance Criteria
- A standalone asynchronous worker (
dd0c-worker) evaluates continuous aggregates (via TimescaleDB) every hour. - Generates a "Monday Morning Digest" email via AWS SES.
- Emits Slack webhook payloads when a threshold alert is triggered (
threshold_amount,threshold_pct). - Adds a
X-DD0C-Signatureto outbound webhooks to prevent spoofing.
Estimate: 8 points
Dependencies: Epic 3 (Analytics Pipeline), Epic 4 (Dashboard API)
Technical Notes:
- Stack: Rust (
tokio-cron),reqwest(for webhooks), AWS SES. - Worker is a singleton container (1 task) running alongside the proxy to avoid lock contention on cron tasks.
- Ensure alerts maintain state (using PostgreSQL
alert_configsandlast_fired_at) so users aren't spammed for the same incident.
Epic 8: Infrastructure & DevOps
Description: Containerized ECS Fargate deployment, AWS native networking, basic monitoring, and fully automated CI/CD for the entire dd0c stack. Essential for a solo founder to deploy safely and frequently.
User Stories
- Story 8.1: As a solo founder, I want to use AWS ECS Fargate, so that I don't have to manage EC2 instances or worry about OS-level patching.
- Story 8.2: As a solo founder, I want a GitHub Actions CI/CD pipeline, so that
git pushautomatically runs tests, builds containers, and deploys rolling updates with zero downtime. - Story 8.3: As an operator, I want standard AWS CloudWatch alarms (e.g., P99 proxy latency > 50ms) connected to PagerDuty, so that I am only woken up when a critical threshold is breached.
- Story 8.4: As a solo founder, I want a strict separation between my configuration (PostgreSQL) and telemetry (TimescaleDB) stores, so that I can scale analytics independently from org/auth state.
Acceptance Criteria
- Full AWS infrastructure defined via CDK (TypeScript) or Terraform.
- ALB routes
/v1/*to the proxy container,/api/*to the dashboard API container. - Dashboard static assets deployed to an S3 bucket with CloudFront caching.
docker buildproduces three optimized images from a single Rust workspace (dd0c-proxy,dd0c-api,dd0c-worker).- CloudWatch dashboards and minimum alarms configured (CPU >80%, Proxy Error Rate >5%, ALB 5xx Rate).
git push maintriggers a GitHub Action to test, lint, build, push to ECR, and update the ECS Fargate services.
Estimate: 13 points
Dependencies: Epic 1 (Proxy Engine), Epic 4 (Dashboard API)
Technical Notes:
- Stack: AWS ECS Fargate, ALB, CloudFront, S3, RDS (PostgreSQL/TimescaleDB), ElastiCache (Redis), GitHub Actions.
- Ensure the ALB utilizes path-based routing correctly and handles TLS termination.
- For cost optimization on AWS, explore consolidating NAT Gateways or utilizing VPC Endpoints for S3/ECR/CloudWatch.
Epic 9: Onboarding & PLG
Description: Self-serve signup, free tier, API key management, and a getting-started flow that gets users routing their first LLM call through dd0c/route in under 2 minutes. This is the growth engine.
User Stories
- Story 9.1: As a new user, I want to sign up with GitHub OAuth in one click, so that I can start using dd0c/route without filling out forms.
- Story 9.2: As a new user, I want a free tier (up to $50/month in routed LLM spend), so that I can evaluate the product with real traffic before committing.
- Story 9.3: As a developer, I want to generate and manage API keys from the dashboard, so that I can integrate dd0c/route into my applications.
- Story 9.4: As a new user, I want a guided "First Route" onboarding flow that gives me a working curl command, so that I see cost savings within 2 minutes of signing up.
- Story 9.5: As a team lead, I want to invite team members via email, so that my team can share a single org and see aggregated savings.
Acceptance Criteria
- GitHub OAuth signup creates org + first API key automatically.
- Free tier enforced at the proxy level — requests beyond $50/month routed spend return 429 with upgrade CTA.
- API key CRUD: create, list, revoke, rotate. Keys are hashed at rest (bcrypt), only shown once on creation.
- Onboarding wizard: 3 steps — (1) copy API key, (2) paste curl command, (3) see first request in dashboard. Completion rate tracked.
- Team invite sends email with magic link. Invited user joins existing org on signup.
- Stripe Checkout integration for upgrade from free → paid ($49/month base).
Estimate: 8 points
Dependencies: Epic 4 (Dashboard API), Epic 5 (Dashboard UI)
Technical Notes:
- Use Stripe Checkout Sessions for payment — no custom billing UI needed for V1.
- Free tier enforcement happens in the proxy hot path — must be O(1) lookup (Redis counter per org, reset monthly via cron).
- Onboarding completion events tracked via PostHog or simple DB events for funnel analysis.
- Magic link invites use signed JWTs with 72-hour expiry, stored in
pending_invitestable.
Epic 10: Transparent Factory Compliance
Description: Cross-cutting epic ensuring dd0c/route adheres to the 5 Transparent Factory architectural tenets: Atomic Flagging, Elastic Schema, Cognitive Durability, Semantic Observability, and Configurable Autonomy. These stories are woven across the existing system — they don't add features, they add engineering discipline.
Story 10.1: Atomic Flagging — Feature Flag Infrastructure
As a solo founder, I want every new routing rule, cost threshold, and provider failover behavior wrapped in a feature flag (default: off), so that I can deploy code continuously without risking production traffic.
Acceptance Criteria:
- OpenFeature SDK integrated into the Rust proxy via a compatible provider (e.g.,
flagdsidecar or env-based provider for V1). - All flags evaluate locally (in-memory or sidecar) — zero network calls on the hot path.
- Every flag has an
ownerfield and attl(max 14 days). CI blocks deployment if any flag exceeds its TTL at 100% rollout. - Automated circuit breaker: if a flagged code path increases P99 latency by >5% or error rate >2%, the flag auto-disables within 30 seconds.
- Flags exist for: model routing strategies, complexity classifier thresholds, provider failover chains, new dashboard features.
Estimate: 5 points Dependencies: Epic 1 (Proxy Engine), Epic 2 (Router Brain) Technical Notes:
- Use OpenFeature Rust SDK. For V1, a simple JSON file or env-var provider is fine — no LaunchDarkly needed.
- Circuit breaker integration: extend the existing Redis-backed circuit breaker to also flip flags.
- Flag cleanup: add a
make flag-audittarget that lists expired flags.
Story 10.2: Elastic Schema — Additive-Only Migration Discipline
As a solo founder, I want all TimescaleDB and Redis schema changes to be strictly additive, so that I can roll back any deployment instantly without data loss or broken readers.
Acceptance Criteria:
- CI lint step rejects any migration containing
DROP,ALTER ... TYPE, orRENAMEon existing columns. - New fields use
_v2suffix or a new table when breaking changes are unavoidable. - All Rust structs use
#[serde(deny_unknown_fields = false)](or equivalent) so V1 code ignores V2 fields. - Dual-write pattern documented and enforced: during migration windows, the API writes to both old and new schema targets within the same DB transaction.
- Every migration file includes a
sunset_datecomment (max 30 days). A CI check warns if any migration is past sunset without cleanup.
Estimate: 3 points Dependencies: Epic 3 (Analytics Pipeline) Technical Notes:
- Use
sqlxmigration files. Add a pre-commit hook or CI step that greps for forbidden DDL keywords. - Redis key schema: version keys with prefix (e.g.,
route:v1:config,route:v2:config). Never rename keys. - For the
request_eventshypertable, new columns are alwaysNULLABLEwith defaults.
Story 10.3: Cognitive Durability — Decision Logs for Routing Logic
As a future maintainer (or future me), I want every change to routing algorithms, cost models, or provider selection logic accompanied by a decision_log.json, so that I can understand why a decision was made months later in under 60 seconds.
Acceptance Criteria:
decision_log.jsonschema defined:{ prompt, reasoning, alternatives_considered, confidence, timestamp, author }.- CI requires a
decision_log.jsonentry for any PR touchingsrc/router/,src/cost/, or migration files. - Cyclomatic complexity cap of 10 enforced via
cargo clippyor a custom lint. PRs exceeding this are blocked. - Decision logs are committed alongside code in a
docs/decisions/directory, one file per significant change.
Estimate: 2 points Dependencies: None Technical Notes:
- Use a PR template that prompts for the decision log fields.
- For the complexity cap,
cargo clippy -W clippy::cognitive_complexitywith threshold 10. - Decision logs for cost table updates should include: source of pricing data, comparison with previous rates, expected savings impact.
Story 10.4: Semantic Observability — AI Reasoning Spans on Routing Decisions
As a platform engineer debugging a misrouted request, I want every proxy routing decision to emit an OpenTelemetry span with structured AI reasoning metadata, so that I can trace exactly which model was chosen, why, and what alternatives were rejected.
Acceptance Criteria:
- Every
/v1/chat/completionsrequest generates anai_routing_decisionspan as a child of the request trace. - Span attributes include:
ai.model_selected,ai.model_alternatives(JSON array of rejected models + reasons),ai.cost_delta(savings vs. default),ai.complexity_score,ai.routing_strategy(passthrough/cheapest/quality-first/cascading). ai.prompt_hash(SHA-256 of first 500 chars of system prompt) included for correlation — never raw prompt content.- Spans export to any OTLP-compatible backend (Grafana Cloud, Jaeger, etc.).
- No PII in any span attribute. Prompt content is hashed, not logged.
Estimate: 3 points Dependencies: Epic 1 (Proxy Engine), Epic 2 (Router Brain) Technical Notes:
- Use
tracing+opentelemetry-rustcrate with OTLP exporter. - The span should be created inside the router decision function, not as middleware — it needs access to the alternatives list.
- For V1, export to stdout in OTLP JSON format. Production: OTLP gRPC to a collector.
Story 10.5: Configurable Autonomy — Governance Policy for Automated Routing
As a solo founder, I want a policy.json governance file that controls what the system is allowed to do autonomously (e.g., switch models, update cost tables, add providers), so that I maintain human oversight as the system grows.
Acceptance Criteria:
policy.jsondefinesgovernance_mode:strict(all changes require manual approval) oraudit(changes auto-apply but are logged).- The proxy checks
governance_modebefore applying any runtime config change (routing rule update, cost table refresh, provider addition). panic_modeflag: when set totrue, the proxy freezes all routing rules to their last-known-good state, disables auto-failover, and routes everything to a single hardcoded provider.- Governance drift monitoring: a weekly cron job logs the ratio of auto-applied vs. manually-approved changes. If auto-applied changes exceed 80% in
strictmode, an alert fires. - All policy check decisions logged: "Allowed by audit mode", "Blocked by strict mode", "Panic mode active — frozen".
Estimate: 3 points Dependencies: Epic 2 (Router Brain) Technical Notes:
policy.jsonlives in the repo root and is loaded at startup + watched for changes vianotifycrate.- For V1 as a solo founder, start in
auditmode.strictmode is for when you hire or add AI agents to the pipeline. - Panic mode should be triggerable via a single API call (
POST /admin/panic) or by setting an env var — whichever is faster in an emergency.
Epic 10 Summary
| Story | Tenet | Points |
|---|---|---|
| 10.1 | Atomic Flagging | 5 |
| 10.2 | Elastic Schema | 3 |
| 10.3 | Cognitive Durability | 2 |
| 10.4 | Semantic Observability | 3 |
| 10.5 | Configurable Autonomy | 3 |
| Total | 16 |