Files
Max Mayfield 5ee95d8b13 dd0c: full product research pipeline - 6 products, 8 phases each
Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
        product-brief, architecture, epics (incl. Epic 10 TF compliance),
        test-architecture (TDD strategy)

Brand strategy and market research included.
2026-02-28 17:35:02 +00:00

64 KiB

🎷 dd0c/route — Design Thinking Session

Facilitator: Maya, Design Thinking Maestro Date: February 28, 2026 Product: dd0c/route — LLM Cost Router & Optimization Dashboard Brand: 0xDD0C — "All signal. Zero chaos." Method: Full Design Thinking (Empathize → Define → Ideate → Prototype → Test → Iterate)


"Design is jazz. You learn the scales so you can forget them. You study the user so deeply that the solution plays itself. Today we're not designing a proxy or a dashboard — we're designing a feeling. The feeling of control in a world where AI spend is a black box. Let's riff."


Phase 1: EMPATHIZE — Meet the Humans

Before we sketch a single wireframe, we sit with the people. We shut up. We listen. We watch their hands — do they clench when they open the billing console? Do they sigh when they switch tabs to check which model is running? The body knows before the brain does.

Three humans. Three worlds. One shared frustration: AI costs are a fog, and nobody gave them a flashlight.


🧑‍💻 Persona 1: Priya Sharma — The ML Engineer

Age: 29 | Role: Senior ML Engineer | Company: Series B fintech, 80 engineers | Location: Austin, TX Spotify: Lo-fi beats playlist on repeat | Slack status: 🔬 "in the prompt mines"

Priya builds the AI features that make the product magical. She picks models, writes prompts, tunes parameters. She's an artist working in tokens. She cares deeply about output quality — a bad summarization or a hallucinated number could cost a customer real money. She does NOT want to think about cost. That's someone else's problem. Except... it keeps becoming her problem.

Empathy Map

Says Thinks
"I just need GPT-4o for this — the output quality matters." "Is there actually a cheaper model that could handle this? I don't have time to benchmark."
"Can someone tell me why our API bill doubled?" "I bet it's that new RAG pipeline. But I can't prove it."
"I'll optimize the prompts later." "Later never comes. There's always a new feature."
"We should probably test Claude for this use case." "But switching means rewriting all my evaluation scripts. Not worth it right now."
Does Feels
Defaults to GPT-4o for everything because it "just works" Guilty about costs but not empowered to fix them
Copies system prompts from old projects, never audits token length Anxious when the eng manager asks "why is AI so expensive?"
Runs expensive prompt experiments in production because staging has no model config Frustrated that cost visibility requires digging through provider dashboards
Manually tracks model performance in a messy spreadsheet Torn between quality perfectionism and budget pressure

Pain Points

  1. No time to benchmark models — She knows GPT-4o is overkill for half her tasks, but testing alternatives takes days she doesn't have.
  2. Cost is invisible at the code level — She writes openai.chat.completions.create() and has zero idea what that line costs per invocation, per day, per month.
  3. Prompt bloat is technical debt nobody tracks — System prompts grow like kudzu. Nobody audits them. She knows they're wasteful but there's no tooling to measure it.
  4. Multi-model chaos — She wants to use Claude for some tasks, GPT for others, Gemini for yet others. But each has different SDKs, different quirks, different billing. It's a mess.
  5. Blame without data — When costs spike, engineering leadership asks "who did this?" and nobody has attribution data. It feels like a witch hunt.

Current Workarounds

  • Uses GPT-4o for everything to avoid the cognitive overhead of model selection
  • Keeps a personal spreadsheet comparing model outputs for key tasks (updated sporadically)
  • Hardcodes model names in application code — no abstraction layer
  • Checks the OpenAI usage dashboard once a month, squints at the graph, shrugs

Jobs to Be Done (JTBD)

  • When I'm building a new AI feature, I want to pick the right model without spending days benchmarking, so that I can ship fast without wasting money.
  • When my manager asks why AI costs went up, I want to point to specific features and usage patterns, so that I'm not the scapegoat.
  • When a cheaper model drops (like a new Claude or Gemini version), I want to know instantly if it's good enough for my tasks, so that I can switch without risk.

Day-in-the-Life Scenario

7:45 AM — Priya opens her laptop at a coffee shop. Slack is already buzzing. The eng manager posted in #engineering: "Our OpenAI bill was $14K last month. That's 40% over budget. Can the ML team look into this?"

She sighs. She knows it's probably the new document summarization pipeline — it's using GPT-4o with 8K-token system prompts on every request. But she can't prove it. The OpenAI dashboard shows total spend by model, not by feature.

8:30 AM — She opens the codebase. The summarization service has model="gpt-4o" hardcoded in 14 places. She thinks about switching to GPT-4o-mini but worries about quality regression. She'd need to run her eval suite against both models, compare outputs, check edge cases. That's a two-day project minimum.

9:15 AM — She decides to "do it later" and starts working on the new feature instead. The prompt she's writing is 3,200 tokens. She knows it could be shorter but the deadline is Friday.

2:00 PM — A Slack DM from the eng manager: "Can you at least estimate which features are costing the most?" She spends 45 minutes trying to correlate OpenAI usage timestamps with their application logs. The data doesn't line up cleanly. She gives a rough guess.

5:30 PM — She commits code with model="gpt-4o" because it's the safe choice. She closes her laptop feeling vaguely guilty.


👔 Persona 2: Marcus Chen — The Engineering Manager

Age: 36 | Role: Engineering Manager, Platform & AI | Company: Series B fintech, 80 engineers (same company as Priya) | Location: Austin, TX Slack status: 📊 "in budget review hell" | Calendar: Back-to-back from 9 to 4

Marcus manages the team that builds AI features AND the team that runs the infrastructure. He's the bridge between "what's technically possible" and "what the CFO will approve." He sees the AWS bill. He sees the OpenAI bill. He sees the Anthropic bill. He sees them all, and none of them tell him what he actually needs to know: which features are worth the cost, and which are burning money?

Empathy Map

Says Thinks
"We need to get AI costs under control before the board meeting." "I have no idea if $14K/month is reasonable or insane for what we're doing."
"Can we get a breakdown by feature?" "Why is this so hard? Every other infrastructure cost has attribution."
"I trust the ML team to pick the right models." "But I can't defend their choices to the CFO without data."
"Let's set a budget for AI spend this quarter." "How do I set a budget when I can't even measure current spend by category?"
Does Feels
Exports CSVs from 3 different provider dashboards monthly Overwhelmed by the opacity of AI costs
Builds manual spreadsheets to estimate per-feature AI cost Exposed — he's accountable for a budget he can't control or measure
Asks engineers to "be mindful of costs" (vague, ineffective) Frustrated that AI cost tooling is 5 years behind cloud cost tooling
Presents hand-wavy cost estimates to leadership Anxious before every budget review — he's guessing, and he knows it

Pain Points

  1. No attribution — The single biggest pain. He gets one bill from OpenAI that says "$14,000." He needs to know: chatbot = $4K, summarizer = $6K, code review = $2K, experiments = $2K. This data doesn't exist.
  2. No forecasting — "At current growth, what will AI cost us in 6 months?" He literally cannot answer this question.
  3. Multi-provider reconciliation nightmare — OpenAI bills by tokens. Anthropic bills differently. Google bills through GCP. Three billing models, three dashboards, zero unified view.
  4. Can't set meaningful budgets — Without attribution, budgets are arbitrary. Without alerts, budgets are unenforceable.
  5. The "justify AI" pressure — Leadership sees AI as expensive and wants ROI proof. Marcus needs to show "our AI chatbot saves $X in support costs" but has no cost-per-conversation metric.

Current Workarounds

  • Monthly manual spreadsheet reconciliation across providers (takes 3-4 hours)
  • Asks engineers to add comments in code estimating per-call cost (nobody does this)
  • Uses rough heuristics: "We have 5 AI features, bill is $14K, so ~$2.8K each" (wildly inaccurate)
  • Pads the AI budget by 50% to avoid surprises (CFO hates this)

Jobs to Be Done (JTBD)

  • When I'm preparing for a budget review, I want to show exactly where every AI dollar goes, so that leadership trusts my team's spending decisions.
  • When costs spike unexpectedly, I want to get an alert with root cause attribution, so that I can respond in hours, not days.
  • When planning next quarter, I want to forecast AI costs based on feature roadmap and growth projections, so that I can set realistic budgets.

Day-in-the-Life Scenario

Monday 9:00 AM — Marcus opens his week with a calendar invite he's been dreading: "Q2 AI Budget Review with CFO — Wednesday 2pm." He has two days to build a story around numbers he doesn't fully understand.

9:30 AM — He logs into the OpenAI dashboard. $14,200 last month. Up from $9,800 the month before. A 45% increase. He can see it's mostly GPT-4o usage, but he can't see WHY. Was it the new feature launch? A prompt engineering experiment? A retry bug? The dashboard doesn't say.

10:00 AM — He Slacks Priya: "Can you estimate which features are driving the cost increase?" He knows this is an unfair ask — she doesn't have the data either — but he needs something for the slide deck.

11:00 AM — He opens the Anthropic console. $2,100 last month. The Google AI Platform billing page shows another $800. He starts a spreadsheet. Three tabs. Manual data entry. He's doing FinOps with a calculator.

1:00 PM — He gets Priya's response: "Probably the summarization pipeline, but I can't be sure without better logging." He writes "Summarization pipeline optimization — estimated savings: $3-5K/month" on his slide. He's guessing. He knows the CFO will ask follow-up questions he can't answer.

Tuesday 4:00 PM — He finishes the deck. It has a pie chart with estimated cost breakdown. The slices are labeled "Chatbot (est.)", "Summarization (est.)", "Other (est.)." Every number has a margin of error he's not disclosing. He feels like a fraud.

Wednesday 2:00 PM — The CFO asks: "What's our cost per AI-assisted customer interaction?" Marcus doesn't know. The meeting goes poorly.


🛠️ Persona 3: Jordan Okafor — The Platform/DevOps Engineer

Age: 32 | Role: Senior Platform Engineer | Company: Mid-stage SaaS, 120 engineers | Location: Remote (Portland, OR) Terminal: Always open, always tmux | Philosophy: "If it needs a manual, it's broken."

Jordan runs the infrastructure. Kubernetes clusters, CI/CD pipelines, observability stacks. They were handed the LLM proxy project six months ago because "it's infrastructure, right?" Now they maintain a fragile, hand-rolled proxy that routes between OpenAI and Anthropic based on a YAML config file that three different teams keep editing. They hate it. They want a proxy they can deploy, configure once, and never think about again. They especially hate vendor lock-in — they've been burned before (remember when Heroku died?).

Empathy Map

Says Thinks
"Just give me a Helm chart and I'll have it running in 20 minutes." "If this thing adds more than 10ms of latency, I'm ripping it out."
"I don't care which model you use, just don't make me maintain the routing logic." "Why am I, a platform engineer, debugging prompt routing? This isn't my job."
"We need to be able to run this in our VPC. Non-negotiable." "I'm not sending our customers' data through some random startup's proxy."
"Can we please standardize on one API format?" "OpenAI, Anthropic, Google — three different request formats. I'm writing adapters in my sleep."
Does Feels
Maintains a hand-rolled LLM proxy (Node.js, ~2K lines, growing) Resentful — this proxy is a time sink that shouldn't exist
Writes adapter layers for each new LLM provider Paranoid about the proxy being a single point of failure
Gets paged when the proxy has issues (rate limits, timeouts) Exhausted by the operational burden of something that should be a commodity
Reviews every PR that touches the routing config YAML Protective of infrastructure reliability — won't adopt anything that feels fragile

Pain Points

  1. Maintaining a hand-rolled proxy is soul-crushing — It started as 200 lines. It's now 2,000. Every new model, every API change, every edge case adds more code. It's becoming a product, and Jordan didn't sign up to build a product.
  2. Vendor lock-in anxiety — The codebase is riddled with OpenAI-specific patterns. If they need to switch providers (pricing change, outage, compliance), it's a multi-week migration.
  3. Reliability is on their head — If the proxy goes down, ALL AI features go down. They've built retry logic, circuit breakers, fallback chains — all by hand. It's fragile.
  4. No observability into LLM traffic — They have Datadog for everything else, but LLM requests are a black box. No metrics on tokens, latency per model, error rates by provider.
  5. Config sprawl — Three teams edit the routing YAML. No validation. Last month someone typo'd a model name and requests silently fell back to the most expensive model for 6 hours.

Current Workarounds

  • Hand-rolled Node.js proxy with growing technical debt
  • Manual YAML config for routing rules (no validation, no versioning)
  • Custom Prometheus metrics bolted on after the fact (incomplete)
  • A "break glass" runbook for when the proxy fails (switch all traffic to OpenAI direct)

Jobs to Be Done (JTBD)

  • When the team needs LLM routing, I want to deploy a battle-tested proxy with a Helm chart, so that I can stop maintaining custom code.
  • When a new LLM provider or model is added, I want to update a config file (not write code), so that the change takes minutes, not days.
  • When something goes wrong with LLM traffic, I want to see it in my existing observability stack, so that I don't need yet another dashboard.

Day-in-the-Life Scenario

6:30 AM — Jordan's phone buzzes. PagerDuty. "LLM Proxy — Error rate > 5% for 10 minutes." They grab their laptop from the nightstand (it's always there) and SSH into the proxy server.

6:35 AM — The logs show Anthropic is returning 529 (overloaded) errors. The proxy's fallback logic should route to OpenAI, but there's a bug — the fallback only triggers on 500 errors, not 529. They hotfix it, deploy, error rate drops.

6:50 AM — They open a PR to fix the fallback logic properly. While they're in the code, they notice the system prompt for the chatbot feature is 4,200 tokens. They Slack the ML team: "Hey, is this prompt supposed to be this long? It's costing us on every request." No response for 3 hours.

10:00 AM — Sprint planning. The ML team wants to add Google Gemini as a third provider. Jordan estimates 2 weeks to add the adapter, update the routing config schema, add monitoring, and test failover. The PM asks "Can't you just add it?" Jordan stares into the void.

2:00 PM — A junior engineer on another team submits a PR to the routing config. They want to route their new feature to Claude 3.5 Sonnet. The YAML is valid but the model name is wrong — it should be claude-3-5-sonnet-20241022, not claude-3.5-sonnet. Jordan catches it in review. They spend 30 minutes writing a config validation script that should have existed from day one.

5:00 PM — Jordan updates their internal "LLM Proxy Replacement" doc. It now has 47 requirements. They've evaluated LiteLLM (too many features, not enough reliability focus), Portkey (SaaS-only, can't self-host), and building something in Rust (no time). They close the doc and open a beer.


Phase 2: DEFINE — Frame the Problem

"Okay. We've sat with Priya, Marcus, and Jordan. We've felt their frustration in our bones. Now comes the hardest part of design thinking — and the part most people rush through. We need to DEFINE the problem so precisely that the solution becomes almost inevitable. A well-framed problem is a half-solved problem. Let's not design for 'AI cost management.' Let's design for the specific human moment where everything breaks down."


Point-of-View (POV) Statements

A POV statement isn't a feature request. It's a declaration of a human truth. It's the gap between the world as it is and the world as it should be.

Priya (ML Engineer):

Priya, a senior ML engineer who builds AI features under deadline pressure, needs a way to make smart model choices without stopping her workflow, because the current reality forces her to choose between shipping fast (expensive model, no research) and shipping cheap (days of benchmarking she doesn't have). She defaults to expensive every time, and the guilt accumulates like technical debt.

Marcus (Engineering Manager):

Marcus, an engineering manager accountable for a growing AI budget he can't decompose, needs real-time cost attribution by feature, team, and environment, because without it he's presenting estimated pie charts to a CFO who wants exact numbers. He's one bad budget review away from a hiring freeze on AI projects — not because AI isn't valuable, but because he can't prove it is.

Jordan (Platform Engineer):

Jordan, a platform engineer maintaining a hand-rolled LLM proxy that's become a second job, needs a production-grade, self-hostable routing layer that speaks one API format, because every new model and every new provider adds weeks of adapter code, and the proxy they built as a quick hack is now a critical single point of failure they're terrified of.


Key Insights

These are the truths that emerged from empathy. Not features. Not solutions. Truths.

  1. Cost is a team sport with no scoreboard. Everyone contributes to AI spend. Nobody can see their contribution. The result is collective guilt and zero accountability. It's like splitting a dinner check with no itemized receipt.

  2. Model selection is a high-stakes guess. Engineers pick models based on vibes, not data. "GPT-4o is good" is the entire decision framework. There's no feedback loop between model choice and actual cost/quality outcomes.

  3. The proxy is the unloved middle child. Every team that uses multiple LLM providers ends up building a proxy. Every proxy starts as 200 lines and ends as 2,000. Nobody wants to maintain it. It's infrastructure that shouldn't be custom.

  4. Attribution is the killer insight, not routing. Routing saves money. Attribution saves careers. Marcus doesn't just need lower costs — he needs to PROVE where costs go. The dashboard might matter more than the proxy.

  5. Trust is earned in milliseconds. Adding a proxy hop to every LLM request is a massive trust ask. If it adds latency, it's dead. If it touches prompt content unnecessarily, it's dead. If it goes down once in the first week, it's dead. The proxy must be invisible.

  6. "Deploy and forget" is the platinum standard. Jordan doesn't want a powerful tool. Jordan wants a boring tool. Boring means reliable. Boring means no surprises. Boring means they can go back to their actual job.

  7. The real competitor is inertia. The biggest threat isn't LiteLLM or Portkey. It's "we'll deal with AI costs later." The product must make the cost of NOT using it feel immediate and visceral.


Core Tension: The Quality-Cost-Speed Triangle

Every LLM decision lives inside a triangle of competing forces:

        QUALITY
       /       \
      /   THE    \
     /   TENSION  \
    /     ZONE     \
   /________________\
 COST              SPEED
  • Priya lives at the QUALITY vertex. She'll overpay to avoid a bad output.
  • Marcus lives at the COST vertex. He needs the budget to make sense.
  • Jordan lives at the SPEED vertex. Deploy fast, run fast, fail fast, recover fast.

The product doesn't resolve this tension — it makes the tension VISIBLE and NAVIGABLE. Right now, teams make tradeoffs blindly. dd0c/route gives them the instrument panel so they can make tradeoffs intentionally.


"How Might We" Questions

HMWs are the bridge between problem and solution. Each one is a door. Some lead to features. Some lead to entirely new product concepts. We open all the doors.

Attribution & Visibility:

  1. HMW make AI cost attribution as automatic as cloud resource tagging?
  2. HMW show an engineer the cost of their code at the moment they write it, not at the end of the month?
  3. HMW turn a single opaque LLM bill into a story that a CFO can understand in 30 seconds?

Model Selection & Routing: 4. HMW remove the "model selection" decision from the engineer's workflow entirely — make it automatic and invisible? 5. HMW let teams define quality thresholds in plain language ("good enough for classification," "must be perfect for customer-facing") and have the system pick the cheapest model that qualifies? 6. HMW make trying a cheaper model feel as safe as trying a new route on Google Maps — low risk, easy to revert, with a clear comparison?

Trust & Adoption: 7. HMW earn the trust of a paranoid platform engineer in the first 5 minutes of setup? 8. HMW provide value (insights, savings estimates) BEFORE the user routes a single request through us? 9. HMW make the proxy so invisible that engineers forget it's there — until they check the dashboard and see the savings?

Operational Excellence: 10. HMW make adding a new LLM provider feel like adding a new DNS record — config change, not code change? 11. HMW prevent a proxy failure from becoming a total AI outage — graceful degradation by default? 12. HMW integrate LLM observability into the tools teams already use (Datadog, Grafana, PagerDuty) instead of forcing them into a new dashboard?

Behavioral & Cultural: 13. HMW make cost optimization feel like a game (leaderboards, savings streaks) rather than a chore? 14. HMW create a feedback loop where engineers SEE the impact of their model choices within hours, not months? 15. HMW turn the monthly "why is AI so expensive?" meeting into a celebration of savings?


Phase 3: IDEATE — Generate Solutions

"Now we improvise. Phase 1 and 2 were the scales — we learned the key, the tempo, the feel. Phase 3 is the solo. No wrong notes. Every idea gets written down, even the ones that make you wince. Especially those. The idea that makes you uncomfortable is usually the one that's actually new. Let's fill the room with possibilities and sort them later."

I'm organizing ideas across six themes that map to the user journey: getting started, routing intelligence, the dashboard experience, staying safe, working as a team, and connecting to the world.


💡 Solution Ideas (26 ideas across 6 themes)

Theme A: Onboarding & First Value ("The First Five Minutes")

  1. The One-Line Setupexport OPENAI_BASE_URL=https://route.dd0c.dev/v1 and you're live. No SDK. No code change. No signup form with 14 fields. Just a URL swap.

  2. Pre-Route Audit Mode — Before routing a single request, dd0c/route runs in "shadow mode": it observes your existing LLM traffic (via log ingestion or a passive proxy) and generates a report: "Here's what you spent last month, here's what you WOULD have spent with smart routing." Value before commitment. Like a doctor showing you the X-ray before suggesting surgery.

  3. The 60-Second Cost Scan CLInpx dd0c-scan ./src — scans your codebase, finds every LLM API call, estimates monthly cost based on typical usage patterns, and prints a savings estimate. No account needed. No data leaves your machine. Pure engineering-as-marketing.

  4. Interactive Setup Wizard (Terminal) — A dd0c init command that walks you through: Which providers do you use? → What are your API keys? → What's your cost priority (aggressive savings / balanced / quality-first)? → Generates a config file. Done. Helm chart optional.

  5. "Copy Your Neighbor" Templates — Pre-built routing configs for common architectures: "SaaS with chatbot + RAG," "AI code review pipeline," "Multi-agent workflow." Pick a template, customize, deploy. Nobody starts from zero.

Theme B: Routing Intelligence ("The Brain")

  1. Complexity Classifier — A tiny, fast model (<5ms) that reads each incoming prompt and classifies it: trivial (formatting, extraction) → cheap model, moderate (summarization, simple Q&A) → mid-tier model, complex (multi-step reasoning, code generation) → premium model. The user never picks a model. The router picks for them.

  2. Cascading Try-Cheap-First — Send every request to the cheapest viable model first. If the response confidence is below a threshold (measured by logprobs, output length heuristics, or a lightweight quality check), automatically escalate to the next tier. You only pay for expensive models when cheap ones actually fail.

  3. Semantic Response Cache — Hash incoming prompts by semantic similarity (not exact match). If a sufficiently similar prompt was answered in the last N hours, return the cached response. "What's the capital of France?" and "Tell me France's capital city" hit the same cache entry. 20-40% cost reduction for repetitive workloads.

  4. Quality Threshold Profiles — Named profiles that teams attach to their requests: "profile": "customer-facing" (high quality, premium models OK), "profile": "internal-tool" (good enough, optimize for cost), "profile": "batch-job" (cheapest possible, latency doesn't matter). The router maps profiles to routing strategies.

  5. A/B Model Testing — Automatically split traffic for a given task between two models. Measure cost, latency, and quality (via user feedback signals or automated evals). After N requests, recommend the winner. Continuous optimization without manual benchmarking — this is what Priya needs.

  6. Time-Aware Routing — Batch API pricing is 50% cheaper on OpenAI. Background jobs that don't need real-time responses automatically queue for batch processing during off-peak windows. The user sees the same API; the router handles the timing.

  7. Fallback Chain with Circuit Breakers — Provider A → Provider B → Provider C. If Provider A starts returning errors or high latency, the circuit breaker trips and traffic shifts automatically. Jordan's 529-error nightmare never happens again.

Theme C: Dashboard & Insights ("The Scoreboard")

  1. Real-Time Cost Ticker — A live-updating number at the top of the dashboard: "Today's AI spend: $47.23" with a sparkline showing the last 24 hours. Like a stock ticker for your LLM budget. Visceral. Immediate. Marcus checks it once and is hooked.

  2. Attribution Treemap — A visual treemap: Company → Team → Feature → Endpoint → Model. Click to drill down. Each rectangle sized by cost. Instantly see that the summarization pipeline is 43% of total spend. The CFO slide deck writes itself.

  3. "You Could Have Saved" Counter — A persistent, slightly provocative number: "Estimated savings if you'd used dd0c/route routing last month: $4,217." Shown during the audit/shadow mode phase. This is the number that converts free users to paid. It's the gap between their world and the better world.

  4. Model Comparison Scatter Plot — For each task type, plot every available model on a cost vs. quality chart. Highlight the Pareto frontier. Show where the user's current model sits. If they're above the frontier, they're overpaying. One glance, one insight.

  5. Prompt Efficiency Heatmap — Visualize every prompt template by tokens-per-useful-output. Red = bloated (4K system prompt for a yes/no classification). Green = lean. Engineers can see which prompts need a diet. Gamifies prompt optimization.

  6. Weekly Savings Digest Email — Every Monday: "Last week you routed 142K requests. dd0c/route saved you $1,847 vs. default routing. Top saving: switching classification from GPT-4o to Haiku (-$890)." Marcus forwards this to the CFO. It's his proof of value.

Theme D: Guardrails & Safety ("The Seatbelts")

  1. Budget Guardrails with Soft/Hard Limits — Per-team, per-feature, per-day budgets. Soft limit = Slack alert ("Backend team is at 80% of daily AI budget"). Hard limit = throttle to cheaper models or queue requests. The "andon cord" from the brainstorm — anyone can see when spending is abnormal.

  2. Anomaly Detection Alerts — ML-based anomaly detection on spend patterns. "Your RAG pipeline cost 340% more than its 30-day average today." Fires to Slack, PagerDuty, email. Catches retry storms, prompt bugs, and runaway experiments before they become $3K surprises.

  3. Request Inspector / Debugger — Chrome DevTools for LLM requests. See every request: prompt tokens, completion tokens, model used, routing decision (and why), latency, cost. Filter by feature, team, time range. Jordan's observability gap, filled.

Theme E: Team & Collaboration ("The Band")

  1. Savings Leaderboard — "Team Backend saved $3,200 this month by adopting cascading routing. Team ML saved $1,100 by compressing system prompts." Gamification that actually works because it's tied to real dollars. Friendly competition drives adoption across the org.

  2. Routing Policy as Code — Define routing rules in a version-controlled config file (YAML/TOML). PR review for routing changes. GitOps for model selection. Jordan's YAML chaos, replaced with validated, versioned, reviewable config.

  3. Role-Based Views — The ML engineer sees: model performance, prompt efficiency, quality metrics. The manager sees: cost attribution, forecasts, budget status. The platform engineer sees: proxy health, latency, error rates, provider status. Same data, different lenses. Each persona gets their instrument panel.

Theme F: Integrations & Ecosystem ("The Connections")

  1. OpenTelemetry Export — Push LLM telemetry (tokens, cost, latency, model, routing decision) as OTel spans. Fits into existing Datadog/Grafana/Honeycomb pipelines. Jordan doesn't need a new dashboard — the data flows into the dashboards they already have.

  2. GitHub Action: Cost Impact on PRs — A GitHub Action that comments on PRs: "This change adds a new GPT-4o call in the checkout flow. Estimated cost impact: +$1,200/month based on current traffic." Priya sees cost at the moment she writes code, not at the end of the month. Shifts cost awareness left.


🎯 Idea Clusters

Cluster Ideas Core Value
Zero-Friction Adoption 1, 2, 3, 4, 5 Value in minutes, not weeks
Intelligent Routing 6, 7, 8, 9, 10, 11, 12 Automatic savings, no manual model selection
Cost Visibility & Attribution 13, 14, 15, 16, 17, 18 Turn opaque bills into actionable stories
Safety & Control 19, 20, 21 Prevent surprises, maintain trust
Team Dynamics 22, 23, 24 Make cost optimization a team sport
Developer Workflow 25, 26 Meet engineers where they already work

🏆 Top 5 Concepts (with User Flow Sketches)

Concept 1: "The Invisible Router"

For Priya — she never thinks about model selection again.

Developer writes code:
  openai.chat.completions.create(model="gpt-4o", ...)
                    │
                    ▼
  Request hits dd0c/route proxy (base URL swap, zero code change)
                    │
                    ▼
  Complexity classifier runs (<5ms):
    ├── Trivial task → route to GPT-4o-mini ($0.15/M tokens)
    ├── Moderate task → route to Claude Haiku ($0.25/M tokens)  
    └── Complex task → route to GPT-4o ($2.50/M tokens) as requested
                    │
                    ▼
  Response returned with X-DD0C-Cost header
  (engineer can ignore it; dashboard captures it)

Why it wins: Zero behavior change for the engineer. The savings happen automatically. Priya keeps writing model="gpt-4o" and the router quietly downgrades when it's safe to do so.

Concept 2: "The CFO Slide Deck"

For Marcus — the dashboard that tells the cost story.

Marcus opens dd0c dashboard Monday morning:
                    │
                    ▼
  Landing view: Real-time spend ticker + weekly trend
    "This week: $2,847 | Last week: $3,291 | Saved: $444 (13%)"
                    │
                    ▼
  Attribution treemap: Click "Summarization" → see it's 43% of spend
    → Drill into: which endpoints, which prompts, which models
                    │
                    ▼
  Recommendations panel: "Switch classification to Haiku: -$890/mo"
    → One-click: "Apply this routing rule" 
                    │
                    ▼
  Export: PDF report for CFO | CSV for finance | Slack digest for team

Why it wins: Marcus walks into the budget review with exact numbers, not estimates. The dashboard IS the slide deck.

Concept 3: "The Boring Proxy"

For Jordan — deploy it, configure it, forget it.

Jordan runs:
  helm install dd0c-route dd0c/route --set apiKeys.openai=$KEY
                    │
                    ▼
  Proxy is live in their VPC. No data leaves the cluster.
  Health check: GET /health → 200 OK
                    │
                    ▼
  Routing config (YAML, version-controlled):
    routes:
      - match: { tag: "classification" }
        strategy: cheapest
        models: [gpt-4o-mini, claude-haiku]
      - match: { tag: "customer-facing" }
        strategy: quality-first
        fallback: [gpt-4o, claude-sonnet, gemini-pro]
                    │
                    ▼
  OTel metrics flow to their existing Grafana.
  Circuit breakers handle provider outages automatically.
  Jordan goes back to their actual job.

Why it wins: Helm chart. VPC-native. OTel export. Config-as-code. Everything Jordan already knows. Nothing new to learn.

Concept 4: "The Shadow Audit"

For the skeptic in everyone — prove value before asking for trust.

Team installs dd0c in "shadow mode":
  - Reads existing LLM logs (no proxy, no traffic interception)
  - OR runs as a passive sidecar that mirrors (not intercepts) requests
                    │
                    ▼
  After 7 days, generates a report:
    "You spent $11,400 on LLM calls last week.
     With dd0c routing, estimated spend: $6,800.
     Potential savings: $4,600/week ($19,900/month)."
                    │
                    ▼
  Breakdown by feature, by model, by waste category:
    - Overqualified model usage: $2,100 waste
    - Cache-eligible duplicate requests: $1,400 waste  
    - Prompt bloat (compressible tokens): $1,100 waste
                    │
                    ▼
  Team sees the number. Team activates routing.
  Trust earned through evidence, not promises.

Why it wins: Addresses the #1 adoption blocker — trust. No risk. No traffic interception. Just data. The savings number does the selling.

Concept 5: "The Cost-Aware IDE"

For the future — cost visibility at the point of creation.

Priya writes code in VS Code:
                    │
  Inline annotation appears:
    openai.chat.completions.create(
      model="gpt-4o",  // ⚡ ~$0.003/call | ~$2,100/mo at current traffic
      ...               // 💡 GPT-4o-mini: ~$0.0002/call | saves $1,950/mo
    )                   //    (98.7% quality match for this task type)
                    │
                    ▼
  On PR submission, GitHub Action comments:
    "This PR adds 2 new LLM calls. Estimated monthly impact: +$340.
     Recommendation: Route via dd0c profile 'internal-tool' to save $290/mo."
                    │
                    ▼
  Cost becomes part of the code review conversation.
  Not an afterthought. Not a monthly surprise. A design decision.

Why it wins: Shifts cost awareness to the earliest possible moment — when the code is written. This is the long-term vision: cost as a first-class engineering concern, like performance or security.


Phase 4: PROTOTYPE — Define the MVP

"Here's where most teams blow it. They fall in love with the Top 5 concepts and try to build all of them for V1. That's not a prototype — that's a fantasy. A prototype is a question made tangible. What's the ONE question we need to answer first? It's this: 'Will engineers change their base URL and stay?' Everything in V1 exists to answer that question. Everything else waits."


The MVP: "Save Money in 5 Minutes"

The V1 product is ruthlessly scoped. It combines Concept 1 (Invisible Router), Concept 2 (CFO Slide Deck — stripped down), and Concept 3 (Boring Proxy). Concept 4 (Shadow Audit) is a fast-follow. Concept 5 (Cost-Aware IDE) is V2+.

The V1 promise in one sentence:

Change your OpenAI base URL. See where your money goes. Start saving automatically.

What V1 IS:

  • An OpenAI-compatible proxy that routes requests to cheaper models when safe
  • A dashboard that shows cost attribution by feature/team/model
  • Alerts when spending is abnormal

What V1 is NOT:

  • A multi-provider orchestration platform (V1 supports OpenAI + Anthropic only)
  • A prompt optimization engine
  • A model benchmarking suite
  • An enterprise platform with SSO and RBAC

Core User Flows

Flow 1: Setup → First Route (Jordan's flow — 5 minutes)

Step 1: Sign up (GitHub OAuth — one click)
            │
            ▼
Step 2: Get your proxy URL + API key
        "Your dd0c/route endpoint: https://route.dd0c.dev/v1"
        "Your dd0c API key: dd0c_sk_..."
            │
            ▼
Step 3: Swap your base URL
        Before: OPENAI_BASE_URL=https://api.openai.com/v1
        After:  OPENAI_BASE_URL=https://route.dd0c.dev/v1
        (Add dd0c API key as a header or env var)
            │
            ▼
Step 4: Send your first request (existing code, zero changes)
            │
            ▼
Step 5: See it in the dashboard — model used, tokens, cost,
        routing decision. "Your first request was routed to
        GPT-4o-mini. Saved $0.002 vs GPT-4o. You're live."

Time to first value: < 5 minutes. Code changes required: 1 environment variable.

Flow 2: First Route → First Insight (Marcus's flow — day 1 to day 7)

Day 1: Proxy is live. Requests flow through.
       Dashboard shows real-time cost ticker.
            │
            ▼
Day 2-3: Attribution data accumulates.
         Treemap starts forming: which features cost what.
         Marcus can already see the summarization pipeline
         is 40% of spend.
            │
            ▼
Day 5: Enough data for recommendations.
       Dashboard shows: "Switch classification endpoint
       from GPT-4o to GPT-4o-mini. Estimated savings:
       $890/month. Quality impact: <1% based on task
       complexity analysis."
            │
            ▼
Day 7: Weekly digest email arrives.
       "Week 1 with dd0c/route:
        - 23,400 requests routed
        - $1,247 spent (vs. $1,890 estimated without routing)
        - $643 saved (34%)
        - Top recommendation: enable cascading for RAG pipeline"
            │
            ▼
       Marcus forwards the email to the CFO.
       The product has paid for itself.

Flow 3: First Insight → Ongoing Optimization (Priya's flow — week 2+)

Week 2: Priya checks the dashboard out of curiosity.
        Sees her summarization prompts are 3,800 tokens each.
        The prompt efficiency view shows 40% of those tokens
        are boilerplate that doesn't affect output quality.
            │
            ▼
        She trims the prompt to 2,200 tokens.
        Dashboard shows the cost drop in real-time.
            │
            ▼
Week 3: She sees the A/B test results (if enabled):
        "Claude Haiku handles 94% of your classification
        requests correctly at 1/10th the cost of GPT-4o."
        She enables the routing rule with one click.
            │
            ▼
Week 4: She's stopped thinking about model selection.
        The router handles it. She checks the dashboard
        occasionally to see the savings number grow.
        She feels good instead of guilty.

Key Screens / Views

Screen 1: Dashboard Home

┌─────────────────────────────────────────────────────┐
│  dd0c/route                          [Priya ▾] [⚙]  │
├─────────────────────────────────────────────────────┤
│                                                      │
│  TODAY: $47.23        THIS WEEK: $284.91             │
│  ▁▂▃▂▄▅▃▂▁▂▃▅▆▅▃    vs last week: -18% ↓           │
│                       saved this week: $62.40        │
│                                                      │
│  ┌─ COST BY FEATURE ──────────────────────────────┐ │
│  │ ┌──────────────┐┌────────┐┌──────┐┌───┐┌─┐    │ │
│  │ │ Summarization ││Chatbot ││ RAG  ││CRv││…│    │ │
│  │ │    43%        ││  28%   ││ 18%  ││8% ││ │    │ │
│  │ └──────────────┘└────────┘└──────┘└───┘└─┘    │ │
│  └────────────────────────────────────────────────┘ │
│                                                      │
│  💡 RECOMMENDATIONS                                  │
│  ┌────────────────────────────────────────────────┐ │
│  │ Switch classification → Haiku    Save $890/mo  │ │
│  │ Enable caching on FAQ endpoint   Save $340/mo  │ │
│  │ Compress summarization prompt    Save $210/mo  │ │
│  └────────────────────────────────────────────────┘ │
│                                                      │
└─────────────────────────────────────────────────────┘

Screen 2: Request Inspector

┌─────────────────────────────────────────────────────┐
│  Request Inspector                    [Filter ▾]     │
├─────────────────────────────────────────────────────┤
│  Time       Feature      Model Req'd  Model Used    │
│  ─────────  ───────────  ──────────── ────────────  │
│  14:23:01   /summarize   gpt-4o       gpt-4o-mini ↓ │
│  14:23:00   /chat        gpt-4o       gpt-4o      = │
│  14:22:58   /classify    gpt-4o       haiku       ↓ │
│  14:22:55   /summarize   gpt-4o       gpt-4o-mini ↓ │
│                                                      │
│  ▶ Request 14:23:01 — /summarize                     │
│  ┌────────────────────────────────────────────────┐ │
│  │ Requested: gpt-4o                               │ │
│  │ Routed to: gpt-4o-mini (complexity: LOW)        │ │
│  │ Reason: Task classified as extractive summary.  │ │
│  │         Confidence: 94%. Below complexity        │ │
│  │         threshold for premium routing.           │ │
│  │ Tokens: 1,247 in / 342 out                      │ │
│  │ Cost: $0.0002 (vs $0.0024 if gpt-4o)           │ │
│  │ Latency: 340ms (vs ~890ms est. for gpt-4o)     │ │
│  │ Saved: $0.0022                                  │ │
│  └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Screen 3: Routing Config

┌─────────────────────────────────────────────────────┐
│  Routing Rules                    [+ New Rule]       │
├─────────────────────────────────────────────────────┤
│                                                      │
│  Rule 1: Customer-Facing Chat          [ACTIVE] ✓   │
│  Match: tag = "chat" AND tag = "customer"            │
│  Strategy: Quality-first                             │
│  Models: gpt-4o → claude-sonnet → gpt-4o-mini       │
│  Budget: No limit                                    │
│                                                      │
│  Rule 2: Internal Classification       [ACTIVE] ✓   │
│  Match: tag = "classify"                             │
│  Strategy: Cheapest                                  │
│  Models: gpt-4o-mini → claude-haiku                  │
│  Budget: $50/day                                     │
│                                                      │
│  Rule 3: Default (catch-all)           [ACTIVE] ✓   │
│  Match: *                                            │
│  Strategy: Balanced (cascading)                      │
│  Models: gpt-4o-mini → gpt-4o                       │
│  Budget: Soft limit $200/day                         │
│                                                      │
│  ⚙ Advanced: Edit as YAML | Export | Version History │
└─────────────────────────────────────────────────────┘

What to FAKE vs. BUILD in V1

This is the most important decision in the prototype phase. Every hour spent building something that could be faked is an hour stolen from the core experience.

Feature V1 Approach Why
OpenAI-compatible proxy BUILD — this IS the product Non-negotiable core
Complexity classifier BUILD (simple) — rule-based heuristics first, ML later Must work, but doesn't need to be perfect. Token count + keyword patterns get you 80% accuracy
Cascading routing BUILD — try cheap model, escalate on low confidence Core value prop. Must be real.
Cost attribution dashboard BUILD — real-time, per-request cost tracking The "aha moment" for Marcus. Must be real data.
Attribution treemap BUILD (simple) — basic drill-down by tag/model Can be a simple table in V1, treemap visualization in V1.1
Anomaly detection FAKE — simple threshold alerts (>2x daily average) ML-based anomaly detection is V2. Static thresholds work for V1.
Semantic caching FAKE — exact-match caching only in V1 Semantic similarity matching is complex. Exact match captures the easy wins.
Recommendations engine SEMI-FAKE — hand-crafted rules, not ML "If >50% of requests to model X are low-complexity, suggest cheaper model." Rule-based is fine for V1.
Weekly digest email BUILD — it's a cron job and a template High impact, low effort. The email is the viral loop (Marcus forwards it).
Multi-provider support BUILD for OpenAI + Anthropic only Two providers covers 80% of the market. Google/Cohere/etc. are V1.1.
Helm chart / self-hosted DEFER — SaaS-only for V1 Self-hosted is a support burden. Validate the product before adding deployment complexity.
OTel export DEFER — V1.1 Important for Jordan but not for initial validation.
A/B model testing DEFER — V1.1 Powerful but complex. Manual model comparison in V1.
GitHub Action DEFER — V2 Long-term vision, not MVP.

Technical Approach

Architecture (V1):

┌─────────────┐     ┌──────────────────────────────────┐
│  Developer   │     │         dd0c/route SaaS           │
│  Application │     │                                    │
│              │     │  ┌──────────┐   ┌──────────────┐  │
│  OPENAI_     │────▶│  │  Proxy   │──▶│  Router      │  │
│  BASE_URL=   │     │  │  (Rust)  │   │  (classifier │  │
│  route.dd0c  │     │  │          │   │   + rules)   │  │
│  .dev/v1     │◀────│  │          │◀──│              │  │
│              │     │  └──────────┘   └──────────────┘  │
└─────────────┘     │        │                           │
                    │        ▼                           │
                    │  ┌──────────┐   ┌──────────────┐  │
                    │  │ Telemetry│──▶│  Dashboard    │  │
                    │  │ (events) │   │  (React)      │  │
                    │  └──────────┘   └──────────────┘  │
                    │        │                           │
                    │        ▼                           │
                    │  ┌──────────────────────────────┐  │
                    │  │  LLM Providers               │  │
                    │  │  ├── OpenAI API              │  │
                    │  │  ├── Anthropic API            │  │
                    │  │  └── (more in V1.1)          │  │
                    │  └──────────────────────────────┘  │
                    └──────────────────────────────────┘

Key Technical Decisions:
- Proxy in Rust (performance-critical, <5ms overhead target)
- Dashboard in React + Vite (fast, modern, dark mode default)
- Telemetry stored in ClickHouse (columnar, fast aggregation, cost-effective)
- Auth via GitHub OAuth (one-click, developer-friendly)
- Request tagging via HTTP headers (X-DD0C-Feature, X-DD0C-Team)
- Config via YAML file OR dashboard UI (both sync)

Latency budget: The proxy MUST add <10ms p99 overhead. This is non-negotiable. If we can't hit this, the product is dead. The complexity classifier must run in <5ms. The routing decision must be <1ms. Network hop is the rest.

Privacy model: V1 SaaS sees prompt content (necessary for complexity classification). Roadmap: offer a mode where classification runs client-side and only telemetry (tokens, cost, model, latency) is sent to the dashboard. This addresses Jordan's "no data leaves our VPC" requirement.


Phase 5: TEST — Validation Plan

"A prototype without a test plan is just a demo. Demos impress. Tests teach. We need to learn, not impress. The question isn't 'do people like it?' — it's 'do people USE it, and do they STAY?' Let's design the experiment."


Beta User Acquisition Strategy

Target: 30 beta users in 3 cohorts of 10

Cohort Profile Acquisition Channel Why This Cohort
Cohort A: The Builders ML engineers at Series A-C startups spending $1K-$10K/mo on LLMs Hacker News "Show HN" post + r/MachineLearning + AI Twitter/X They feel the pain daily. They'll try anything that saves time. Fast feedback loop.
Cohort B: The Managers Engineering managers / tech leads at 50-200 person companies Direct outreach via LinkedIn. Target people who've posted about AI costs. They have budget authority. If they see the attribution dashboard, they'll champion it internally.
Cohort C: The Operators Platform/DevOps engineers maintaining LLM infrastructure DevOps Slack communities, CNCF channels, Kubernetes forums They'll stress-test reliability, latency, and deployment. Harshest critics = best feedback.

Acquisition Tactics:

  1. The Cost Scan Hook — Release the npx dd0c-scan CLI tool as a free, standalone utility. It scans codebases and estimates LLM spend. No account needed. Captures email for "get your full report." This is the top-of-funnel.
  2. "Show HN" Launch — Post the open-source proxy component. Lead with the savings number: "We saved $X in our own AI pipeline. Here's the tool." Developer credibility through transparency.
  3. Content Marketing — Publish "The State of LLM Costs in 2026" report using anonymized data from early users. Become the trusted source for AI cost benchmarks.
  4. Direct Outreach — Find 20 companies that have blogged or tweeted about AI costs. DM them: "We built a tool that would have caught that. Want early access?"

Success Metrics

Primary Metrics (The "Did It Work?" Metrics)

Metric Target (30-day) Why It Matters
Activation Rate >60% of signups route their first request within 24 hours If they don't activate, the onboarding is broken
7-Day Retention >40% of activated users still routing traffic after 7 days If they leave after trying it, the value prop is broken
30-Day Retention >25% of activated users still active after 30 days If they stay a month, we have product-market fit signal
Measured Savings Average user saves >20% on LLM costs If savings are <10%, the routing intelligence isn't good enough
Time to First Insight <24 hours from first routed request to first actionable dashboard insight If insights take a week, Marcus loses interest

Secondary Metrics (The "Is It Good?" Metrics)

Metric Target Why It Matters
Proxy Latency Overhead <10ms p99 If we add latency, engineers will rip us out
Routing Accuracy <2% of downgraded requests produce noticeably worse output If cheap routing hurts quality, trust is destroyed
Dashboard Daily Active Usage >3 sessions/week for managers If Marcus doesn't check the dashboard, attribution isn't compelling enough
Alert Actionability >50% of anomaly alerts lead to an action (config change, investigation) If alerts are noise, they'll be muted
NPS >40 Standard SaaS benchmark for early-stage product-market fit

Anti-Metrics (Things We Explicitly Do NOT Optimize For)

  • Number of features — More features ≠ better product. We optimize for depth of core features.
  • Number of supported providers — Two providers (OpenAI + Anthropic) done well beats five done poorly.
  • Enterprise sales — V1 is self-serve. If we're doing sales calls, we've lost focus.

Beta Interview Protocol

Week 1 Interview (Post-Activation) — 20 minutes

  1. Walk me through your setup experience. Where did you get stuck? (Watch for: friction points, confusion, moments of delight)
  2. What did you expect to see in the dashboard? What surprised you? (Watch for: mental model mismatches)
  3. Have you changed any routing rules from the defaults? Why or why not? (Watch for: trust level, desire for control vs. automation)
  4. If dd0c/route disappeared tomorrow, what would you miss most? (The answer to this IS your value prop)
  5. What's the one thing that would make you recommend this to a colleague? (Watch for: the "aha moment" — is it savings? attribution? ease of setup?)

Week 4 Interview (Retention Check) — 30 minutes

  1. How has your relationship with AI costs changed since using dd0c/route? (Watch for: emotional shift — guilt → control, anxiety → confidence)
  2. Show me how you use the dashboard. What do you check first? (Watch for: actual usage patterns vs. designed flows)
  3. Have you shared the dashboard or savings data with anyone? Who? Why? (Watch for: viral loops — Marcus forwarding the digest email)
  4. What's the most money dd0c/route has saved you on a single decision? (Watch for: concrete stories we can use in marketing)
  5. What would make you upgrade to a paid plan? What would make you leave? (Watch for: willingness to pay, churn risks)

The One Question That Matters Most:

"If I told you dd0c/route costs $X/month, would you pay for it today?"

If >40% of 30-day retained users say yes at our target price point, we have product-market fit.


Validation Milestones

Week 0:  Beta launch. 30 users invited.
         ✓ Success: >20 activate within 48 hours.
         ✗ Failure: <10 activate. → Revisit onboarding flow.

Week 1:  First interviews. First usage data.
         ✓ Success: Users report "aha moment" with attribution data.
         ✗ Failure: Users say "cool but I don't need this." → Revisit value prop.

Week 2:  Routing intelligence data accumulates.
         ✓ Success: Average savings >15%. Users trust the routing.
         ✗ Failure: Savings <5% OR quality complaints. → Revisit classifier.

Week 4:  Retention check. Payment intent survey.
         ✓ Success: >25% still active. >40% would pay.
         ✗ Failure: <15% active. → Major pivot or kill decision.

Week 6:  Decision point.
         → GO: Launch public beta with pricing.
         → ITERATE: Address top 3 feedback themes, re-run with new cohort.
         → KILL: If core value prop doesn't resonate, pivot to analytics-only
                 (no routing, just visibility). Test that instead.

Phase 6: ITERATE — The Road Ahead

"The first version of anything is a question. The second version is the beginning of an answer. The third version is where the music starts to play. Here's how the song develops — from a single note to a full arrangement."


V1 → V2 Progression

V1.0: "The Flashlight" (Months 1-3)

You can finally SEE where the money goes.

  • OpenAI-compatible proxy with basic routing
  • Cost attribution dashboard
  • Threshold-based alerts
  • Two providers (OpenAI + Anthropic)
  • SaaS-only deployment

Core metric: 20%+ average cost savings for active users.

V1.1: "The Autopilot" (Months 3-5)

The system starts making smart decisions for you.

  • ML-based complexity classifier (replaces rule-based heuristics)
  • Semantic response caching
  • A/B model testing
  • OTel export for existing observability stacks
  • Google Gemini as third provider
  • Budget guardrails with Slack integration

Core metric: 35%+ average cost savings. 50%+ 30-day retention.

V2.0: "The Platform" (Months 6-9)

From tool to infrastructure.

  • Self-hosted deployment (Helm chart, runs in customer VPC)
  • Prompt efficiency scoring and optimization suggestions
  • Team management with RBAC
  • Multi-provider bill reconciliation
  • GitHub Action for cost-aware PRs
  • API for programmatic access to cost data

Core metric: $10K MRR. 5+ teams with >$5K/month LLM spend.

V2.5: "The Intelligence Layer" (Months 9-12)

The data moat becomes the product.

  • Cross-customer benchmarking ("companies like yours save X by doing Y")
  • Automated prompt compression engine
  • Self-hosted model cost comparison ("deploy Llama 3 and save $8K/month")
  • Advanced forecasting (ML-based spend projections)
  • SOC 2 Type II certification

Core metric: $30K MRR. Routing intelligence measurably improves with each new customer (flywheel confirmed).

V3.0: "The AI FinOps Platform" (Year 2)

dd0c/route becomes the control plane for all AI spend.

  • Model distillation-as-a-service
  • Cooperative buying group for volume discounts
  • Agent cost management (track and optimize agentic AI workflows)
  • Carbon footprint tracking
  • Enterprise features (SSO, audit logs, dedicated support)
  • Integration marketplace (community-contributed routing strategies)

Core metric: $50K+ MRR. Category leadership in "AI FinOps."


Growth Loops

Three reinforcing loops that compound over time:

Loop 1: The Savings Loop (Product-Led Growth)

User activates → sees savings → forwards digest to colleague
→ colleague activates → more savings → more forwards

The weekly savings digest email is the viral mechanism. Marcus forwards it to the CFO. The CFO asks other teams to use it. Organic expansion within the org.

Loop 2: The Data Loop (Intelligence Flywheel)

More requests routed → better complexity classifier
→ smarter routing → more savings → more users → more requests

Every request teaches the router. More customers = better routing for everyone. This is the moat. A new competitor starting from zero can't match the routing intelligence of a system that's seen billions of requests across hundreds of customers.

Loop 3: The Content Loop (Category Creation)

Anonymized usage data → publish "State of AI Costs" reports
→ media coverage → brand authority → inbound signups
→ more data → better reports

dd0c becomes the trusted source for AI cost benchmarks. "According to dd0c's latest report, the average company wastes 47% of their LLM spend." This gets cited in blog posts, conference talks, VC pitch decks. The brand becomes synonymous with AI cost intelligence.


Key Metrics Dashboard (What We Track Ourselves)

Category Metric V1 Target V2 Target V3 Target
Acquisition Weekly signups 50 200 500
Activation % who route first request in 24h 60% 70% 80%
Retention 30-day active rate 25% 40% 55%
Revenue MRR $0 (free beta) $10K $50K
Savings Avg % saved per user 20% 35% 45%
Routing Classifier accuracy 80% 92% 97%
Reliability Proxy uptime 99.5% 99.9% 99.99%
Virality % of users who refer a colleague 10% 20% 30%

The North Star

Every dollar of AI spend should be intentional.

Not minimized. Not eliminated. Intentional. The goal isn't to make AI cheap — it's to make AI spend a conscious, data-driven decision rather than an accidental, opaque one. When every engineer can see the cost of their model choices, and every manager can attribute spend to business value, the entire organization's relationship with AI transforms.

That's not a product. That's a movement. And movements don't start with platforms — they start with a single, undeniable insight.

For dd0c/route, that insight is the first time Marcus opens the dashboard and sees exactly where $14,000 went.

Everything else follows from that moment.


"And that's the session. Six phases. Three humans. One product. We started with empathy and ended with a roadmap — but the roadmap is just a hypothesis. The real design happens when the first beta user changes their base URL and we watch what they do next. That's when the jazz really starts."

— Maya 🎷