dd0c: full product research pipeline - 6 products, 8 phases each
Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
product-brief, architecture, epics (incl. Epic 10 TF compliance),
test-architecture (TDD strategy)
Brand strategy and market research included.
This commit is contained in:
1881
products/01-llm-cost-router/architecture/architecture.md
Normal file
1881
products/01-llm-cost-router/architecture/architecture.md
Normal file
File diff suppressed because it is too large
Load Diff
324
products/01-llm-cost-router/brainstorm/session.md
Normal file
324
products/01-llm-cost-router/brainstorm/session.md
Normal file
@@ -0,0 +1,324 @@
|
||||
# 🧠 LLM Cost Router — Brainstorming Session
|
||||
|
||||
**Facilitator:** Carson (Elite Brainstorming Specialist)
|
||||
**Date:** February 28, 2026
|
||||
**Product:** LLM Cost Router & Optimization Dashboard
|
||||
**Target:** Bootstrap SaaS, $5K–$50K MRR
|
||||
**Session Goal:** 100+ ideas across problem space, solutions, differentiation, and risk
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Problem Space Exploration
|
||||
|
||||
*"Alright team, let's start with the PAIN. What hurts? What keeps the CFO up at night? What makes the engineer cringe when they open the billing console? No idea is too small — if it stings, say it!"*
|
||||
|
||||
### Direct Cost Pain Points (Classic Brainstorm — Free Association)
|
||||
|
||||
1. **"GPT-4 for everything" syndrome** — Teams default to the most powerful model for every request, including trivial ones like formatting JSON or generating slugs. 60%+ waste.
|
||||
2. **No per-feature cost attribution** — The monthly OpenAI bill is $12K but nobody knows if it's the chatbot, the summarizer, or the code review tool burning cash.
|
||||
3. **No per-team/per-developer attribution** — Engineering teams share API keys. Who's running expensive experiments? Nobody knows.
|
||||
4. **Surprise billing spikes** — A prompt engineering experiment goes wrong, retries in a loop, $3K gone in an hour. No alerts.
|
||||
5. **Token estimation is broken** — Developers can't predict cost before sending a request. Tokenizer counts are approximate and vary by model.
|
||||
6. **Retry storms** — Failed requests retry with exponential backoff but still burn tokens on partial completions. The cost of failures is invisible.
|
||||
7. **Prompt bloat over time** — System prompts grow as features are added. Nobody audits prompt length. A 4,000-token system prompt on every request adds up fast.
|
||||
8. **Context window stuffing** — RAG pipelines stuff maximum context "just in case." Sending 100K tokens when 10K would suffice.
|
||||
9. **Streaming cost opacity** — Streaming responses make it harder to track per-request cost in real-time.
|
||||
10. **Multi-provider bill reconciliation** — Teams using OpenAI + Anthropic + Google + Cohere get 4 separate bills with different billing models. No unified view.
|
||||
|
||||
### Hidden Costs Nobody Talks About (Reverse Brainstorm — "What costs are we pretending don't exist?")
|
||||
|
||||
11. **Latency cost** — Using GPT-4o when GPT-4o-mini would respond 3x faster. The user experience cost of slow responses is real but unmeasured.
|
||||
12. **Developer time debugging model issues** — Hours spent figuring out why Claude gave a different answer than GPT. The human cost of multi-model chaos.
|
||||
13. **Opportunity cost of model lock-in** — Teams build around one provider's API quirks. Switching costs grow silently.
|
||||
14. **Compliance cost of untracked AI usage** — Shadow AI: developers using personal API keys. No audit trail. SOC 2 auditors will ask about this.
|
||||
15. **Embedding re-computation cost** — Changing embedding models means re-embedding your entire corpus. Nobody budgets for this.
|
||||
16. **Fine-tuning waste** — Teams fine-tune models that become obsolete in 3 months when a better base model drops.
|
||||
17. **Testing/staging environment costs** — Running the same expensive models in dev/staging as production. Nobody sets up model tiers per environment.
|
||||
18. **Prompt iteration cost** — The R&D cost of trying 50 prompt variations against GPT-4o to find the best one, when you could test against a cheap model first.
|
||||
19. **Cache miss cost** — Identical or near-identical requests hitting the API repeatedly. Semantic caching could eliminate 20-40% of calls.
|
||||
20. **Overprovisioned rate limits** — Paying for higher rate limit tiers "just in case" when actual usage is 10% of capacity.
|
||||
|
||||
### Workflow Waste (SCAMPER — Substitute, Combine, Adapt, Modify, Put to other use, Eliminate, Reverse)
|
||||
|
||||
21. **Summarize-then-analyze chains** — Step 1 summarizes with GPT-4o, Step 2 analyzes the summary with GPT-4o. Step 1 could use a tiny model.
|
||||
22. **Classification tasks on large models** — Binary yes/no classification sent to a $15/M-token model when a $0.10/M-token model gets 98% accuracy.
|
||||
23. **Batch jobs running synchronously** — Nightly batch processing using real-time API pricing instead of batch API discounts (OpenAI batch API is 50% cheaper).
|
||||
24. **Redundant safety checks** — Multiple layers of content moderation each calling an LLM, when one dedicated moderation endpoint would suffice.
|
||||
25. **Verbose output requests** — Asking for detailed explanations when the downstream consumer only needs a JSON object. Paying for output tokens nobody reads.
|
||||
26. **Translation chains** — Translating content through English as an intermediary when direct translation would be cheaper and better.
|
||||
|
||||
### "Zero Waste AI" Vision (Analogy Thinking — "What would a Toyota Production System for AI look like?")
|
||||
|
||||
27. **Just-in-time model selection** — Like JIT manufacturing: use exactly the right model at exactly the right time, no inventory (no over-provisioning).
|
||||
28. **Kanban for AI requests** — Visualize the flow of requests, identify bottlenecks, limit work-in-progress to prevent cost spikes.
|
||||
29. **Kaizen for prompts** — Continuous improvement: every prompt gets reviewed monthly for token efficiency.
|
||||
30. **Andon cord for AI spend** — Any team member can pull the cord (trigger an alert) when they notice unusual AI spending.
|
||||
31. **Value stream mapping for LLM pipelines** — Map every LLM call in a workflow, identify which ones add value vs. waste.
|
||||
32. **Poka-yoke (mistake-proofing)** — Guardrails that prevent sending a 100K-token request to GPT-4o when the task is simple classification.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Solution Space Explosion
|
||||
|
||||
*"NOW we're cooking! Phase 1 gave us the pain — Phase 2 is where we go WILD with solutions. I want quantity over quality. Bad ideas welcome. TERRIBLE ideas celebrated. The worst idea in the room often leads to the best one. Let's GO!"*
|
||||
|
||||
### Routing Strategies (Classic Brainstorm + SCAMPER)
|
||||
|
||||
33. **Complexity-based routing** — Analyze the prompt: simple extraction → cheap model, multi-step reasoning → expensive model. Use a tiny classifier to decide.
|
||||
34. **Latency-based routing** — User-facing requests get fast models, background jobs get cheap-but-slow models.
|
||||
35. **Quality-threshold routing** — Define acceptable quality per task type. Route to the cheapest model that meets the threshold.
|
||||
36. **Cascading routing** — Try the cheapest model first. If confidence is low, escalate to the next tier. Only pay for expensive models when needed.
|
||||
37. **Time-of-day routing** — Use batch APIs during off-peak hours. Route to providers with lower pricing during their off-peak.
|
||||
38. **Geographic routing** — Route to the nearest/cheapest regional endpoint. EU requests to EU-hosted models for compliance + cost.
|
||||
39. **Token-budget routing** — Set a per-request token budget. Router picks the best model that fits within budget.
|
||||
40. **Ensemble routing** — For critical requests, send to 2 cheap models and compare. Only escalate to expensive model if they disagree.
|
||||
41. **Historical performance routing** — Track which model performs best for each task type over time. Route based on empirical data, not assumptions.
|
||||
42. **A/B test routing** — Automatically A/B test models for each task. Converge on the cheapest one that maintains quality metrics.
|
||||
43. **Fallback chain routing** — Primary model → fallback 1 → fallback 2. Automatic failover on rate limits, outages, or quality drops.
|
||||
44. **Priority queue routing** — High-priority requests get premium models immediately. Low-priority requests queue for batch processing.
|
||||
45. **Semantic similarity routing** — If a similar prompt was answered recently, return cached result or route to cheapest model for minor variations.
|
||||
|
||||
### Dashboard Features (Mind Map Explosion)
|
||||
|
||||
46. **Real-time spend ticker** — Live-updating cost counter like a stock ticker. Per-model, per-team, per-feature.
|
||||
47. **Cost attribution by feature/endpoint** — Tag each API call with metadata (feature, team, environment). Drill down in dashboard.
|
||||
48. **Spend forecasting** — ML-based projection: "At current rate, you'll spend $X this month." With confidence intervals.
|
||||
49. **Anomaly detection alerts** — "Your summarization pipeline cost 400% more than usual today." Slack/email/PagerDuty integration.
|
||||
50. **Model comparison reports** — "Switching your classification task from GPT-4o to Claude Haiku would save $2,100/month with <1% quality drop."
|
||||
51. **Prompt efficiency scoring** — Score each prompt template on tokens-per-useful-output. Identify bloated prompts.
|
||||
52. **Savings leaderboard** — Gamify cost optimization. "Team Backend saved $3,200 this month by switching to cascading routing."
|
||||
53. **Budget guardrails** — Set hard/soft limits per team, per feature, per day. Auto-throttle or alert when approaching limits.
|
||||
54. **Invoice reconciliation** — Match your internal tracking against provider invoices. Flag discrepancies.
|
||||
55. **Carbon footprint tracking** — Estimate CO2 per model per request. ESG reporting for AI usage.
|
||||
56. **ROI calculator per AI feature** — "Your chatbot costs $4K/month and handles 10K conversations. That's $0.40/conversation vs. $8/conversation for human support."
|
||||
57. **Token waste heatmap** — Visual heatmap showing where tokens are wasted: long system prompts, verbose outputs, unnecessary context.
|
||||
58. **Provider health dashboard** — Real-time status of each LLM provider. Latency, error rates, rate limit utilization.
|
||||
59. **Cost-per-quality scatter plot** — Plot each model's cost vs. quality score for your specific tasks. Find the Pareto frontier.
|
||||
|
||||
### Developer Experience (Random Word Association — "SDK" + "magic" + "invisible")
|
||||
|
||||
60. **OpenAI-compatible proxy** — Drop-in replacement. Change your base URL, everything else stays the same. Zero code changes.
|
||||
61. **One-line SDK wrapper** — `import { llm } from 'costrouter'; llm.chat(...)` — wraps OpenAI/Anthropic/Google SDKs transparently.
|
||||
62. **CLI tool** — `costrouter analyze` scans your codebase, finds all LLM calls, estimates monthly cost, suggests optimizations.
|
||||
63. **VS Code extension** — Inline cost estimates next to LLM API calls. "This call costs ~$0.003 per invocation."
|
||||
64. **Middleware/interceptor pattern** — Express/FastAPI middleware that automatically wraps outgoing LLM calls.
|
||||
65. **Terraform/Pulumi provider** — Define routing rules as infrastructure code. Version-controlled cost policies.
|
||||
66. **GitHub Action** — PR comment: "This change adds a new GPT-4o call in the hot path. Estimated cost impact: +$1,200/month."
|
||||
67. **Playground/sandbox** — Test prompts against multiple models simultaneously. See cost, latency, and quality side-by-side before deploying.
|
||||
68. **Auto-generated migration guides** — "To switch from OpenAI to Anthropic for this task, change these 3 lines."
|
||||
69. **Prompt optimizer** — Automatically compress prompts to use fewer tokens while maintaining output quality.
|
||||
70. **Request inspector/debugger** — Chrome DevTools-style inspector for LLM requests. See tokens, cost, latency, routing decision for each call.
|
||||
|
||||
### Business Model Variations (Worst Possible Idea → Invert)
|
||||
|
||||
71. **% of savings** — "We saved you $5K this month, we keep 20%." Aligned incentives. Risk: hard to prove counterfactual.
|
||||
72. **Flat per-request fee** — $0.001 per routed request. Simple, predictable. Scales with usage.
|
||||
73. **Freemium with usage cap** — Free for <$100/month LLM spend. Paid tiers for higher volume.
|
||||
74. **Open-core** — OSS proxy (routing engine) + paid dashboard/analytics/team features.
|
||||
75. **Seat-based** — $29/developer/month. Simple for procurement.
|
||||
76. **Spend-tier pricing** — Free for <$500 LLM spend, $49/mo for <$5K, $199/mo for <$50K, custom for enterprise.
|
||||
77. **Reverse auction model** — (Wild!) Providers bid for your traffic. You get the lowest price automatically.
|
||||
78. **Insurance model** — "Pay us $X/month, we guarantee your LLM costs won't exceed $Y." We eat the risk.
|
||||
79. **Worst idea → invert: Charge per dollar WASTED** — Actually... a "waste tax" that donates to charity could be a viral marketing hook.
|
||||
|
||||
### Integration Approaches (Analogy — "How do CDNs work? Apply that to LLM routing")
|
||||
|
||||
80. **CDN-style edge proxy** — Deploy routing logic at the edge (Cloudflare Workers). Lowest latency routing decisions.
|
||||
81. **DNS-style resolution** — `gpt4o.costrouter.ai` resolves to the cheapest equivalent model. Change DNS, not code.
|
||||
82. **Service mesh sidecar** — Kubernetes sidecar that intercepts LLM traffic. Zero application changes.
|
||||
83. **Browser extension for AI tools** — Intercept ChatGPT/Claude web UI usage. Track and optimize even manual usage.
|
||||
84. **Webhook-based** — Send us your LLM logs via webhook. We analyze and recommend. No proxy needed (analytics-only mode).
|
||||
85. **LangChain/LlamaIndex plugin** — Native integration with the most popular AI frameworks.
|
||||
86. **OpenTelemetry collector** — Export LLM telemetry via OTel. Fits into existing observability pipelines.
|
||||
|
||||
### Wild Ideas (Lateral Thinking — "What if we had no constraints?")
|
||||
|
||||
87. **AI agent that negotiates volume discounts** — Bot that contacts LLM providers, negotiates enterprise pricing based on your aggregated usage across customers.
|
||||
88. **Semantic response cache** — Cache responses by semantic similarity, not exact match. "What's the capital of France?" and "France's capital city?" return the same cached response.
|
||||
89. **Predictive pre-computation** — Analyze usage patterns, pre-generate likely responses during off-peak hours at batch pricing.
|
||||
90. **Model distillation-as-a-service** — Automatically fine-tune a small model on your specific tasks using your GPT-4o outputs. Replace the expensive model with your custom cheap one.
|
||||
91. **LLM futures market** — (Truly wild) Let companies buy/sell LLM compute futures. Lock in pricing for next quarter.
|
||||
92. **Cooperative buying group** — Pool small companies' usage to negotiate enterprise pricing collectively.
|
||||
93. **Response quality bounty** — Users flag bad responses. The system learns which model/prompt combos fail and routes around them.
|
||||
94. **"Eco mode" for AI** — Like phone battery saver. One toggle: "optimize for cost." Automatically downgrades all non-critical AI calls.
|
||||
95. **AI spend carbon offset marketplace** — Track AI carbon footprint, automatically purchase offsets. ESG compliance built-in.
|
||||
96. **Prompt compression engine** — Automatically rewrite prompts to be shorter while preserving intent. Like gzip for prompts.
|
||||
97. **Multi-turn conversation optimizer** — Detect when a conversation can be summarized and continued with a cheaper model mid-stream.
|
||||
98. **Self-hosted model recommender** — "Based on your usage, hosting Llama 3 on a $2K/month GPU would save you $8K/month vs. API calls."
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Differentiation & Moat
|
||||
|
||||
*"Okay beautiful people, we've got a MOUNTAIN of ideas. Now let's get strategic. What makes this thing DEFENSIBLE? What stops someone from cloning it in a weekend? What makes customers STAY? This is where we separate a feature from a business."*
|
||||
|
||||
### Data Moats (Analogy — "What's Waze's moat? User-generated traffic data.")
|
||||
|
||||
99. **Routing intelligence network effect** — Every request teaches the router which model is best for which task. More customers = better routing = more savings = more customers. Flywheel.
|
||||
100. **Cross-customer benchmarking** — "Companies like yours typically save 40% by routing classification to Haiku." Anonymized aggregate intelligence.
|
||||
101. **Task-type performance database** — The world's largest dataset of "model X performs Y% on task type Z at cost W." Nobody else has this.
|
||||
102. **Prompt efficiency corpus** — Anonymized library of optimized prompts. "Here's a 40% shorter version of your system prompt that performs identically."
|
||||
|
||||
### Switching Costs (SCAMPER — What can we Combine to increase stickiness?)
|
||||
|
||||
103. **Historical analytics lock-in** — 6 months of cost data, trends, and forecasts. Leaving means losing your analytics history.
|
||||
104. **Custom routing rules** — Teams invest time configuring routing policies. That configuration is valuable and non-portable.
|
||||
105. **Team workflows built around alerts** — Budget alerts, anomaly detection, Slack integrations — all wired into team processes.
|
||||
106. **Compliance audit trail** — SOC 2 auditors accept your cost attribution reports. Switching means rebuilding compliance evidence.
|
||||
|
||||
### Technical Moats
|
||||
|
||||
107. **Proprietary complexity classifier** — A fast, accurate model that classifies prompt complexity in <5ms. Hard to replicate without massive training data.
|
||||
108. **Real-time model benchmarking** — Continuously benchmark all models on standardized tasks. Know within hours when a model's quality changes (post-update regressions).
|
||||
109. **Provider relationship advantages** — Early access to new models, volume discounts passed to customers, beta features.
|
||||
110. **Multi-cloud routing optimization** — Optimize across AWS Bedrock, Azure OpenAI, Google Vertex, and direct APIs simultaneously. Complex to build, easy to use.
|
||||
|
||||
### Brand & Community Moats
|
||||
|
||||
111. **"AI FinOps" category creation** — Own the category name. Be the Datadog of AI cost management.
|
||||
112. **Open-source proxy as top-of-funnel** — OSS routing engine gets adoption. Paid dashboard converts power users. Community contributes routing strategies.
|
||||
113. **Public AI cost benchmarks** — Publish monthly "State of AI Costs" reports. Become the trusted source. Media coverage → brand → customers.
|
||||
114. **Developer community & marketplace** — Community-contributed routing strategies, prompt optimizers, integrations. Ecosystem lock-in.
|
||||
115. **Integration partnerships** — Official partner with LangChain, LlamaIndex, Vercel AI SDK. "Recommended cost optimization tool."
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Anti-Ideas & Red Team
|
||||
|
||||
*"Time to put on our black hats. I want you to DESTROY this idea. Be ruthless. Be the VC who says no. Be the competitor who wants to crush us. Be the customer who churns. If we can survive this gauntlet, we've got something real."*
|
||||
|
||||
### Why This Could FAIL (Reverse Brainstorm — "How do we guarantee failure?")
|
||||
|
||||
116. **Race to zero pricing** — LLM providers keep cutting prices. If GPT-4o becomes as cheap as GPT-4o-mini, routing adds no value. The savings disappear.
|
||||
117. **Provider lock-in by design** — OpenAI, Anthropic, and Google actively discourage multi-provider usage. Proprietary features (function calling formats, vision capabilities) make routing harder.
|
||||
118. **"Good enough" built-in solutions** — OpenAI launches their own cost dashboard and routing. They have all the data already. Why would they let a third party capture this value?
|
||||
119. **Latency overhead kills adoption** — Adding a proxy hop adds latency. For real-time chat applications, even 50ms matters. Developers won't accept the tradeoff.
|
||||
120. **Trust barrier** — "You want me to route ALL my LLM traffic through your proxy? Including my proprietary prompts and customer data?" Security/compliance teams will block this.
|
||||
121. **Small market initially** — Only companies spending >$1K/month on LLMs care about optimization. That's a smaller market than it seems in 2026.
|
||||
122. **Open-source competition** — LiteLLM already exists as an OSS proxy. A well-funded OSS project could eat the market before a SaaS gains traction.
|
||||
123. **Model convergence** — If all models become equally good and equally priced, routing intelligence has no value.
|
||||
|
||||
### Biggest Risks
|
||||
|
||||
124. **Single point of failure risk** — If the router goes down, ALL LLM calls fail. Customers won't accept this for production workloads without extreme reliability guarantees.
|
||||
125. **Data privacy liability** — Routing means seeing all prompts and responses. One data breach and the company is dead. GDPR, HIPAA, SOC 2 all apply.
|
||||
126. **Accuracy of complexity classification** — If the router sends a complex task to a cheap model and it fails, the customer blames you, not the model.
|
||||
127. **Provider API changes** — OpenAI changes their API format, your proxy breaks. You're now maintaining compatibility layers for 5+ providers. Operational burden grows fast.
|
||||
|
||||
### Competitor Kill Strategies
|
||||
|
||||
128. **OpenAI launches "Smart Routing"** — Built into their API. Free. Game over for the routing value prop.
|
||||
129. **Datadog acquires Helicone** — Adds LLM cost tracking to their existing observability platform. Instant distribution to 26K+ customers.
|
||||
130. **LiteLLM raises $50M** — Goes from OSS project to well-funded SaaS competitor with 10x your engineering team.
|
||||
131. **AWS Bedrock adds native routing** — Brian's own employer could build this as a platform feature. Free for Bedrock customers.
|
||||
132. **Price war** — A VC-funded competitor offers the same product for free to gain market share. Burns cash to kill bootstrapped competitors.
|
||||
|
||||
### Assumptions That Might Be Wrong
|
||||
|
||||
133. **"Teams want multi-provider"** — Maybe most teams are happy with one provider. The multi-provider routing value prop only matters if teams actually use multiple models.
|
||||
134. **"Cost is the primary concern"** — Maybe quality and reliability matter 10x more than cost. Teams might prefer to overpay for consistency.
|
||||
135. **"A proxy is the right architecture"** — Maybe an analytics-only approach (no routing, just visibility) is what the market actually wants first.
|
||||
136. **"Small teams will pay"** — Maybe only enterprises have enough LLM spend to justify a cost optimization tool. The bootstrap-friendly market might be too small.
|
||||
137. **"Routing decisions can be automated"** — Maybe the task complexity is too nuanced for automated classification. Maybe humans need to define routing rules manually, which reduces the magic.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Synthesis
|
||||
|
||||
*"What a session! We generated a LOT of signal. Let me pull together the themes, rank the winners, and highlight the wild cards that could change everything."*
|
||||
|
||||
### Top 10 Most Promising Ideas (Ranked)
|
||||
|
||||
| Rank | Idea | Why It Wins |
|
||||
|------|------|-------------|
|
||||
| 1 | **OpenAI-compatible proxy with zero-code setup** (#60) | Lowest adoption barrier. Change one URL, start saving. This IS the product. |
|
||||
| 2 | **Cascading routing — try cheap first, escalate on low confidence** (#36) | Elegant, automatic, measurable savings. The core routing innovation. |
|
||||
| 3 | **Cost attribution by feature/team/environment** (#47) | The dashboard killer feature. Nobody else does this well. Solves the "who's spending?" problem. |
|
||||
| 4 | **Open-core model — OSS proxy + paid dashboard** (#74) | De-risks adoption, builds community, creates top-of-funnel. LiteLLM proves the model works. |
|
||||
| 5 | **Semantic response cache** (#88) | 20-40% cost reduction with zero quality impact. Immediate, provable ROI. |
|
||||
| 6 | **Anomaly detection with Slack/PagerDuty alerts** (#49) | Prevents the "$3K surprise bill" story. Emotional resonance + clear value. |
|
||||
| 7 | **Spend-tier pricing model** (#76) | Aligns with customer growth. Free tier drives adoption. Simple to understand. |
|
||||
| 8 | **Routing intelligence flywheel / data moat** (#99) | The strategic moat. More traffic = better routing = more savings = more traffic. |
|
||||
| 9 | **Model comparison reports with savings estimates** (#50) | "Switch this task to Haiku, save $2,100/month." Actionable, specific, compelling. |
|
||||
| 10 | **Prompt efficiency scoring & optimization** (#51, #96) | Unique differentiator. Nobody else helps you write cheaper prompts. |
|
||||
|
||||
### 3 Wild Card Ideas That Could Be Game-Changers
|
||||
|
||||
🃏 **Wild Card 1: Model Distillation-as-a-Service (#90)**
|
||||
Automatically fine-tune a small, cheap model on your specific tasks using your expensive model's outputs. This turns a cost optimization tool into an AI platform play. If it works, customers save 90%+ and are locked in forever because the distilled model is trained on THEIR data. Massive moat.
|
||||
|
||||
🃏 **Wild Card 2: Cooperative Buying Group (#92)**
|
||||
Pool hundreds of small companies' LLM usage to negotiate enterprise-tier pricing from providers. Like a credit union for AI compute. This creates a network effect that's nearly impossible to replicate and positions the company as the "collective bargaining agent" for the long tail of AI-using startups.
|
||||
|
||||
🃏 **Wild Card 3: Self-Hosted Model Recommender (#98)**
|
||||
"Based on your usage patterns, deploying Llama 3 70B on 2x A100s would save you $14K/month vs. API calls." This extends the value prop beyond routing to infrastructure advisory. It's the natural evolution: first optimize API costs, then help customers graduate to self-hosting when it makes sense. Counter-intuitive (you lose the routing revenue) but builds massive trust and opens up a consulting/managed-service revenue stream.
|
||||
|
||||
### Key Themes That Emerged
|
||||
|
||||
1. **"Invisible by default"** — The winning product requires zero code changes. Proxy architecture with OpenAI-compatible API is non-negotiable. Adoption friction kills.
|
||||
|
||||
2. **"Show me the money"** — Every feature must connect to a dollar amount. Not "better observability" but "you saved $4,200 this month." The dashboard is a savings scoreboard.
|
||||
|
||||
3. **"Trust is the bottleneck"** — Routing all LLM traffic through a third party is a massive trust ask. The product needs SOC 2 from day one, data residency options, and an analytics-only mode for cautious adopters.
|
||||
|
||||
4. **"The moat is in the data"** — The routing intelligence flywheel is the only sustainable competitive advantage. Everything else can be cloned. The cross-customer performance database cannot.
|
||||
|
||||
5. **"Start narrow, expand wide"** — Start as a cost router. Expand to prompt optimization, model distillation, self-hosted recommendations. The wedge is cost savings; the platform is AI operations.
|
||||
|
||||
6. **"Open source is a feature, not a threat"** — LiteLLM proves OSS proxy works. Don't fight it — embrace it. Open-source the proxy, monetize the intelligence layer.
|
||||
|
||||
### Recommended Focus Areas for Product Brief
|
||||
|
||||
**Must-Have (V1 — "Save money in 5 minutes"):**
|
||||
- OpenAI-compatible proxy (drop-in replacement)
|
||||
- Complexity-based routing with cascading fallback
|
||||
- Real-time cost dashboard with per-feature attribution
|
||||
- Anomaly detection + Slack alerts
|
||||
- Semantic response caching
|
||||
- Free tier for <$500/month LLM spend
|
||||
|
||||
**Should-Have (V1.1 — "Prove the ROI"):**
|
||||
- Model comparison reports with savings recommendations
|
||||
- Prompt efficiency scoring
|
||||
- Budget guardrails (soft/hard limits per team)
|
||||
- Multi-provider bill reconciliation
|
||||
|
||||
**Could-Have (V2 — "Platform play"):**
|
||||
- A/B test routing for model evaluation
|
||||
- Prompt compression/optimization engine
|
||||
- Self-hosted model cost comparison
|
||||
- OpenTelemetry export for existing observability stacks
|
||||
|
||||
**Future Vision (V3+ — "AI FinOps platform"):**
|
||||
- Model distillation-as-a-service
|
||||
- Cooperative buying group for volume discounts
|
||||
- AI agent policy engine (guardrails for agentic workflows)
|
||||
- Carbon footprint tracking and offset marketplace
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Idea Count by Phase
|
||||
|
||||
| Phase | Target | Actual |
|
||||
|-------|--------|--------|
|
||||
| Phase 1: Problem Space | 20+ | 32 |
|
||||
| Phase 2: Solution Space | 30+ | 66 |
|
||||
| Phase 3: Differentiation | 15+ | 17 |
|
||||
| Phase 4: Anti-Ideas | 10+ | 22 |
|
||||
| **Total unique ideas** | **100+** | **137** |
|
||||
|
||||
## Techniques Used
|
||||
|
||||
- **Classic Brainstorm (Free Association):** Ideas 1–10, 33–45
|
||||
- **Reverse Brainstorm:** Ideas 11–20, 116–123
|
||||
- **SCAMPER:** Ideas 21–26, 103–106
|
||||
- **Analogy Thinking:** Ideas 27–32 (Toyota Production System), 80–86 (CDN architecture), 99–102 (Waze data moat)
|
||||
- **Mind Map Explosion:** Ideas 46–59
|
||||
- **Random Word Association:** Ideas 60–70
|
||||
- **Worst Possible Idea → Invert:** Ideas 71–79
|
||||
- **Lateral Thinking:** Ideas 87–98
|
||||
- **Red Team / Black Hat:** Ideas 124–137
|
||||
|
||||
---
|
||||
|
||||
*Session complete. 137 ideas generated. The signal is strong: this product has legs. The proxy-first, open-core approach with a data-driven routing moat is the play. Now let's turn this into a product brief.* 🎯
|
||||
1013
products/01-llm-cost-router/design-thinking/session.md
Normal file
1013
products/01-llm-cost-router/design-thinking/session.md
Normal file
File diff suppressed because it is too large
Load Diff
340
products/01-llm-cost-router/epics/epics.md
Normal file
340
products/01-llm-cost-router/epics/epics.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# dd0c/route — V1 MVP Epics
|
||||
|
||||
This document outlines the core Epics and User Stories for the V1 MVP of dd0c/route, designed for a solo founder to implement in 1-3 day chunks per story.
|
||||
|
||||
---
|
||||
|
||||
## Epic 1: Proxy Engine
|
||||
**Description:** Core Rust proxy that sits between the client application and LLM providers. Must maintain strict OpenAI API compatibility, support SSE streaming, and introduce <5ms latency overhead.
|
||||
|
||||
### User Stories
|
||||
- **Story 1.1:** As a developer, I want to swap my `OPENAI_BASE_URL` to the proxy endpoint, so that my existing OpenAI SDK works without code changes.
|
||||
- **Story 1.2:** As a developer, I want streaming support (SSE) preserved, so that my chat applications remain responsive while using the proxy.
|
||||
- **Story 1.3:** As a platform engineer, I want the proxy latency overhead to be <5ms, so that intelligent routing doesn't degrade our application's user experience.
|
||||
- **Story 1.4:** As a developer, I want provider errors (e.g., rate limits) to be passed through transparently, so that my app's existing error handling continues to work.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Implements `POST /v1/chat/completions` for both streaming (`stream: true`) and non-streaming requests.
|
||||
- Validates the `Authorization: Bearer` header against a Redis cache (falling back to DB).
|
||||
- Successfully forwards requests to OpenAI and Anthropic, translating formats if necessary.
|
||||
- Asynchronously emits telemetry events to an in-memory channel without blocking the hot path.
|
||||
- P99 latency overhead is measured at <5ms.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: None
|
||||
### Technical Notes:
|
||||
- Stack: Rust, `tokio`, `hyper`, `axum`.
|
||||
- Use connection pooling for upstream providers to eliminate TLS handshake overhead.
|
||||
- For streaming, parse only the first chunk/headers to make a routing decision, then passthrough. Count tokens from the final SSE chunk (e.g., `[DONE]`).
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 2: Router Brain
|
||||
**Description:** The intelligence core of dd0c/route embedded within the proxy. It evaluates incoming requests against routing rules, classifies complexity heuristically, checks cost tables, and executes fallback chains.
|
||||
|
||||
### User Stories
|
||||
- **Story 2.1:** As an engineering manager, I want the router to classify the complexity of requests, so that simple extraction tasks are downgraded to cheaper models.
|
||||
- **Story 2.2:** As an engineering manager, I want to configure routing rules (e.g., if feature=classify -> use cheapest from [gpt-4o-mini, claude-haiku]), so that I can automatically save money on predictable workloads.
|
||||
- **Story 2.3:** As an application developer, I want the router to automatically fallback to an alternative model if the primary model fails or rate-limits, so that my application remains highly available.
|
||||
- **Story 2.4:** As an engineering manager, I want cost savings calculated instantly based on up-to-date provider pricing, so that my dashboard data is immediately accurate.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Heuristic complexity classifier runs in <2ms based on token count, task patterns (regex on system prompt), and model hints.
|
||||
- Evaluates first-match routing rules based on request tags (`X-DD0C-Feature`, `X-DD0C-Team`).
|
||||
- Executes "passthrough", "cheapest", "quality-first", and "cascading" routing strategies.
|
||||
- Enforces circuit breakers on downstream providers (e.g., open circuit if error rate > 10%).
|
||||
- Calculates `cost_saved = cost_original - cost_actual` on the fly using in-memory cost tables.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 1 (Proxy Engine)
|
||||
### Technical Notes:
|
||||
- Stack: Rust.
|
||||
- Run purely in-memory on the proxy hot path. No DB queries per request.
|
||||
- Cost tables and routing rules must be loaded at startup and refreshed via a background task every 60s.
|
||||
- Use `serde_json` to inspect the `messages` array for complexity classification but do not persist the prompt.
|
||||
- Circuit breaker state must be shared via Redis so all proxy instances agree on provider health.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 3: Analytics Pipeline
|
||||
**Description:** High-throughput logging and aggregation system using TimescaleDB. Focuses on ingesting asynchronous telemetry from the Proxy Engine without blocking request processing.
|
||||
|
||||
### User Stories
|
||||
- **Story 3.1:** As a platform engineer, I want the proxy to emit telemetry without blocking the main request thread, so that our application performance remains unaffected.
|
||||
- **Story 3.2:** As an engineering manager, I want my dashboard queries to be lightning fast even with millions of rows, so that I can quickly slice and dice our AI spend.
|
||||
- **Story 3.3:** As an engineering manager, I want historical telemetry to be compressed or aged out automatically, so that the database storage costs remain minimal.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Proxy emits a `RequestEvent` over an in-memory `mpsc` channel via `tokio::spawn`.
|
||||
- A background worker batches events and inserts them into TimescaleDB every 1s or 100 events using bulk `COPY INTO`.
|
||||
- Continuous aggregates (`hourly_cost_summary`, `daily_cost_summary`) are created and updated on schedule to pre-calculate `total_cost`, `total_saved`, and `avg_latency`.
|
||||
- TimescaleDB compression policies compress chunks older than 7 days by 90%+.
|
||||
- The proxy must degrade gracefully if the analytics database is unavailable.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 1 (Proxy Engine)
|
||||
### Technical Notes:
|
||||
- Stack: Rust (worker), PostgreSQL/TimescaleDB.
|
||||
- Write the TimescaleDB migration scripts for the hypertable `request_events` and the continuous aggregates.
|
||||
- Batching must be robust to worker panics (use bounded channels).
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 4: Dashboard API
|
||||
**Description:** Axum REST API providing authentication, org/team management, routing rule CRUD, and data endpoints for the frontend dashboard. Focuses on frictionless developer onboarding.
|
||||
|
||||
### User Stories
|
||||
- **Story 4.1:** As an engineering manager, I want to authenticate via GitHub OAuth, so that I can create an organization and get an API key in under 60 seconds without remembering a password.
|
||||
- **Story 4.2:** As an engineering manager, I want to manage my organization's routing rules and provider API keys securely, so that dd0c/route can successfully broker requests to OpenAI and Anthropic.
|
||||
- **Story 4.3:** As an engineering manager, I want an endpoint that provides my historical spend and savings summary, so that I can visualize it in the UI.
|
||||
- **Story 4.4:** As a platform engineer, I want to revoke an active API key, so that compromised credentials are immediately blocked.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Implements `/api/auth/github` OAuth flow issuing JWTs and refresh tokens.
|
||||
- Implements `/api/orgs` CRUD for managing an organization and API keys.
|
||||
- Implements `/api/dashboard/summary` and `/api/dashboard/treemap` queries hitting the TimescaleDB continuous aggregates.
|
||||
- Implements `/api/requests` for the request inspector with filters (e.g., `model`, `feature`, `team`).
|
||||
- Securely stores and encrypts provider API keys in PostgreSQL using an AES-256-GCM Data Encryption Key.
|
||||
- Enforces an RBAC model (Owner, Member) per organization.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: Epic 3 (Analytics Pipeline)
|
||||
### Technical Notes:
|
||||
- Stack: Rust (`axum`), PostgreSQL.
|
||||
- Reuse `tokio` runtime to minimize context switches for a solo founder.
|
||||
- Use `oauth2` crate for GitHub integration. JWTs are signed with RS256, refresh tokens in Redis.
|
||||
- Ensure API keys are hashed (SHA-256) before storage; raw keys are never stored.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 5: Dashboard UI
|
||||
**Description:** The React SPA serving the cost attribution dashboard. Visualizes the AI spend treemap, routing rules editor, real-time ticker, and request inspector. This is the product's primary visual "Aha" moment.
|
||||
|
||||
### User Stories
|
||||
- **Story 5.1:** As an engineering manager, I want to see a treemap of my organization's AI spend broken down by team, feature, and model, so that I can instantly identify the most expensive areas of my application.
|
||||
- **Story 5.2:** As an engineering manager, I want a real-time counter showing "You saved $X this week," so that I feel confident the tool is paying for itself.
|
||||
- **Story 5.3:** As a platform engineer, I want an interface to configure routing rules (e.g., drag-to-reorder priority), so that I can instruct the proxy without editing config files.
|
||||
- **Story 5.4:** As a platform engineer, I want a request inspector that displays metadata, cost, latency, and the specific routing decision for every request, so that I can debug why a certain model was chosen.
|
||||
|
||||
### Acceptance Criteria
|
||||
- React + Vite SPA deployed as static assets to S3 + CloudFront.
|
||||
- Treemap visualization renders cost aggregations dynamically over selected time periods (7d/30d/90d).
|
||||
- A routing rules editor allows CRUD operations and priority reordering for a team's rules.
|
||||
- Request Inspector table displays paginated, filterable (`feature`, `team`, `status`) lists of telemetry without showing prompt content.
|
||||
- Allows an admin to securely input OpenAI and Anthropic API keys.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: Epic 4 (Dashboard API)
|
||||
### Technical Notes:
|
||||
- Stack: React, TypeScript, Vite, Tailwind CSS.
|
||||
- No SSR required for V1 (keep it simple). Use `react-query` or similar for data fetching and caching.
|
||||
- Build the treemap with a charting library like D3 or Recharts.
|
||||
- Emphasize speed—data fetches should resolve from continuous aggregates in <200ms.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 6: Shadow Audit CLI
|
||||
**Description:** The PLG "Shadow Audit" command-line tool (`npx dd0c-scan`). It analyzes a local codebase for LLM API calls, estimates monthly cost based on prompt templates, and projects savings with dd0c/route.
|
||||
|
||||
### User Stories
|
||||
- **Story 6.1:** As a developer, I want a zero-setup CLI tool (`npx dd0c-scan`) that scans my codebase and estimates how much money I'm currently wasting on overqualified LLMs, so that I can convince my manager to use dd0c/route.
|
||||
- **Story 6.2:** As an engineering manager, I want the CLI to run locally without sending my source code to a third party, so that I can securely audit my own projects.
|
||||
- **Story 6.3:** As an engineering manager, I want a clean, visually appealing terminal report showing "Top Opportunities" for model downgrades, so that I immediately see the value of routing.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Parses a local directory for OpenAI or Anthropic SDK usage in TypeScript/JavaScript/Python files.
|
||||
- Identifies the models requested in the code and estimates token usage heuristically based on the strings passed to the SDK.
|
||||
- Hits `/api/v1/pricing/current` to fetch the latest cost tables and calculates an estimated monthly bill and projected savings.
|
||||
- Outputs a formatted terminal report showing total potential savings and a breakdown of the highest-impact files.
|
||||
- Anonymized scan summary is sent to the server only if the user explicitly opts in.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 4 (Dashboard API - Pricing Endpoint)
|
||||
### Technical Notes:
|
||||
- Stack: Node.js, `commander`, `chalk`, simple regex parsers for Python/JS SDKs.
|
||||
- Keep the CLI lightweight, fast, and dependency-free as possible. No actual LLM parsing; use heuristics (string length/structure) for token estimates.
|
||||
- Must run completely offline if the pricing table is cached.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 7: Slack Integration
|
||||
**Description:** The primary retention mechanism and anomaly alerting system. An asynchronous worker task dispatches weekly savings digests and threshold-based budget alerts to Slack and Email.
|
||||
|
||||
### User Stories
|
||||
- **Story 7.1:** As an engineering manager, I want an automated weekly digest summarizing my team's AI savings, so that I can easily report to the CFO that our tooling investment is paying off.
|
||||
- **Story 7.2:** As a platform engineer, I want to configure a budget limit (e.g., alert if daily spend > $100) and receive a Slack webhook notification immediately, so that I can stop a retry storm before the bill gets out of hand.
|
||||
- **Story 7.3:** As an engineering manager, I want an email version of the weekly digest, so that I can forward it straight to my leadership team.
|
||||
|
||||
### Acceptance Criteria
|
||||
- A standalone asynchronous worker (`dd0c-worker`) evaluates continuous aggregates (via TimescaleDB) every hour.
|
||||
- Generates a "Monday Morning Digest" email via AWS SES.
|
||||
- Emits Slack webhook payloads when a threshold alert is triggered (`threshold_amount`, `threshold_pct`).
|
||||
- Adds a `X-DD0C-Signature` to outbound webhooks to prevent spoofing.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 3 (Analytics Pipeline), Epic 4 (Dashboard API)
|
||||
### Technical Notes:
|
||||
- Stack: Rust (`tokio-cron`), `reqwest` (for webhooks), AWS SES.
|
||||
- Worker is a singleton container (1 task) running alongside the proxy to avoid lock contention on cron tasks.
|
||||
- Ensure alerts maintain state (using PostgreSQL `alert_configs` and `last_fired_at`) so users aren't spammed for the same incident.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 8: Infrastructure & DevOps
|
||||
**Description:** Containerized ECS Fargate deployment, AWS native networking, basic monitoring, and fully automated CI/CD for the entire dd0c stack. Essential for a solo founder to deploy safely and frequently.
|
||||
|
||||
### User Stories
|
||||
- **Story 8.1:** As a solo founder, I want to use AWS ECS Fargate, so that I don't have to manage EC2 instances or worry about OS-level patching.
|
||||
- **Story 8.2:** As a solo founder, I want a GitHub Actions CI/CD pipeline, so that `git push` automatically runs tests, builds containers, and deploys rolling updates with zero downtime.
|
||||
- **Story 8.3:** As an operator, I want standard AWS CloudWatch alarms (e.g., P99 proxy latency > 50ms) connected to PagerDuty, so that I am only woken up when a critical threshold is breached.
|
||||
- **Story 8.4:** As a solo founder, I want a strict separation between my configuration (PostgreSQL) and telemetry (TimescaleDB) stores, so that I can scale analytics independently from org/auth state.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Full AWS infrastructure defined via CDK (TypeScript) or Terraform.
|
||||
- ALB routes `/v1/*` to the proxy container, `/api/*` to the dashboard API container.
|
||||
- Dashboard static assets deployed to an S3 bucket with CloudFront caching.
|
||||
- `docker build` produces three optimized images from a single Rust workspace (`dd0c-proxy`, `dd0c-api`, `dd0c-worker`).
|
||||
- CloudWatch dashboards and minimum alarms configured (CPU >80%, Proxy Error Rate >5%, ALB 5xx Rate).
|
||||
- `git push main` triggers a GitHub Action to test, lint, build, push to ECR, and update the ECS Fargate services.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: Epic 1 (Proxy Engine), Epic 4 (Dashboard API)
|
||||
### Technical Notes:
|
||||
- Stack: AWS ECS Fargate, ALB, CloudFront, S3, RDS (PostgreSQL/TimescaleDB), ElastiCache (Redis), GitHub Actions.
|
||||
- Ensure the ALB utilizes path-based routing correctly and handles TLS termination.
|
||||
- For cost optimization on AWS, explore consolidating NAT Gateways or utilizing VPC Endpoints for S3/ECR/CloudWatch.
|
||||
|
||||
---
|
||||
|
||||
## Epic 9: Onboarding & PLG
|
||||
**Description:** Self-serve signup, free tier, API key management, and a getting-started flow that gets users routing their first LLM call through dd0c/route in under 2 minutes. This is the growth engine.
|
||||
|
||||
### User Stories
|
||||
- **Story 9.1:** As a new user, I want to sign up with GitHub OAuth in one click, so that I can start using dd0c/route without filling out forms.
|
||||
- **Story 9.2:** As a new user, I want a free tier (up to $50/month in routed LLM spend), so that I can evaluate the product with real traffic before committing.
|
||||
- **Story 9.3:** As a developer, I want to generate and manage API keys from the dashboard, so that I can integrate dd0c/route into my applications.
|
||||
- **Story 9.4:** As a new user, I want a guided "First Route" onboarding flow that gives me a working curl command, so that I see cost savings within 2 minutes of signing up.
|
||||
- **Story 9.5:** As a team lead, I want to invite team members via email, so that my team can share a single org and see aggregated savings.
|
||||
|
||||
### Acceptance Criteria
|
||||
- GitHub OAuth signup creates org + first API key automatically.
|
||||
- Free tier enforced at the proxy level — requests beyond $50/month routed spend return 429 with upgrade CTA.
|
||||
- API key CRUD: create, list, revoke, rotate. Keys are hashed at rest (bcrypt), only shown once on creation.
|
||||
- Onboarding wizard: 3 steps — (1) copy API key, (2) paste curl command, (3) see first request in dashboard. Completion rate tracked.
|
||||
- Team invite sends email with magic link. Invited user joins existing org on signup.
|
||||
- Stripe Checkout integration for upgrade from free → paid ($49/month base).
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 4 (Dashboard API), Epic 5 (Dashboard UI)
|
||||
### Technical Notes:
|
||||
- Use Stripe Checkout Sessions for payment — no custom billing UI needed for V1.
|
||||
- Free tier enforcement happens in the proxy hot path — must be O(1) lookup (Redis counter per org, reset monthly via cron).
|
||||
- Onboarding completion events tracked via PostHog or simple DB events for funnel analysis.
|
||||
- Magic link invites use signed JWTs with 72-hour expiry, stored in `pending_invites` table.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 10: Transparent Factory Compliance
|
||||
**Description:** Cross-cutting epic ensuring dd0c/route adheres to the 5 Transparent Factory architectural tenets: Atomic Flagging, Elastic Schema, Cognitive Durability, Semantic Observability, and Configurable Autonomy. These stories are woven across the existing system — they don't add features, they add engineering discipline.
|
||||
|
||||
### Story 10.1: Atomic Flagging — Feature Flag Infrastructure
|
||||
**As a** solo founder, **I want** every new routing rule, cost threshold, and provider failover behavior wrapped in a feature flag (default: off), **so that** I can deploy code continuously without risking production traffic.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- OpenFeature SDK integrated into the Rust proxy via a compatible provider (e.g., `flagd` sidecar or env-based provider for V1).
|
||||
- All flags evaluate locally (in-memory or sidecar) — zero network calls on the hot path.
|
||||
- Every flag has an `owner` field and a `ttl` (max 14 days). CI blocks deployment if any flag exceeds its TTL at 100% rollout.
|
||||
- Automated circuit breaker: if a flagged code path increases P99 latency by >5% or error rate >2%, the flag auto-disables within 30 seconds.
|
||||
- Flags exist for: model routing strategies, complexity classifier thresholds, provider failover chains, new dashboard features.
|
||||
|
||||
**Estimate:** 5 points
|
||||
**Dependencies:** Epic 1 (Proxy Engine), Epic 2 (Router Brain)
|
||||
**Technical Notes:**
|
||||
- Use OpenFeature Rust SDK. For V1, a simple JSON file or env-var provider is fine — no LaunchDarkly needed.
|
||||
- Circuit breaker integration: extend the existing Redis-backed circuit breaker to also flip flags.
|
||||
- Flag cleanup: add a `make flag-audit` target that lists expired flags.
|
||||
|
||||
### Story 10.2: Elastic Schema — Additive-Only Migration Discipline
|
||||
**As a** solo founder, **I want** all TimescaleDB and Redis schema changes to be strictly additive, **so that** I can roll back any deployment instantly without data loss or broken readers.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- CI lint step rejects any migration containing `DROP`, `ALTER ... TYPE`, or `RENAME` on existing columns.
|
||||
- New fields use `_v2` suffix or a new table when breaking changes are unavoidable.
|
||||
- All Rust structs use `#[serde(deny_unknown_fields = false)]` (or equivalent) so V1 code ignores V2 fields.
|
||||
- Dual-write pattern documented and enforced: during migration windows, the API writes to both old and new schema targets within the same DB transaction.
|
||||
- Every migration file includes a `sunset_date` comment (max 30 days). A CI check warns if any migration is past sunset without cleanup.
|
||||
|
||||
**Estimate:** 3 points
|
||||
**Dependencies:** Epic 3 (Analytics Pipeline)
|
||||
**Technical Notes:**
|
||||
- Use `sqlx` migration files. Add a pre-commit hook or CI step that greps for forbidden DDL keywords.
|
||||
- Redis key schema: version keys with prefix (e.g., `route:v1:config`, `route:v2:config`). Never rename keys.
|
||||
- For the `request_events` hypertable, new columns are always `NULLABLE` with defaults.
|
||||
|
||||
### Story 10.3: Cognitive Durability — Decision Logs for Routing Logic
|
||||
**As a** future maintainer (or future me), **I want** every change to routing algorithms, cost models, or provider selection logic accompanied by a `decision_log.json`, **so that** I can understand *why* a decision was made months later in under 60 seconds.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- `decision_log.json` schema defined: `{ prompt, reasoning, alternatives_considered, confidence, timestamp, author }`.
|
||||
- CI requires a `decision_log.json` entry for any PR touching `src/router/`, `src/cost/`, or migration files.
|
||||
- Cyclomatic complexity cap of 10 enforced via `cargo clippy` or a custom lint. PRs exceeding this are blocked.
|
||||
- Decision logs are committed alongside code in a `docs/decisions/` directory, one file per significant change.
|
||||
|
||||
**Estimate:** 2 points
|
||||
**Dependencies:** None
|
||||
**Technical Notes:**
|
||||
- Use a PR template that prompts for the decision log fields.
|
||||
- For the complexity cap, `cargo clippy -W clippy::cognitive_complexity` with threshold 10.
|
||||
- Decision logs for cost table updates should include: source of pricing data, comparison with previous rates, expected savings impact.
|
||||
|
||||
### Story 10.4: Semantic Observability — AI Reasoning Spans on Routing Decisions
|
||||
**As a** platform engineer debugging a misrouted request, **I want** every proxy routing decision to emit an OpenTelemetry span with structured AI reasoning metadata, **so that** I can trace exactly which model was chosen, why, and what alternatives were rejected.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Every `/v1/chat/completions` request generates an `ai_routing_decision` span as a child of the request trace.
|
||||
- Span attributes include: `ai.model_selected`, `ai.model_alternatives` (JSON array of rejected models + reasons), `ai.cost_delta` (savings vs. default), `ai.complexity_score`, `ai.routing_strategy` (passthrough/cheapest/quality-first/cascading).
|
||||
- `ai.prompt_hash` (SHA-256 of first 500 chars of system prompt) included for correlation — never raw prompt content.
|
||||
- Spans export to any OTLP-compatible backend (Grafana Cloud, Jaeger, etc.).
|
||||
- No PII in any span attribute. Prompt content is hashed, not logged.
|
||||
|
||||
**Estimate:** 3 points
|
||||
**Dependencies:** Epic 1 (Proxy Engine), Epic 2 (Router Brain)
|
||||
**Technical Notes:**
|
||||
- Use `tracing` + `opentelemetry-rust` crate with OTLP exporter.
|
||||
- The span should be created *inside* the router decision function, not as middleware — it needs access to the alternatives list.
|
||||
- For V1, export to stdout in OTLP JSON format. Production: OTLP gRPC to a collector.
|
||||
|
||||
### Story 10.5: Configurable Autonomy — Governance Policy for Automated Routing
|
||||
**As a** solo founder, **I want** a `policy.json` governance file that controls what the system is allowed to do autonomously (e.g., switch models, update cost tables, add providers), **so that** I maintain human oversight as the system grows.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- `policy.json` defines `governance_mode`: `strict` (all changes require manual approval) or `audit` (changes auto-apply but are logged).
|
||||
- The proxy checks `governance_mode` before applying any runtime config change (routing rule update, cost table refresh, provider addition).
|
||||
- `panic_mode` flag: when set to `true`, the proxy freezes all routing rules to their last-known-good state, disables auto-failover, and routes everything to a single hardcoded provider.
|
||||
- Governance drift monitoring: a weekly cron job logs the ratio of auto-applied vs. manually-approved changes. If auto-applied changes exceed 80% in `strict` mode, an alert fires.
|
||||
- All policy check decisions logged: "Allowed by audit mode", "Blocked by strict mode", "Panic mode active — frozen".
|
||||
|
||||
**Estimate:** 3 points
|
||||
**Dependencies:** Epic 2 (Router Brain)
|
||||
**Technical Notes:**
|
||||
- `policy.json` lives in the repo root and is loaded at startup + watched for changes via `notify` crate.
|
||||
- For V1 as a solo founder, start in `audit` mode. `strict` mode is for when you hire or add AI agents to the pipeline.
|
||||
- Panic mode should be triggerable via a single API call (`POST /admin/panic`) or by setting an env var — whichever is faster in an emergency.
|
||||
|
||||
### Epic 10 Summary
|
||||
| Story | Tenet | Points |
|
||||
|-------|-------|--------|
|
||||
| 10.1 | Atomic Flagging | 5 |
|
||||
| 10.2 | Elastic Schema | 3 |
|
||||
| 10.3 | Cognitive Durability | 2 |
|
||||
| 10.4 | Semantic Observability | 3 |
|
||||
| 10.5 | Configurable Autonomy | 3 |
|
||||
| **Total** | | **16** |
|
||||
1122
products/01-llm-cost-router/innovation-strategy/session.md
Normal file
1122
products/01-llm-cost-router/innovation-strategy/session.md
Normal file
File diff suppressed because it is too large
Load Diff
121
products/01-llm-cost-router/party-mode/session.md
Normal file
121
products/01-llm-cost-router/party-mode/session.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# 🎉 dd0c/route — Advisory Board "Party Mode" Review
|
||||
|
||||
**Date:** February 28, 2026
|
||||
**Product Under Review:** dd0c/route — LLM Cost Router & Optimization Dashboard
|
||||
**Format:** BMad Creative Intelligence Suite — "Party Mode" (5 Expert Panelists)
|
||||
|
||||
---
|
||||
|
||||
## Round 1: INDIVIDUAL REVIEWS
|
||||
|
||||
### 💸 The VC
|
||||
**What excites me:** The market math is undeniable. Inference is eating the world, and companies are bleeding cash because developers are lazy and just use `gpt-4o` for everything. The wedge—changing one base URL—is brilliant. If the "Shadow Audit" can actually show a CFO they're wasting $10K a month before they even adopt the tool, that's a PLG motion that prints money.
|
||||
|
||||
**What worries me:** You have zero structural moat on day one, and you're competing against the hyperscalers' own roadmaps. Why won't OpenAI just release "Smart Tier" routing tomorrow? Why won't AWS Bedrock bake this into their console? Plus, LiteLLM already has the open-source community mindshare. You're telling me a solo bootstrapped founder is going to out-execute YC-backed teams and hyperscalers based on a "data flywheel" that takes 12 months to spin up?
|
||||
|
||||
**Vote: CONDITIONAL GO.** You need to prove the data network effect actually exists. If you can't show that your routing gets demonstrably better with scale by month 6, you're just a wrapper that's waiting to get sherlocked.
|
||||
|
||||
### 🏗️ The CTO
|
||||
**What excites me:** The architectural discipline. Targeting <10ms latency in Rust and integrating with OpenTelemetry right out of the gate shows you actually understand production environments. The fact that you're treating LLM calls like standard infrastructure that needs circuit breakers and fallback chains is exactly how grown-up engineering teams think.
|
||||
|
||||
**What worries me:** Trust and scale. You're asking me to take my company's most sensitive data—customer prompts, PII, proprietary business logic—and pipe it through a side-project proxy run by one guy? Absolutely not. Even if you say you don't log the prompts, my compliance team will laugh you out of the room. If this proxy goes down, my entire AI product goes down.
|
||||
|
||||
**Vote: CONDITIONAL GO.** The V1 SaaS-only proxy is a non-starter for serious teams. You must offer a VPC-deployable data plane (where you only phone home the telemetry to your SaaS dashboard) from Day 1. If you do that, I'm in.
|
||||
|
||||
### 🚀 The Bootstrap Founder
|
||||
**What excites me:** The pricing and the GTM. $49/month is the magic number—it's an expense report, not a procurement cycle. The "Weekly Savings Digest" is an incredible retention hook. If you actually save a team $500/month, they will never churn. 200 customers gets you to $10K MRR. I've built businesses on much flimsier value props than "I will literally hand you back your own money."
|
||||
|
||||
**What worries me:** Scope creep. Victor's strategy doc tells you to build a Rust proxy, a ClickHouse analytics dashboard, a CLI tool, and write weekly thought-leadership content. Bro, you have a day job. You're going to burn out in month two. You cannot fight LiteLLM on features while fighting Portkey on enterprise dashboards as a solo founder.
|
||||
|
||||
**Vote: GO.** But only if you aggressively cut scope. Drop the CLI. Drop the ML classifier for V1. Use dumb heuristics. Launch the proxy and the dashboard in 30 days. Get to $1K MRR before you write a single line of ML code.
|
||||
|
||||
### 📟 The DevOps Practitioner
|
||||
**What excites me:** The "Boring Proxy" concept. I am so tired of my team maintaining a janky Node.js script to balance OpenAI rate limits. A drop-in replacement that handles fallbacks, retries, and gives me standard Prometheus/OTel metrics is a dream.
|
||||
|
||||
**What worries me:** The maintenance nightmare. OpenAI changes their API schema. Anthropic introduces a new prompt caching header. Google deprecates a model. You, a solo dev, have to patch the proxy within hours or my production traffic breaks. I'm taking a hard dependency on your weekend availability. That's a massive operational risk I don't want to absorb.
|
||||
|
||||
**Vote: NO-GO.** LiteLLM has hundreds of contributors fixing API breakages the second they happen. A solo founder cannot keep up with the churn of LLM provider APIs without sacrificing reliability. I wouldn't switch to this.
|
||||
|
||||
### 🃏 The Contrarian
|
||||
**What excites me:** Everyone is entirely focused on the "routing" aspect, but that's actually the least interesting part. The real genius here is the attribution treemap. Nobody cares about saving $400 on API calls if the dashboard can solve the internal political problem of "who is spending all this AI budget?" This is a FinOps tool disguised as a router.
|
||||
|
||||
**What worries me:** The core premise is that LLMs will remain expensive enough to care about. They won't. Prices dropped 90% last year. They'll drop another 90%. When 1M tokens cost a penny, nobody will pay $49/month to optimize it. You're building a highly sophisticated coupon-clipper for a world that's moving toward post-scarcity intelligence.
|
||||
|
||||
**Vote: GO (BUT PIVOT).** Ditch the proxy entirely. Forget routing. Just ingest existing logs and give Marcus his CFO slide deck. Charge $99/mo for pure AI cost attribution. It removes the latency risk, removes the "proxy going down" risk, and solves the real human pain point (looking stupid in a budget meeting).
|
||||
|
||||
---
|
||||
|
||||
### Round 2: CROSS-EXAMINATION
|
||||
|
||||
**The VC:** Bootstrap, you're delusional if you think 200 customers at $49/mo is a defensible business in this space. If Datadog or Helicone turns this on for free, your 200 customers churn overnight. You need enterprise contracts to survive the hyperscaler onslaught.
|
||||
|
||||
**The Bootstrap Founder:** VC, that's why you lose money on 99% of your bets. I don't need a $1B exit. $10K MRR pays the mortgage. Datadog charges $23/host minimum; they aren't giving LLM cost tracking away for free. And enterprise deals take 9 months to close. Brian needs cash flow on day 30, not a 60-page vendor security questionnaire next year.
|
||||
|
||||
**The CTO:** Contrarian, your "post-scarcity intelligence" theory is cute, but mathematically illiterate. Yes, per-token prices drop, but usage is exploding. Agentic workflows use 100x the tokens of standard RAG. The bill isn't going away, it's just shifting from "expensive models" to "massive volume." The routing still matters tremendously.
|
||||
|
||||
**The Contrarian:** If volume explodes 100x, CTO, then latency matters 100x more. Do you really think teams will add a third-party Rust proxy hop to an agentic loop running 50 times a second? They'll build the routing logic directly into their clients. The proxy is a dead end architecture for high-volume agents.
|
||||
|
||||
**The DevOps Practitioner:** Exactly. The proxy is a dead end because the second Anthropic changes their API schema on a Friday night, Brian is asleep, and my agentic loop is throwing 500s. I'm the one getting paged at 2 AM.
|
||||
|
||||
**The Bootstrap Founder:** DevOps, you're projecting your own operational PTSD onto the product. Brian isn't supporting 1,600 models like Portkey. He's supporting OpenAI and Anthropic. Two APIs. They don't make breaking schema changes every Friday. It's totally manageable.
|
||||
|
||||
**The DevOps Practitioner:** They literally just introduced prompt caching headers, structured outputs, and vision payloads in the last 6 months! If the proxy doesn't support the new feature on day one, my ML engineers scream at me that they can't use the new toy. I am not putting a bottleneck in front of the fastest-moving APIs in tech.
|
||||
|
||||
**The VC:** DevOps is right. The maintenance overhead is brutal. This is why I said there's no moat. You're building a feature, not a platform. The moment OpenAI releases "Smart Tier" routing, you have zero differentiation. Why won't Sam Altman just eat your lunch?
|
||||
|
||||
**The Contrarian:** VC, you're missing the point again. Sam Altman doesn't care about attribution by feature, team, and environment. He just wants your total API spend to go up. OpenAI's dashboard will never tell you "Team Backend wasted $400 on the summarizer." That's why I say ditch the proxy and just do the analytics. You own the FinOps dashboard, not the pipe.
|
||||
|
||||
**The CTO:** Ditching the proxy kills the "Shadow Audit" and the real-time cost prevention, Contrarian. If you only do analytics, you're just looking in the rearview mirror. You're telling Marcus he crashed the car *after* the bill arrives. The proxy is what stops the bleeding *before* the invoice hits.
|
||||
|
||||
**The Bootstrap Founder:** CTO hits the nail on the head. The value prop is "Change this one URL and stop bleeding cash today." You can't do that with a log parser. You need the proxy. Brian just needs to self-host the data plane so DevOps stops hyperventilating about PII and latency.
|
||||
|
||||
**The DevOps Practitioner:** I'll stop hyperventilating when Brian provides a certified Helm chart, an OTel collector, and a signed SLA. Until then, it's a weekend toy masquerading as infrastructure.
|
||||
---
|
||||
|
||||
### Round 3: STRESS TEST
|
||||
|
||||
#### Stress Test 1: What if OpenAI drops prices 90% tomorrow?
|
||||
**The Contrarian:** "This is inevitable. GPT-4o-mini is already basically free. When GPT-5 drops, the older models will go to zero. If the delta between the 'expensive' model and the 'cheap' model is pennies, nobody is paying you $49 to route it."
|
||||
- **Severity (1-10):** 8/10. It fundamentally breaks the core value prop of "save $500/month."
|
||||
- **Can Brian pivot?** Yes. If per-token cost is negligible, the pain point shifts to *token efficiency* (context window stuffing, prompt bloat, latency).
|
||||
- **Mitigation:** Shift the positioning from "we pick the cheapest model" to "we optimize your payload." Build semantic caching, prompt compression, and attribution. The dashboard tracking "Who is running these 100K token prompts?" remains valuable even if the tokens are cheap, because 100K tokens still adds massive latency.
|
||||
|
||||
#### Stress Test 2: What if a well-funded competitor (Helicone, Portkey) copies the exact feature set?
|
||||
**The VC:** "Helicone has YC money and mindshare. Portkey has $3M. If you prove the 'Shadow Audit' and the 'Attribution Treemap' work, they will build them in two weeks with a team of 10 engineers."
|
||||
- **Severity (1-10):** 6/10. It's a risk, but incumbents usually move slower than expected and over-complicate simple features.
|
||||
- **Can Brian pivot?** Yes. Move down-market. Portkey targets enterprise; Helicone targets observability power users. Brian can own the "Series A Bootstrap" niche.
|
||||
- **Mitigation:** Rely on the unique GTM. Brian's advantage isn't the feature, it's the lack of friction. If Portkey builds a treemap but still requires a 30-minute sales call to get an API key, Brian wins the developer who just wants to run `npx dd0c-scan` on a Saturday night. Double down on PLG and community trust.
|
||||
|
||||
#### Stress Test 3: What if enterprises won't trust a proxy with their API keys and prompts?
|
||||
**The CTO:** "I've said it already. You cannot send PII through a third-party startup. It's an automatic hard-stop from Infosec."
|
||||
- **Severity (1-10):** 9/10. This is the single biggest adoption blocker. The "change your base URL" trick only works if you aren't violating GDPR, HIPAA, or SOC 2.
|
||||
- **Can Brian pivot?** Yes, by splitting the control plane and data plane.
|
||||
- **Mitigation:** For V1, accept that you won't close enterprise or healthcare customers. Stick to the beachhead: small SaaS startups who don't have compliance teams yet. For V2, build a self-hosted data plane. The Rust proxy runs inside the customer's VPC, strips the prompt payloads, and only sends telemetry (latency, token counts, cost) back to the dd0c SaaS dashboard.
|
||||
|
||||
---
|
||||
|
||||
### Round 4: FINAL VERDICT
|
||||
|
||||
**The Panel Decision:** A SPLIT DECISION leaning heavily toward **GO (WITH MAJOR SCOPE REDUCTIONS)**.
|
||||
|
||||
**Reasoning:**
|
||||
- **The VC (Go):** The market timing is perfect. AI FinOps is the next category.
|
||||
- **The Bootstrap Founder (Go):** It's a textbook SaaS play with a clear wedge.
|
||||
- **The Contrarian (Go-Pivot):** The value is in attribution, not routing.
|
||||
- **The CTO (Conditional Go):** Only if there's a clear path to self-hosted proxies.
|
||||
- **The DevOps Practitioner (No-Go):** The maintenance overhead of third-party API churn is a trap for a solo dev.
|
||||
|
||||
**Revised Priority Ranking:**
|
||||
This is still **#1 in the dd0c lineup**, assuming Brian focuses on the PLG "Shadow Audit" and simple attribution dashboard rather than trying to build a complex ML routing classifier immediately. It solves a real problem *today* for people with budget authority.
|
||||
|
||||
**Top 3 Things Brian MUST Get Right:**
|
||||
1. **The "Shadow Audit" Wedge:** The CLI that proves they are wasting $1,000s *before* asking them to change their base URL. It's the ultimate sales tool.
|
||||
2. **The 5-Minute Setup:** Changing the base URL and adding an API key must be flawless. If it takes 2 hours to configure YAML, the PLG motion dies.
|
||||
3. **The "Weekly Savings Digest":** Marcus needs an email every Monday showing his CFO why the product is paying for itself. This is the only retention moat in Year 1.
|
||||
|
||||
**The ONE Thing That Kills This If Wrong:**
|
||||
**Adding latency or breaking APIs.** If the Rust proxy adds 100ms instead of 10ms, or if an Anthropic API change breaks production workloads for 12 hours while Brian is at his day job, trust is permanently destroyed. Infrastructure must be invisible.
|
||||
|
||||
**Final Recommendation:**
|
||||
**GO (V1: Analytics First, Router Second).**
|
||||
Brian should build the "Boring Proxy" in Rust (optimizing purely for latency and reliability), but rely heavily on *heuristics* for routing initially. The primary marketing and retention lever should be the **Dashboard Attribution** and the **Shadow Audit**. Launch the cost scan CLI in 30 days to validate demand before building the ML classifier.
|
||||
437
products/01-llm-cost-router/product-brief/brief.md
Normal file
437
products/01-llm-cost-router/product-brief/brief.md
Normal file
@@ -0,0 +1,437 @@
|
||||
# dd0c/route — Product Brief
|
||||
|
||||
**Product:** dd0c/route — LLM Cost Router & Optimization Dashboard
|
||||
**Brand:** 0xDD0C — "All signal. Zero chaos."
|
||||
**Author:** Product Brief synthesized from BMad Creative Intelligence Suite (Phases 1–4)
|
||||
**Date:** February 28, 2026
|
||||
**Status:** Investor-Ready Draft
|
||||
|
||||
---
|
||||
|
||||
## 1. EXECUTIVE SUMMARY
|
||||
|
||||
### Elevator Pitch
|
||||
|
||||
dd0c/route is an OpenAI-compatible proxy that sits between your application and LLM providers, intelligently routing each API request to the cheapest model that meets quality requirements — saving engineering teams 30–50% on AI costs with a single environment variable change. It pairs this routing engine with an attribution dashboard that answers the question no existing tool can: "Who is spending our AI budget, on what, and is it worth it?"
|
||||
|
||||
### Problem Statement
|
||||
|
||||
Enterprise and startup LLM spending is exploding — the global LLM market is projected to reach $36.1B by 2030 (Straits Research), with inference costs representing the fastest-growing line item on engineering budgets. Yet the tooling for managing this spend is stuck in 2022:
|
||||
|
||||
- **60%+ of LLM API calls use overqualified models.** Teams default to GPT-4o for everything — including trivial tasks like JSON formatting, classification, and extraction — because benchmarking alternatives takes days nobody has.
|
||||
- **Zero cost attribution exists at the feature level.** Engineering managers receive a single monthly invoice from OpenAI ("$14,000") with no breakdown by feature, team, or environment. Cloud cost tooling solved this for AWS a decade ago. AI cost tooling hasn't caught up.
|
||||
- **Multi-provider billing is a manual nightmare.** Teams using OpenAI + Anthropic + Google get three separate bills with three different billing models. Reconciliation is a monthly spreadsheet exercise that takes 3–4 hours.
|
||||
- **Cost spikes are invisible until the invoice arrives.** A retry storm, a prompt engineering experiment gone wrong, or a new feature launch can burn $3K in an hour with no alert.
|
||||
|
||||
The result: engineering managers present estimated pie charts to CFOs, ML engineers feel guilty about costs they can't measure, and platform engineers maintain hand-rolled proxy scripts that started as 200 lines and grew to 2,000.
|
||||
|
||||
### Solution Overview
|
||||
|
||||
dd0c/route is a drop-in proxy (change one environment variable: `OPENAI_BASE_URL`) that provides:
|
||||
|
||||
1. **Intelligent Routing:** A complexity classifier analyzes each request and routes it to the cheapest adequate model — GPT-4o requests that are simple extractions get silently downgraded to GPT-4o-mini or Claude Haiku, saving 80–95% per request with negligible quality impact.
|
||||
2. **Cost Attribution Dashboard:** Real-time treemap visualization showing spend by team → feature → endpoint → model. The "CFO slide deck" that writes itself.
|
||||
3. **Weekly Savings Digest:** An automated Monday morning email showing exactly how much dd0c/route saved, broken down by category — the retention mechanism and viral loop (managers forward it to leadership).
|
||||
4. **Budget Guardrails & Anomaly Alerts:** Per-team, per-feature spending limits with Slack/PagerDuty integration. Catches the $3K retry storm before it becomes a $3K invoice.
|
||||
|
||||
### Target Customer Profile
|
||||
|
||||
**Primary Beachhead:** Series A–B SaaS startups with 10–50 engineers, spending $2K–$15K/month on LLM APIs, with no dedicated ML infrastructure team. The CTO or VP Engineering can approve a $49/month tool via expense report — no procurement process, no 6-month evaluation cycle.
|
||||
|
||||
**Why this segment:** They feel the pain acutely ($5K–$15K/month hurts but doesn't justify hiring ML ops), they're technically sophisticated enough to adopt in minutes (they understand API proxies and environment variables), and they talk to each other (startup CTOs share tools in the same Slack communities, meetups, and newsletters).
|
||||
|
||||
### Key Differentiators
|
||||
|
||||
1. **5-Minute Setup, Zero Code Changes.** Change one environment variable. No SDK migration, no code refactor, no YAML configuration marathon. The fastest time-to-value in the category.
|
||||
2. **Attribution-First Design.** Competitors focus on observability (what happened). dd0c/route focuses on attribution (who spent what, on which feature, and was it worth it). The treemap dashboard is the product's signature.
|
||||
3. **"Shadow Audit" Pre-Sale Wedge.** A CLI tool (`npx dd0c-scan`) and passive log analysis mode that proves savings potential *before* asking the customer to route traffic. Value before trust. Evidence before commitment.
|
||||
4. **Transparent, Flat Pricing.** $49/month Pro tier — an expense-report purchase. No per-seat fees that punish adoption, no usage-based billing that recreates the unpredictability problem we're solving.
|
||||
5. **Open-Source Proxy Core.** The routing engine is open-source and self-hostable. The SaaS monetizes the intelligence layer (dashboard, analytics, digest, recommendations). Trust through transparency.
|
||||
|
||||
---
|
||||
|
||||
## 2. MARKET OPPORTUNITY
|
||||
|
||||
### TAM / SAM / SOM
|
||||
|
||||
| Metric | Value | Basis |
|
||||
|--------|-------|-------|
|
||||
| **TAM** | $36.1B by 2030 | Global LLM market (Straits Research). Inference costs are the fastest-growing segment. |
|
||||
| **SAM** | ~$5.4B | LLM API spend by companies with $1K–$100K/month bills — the segment where third-party cost optimization is viable (not too small to care, not large enough to build in-house). Estimated at ~15% of TAM. |
|
||||
| **SOM (Year 1)** | $1.8M–$3.6M | 300–600 paying customers at $49–$199/month average. Achievable via PLG in the Series A–B SaaS beachhead. |
|
||||
|
||||
The FinOps Foundation's 2026 report identifies AI workload cost management as the #1 emerging challenge. Cloud FinOps is a mature $3B+ category; AI FinOps is its greenfield successor with no dominant player.
|
||||
|
||||
### Competitive Landscape
|
||||
|
||||
| Competitor | Positioning | Strengths | Weaknesses | dd0c/route Advantage |
|
||||
|-----------|-------------|-----------|------------|---------------------|
|
||||
| **LiteLLM** | Open-source LLM proxy framework | 15K+ GitHub stars, broad model support (1,600+), active community | No intelligence layer, no attribution dashboard, no SaaS product — it's a framework, not a solution | Product completeness: proxy + dashboard + digest + attribution |
|
||||
| **Portkey** | Enterprise AI gateway | $3M funding, enterprise features, broad provider support | Enterprise sales motion, complex setup, overkill for small teams | 5-minute PLG setup vs. enterprise procurement cycle |
|
||||
| **Helicone** | LLM observability platform | YC-backed, strong developer brand, good logging/tracing | Observability-focused (what happened), not optimization-focused (what to do). No intelligent routing. | Attribution + routing + actionable recommendations vs. passive logging |
|
||||
| **Martian** | AI model router | Smart routing technology, usage-based pricing | Opaque pricing, routing-only (no dashboard/attribution), limited transparency | Transparent routing + full cost attribution dashboard |
|
||||
| **OpenRouter** | Multi-model API gateway | Simple unified API, broad model access | 5% markup on all requests, no cost optimization intelligence, no attribution | Flat pricing + intelligent routing that reduces spend |
|
||||
|
||||
### Timing Thesis
|
||||
|
||||
Three converging forces make Q1 2026 the optimal launch window:
|
||||
|
||||
1. **The "AI in Production" Transition.** Companies are moving from experimentation to production deployment. Production AI costs are operational expenses that demand optimization tooling — creating the "tooling gap" dd0c/route fills.
|
||||
|
||||
2. **Multi-Model Reality.** The era of "just use OpenAI" is ending. Teams now use OpenAI + Anthropic + Google + open-source models. Multi-provider complexity creates demand for a unified routing and attribution layer.
|
||||
|
||||
3. **Agentic AI Volume Explosion.** Agentic workflows make 10–100x more API calls than simple chat. Even as per-token prices drop, total spend increases. The bill isn't going away — it's shifting from "expensive models" to "massive volume."
|
||||
|
||||
### Market Trends
|
||||
|
||||
- LLM inference costs dropped ~90% in 2024–2025, but total enterprise AI spend increased 3x due to volume growth
|
||||
- "AI FinOps" is an emerging category with no category leader — the FinOps discipline is expanding from cloud infrastructure to AI workloads
|
||||
- Developer tooling is consolidating around PLG motions — enterprise sales cycles are shortening for sub-$500/month tools
|
||||
- Open-source AI infrastructure (LiteLLM, vLLM, Ollama) has normalized the concept of proxy layers between applications and LLM providers
|
||||
|
||||
---
|
||||
|
||||
## 3. PRODUCT DEFINITION
|
||||
|
||||
### Core Value Proposition
|
||||
|
||||
> Change one environment variable. See where every AI dollar goes. Start saving automatically.
|
||||
|
||||
dd0c/route transforms AI cost management from a monthly guessing game into a real-time, automated discipline. It's the "Linear for AI FinOps" — fast, opinionated, and built for practitioners, not procurement committees.
|
||||
|
||||
### User Personas
|
||||
|
||||
**Persona 1: Priya Sharma — The ML Engineer (Age 29, Series B fintech)**
|
||||
- Defaults to GPT-4o for everything because benchmarking alternatives takes days she doesn't have
|
||||
- Feels guilty about costs but isn't empowered to fix them — no per-call cost visibility exists
|
||||
- Needs: automatic model selection without workflow disruption, cost feedback at the code level
|
||||
- dd0c/route value: "I keep writing `model='gpt-4o'` and the router quietly downgrades when it's safe. I stopped feeling guilty."
|
||||
|
||||
**Persona 2: Marcus Chen — The Engineering Manager (Age 36, same fintech)**
|
||||
- Gets one opaque bill from OpenAI ("$14,000") with zero breakdown by feature, team, or environment
|
||||
- Spends 3–4 hours monthly on manual spreadsheet reconciliation across providers
|
||||
- Presents estimated pie charts to the CFO and feels like a fraud
|
||||
- dd0c/route value: "The attribution treemap IS my slide deck. Monday morning digest goes straight to the CFO."
|
||||
|
||||
**Persona 3: Jordan Okafor — The Platform Engineer (Age 32, mid-stage SaaS)**
|
||||
- Maintains a hand-rolled Node.js LLM proxy that started as 200 lines and grew to 2,000
|
||||
- Gets paged when the proxy breaks; paranoid about it being a single point of failure
|
||||
- Wants a Helm chart, OTel export, and config-as-code — then to never think about it again
|
||||
- dd0c/route value: "I deployed it with Helm, pointed the env var, and went back to my actual job."
|
||||
|
||||
### Key Features by Release
|
||||
|
||||
#### MVP (Month 1–3)
|
||||
- OpenAI-compatible proxy (Rust, <10ms overhead at p99)
|
||||
- Rule-based routing with heuristic complexity classifier (token count + keyword patterns)
|
||||
- Cascading try-cheap-first routing (cheap model → escalate on low confidence)
|
||||
- Cost attribution dashboard: real-time ticker, treemap by feature/team/model
|
||||
- Request inspector (tokens, cost, latency, routing decision per call)
|
||||
- Weekly Savings Digest email (automated Monday morning)
|
||||
- Budget guardrails with threshold-based anomaly alerts (Slack integration)
|
||||
- OpenAI + Anthropic support only
|
||||
- SaaS-hosted proxy
|
||||
|
||||
#### V2 (Month 4–6)
|
||||
- Self-hosted data plane (Rust proxy in customer VPC, only telemetry to SaaS)
|
||||
- Semantic response cache (exact-match V1, semantic similarity V2)
|
||||
- A/B model testing (split traffic, measure cost/quality/latency, recommend winner)
|
||||
- OTel export (Datadog, Grafana, Honeycomb integration)
|
||||
- Google Gemini + Mistral provider support
|
||||
- Quality threshold profiles ("customer-facing" vs. "internal-tool" vs. "batch-job")
|
||||
- Prompt efficiency heatmap and optimization recommendations
|
||||
|
||||
#### V3 (Month 7–12)
|
||||
- ML-based complexity classifier (trained on routing data flywheel)
|
||||
- GitHub Action: cost impact comments on PRs
|
||||
- Spend forecasting with confidence intervals
|
||||
- VS Code extension with inline cost annotations
|
||||
- SOC 2 Type II certification
|
||||
- Enterprise features: SSO, RBAC, role-based dashboard views
|
||||
- Model distillation recommendations ("hosting Llama 3 on a $2K/month GPU would save you $8K/month")
|
||||
|
||||
### User Journey
|
||||
|
||||
```
|
||||
AWARENESS ACTIVATION RETENTION EXPANSION
|
||||
─────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
npx dd0c-scan ./src Change OPENAI_BASE_URL Weekly Savings Digest Team-wide rollout
|
||||
↓ ↓ ↓ ↓
|
||||
"You're wasting $4K/mo" First request routed Marcus forwards to CFO Budget guardrails
|
||||
↓ ↓ ↓ ↓
|
||||
Show HN / blog post Dashboard shows savings Routing rules refined Pro → Business tier
|
||||
↓ ↓ ↓ ↓
|
||||
Free signup "Aha" in <5 minutes Attribution data compounds dd0c/cost cross-sell
|
||||
```
|
||||
|
||||
### Pricing Model
|
||||
|
||||
| Tier | Price | Includes | Target |
|
||||
|------|-------|----------|--------|
|
||||
| **Free** | $0/month | Up to $500/month LLM spend routed, basic dashboard, 7-day data retention | Individual devs, evaluation |
|
||||
| **Pro** | $49/month | Up to $15K/month LLM spend, full attribution treemap, weekly digest, 90-day retention, Slack alerts | Series A–B startups (beachhead) |
|
||||
| **Business** | $199/month | Unlimited spend, self-hosted proxy option, OTel export, RBAC, 1-year retention, priority support | Growth-stage companies |
|
||||
| **Enterprise** | Custom | SSO, SOC 2 compliance, dedicated support, SLA, custom integrations | Large organizations (V3+) |
|
||||
|
||||
**Pricing rationale (resolving Party Mode debate):** The Bootstrap Founder panelist argued for $49 flat; the VC argued for enterprise contracts. Resolution: $49 Pro tier captures the beachhead via expense-report purchases. $199 Business tier captures expansion revenue as teams grow. Enterprise tier deferred to V3 — closing enterprise deals takes 9 months and requires SOC 2, which a solo founder can't prioritize in Year 1. The Contrarian's suggestion to charge $99 for pure analytics (no proxy) is captured in the Free tier's shadow audit mode — prove value first, convert to routing later.
|
||||
|
||||
---
|
||||
|
||||
## 4. GO-TO-MARKET PLAN
|
||||
|
||||
### Launch Strategy
|
||||
|
||||
**Phase 1: Engineering-as-Marketing (Days 1–30)**
|
||||
- Build and ship `npx dd0c-scan` — the CLI that scans a codebase, estimates LLM spend, and shows savings potential. No account needed. No data leaves the machine. This is the top-of-funnel viral tool.
|
||||
- Dogfood dd0c/route on Brian's own projects. If the founder doesn't use it daily, it's not ready.
|
||||
|
||||
**Phase 2: Private Beta (Days 31–60)**
|
||||
- Invite 10–20 people from Brian's network: AWS colleagues, startup CTO friends, Twitter mutuals.
|
||||
- Free access in exchange for 15 minutes of weekly feedback.
|
||||
- Track: time to first route, first "aha" moment, first complaint.
|
||||
- Milestone: 5+ beta users who say "I would pay for this" unprompted.
|
||||
|
||||
**Phase 3: Public Launch (Days 61–90)**
|
||||
- Show HN post (Tuesday/Wednesday morning US time — highest traffic days)
|
||||
- First comment: technical architecture, honest limitations, roadmap
|
||||
- Simultaneous posts: Twitter/X, Reddit (r/MachineLearning, r/devops), relevant Slack communities
|
||||
- "Why I Built dd0c/route" blog post (personal story, technical architecture, honest tradeoffs)
|
||||
- "State of AI Costs Q1 2026" report (anonymized data from beta users)
|
||||
- Target: 500+ signups in week 1, 10–20 paying customers by day 90
|
||||
|
||||
### Beachhead Market
|
||||
|
||||
Series A–B SaaS startups in the US, spending $2K–$15K/month on LLM APIs, with 10–50 engineers. Specifically:
|
||||
- Companies building AI-powered features (chatbots, summarization, code review, RAG pipelines)
|
||||
- No dedicated ML infrastructure team — the platform engineer or CTO manages LLM infrastructure as a side responsibility
|
||||
- CTO/VP Eng can approve $49/month without procurement
|
||||
- Active in developer communities (Hacker News, Twitter/X, Discord, Slack groups)
|
||||
|
||||
Estimated beachhead size: 5,000–10,000 companies in the US alone.
|
||||
|
||||
### Growth Loops & Viral Mechanics
|
||||
|
||||
1. **The Savings Digest Loop:** dd0c/route sends a Monday morning email → Marcus (eng manager) sees "$1,847 saved this week" → forwards to CFO → CFO mandates team-wide adoption → more teams onboard → more savings → bigger digest number → more forwards.
|
||||
|
||||
2. **The Shadow Audit Loop:** Developer runs `npx dd0c-scan` → sees "$4,200/month wasted" → shares screenshot on Twitter/Slack → other developers try it → some convert to paid.
|
||||
|
||||
3. **The "You Could Have Saved" Loop:** Free tier users see a persistent counter: "Estimated savings if you'd used dd0c routing: $X" → the number grows daily → conversion pressure increases naturally.
|
||||
|
||||
4. **The Open-Source Loop:** OSS proxy gets GitHub stars → developers discover the project → some self-host (free marketing) → power users convert to SaaS for the dashboard/digest/analytics.
|
||||
|
||||
### Content & Community Strategy
|
||||
|
||||
- **Weekly newsletter:** "This Week in AI Pricing" — model price changes, benchmark updates, cost optimization tips
|
||||
- **Monthly report:** "State of AI Costs" — anonymized aggregate data from dd0c/route users. Becomes the industry reference.
|
||||
- **SEO targets:** High-intent, low-competition keywords first ("LiteLLM alternative," "reduce OpenAI costs," "LLM cost attribution")
|
||||
- **Guest posts:** The New Stack, Dev.to, InfoQ — backlinks + immediate traffic while SEO compounds
|
||||
- **Community:** Discord server for users. The best first hire will come from this community.
|
||||
|
||||
### Partnership Opportunities
|
||||
|
||||
- **Framework integrations:** Official LangChain / LlamaIndex / Vercel AI SDK partner — "recommended cost optimization tool"
|
||||
- **Cloud marketplaces:** AWS Marketplace listing (Brian's AWS expertise is an unfair advantage here)
|
||||
- **FinOps community:** FinOps Foundation membership, conference talks, co-authored reports
|
||||
- **Complementary tools:** Integrate with Datadog, Grafana, PagerDuty — be the AI cost data source that feeds existing observability stacks
|
||||
|
||||
### 90-Day Launch Timeline
|
||||
|
||||
| Week | Focus | Deliverable |
|
||||
|------|-------|-------------|
|
||||
| 1–2 | Build proxy | Working Rust proxy, OpenAI + Anthropic, <10ms overhead |
|
||||
| 2–3 | Build dashboard | Cost overview, treemap, request inspector |
|
||||
| 3–4 | Build digest | Automated Monday email with savings breakdown |
|
||||
| 5–6 | Private beta | 10–20 users routing traffic, collecting feedback |
|
||||
| 6–7 | Build CLI | `npx dd0c-scan` — the viral top-of-funnel tool |
|
||||
| 7–8 | Iterate | Fix top 3 complaints, polish onboarding to <5 min |
|
||||
| 9 | Pre-launch content | Blog post, AI costs report, Show HN draft, landing page |
|
||||
| 10 | Show HN launch | All-day in comments. Simultaneous Twitter/Reddit/Slack |
|
||||
| 11–12 | Post-launch | Analyze funnel, fix biggest drop-off, reach out to every paying customer |
|
||||
|
||||
---
|
||||
|
||||
## 5. BUSINESS MODEL
|
||||
|
||||
### Revenue Model & Unit Economics
|
||||
|
||||
| Metric | Value | Notes |
|
||||
|--------|-------|-------|
|
||||
| **Average Revenue Per Account (ARPA)** | $75/month (blended) | Mix of $49 Pro and $199 Business customers |
|
||||
| **Gross Margin** | ~85% | Infrastructure cost is minimal — proxy + ClickHouse + API on AWS, ~$150/month total at scale |
|
||||
| **Monthly infrastructure cost** | $65–$185/month | AWS (proxy + API + analytics), email (Resend), analytics (PostHog free tier) |
|
||||
| **Marginal cost per customer** | ~$0.50–$2/month | Proxy compute + telemetry storage. Near-zero marginal cost. |
|
||||
|
||||
### CAC / LTV Projections
|
||||
|
||||
| Metric | Target | Basis |
|
||||
|--------|--------|-------|
|
||||
| **CAC (organic/PLG)** | <$50 | Content marketing + Show HN + CLI virality. No paid ads in Year 1. |
|
||||
| **Average customer lifetime** | 10+ months | Weekly digest drives retention; savings are visible and ongoing |
|
||||
| **LTV** | >$750 | $75 ARPA × 10 months |
|
||||
| **LTV:CAC ratio** | >15:1 | Best-in-class for PLG SaaS |
|
||||
|
||||
### Path to Revenue Milestones
|
||||
|
||||
| Milestone | Customers Needed | Timeline | What It Means |
|
||||
|-----------|-----------------|----------|---------------|
|
||||
| **$1K MRR** | ~20 Pro | Month 3–4 | Product-market fit signal |
|
||||
| **$5K MRR** | ~80 Pro + 5 Business | Month 6–9 | Sustainable side project. "Should I keep going?" → Yes. |
|
||||
| **$10K MRR** | ~150 Pro + 10 Business | Month 9–12 | "Should I quit my day job?" territory |
|
||||
| **$25K MRR** | ~300 Pro + 20 Business | Month 12–18 | Quit the day job. This is a business. |
|
||||
| **$50K MRR** | ~500 Pro + 40 Business | Month 18–24 | Hire first engineer. |
|
||||
| **$100K MRR** | ~800 Pro + 80 Business | Month 24–36 | Series A optionality (or stay bootstrapped and profitable) |
|
||||
|
||||
### Resource Requirements (Solo Founder Constraints)
|
||||
|
||||
**Time budget:** 15 hours/week maximum until $5K MRR. This is a side project until the numbers say otherwise.
|
||||
|
||||
| Phase | Product Dev | Content | Community | Customer |
|
||||
|-------|------------|---------|-----------|----------|
|
||||
| Months 1–3 | 10 hrs (67%) | 3 hrs (20%) | 1.5 hrs (10%) | 0.5 hrs (3%) |
|
||||
| Months 4–6 | 7 hrs (47%) | 4 hrs (27%) | 2 hrs (13%) | 2 hrs (13%) |
|
||||
| Months 7–12 | 5 hrs (33%) | 4 hrs (27%) | 3 hrs (20%) | 3 hrs (20%) |
|
||||
|
||||
**Infrastructure budget:** $65–$185/month. Brian's AWS expertise keeps this minimal. The burn rate is essentially zero — patience is a competitive advantage funded startups don't have.
|
||||
|
||||
### Key Assumptions & Dependencies
|
||||
|
||||
1. **Engineers will route production traffic through a third-party proxy if savings are visible, immediate, and undeniable.** This is the core bet. Probability: 60/40 favorable.
|
||||
2. **The cost delta between "expensive" and "cheap" models persists.** Frontier models will always command premium pricing; the spread between frontier and commodity persists even as absolute prices drop.
|
||||
3. **Agentic AI drives volume growth that offsets per-token price declines.** Total LLM spend continues to increase even as unit costs decrease.
|
||||
4. **PLG distribution works for this category.** The $49 price point and 5-minute setup enable self-serve adoption without a sales team.
|
||||
5. **Brian can sustain 15 hours/week for 9–12 months.** The discipline of time-boxing is critical to avoiding burnout.
|
||||
|
||||
---
|
||||
|
||||
## 6. RISKS & MITIGATIONS
|
||||
|
||||
### Top 5 Risks
|
||||
|
||||
| # | Risk | Severity | Probability | Source |
|
||||
|---|------|----------|-------------|--------|
|
||||
| 1 | **OpenAI builds native smart routing** — "Smart Tier" that auto-routes within their models | 8/10 | Medium | VC + Innovation Strategy |
|
||||
| 2 | **Trust barrier blocks adoption** — Security/compliance teams refuse to route prompts through a startup's proxy | 9/10 | Medium-High | CTO + DevOps panelists |
|
||||
| 3 | **LLM price race-to-zero** — Cost delta between models shrinks to the point where optimization saves <$100/month | 8/10 | Low-Medium | Contrarian panelist |
|
||||
| 4 | **Solo founder burnout** — 15 hrs/week + day job + support burden exceeds sustainable capacity | 7/10 | Medium | Bootstrap Founder panelist |
|
||||
| 5 | **Well-funded competitor copies features** — Helicone/Portkey builds Shadow Audit + Attribution Treemap with a 10-engineer team | 6/10 | Medium | VC panelist |
|
||||
|
||||
**Mitigations:**
|
||||
|
||||
1. **OpenAI routing:** OpenAI's incentive is to sell the MOST expensive model, not the cheapest — smart routing cannibalizes their revenue. Even if they add it, dd0c/route routes ACROSS providers (OpenAI won't route you to Anthropic). Worst case: pivot to pure "AI FinOps analytics" — the attribution dashboard is valuable even without the proxy.
|
||||
|
||||
2. **Trust barrier:** V1 accepts this limitation — stick to the beachhead (startups without compliance teams). V1.5 (month 4–5): self-hosted data plane where the Rust proxy runs in the customer's VPC and only telemetry leaves their environment. Open-source the proxy core so customers can read every line of code. *Resolution note: The Party Mode CTO demanded VPC-deployable from Day 1. The Bootstrap Founder argued SaaS-only for V1 to reduce scope. Resolution: SaaS-only V1 for the beachhead, self-hosted V1.5 for expansion. The beachhead doesn't have compliance teams; the expansion market does.*
|
||||
|
||||
3. **Price race-to-zero:** Reposition from "use cheaper models" to "optimize AI spend" — framing survives price changes. Build semantic caching (saves money regardless of per-token pricing). Build prompt optimization features ("your average prompt is 40% longer than necessary"). The attribution dashboard remains valuable even if tokens cost a penny — "who is running these 100K token prompts?" is a latency and efficiency question, not just a cost question.
|
||||
|
||||
4. **Solo founder burnout:** Hard rule: no more than 15 hours/week until $5K MRR. Automate everything — zero-ops proxy, static dashboard, Discord community for support (not a ticket queue). The $5K MRR milestone is the "quit or don't" decision point. Build in public — the community becomes unpaid QA, feature prioritization, and emotional support.
|
||||
|
||||
5. **Competitor copies:** Rely on GTM speed, not feature moats. If Portkey builds a treemap but still requires a 30-minute sales call, Brian wins the developer who just wants to run `npx dd0c-scan` on a Saturday night. Double down on PLG friction advantage and community trust. Incumbents move slower than expected and over-complicate simple features.
|
||||
|
||||
### Kill Criteria
|
||||
|
||||
| Criterion | Threshold | Timeline |
|
||||
|-----------|-----------|----------|
|
||||
| No product-market fit signal | <50 free signups after Show HN launch | Month 1 |
|
||||
| No conversion | <5 paying customers after 3 months of availability | Month 4 |
|
||||
| Revenue plateau | <$2K MRR after 6 months | Month 7 |
|
||||
| Churn exceeds growth | Net revenue retention <80% for 3 consecutive months | Month 6+ |
|
||||
| Existential competitor launches | OpenAI/AWS launches free native routing covering 80%+ of dd0c/route's value | Any time |
|
||||
| Burnout | >20 hrs/week AND below $5K MRR AND affecting day job/health | Month 6+ |
|
||||
| Market thesis invalidated | Optimization saves <$100/month for the average customer | Any time |
|
||||
|
||||
**Walk-away rule:** If 2+ kill criteria are met simultaneously, stop. Not pivot. Stop. Pivoting a side project is how founders waste years.
|
||||
|
||||
**Exception:** If qualitative signals are strong (NPS >50, organic word-of-mouth) but quantitative metrics are below threshold, extend by 3 months.
|
||||
|
||||
### Pivot Options
|
||||
|
||||
1. **Pure AI FinOps Analytics (no proxy):** Ingest existing logs, provide attribution dashboard and CFO reports. Removes latency risk, proxy trust barrier, and API maintenance burden. Charge $99/month. (The Contrarian's recommendation.)
|
||||
2. **Open-source everything, monetize consulting:** If the SaaS doesn't convert, release the full product as OSS and sell implementation consulting to enterprises at $200–$400/hour.
|
||||
3. **Vertical specialization:** Instead of horizontal "all AI costs," specialize in one vertical (e.g., "AI cost optimization for healthcare" with HIPAA compliance built in). Smaller market, higher willingness to pay.
|
||||
|
||||
---
|
||||
|
||||
## 7. SUCCESS METRICS
|
||||
|
||||
### North Star Metric
|
||||
|
||||
**Monthly Recurring Revenue (MRR).** Everything else is a leading indicator. MRR is the truth.
|
||||
|
||||
### Leading Indicators (Track Weekly)
|
||||
|
||||
| Metric | Target | Why It Matters |
|
||||
|--------|--------|---------------|
|
||||
| Signups | 50+/week post-launch | Top of funnel health |
|
||||
| Activation rate (signup → first routed request) | >40% | Onboarding quality |
|
||||
| Time to first route | <5 minutes median | The core adoption thesis |
|
||||
| Weekly active routers | Growing 10%+ week-over-week | Product engagement |
|
||||
| Savings per customer per month | >$100 average | Value delivery (must exceed subscription cost) |
|
||||
| Digest email open rate | >50% | Retention mechanism health |
|
||||
|
||||
### Lagging Indicators (Track Monthly)
|
||||
|
||||
| Metric | Target | Why It Matters |
|
||||
|--------|--------|---------------|
|
||||
| Logo churn | <5%/month | Retention health |
|
||||
| Revenue churn | <3%/month | Revenue health (expansion offsets logo churn) |
|
||||
| CAC | <$50 (organic) | Acquisition efficiency |
|
||||
| LTV | >$500 (10+ month lifetime) | Business viability |
|
||||
| LTV:CAC ratio | >10:1 | PLG efficiency |
|
||||
| Net revenue retention | >100% | Expansion > churn |
|
||||
|
||||
### Milestones
|
||||
|
||||
**30 Days:**
|
||||
- Working proxy + dashboard deployed and dogfooded on Brian's own projects
|
||||
- `npx dd0c-scan` CLI shipped and tested
|
||||
- 10–20 private beta users routing traffic
|
||||
|
||||
**60 Days:**
|
||||
- 5+ beta users who would pay unprompted
|
||||
- Top 3 beta complaints fixed
|
||||
- Onboarding polished to <5 minutes
|
||||
- All launch content written
|
||||
|
||||
**90 Days:**
|
||||
- Show HN launched
|
||||
- 500+ signups
|
||||
- 10–20 paying customers
|
||||
- $1K–$2K MRR
|
||||
- Clear understanding of who converts and why
|
||||
|
||||
**Month 6:**
|
||||
- $5K MRR (80 Pro + 5 Business customers)
|
||||
- Self-hosted proxy option shipped (V1.5)
|
||||
- Weekly newsletter established with growing subscriber base
|
||||
- Content/SEO generating measurable organic traffic
|
||||
- Data flywheel showing early signs (routing accuracy improving with scale)
|
||||
|
||||
**Month 12:**
|
||||
- $10K–$15K MRR (200 Pro + 20 Business customers)
|
||||
- Decision point: go full-time or maintain as profitable side project
|
||||
- OTel integration, A/B testing, and semantic caching shipped
|
||||
- "State of AI Costs" report established as industry reference
|
||||
- Community of 500+ developers in Discord
|
||||
|
||||
---
|
||||
|
||||
## APPENDIX: The Unfair Bet
|
||||
|
||||
Every startup has one core belief that, if true, makes everything else work. If false, nothing else matters.
|
||||
|
||||
> **"Engineering teams will route production LLM traffic through a third-party proxy if the savings are visible, immediate, and undeniable."**
|
||||
|
||||
**Assessment: 60/40 favorable.**
|
||||
|
||||
Evidence FOR: Cloudflare/CDNs normalized third-party traffic routing. LiteLLM's 15K+ stars prove developers accept proxy layers. 30–50% savings are a powerful motivator. Multi-model usage makes a routing layer increasingly necessary.
|
||||
|
||||
Evidence AGAINST: LLM prompts contain proprietary data — more sensitive than typical web traffic. Security teams are increasingly paranoid about AI data flows. "Just use the cheap model" is free and requires zero trust.
|
||||
|
||||
**The mitigation that tips the odds:** The open-source, self-hosted proxy. If it runs in the customer's VPC and only telemetry leaves their environment, the trust barrier drops dramatically.
|
||||
|
||||
---
|
||||
|
||||
*This brief synthesizes insights from four BMad Creative Intelligence Suite phases: Brainstorming (Carson), Design Thinking (Maya), Innovation Strategy (Victor), and Party Mode Advisory Board Review. Contradictions between phases have been noted and resolved inline.*
|
||||
|
||||
*The LLM cost optimization market will produce a $100M+ company in the next 5 years. The question is whether a bootstrapped solo founder can capture enough of that market to build a meaningful business before funded players consolidate. This brief argues yes — if Brian moves fast, stays focused, and lets the savings numbers do the selling.*
|
||||
2241
products/01-llm-cost-router/test-architecture/test-architecture.md
Normal file
2241
products/01-llm-cost-router/test-architecture/test-architecture.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user