# dd0c/route — Technical Architecture **Product:** dd0c/route — LLM Cost Router & Optimization Dashboard **Author:** Architecture Phase (BMad Phase 6) **Date:** February 28, 2026 **Status:** V1 MVP Architecture — Solo Founder Scope --- ## Section 1: SYSTEM OVERVIEW ### 1.1 High-Level Architecture ```mermaid graph TB subgraph Clients["Client Applications"] APP1[App Service A] APP2[App Service B] CLI[dd0c-scan CLI] end subgraph DD0C["dd0c/route Platform (AWS us-east-1)"] subgraph ProxyTier["Proxy Tier (ECS Fargate)"] PROXY1[Rust Proxy Instance 1] PROXY2[Rust Proxy Instance N] end subgraph ControlPlane["Control Plane (ECS Fargate)"] API[Dashboard API
Axum/Rust] WORKER[Async Worker
Digest + Alerts] end subgraph DataTier["Data Tier"] PG[(PostgreSQL RDS
Config + Auth)] TS[(TimescaleDB RDS
Request Telemetry)] REDIS[(ElastiCache Redis
Rate Limits + Cache)] end end subgraph Providers["LLM Providers"] OAI[OpenAI API] ANT[Anthropic API] end subgraph External["External Services"] GH[GitHub OAuth] SES[AWS SES
Digest Emails] SLACK[Slack Webhooks] end APP1 -->|HTTPS / OpenAI-compat| PROXY1 APP2 -->|HTTPS / OpenAI-compat| PROXY2 PROXY1 --> OAI PROXY1 --> ANT PROXY2 --> OAI PROXY2 --> ANT PROXY1 -->|async telemetry| TS PROXY2 -->|async telemetry| TS PROXY1 --> REDIS PROXY2 --> REDIS API --> PG API --> TS WORKER --> TS WORKER --> SES WORKER --> SLACK CLI -->|log analysis| APP1 ``` ### 1.2 Component Inventory | Component | Language/Runtime | Responsibility | Criticality | |-----------|-----------------|----------------|-------------| | **Proxy Engine** | Rust (tokio + hyper) | Request interception, complexity classification, model routing, response passthrough, telemetry emission | P0 — the product IS this | | **Router Brain** | Rust (embedded in proxy) | Rule evaluation, cost table lookups, fallback chain execution, cascading try-cheap-first logic | P0 — routing decisions | | **Dashboard API** | Rust (axum) | REST API for dashboard UI, config management, auth, org/team CRUD | P0 — the "aha moment" | | **Dashboard UI** | TypeScript (React + Vite) | Cost treemap, request inspector, routing config editor, real-time ticker | P0 — what Marcus sees | | **Async Worker** | Rust (tokio-cron) | Weekly digest generation, anomaly detection (threshold-based), alert dispatch | P1 — retention mechanism | | **PostgreSQL** | AWS RDS (db.t4g.micro) | Organizations, API keys, routing rules, user accounts | P0 — config store | | **TimescaleDB** | AWS RDS (db.t4g.small) | Request telemetry, cost events, token counts — time-series optimized | P0 — analytics backbone | | **Redis** | AWS ElastiCache (t4g.micro) | Rate limiting, exact-match response cache, session tokens | P1 — performance layer | ### 1.3 Technology Choices & Justification | Choice | Alternative Considered | Why This One | |--------|----------------------|--------------| | **Rust (proxy)** | Go, Node.js | <10ms p99 overhead is non-negotiable. Rust's zero-cost abstractions and tokio async runtime give us predictable tail latency. Go would add GC pauses. Node.js adds event loop overhead. Portkey's 20-40ms overhead in Node.js is the cautionary tale. | | **Rust (API)** | Node.js (Express), Python (FastAPI) | Single language across the stack reduces cognitive overhead for a solo founder. Axum is production-ready and shares the tokio runtime. One `cargo build` produces the proxy AND the API. | | **TimescaleDB** | ClickHouse, plain PostgreSQL | TimescaleDB is PostgreSQL with time-series superpowers — hypertables, continuous aggregates, compression. Brian already knows PostgreSQL. ClickHouse is faster for analytics but adds operational complexity (separate cluster, different query language, different backup strategy). For a solo founder, "it's just Postgres" wins. Continuous aggregates handle the dashboard rollups. Compression handles storage costs. | | **PostgreSQL (config)** | SQLite, DynamoDB | RDS PostgreSQL is Brian's home turf (AWS architect). Managed backups, failover, IAM auth. DynamoDB would work but adds a second data model to reason about. SQLite doesn't scale past a single instance. | | **Redis (cache)** | In-process LRU, DynamoDB DAX | Shared cache across proxy instances for exact-match response dedup. ElastiCache is managed, cheap at t4g.micro ($0.016/hr). In-process cache doesn't share across instances. | | **React + Vite (UI)** | Next.js, SvelteKit, HTMX | React has the largest hiring pool if Brian ever hires. Vite is fast. The dashboard is a SPA — no SSR needed, no SEO needed. Keep it simple. | | **AWS SES (email)** | Resend, SendGrid | Brian has AWS credits and expertise. SES is $0.10/1000 emails. The digest email is plain HTML — no fancy template engine needed. | | **GitHub OAuth** | Auth0, Clerk, email/password | One-click signup for the developer audience. No password management burden. GitHub is where the users live. Implemented via `oauth2` Rust crate — ~200 lines of code. | ### 1.4 Deployment Model **V1: Containerized services on ECS Fargate. Not Lambda. Not a single binary.** Rationale: - **Why not Lambda:** The proxy needs persistent connections to LLM providers (connection pooling, keep-alive). Lambda cold starts (100-500ms) violate the <10ms latency budget. Lambda's 15-minute timeout conflicts with streaming responses. Lambda per-invocation pricing gets expensive at 100K+ requests/day. - **Why not single binary:** The proxy and the dashboard API have different scaling profiles. The proxy scales horizontally with request volume. The API scales with dashboard users (much lower). Coupling them wastes money. - **Why ECS Fargate:** No EC2 instances to manage. Auto-scaling built in. Brian knows ECS. Task definitions are the deployment unit. ALB handles TLS termination and health checks. **Container topology:** | Service | Container | vCPU | Memory | Min Instances | Auto-Scale Trigger | |---------|-----------|------|--------|---------------|-------------------| | Proxy | `dd0c-proxy` | 0.25 | 512MB | 2 | CPU > 60% or request count | | Dashboard API | `dd0c-api` | 0.25 | 512MB | 1 | CPU > 70% | | Async Worker | `dd0c-worker` | 0.25 | 512MB | 1 | None (singleton) | | Dashboard UI | S3 + CloudFront | — | — | — | CDN-managed | **Build artifact:** `docker build` produces three images from a single Rust workspace (`cargo workspace`). The UI is a static build deployed to S3/CloudFront. ``` dd0c-route/ ├── Cargo.toml (workspace root) ├── crates/ │ ├── proxy/ (the proxy engine + router brain) │ ├── api/ (dashboard REST API) │ ├── worker/ (digest + alerts) │ └── shared/ (models, DB queries, cost tables) ├── ui/ (React dashboard) ├── cli/ (dd0c-scan — separate npm package) └── infra/ (CDK or Terraform) ``` --- ## Section 2: CORE COMPONENTS ### 2.1 Proxy Engine (Rust — `crates/proxy`) The proxy is the hot path. Every design decision optimizes for one thing: don't add latency. **Request lifecycle:** ``` Client Request (OpenAI-compat) │ ├─ 1. TLS termination (ALB — not our problem) ├─ 2. Auth validation (API key lookup — Redis cache, PG fallback) ........... <1ms ├─ 3. Request parsing (extract model, messages, metadata) ................... <0.5ms ├─ 4. Tag extraction (X-DD0C-Feature, X-DD0C-Team headers) ................. <0.1ms ├─ 5. Router Brain evaluation (complexity + rules → target model) ........... <2ms ├─ 6. Provider dispatch (connection-pooled HTTPS to OpenAI/Anthropic) ....... network ├─ 7. Response passthrough (streaming SSE or buffered JSON) ................. passthrough ├─ 8. Telemetry emission (async, non-blocking — tokio::spawn) ............... 0ms on hot path └─ 9. Response headers injected (X-DD0C-Model, X-DD0C-Cost, X-DD0C-Saved) ``` **Latency budget breakdown:** | Stage | Budget | Implementation | |-------|--------|----------------| | Auth | <1ms | Redis `GET dd0c_key:{hash}` with 60s TTL. Cache miss → PG lookup + cache set. | | Parse | <0.5ms | `serde_json` zero-copy deserialization. No full body buffering for streaming requests — parse headers + first chunk only. | | Route | <2ms | In-memory rule engine. Cost tables loaded at startup, refreshed every 60s via background task. No DB call on hot path. | | Dispatch | 0ms overhead | `hyper` connection pool to each provider. Pre-warmed connections. HTTP/2 multiplexing. | | Telemetry | 0ms on hot path | `tokio::spawn` fires a telemetry event to an in-memory channel. Background task batch-inserts to TimescaleDB every 1s or 100 events (whichever comes first). | | **Total overhead** | **<5ms p99** | Target is <10ms p99 with margin. | **Streaming support:** The proxy MUST support Server-Sent Events (SSE) streaming — this is how most chat applications consume LLM responses. The proxy operates as a transparent stream relay: 1. Client sends request with `"stream": true` 2. Proxy makes routing decision based on headers + first message content (no need to buffer full body) 3. Proxy opens streaming connection to target provider 4. Each SSE chunk is forwarded to client immediately (`Transfer-Encoding: chunked`) 5. Token counting happens on-the-fly by parsing `usage` from the final SSE `[DONE]` chunk (OpenAI) or `message_stop` event (Anthropic) 6. If the provider doesn't return usage in the stream, the proxy counts tokens from accumulated chunks using `tiktoken-rs` **Provider abstraction:** ```rust // Simplified — the actual trait is more detailed #[async_trait] trait LlmProvider: Send + Sync { fn name(&self) -> &str; fn supports_model(&self, model: &str) -> bool; fn translate_request(&self, req: &ProxyRequest) -> ProviderRequest; fn translate_response(&self, resp: ProviderResponse) -> ProxyResponse; async fn send(&self, req: ProviderRequest) -> Result; async fn send_stream(&self, req: ProviderRequest) -> Result>; } ``` V1 ships two implementations: `OpenAiProvider` and `AnthropicProvider`. Adding a new provider means implementing this trait — no proxy core changes. The `translate_request` / `translate_response` methods handle the format differences (Anthropic's `messages` API vs OpenAI's `chat/completions`). **Connection pooling:** Each proxy instance maintains a `hyper` connection pool per provider: - Max 100 connections to `api.openai.com` - Max 50 connections to `api.anthropic.com` - Keep-alive: 90s - Connection timeout: 5s - Request timeout: 300s (LLM responses can be slow for long completions) ### 2.2 Router Brain (`crates/shared/router`) The Router Brain is embedded in the proxy process — no network hop, no RPC. It's a pure function: `(request, rules, cost_tables) → routing_decision`. **Decision pipeline:** ``` Input: ProxyRequest + RoutingConfig │ ├─ 1. Rule matching: find first rule where all match conditions are true │ Match on: request tags, model requested, token count estimate, time of day │ ├─ 2. Strategy execution (per matched rule): │ ├─ "passthrough" → use requested model, no routing │ ├─ "cheapest" → pick cheapest model from rule's model list │ ├─ "quality-first"→ pick highest-quality model, fallback down on error │ └─ "cascading" → try cheapest first, escalate on low confidence │ ├─ 3. Budget check: if org/team/feature has hit a hard budget limit → throttle to cheapest or reject │ └─ 4. Output: RoutingDecision { target_model, target_provider, reason, confidence } ``` **Complexity classifier (V1 — heuristic, not ML):** The V1 classifier is deliberately simple. It uses three signals: | Signal | Weight | Logic | |--------|--------|-------| | **Token count** | 30% | Short prompts (<500 tokens) with short expected outputs are likely simple tasks. | | **Task pattern** | 50% | Regex/keyword matching on system prompt: "classify", "extract", "format JSON", "yes or no" → LOW complexity. "analyze", "reason step by step", "write code" → HIGH complexity. | | **Model requested** | 20% | If the user explicitly requests a frontier model AND the task looks complex, respect the request. Don't downgrade a code generation request from GPT-4o. | Output: `ComplexityScore { level: Low|Medium|High, confidence: f32 }` This gets 70-80% accuracy. Good enough for V1. The ML classifier (V2) trains on the telemetry data: for each routed request, did the user complain? Did they retry with a different model? Did the downstream application error? That feedback loop is the data flywheel. **Cost tables:** ```rust struct ModelCost { provider: Provider, model_id: String, // "gpt-4o-2024-11-20" model_alias: String, // "gpt-4o" input_cost_per_m: f64, // $/million input tokens output_cost_per_m: f64, // $/million output tokens quality_tier: QualityTier, // Frontier, Standard, Economy max_context: u32, // 128000 supports_streaming: bool, supports_tools: bool, supports_vision: bool, updated_at: DateTime, } ``` Cost tables are stored in PostgreSQL and loaded into memory at proxy startup. A background task polls for updates every 60 seconds. When a provider changes pricing (happens ~monthly), Brian updates one row in the DB and all proxy instances pick it up within 60s. No redeploy. **Fallback chains with circuit breakers:** ``` Primary: gpt-4o-mini (OpenAI) │ ── if error rate > 10% in last 60s ──→ circuit OPEN │ ▼ Fallback 1: claude-3-haiku (Anthropic) │ ── if error rate > 10% in last 60s ──→ circuit OPEN │ ▼ Fallback 2: gpt-4o (OpenAI) ← expensive but reliable last resort │ ▼ Final fallback: return 503 with X-DD0C-Fallback-Exhausted header ``` Circuit breaker state is stored in Redis (shared across proxy instances). State transitions: CLOSED → OPEN (on threshold breach) → HALF-OPEN (after 30s cooldown, allow 1 probe request) → CLOSED (if probe succeeds). ### 2.3 Analytics Pipeline Telemetry flows from the proxy to TimescaleDB asynchronously. The proxy never blocks on analytics. **Event schema (what the proxy emits per request):** ```rust struct RequestEvent { id: Uuid, org_id: Uuid, api_key_id: Uuid, timestamp: DateTime, // Request metadata model_requested: String, model_used: String, provider: String, feature_tag: Option, team_tag: Option, environment_tag: Option, // Tokens & cost input_tokens: u32, output_tokens: u32, cost_actual: f64, // what they paid (routed model) cost_original: f64, // what they would have paid (requested model) cost_saved: f64, // delta // Performance latency_ms: u32, ttfb_ms: u32, // time to first byte (streaming) // Routing complexity_score: f32, complexity_level: String, // LOW, MEDIUM, HIGH routing_reason: String, was_cached: bool, was_fallback: bool, // Status status_code: u16, error_type: Option, } ``` **Batch insert pipeline:** ``` Proxy hot path Background task ───────────── ─────────────── request completes │ ├─ tokio::spawn ──→ mpsc channel ──→ batch collector │ ├─ accumulate events ├─ flush every 1s OR 100 events └─ COPY INTO request_events (bulk insert) ``` `COPY` (PostgreSQL bulk insert) handles 10K+ rows/second on a db.t4g.small. At 100K requests/day (~1.2 req/s average), this is trivially within capacity. **Continuous aggregates (TimescaleDB):** Pre-computed rollups for dashboard queries: ```sql -- Hourly rollup by org, feature, model CREATE MATERIALIZED VIEW hourly_cost_summary WITH (timescaledb.continuous) AS SELECT time_bucket('1 hour', timestamp) AS bucket, org_id, feature_tag, team_tag, model_used, provider, COUNT(*) AS request_count, SUM(input_tokens) AS total_input_tokens, SUM(output_tokens) AS total_output_tokens, SUM(cost_actual) AS total_cost, SUM(cost_saved) AS total_saved, AVG(latency_ms) AS avg_latency, PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99_latency FROM request_events GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider; ``` Dashboard queries hit the continuous aggregate, not the raw events table. This keeps dashboard response times <200ms even with millions of rows. **Savings calculation:** ``` cost_saved = cost_original - cost_actual where: cost_original = (input_tokens × requested_model.input_cost_per_m / 1_000_000) + (output_tokens × requested_model.output_cost_per_m / 1_000_000) cost_actual = (input_tokens × used_model.input_cost_per_m / 1_000_000) + (output_tokens × used_model.output_cost_per_m / 1_000_000) ``` This is computed at request time in the proxy (cost tables are in memory) and stored with the event. No post-hoc recalculation needed. ### 2.4 Dashboard API (`crates/api`) **Framework:** Axum (Rust). Same tokio runtime as the proxy. Shares the `crates/shared` library for DB models and queries. **Why not a separate language (Node/Python)?** Solo founder. One language. One build system. One deployment pipeline. The API is not performance-critical (dashboard users, not proxy traffic), but keeping it in Rust means Brian debugs one ecosystem, not two. **Key endpoint groups (detailed in Section 7):** | Group | Purpose | |-------|---------| | `/api/auth/*` | GitHub OAuth flow, session management | | `/api/orgs/*` | Organization CRUD, team management | | `/api/dashboard/*` | Cost summaries, treemap data, time-series | | `/api/requests/*` | Request inspector — paginated, filterable | | `/api/routing/*` | Routing rules CRUD, cost tables | | `/api/alerts/*` | Alert configuration, budget limits | | `/api/keys/*` | API key management (dd0c keys + encrypted provider keys) | **Auth model:** JWT tokens issued after GitHub OAuth. Short-lived access tokens (15min) + refresh tokens (7 days) stored in Redis. API keys for programmatic access (prefixed `dd0c_sk_`). ### 2.5 Shadow Audit Mode (The PLG Wedge) Shadow Audit is the product-led growth engine. It provides value before the customer routes a single request through the proxy. **Two modes:** **Mode A: CLI Scan (`npx dd0c-scan`)** - Scans a local codebase for LLM API calls - Parses model names, estimates token counts from prompt templates - Applies current pricing to estimate monthly cost - Applies dd0c routing logic to estimate savings - Outputs a report to stdout — no data leaves the machine - Captures email (optional) for follow-up ``` $ npx dd0c-scan ./src dd0c/route — Cost Scan Report ───────────────────────────── Found 14 LLM API calls across 8 files Current estimated monthly cost: $4,217 With dd0c/route routing: $1,890 Potential monthly savings: $2,327 (55%) Top opportunities: ┌─────────────────────────────────────────────────────┐ │ src/services/classify.ts gpt-4o → gpt-4o-mini │ │ Est. savings: $890/mo Confidence: HIGH │ │ │ │ src/services/summarize.ts gpt-4o → claude-haiku │ │ Est. savings: $670/mo Confidence: MEDIUM │ │ │ │ src/services/extract.ts gpt-4o → gpt-4o-mini │ │ Est. savings: $440/mo Confidence: HIGH │ └─────────────────────────────────────────────────────┘ → Sign up at route.dd0c.dev to start saving ``` **Mode B: Log Ingestion (V1.1)** - Customer points dd0c at their existing LLM provider logs (OpenAI usage export CSV, or application logs with token counts) - dd0c processes the logs offline and generates a retrospective savings report - "Here's what you spent last month. Here's what you WOULD have spent." - This is the enterprise conversion tool — show the CFO real numbers from their own data --- ## Section 3: DATA ARCHITECTURE ### 3.1 Database Schema Two databases, clear separation of concerns: - **PostgreSQL (RDS):** Configuration, auth, organizational data. Low-write, high-read. Relational integrity matters. - **TimescaleDB (RDS):** Request telemetry, cost events. High-write, time-series queries. Compression and retention policies matter. #### PostgreSQL — Configuration Store ```sql -- Organizations (multi-tenant root) CREATE TABLE organizations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, slug VARCHAR(63) NOT NULL UNIQUE, -- used in URLs plan VARCHAR(20) NOT NULL DEFAULT 'free', -- free, pro, business stripe_customer_id VARCHAR(255), monthly_llm_spend_limit NUMERIC(10,2), -- plan-based cap on routed spend created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now() ); -- Users (GitHub OAuth) CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), github_id BIGINT NOT NULL UNIQUE, github_login VARCHAR(255) NOT NULL, email VARCHAR(255), avatar_url VARCHAR(512), created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); -- Org membership CREATE TABLE org_members ( org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, role VARCHAR(20) NOT NULL DEFAULT 'member', -- owner, admin, member created_at TIMESTAMPTZ NOT NULL DEFAULT now(), PRIMARY KEY (org_id, user_id) ); -- dd0c API keys (what customers use to auth with the proxy) CREATE TABLE api_keys ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, key_hash VARCHAR(64) NOT NULL UNIQUE, -- SHA-256 of the key; raw key never stored key_prefix VARCHAR(12) NOT NULL, -- "dd0c_sk_a3f..." for display name VARCHAR(255), -- human label: "production", "staging" environment VARCHAR(50) DEFAULT 'production', is_active BOOLEAN NOT NULL DEFAULT true, last_used_at TIMESTAMPTZ, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE INDEX idx_api_keys_hash ON api_keys(key_hash) WHERE is_active = true; -- Customer's LLM provider credentials (encrypted at rest) CREATE TABLE provider_credentials ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, provider VARCHAR(50) NOT NULL, -- 'openai', 'anthropic' encrypted_key BYTEA NOT NULL, -- AES-256-GCM encrypted API key key_suffix VARCHAR(8), -- last 4 chars for display: "...a3f2" is_active BOOLEAN NOT NULL DEFAULT true, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE(org_id, provider) ); -- Routing rules (ordered, first-match-wins) CREATE TABLE routing_rules ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, priority INTEGER NOT NULL DEFAULT 0, -- lower = higher priority name VARCHAR(255) NOT NULL, is_active BOOLEAN NOT NULL DEFAULT true, -- Match conditions (all must be true) match_tags JSONB DEFAULT '{}', -- {"feature": "classify", "team": "backend"} match_models TEXT[], -- models this rule applies to, NULL = all match_complexity VARCHAR(20), -- LOW, MEDIUM, HIGH, NULL = all -- Routing strategy strategy VARCHAR(20) NOT NULL, -- passthrough, cheapest, quality_first, cascading model_chain TEXT[] NOT NULL, -- ordered list of models to try -- Budget constraints daily_budget NUMERIC(10,2), -- hard limit per day for this rule -- Metadata created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE INDEX idx_routing_rules_org ON routing_rules(org_id, priority) WHERE is_active = true; -- Model cost table (the source of truth for pricing) CREATE TABLE model_costs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), provider VARCHAR(50) NOT NULL, model_id VARCHAR(100) NOT NULL, -- "gpt-4o-2024-11-20" model_alias VARCHAR(100) NOT NULL, -- "gpt-4o" input_cost_per_m NUMERIC(10,4) NOT NULL, -- $/million input tokens output_cost_per_m NUMERIC(10,4) NOT NULL, -- $/million output tokens quality_tier VARCHAR(20) NOT NULL, -- frontier, standard, economy max_context INTEGER NOT NULL, supports_streaming BOOLEAN DEFAULT true, supports_tools BOOLEAN DEFAULT false, supports_vision BOOLEAN DEFAULT false, is_active BOOLEAN NOT NULL DEFAULT true, updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE(provider, model_id) ); -- Alert configurations CREATE TABLE alert_configs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, name VARCHAR(255) NOT NULL, alert_type VARCHAR(50) NOT NULL, -- spend_threshold, anomaly, budget_warning -- Conditions threshold_amount NUMERIC(10,2), -- dollar amount trigger threshold_pct NUMERIC(5,2), -- percentage above baseline scope_tags JSONB DEFAULT '{}', -- scope to specific feature/team -- Notification notify_slack_webhook VARCHAR(512), notify_email VARCHAR(255), -- State is_active BOOLEAN NOT NULL DEFAULT true, last_fired_at TIMESTAMPTZ, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); ``` #### TimescaleDB — Telemetry Store ```sql -- Raw request events (hypertable — partitioned by time automatically) CREATE TABLE request_events ( id UUID NOT NULL DEFAULT gen_random_uuid(), org_id UUID NOT NULL, api_key_id UUID NOT NULL, timestamp TIMESTAMPTZ NOT NULL, -- Request model_requested VARCHAR(100) NOT NULL, model_used VARCHAR(100) NOT NULL, provider VARCHAR(50) NOT NULL, feature_tag VARCHAR(100), team_tag VARCHAR(100), environment_tag VARCHAR(50), -- Tokens & cost input_tokens INTEGER NOT NULL, output_tokens INTEGER NOT NULL, cost_actual NUMERIC(12,8) NOT NULL, cost_original NUMERIC(12,8) NOT NULL, cost_saved NUMERIC(12,8) NOT NULL, -- Performance latency_ms INTEGER NOT NULL, ttfb_ms INTEGER, -- Routing complexity_score REAL, complexity_level VARCHAR(10), routing_reason VARCHAR(255), was_cached BOOLEAN DEFAULT false, was_fallback BOOLEAN DEFAULT false, -- Status status_code SMALLINT NOT NULL, error_type VARCHAR(100) ); -- Convert to hypertable (TimescaleDB magic) SELECT create_hypertable('request_events', 'timestamp', chunk_time_interval => INTERVAL '1 day' ); -- Indexes for common query patterns CREATE INDEX idx_re_org_time ON request_events(org_id, timestamp DESC); CREATE INDEX idx_re_org_feature ON request_events(org_id, feature_tag, timestamp DESC); CREATE INDEX idx_re_org_team ON request_events(org_id, team_tag, timestamp DESC); -- Compression policy: compress chunks older than 7 days (90%+ space savings) ALTER TABLE request_events SET ( timescaledb.compress, timescaledb.compress_segmentby = 'org_id', timescaledb.compress_orderby = 'timestamp DESC' ); SELECT add_compression_policy('request_events', INTERVAL '7 days'); -- Retention policy: drop raw data older than plan retention (90 days for Pro) -- Applied per-org via the worker, not a global policy -- Business tier gets 1 year; continuous aggregates survive raw data deletion -- Continuous aggregate: hourly rollup CREATE MATERIALIZED VIEW hourly_cost_summary WITH (timescaledb.continuous) AS SELECT time_bucket('1 hour', timestamp) AS bucket, org_id, feature_tag, team_tag, model_used, provider, COUNT(*) AS request_count, SUM(input_tokens)::BIGINT AS total_input_tokens, SUM(output_tokens)::BIGINT AS total_output_tokens, SUM(cost_actual) AS total_cost, SUM(cost_saved) AS total_saved, AVG(latency_ms)::INTEGER AS avg_latency_ms, MAX(latency_ms) AS max_latency_ms FROM request_events GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider WITH NO DATA; -- Refresh policy: keep hourly aggregates up to date SELECT add_continuous_aggregate_policy('hourly_cost_summary', start_offset => INTERVAL '3 hours', end_offset => INTERVAL '1 hour', schedule_interval => INTERVAL '1 hour' ); -- Daily rollup (for long-range dashboard views) CREATE MATERIALIZED VIEW daily_cost_summary WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', timestamp) AS bucket, org_id, feature_tag, team_tag, model_used, provider, COUNT(*) AS request_count, SUM(cost_actual) AS total_cost, SUM(cost_saved) AS total_saved FROM request_events GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider WITH NO DATA; SELECT add_continuous_aggregate_policy('daily_cost_summary', start_offset => INTERVAL '3 days', end_offset => INTERVAL '1 day', schedule_interval => INTERVAL '1 day' ); ``` ### 3.2 Data Flow Diagram ```mermaid flowchart LR subgraph Client APP[Application] end subgraph Proxy["Proxy Engine"] AUTH[Auth Check] PARSE[Parse Request] ROUTE[Router Brain] DISPATCH[Provider Dispatch] TEL[Telemetry Emitter] end subgraph Async["Async Pipeline"] CHAN[mpsc Channel] BATCH[Batch Collector] end subgraph Storage REDIS[(Redis)] TSDB[(TimescaleDB)] PG[(PostgreSQL)] end subgraph Aggregation HOURLY[Hourly Aggregate] DAILY[Daily Aggregate] end subgraph Consumers DASH[Dashboard API] DIGEST[Weekly Digest Worker] ALERTS[Alert Evaluator] end APP -->|1. HTTPS request| AUTH AUTH -->|key lookup| REDIS REDIS -.->|cache miss| PG AUTH --> PARSE PARSE --> ROUTE ROUTE -->|rules from memory| ROUTE ROUTE --> DISPATCH DISPATCH -->|2. to LLM provider| LLM[OpenAI / Anthropic] LLM -->|3. response| DISPATCH DISPATCH -->|4. response to client| APP DISPATCH --> TEL TEL -->|fire & forget| CHAN CHAN --> BATCH BATCH -->|COPY bulk insert| TSDB TSDB --> HOURLY TSDB --> DAILY HOURLY --> DASH DAILY --> DASH HOURLY --> DIGEST HOURLY --> ALERTS ALERTS -->|Slack / Email| EXT[External Notifications] ``` ### 3.3 Storage Strategy | Tier | Data | Store | Retention | Compression | |------|------|-------|-----------|-------------| | **Hot** | Raw request events (last 7 days) | TimescaleDB — uncompressed chunks | 7 days uncompressed | None — fast queries | | **Warm** | Raw request events (8–90 days) | TimescaleDB — compressed chunks | Up to 90 days (Pro) / 365 days (Business) | TimescaleDB native compression (~90% reduction) | | **Cold** | Continuous aggregates (hourly/daily) | TimescaleDB — materialized views | Indefinite (survives raw data deletion) | Inherently compact (aggregated) | | **Config** | Orgs, keys, rules, users | PostgreSQL | Indefinite | N/A | | **Ephemeral** | Auth sessions, rate limits, cache | Redis | TTL-based (15min–24hr) | N/A | **Storage estimates at scale:** | Scale | Requests/Day | Raw Event Size | Daily Raw Storage | Monthly (compressed) | |-------|-------------|----------------|-------------------|---------------------| | 1K | 1,000 | ~500 bytes/row | ~0.5 MB | ~1.5 MB | | 10K | 10,000 | ~500 bytes/row | ~5 MB | ~15 MB | | 100K | 100,000 | ~500 bytes/row | ~50 MB | ~150 MB | At 100K requests/day with 90-day retention and 90% compression: ~500 MB total. A db.t4g.small with 20GB gp3 storage handles this trivially. Storage is not a concern at V1 scale. ### 3.4 Privacy & Data Handling This is the section that matters most for trust. The proxy sits in the middle of every LLM request. Customers need to know exactly what we see, store, and can access. **What the proxy sees (in memory, during request processing):** | Data | Seen | Stored | Purpose | |------|------|--------|---------| | Full prompt content (system + user messages) | ✅ Yes — in memory during routing | ❌ No — never persisted | Complexity classification reads the system prompt to detect task patterns | | Full response content | ✅ Yes — streamed through | ❌ No — never persisted | Token counting on stream completion | | Model name (requested + used) | ✅ Yes | ✅ Yes | Core telemetry | | Token counts (input + output) | ✅ Yes | ✅ Yes | Cost calculation | | Customer's LLM API keys | ✅ Yes — decrypted in memory for provider dispatch | ✅ Encrypted at rest (AES-256-GCM) | Forwarding requests to providers | | dd0c API key | ✅ Yes — hash compared | ✅ Hash only (SHA-256) | Authentication | | Request tags (feature, team) | ✅ Yes | ✅ Yes | Attribution | | IP address | ✅ Yes | ❌ No | Rate limiting only | | Latency, status code | ✅ Yes | ✅ Yes | Performance telemetry | **Critical privacy guarantees:** 1. **Prompt content is NEVER stored.** Not in the database. Not in logs. Not in error reports. The proxy processes prompts in memory and discards them. This is the #1 trust requirement. 2. **Customer LLM API keys are encrypted at rest** using AES-256-GCM with a per-org encryption key derived from AWS KMS. The proxy decrypts them in memory only for the duration of the provider request. 3. **Telemetry contains metadata, not content.** We store: "this request used 1,247 input tokens on gpt-4o-mini and cost $0.0002." We do NOT store: "the user asked about quarterly revenue projections for Q3." 4. **No cross-org data leakage.** Every query is scoped by `org_id`. TimescaleDB chunks are segmented by `org_id` for compression. There is no query path that returns data from multiple orgs. **V1.5 enhancement — client-side classification:** For customers who can't accept prompt content transiting through a third-party proxy (Jordan's VPC requirement), V1.5 ships a lightweight WASM classifier that runs client-side. The proxy receives only the routing hint (`complexity: LOW`) and the encrypted request body, which it forwards to the provider without inspection. Telemetry still flows to the dashboard, but prompt content never leaves the customer's infrastructure. --- ## Section 4: INFRASTRUCTURE ### 4.1 AWS Architecture Single region: `us-east-1` (Virginia). Lowest latency to OpenAI and Anthropic API endpoints (both hosted in US East). Multi-region is a V2 concern — the beachhead is US startups. ```mermaid graph TB subgraph Internet CLIENT[Client Apps] USER[Dashboard Users] end subgraph AWS["AWS us-east-1"] subgraph Edge["Edge Layer"] CF[CloudFront CDN
Dashboard UI + API cache] ALB[Application Load Balancer
TLS termination, path routing] end subgraph Compute["ECS Fargate Cluster"] SVC_PROXY[Service: dd0c-proxy
2–10 tasks, 0.25 vCPU / 512MB] SVC_API[Service: dd0c-api
1–3 tasks, 0.25 vCPU / 512MB] SVC_WORKER[Service: dd0c-worker
1 task, 0.25 vCPU / 512MB] end subgraph Data["Data Layer (Private Subnets)"] RDS_PG[RDS PostgreSQL 16
db.t4g.micro, 20GB gp3
Config Store] RDS_TS[RDS PostgreSQL 16 + TimescaleDB
db.t4g.small, 50GB gp3
Telemetry Store] ELASTICACHE[ElastiCache Redis 7
cache.t4g.micro
Cache + Rate Limits] end subgraph Security["Security"] KMS[KMS
Encryption keys] SM[Secrets Manager
DB creds, signing keys] WAF[WAF v2
Rate limiting, geo-blocking] end subgraph Ops["Operations"] CW[CloudWatch
Logs + Metrics + Alarms] ECR[ECR
Container Registry] S3_UI[S3 Bucket
Dashboard static assets] SES_SVC[SES
Digest emails] end end CLIENT -->|HTTPS :443| ALB USER -->|HTTPS| CF CF --> S3_UI CF --> ALB ALB -->|/v1/*| SVC_PROXY ALB -->|/api/*| SVC_API SVC_PROXY --> RDS_TS SVC_PROXY --> ELASTICACHE SVC_PROXY --> KMS SVC_API --> RDS_PG SVC_API --> RDS_TS SVC_API --> ELASTICACHE SVC_WORKER --> RDS_TS SVC_WORKER --> SES_SVC SVC_WORKER --> SM ``` **Network topology:** - VPC with 2 AZs (cost-conscious — 3 AZs is overkill for V1) - Public subnets: ALB only - Private subnets: ECS tasks, RDS, ElastiCache - NAT Gateway: 1 (not 2 — single NAT saves ~$32/month; acceptable risk for V1) - VPC endpoints for ECR, S3, CloudWatch, KMS (avoid NAT charges for AWS service traffic) **ALB routing rules:** | Path Pattern | Target Group | Notes | |-------------|-------------|-------| | `/v1/chat/completions` | dd0c-proxy | OpenAI-compatible proxy endpoint | | `/v1/completions` | dd0c-proxy | Legacy completions | | `/v1/embeddings` | dd0c-proxy | Embedding passthrough (no routing — just telemetry) | | `/api/*` | dd0c-api | Dashboard REST API | | `/*` (default) | 404 fixed response | Reject unknown paths | Dashboard UI is served from S3 via CloudFront — never hits the ALB. ### 4.2 Cost Estimate Real numbers. No hand-waving. #### At 1K requests/day (~$65/month infrastructure) | Service | Spec | Monthly Cost | |---------|------|-------------| | ECS Fargate (proxy) | 2 tasks × 0.25 vCPU × 512MB × 730hrs | $14.60 | | ECS Fargate (api) | 1 task × 0.25 vCPU × 512MB × 730hrs | $7.30 | | ECS Fargate (worker) | 1 task × 0.25 vCPU × 512MB × 730hrs | $7.30 | | RDS PostgreSQL | db.t4g.micro, 20GB gp3, single-AZ | $12.41 | | RDS TimescaleDB | db.t4g.small, 50GB gp3, single-AZ | $24.82 | | ElastiCache Redis | cache.t4g.micro, single-AZ | $8.35 | | ALB | 1 ALB + minimal LCUs | $16.20 | | NAT Gateway | 1 gateway + ~5GB data | $33.48 | | CloudFront | <1GB transfer | $0.00 (free tier) | | S3 | <1GB static assets | $0.02 | | SES | <1000 emails/month | $0.10 | | KMS | 1 key + ~10K requests | $1.03 | | CloudWatch | Logs + basic metrics | $3.00 | | **Total** | | **~$129/month** | **Optimization note:** The NAT Gateway at $33/month is the biggest single line item. Alternative: replace with a NAT instance on a t4g.nano ($3/month) or use VPC endpoints aggressively to eliminate NAT for AWS service traffic. With VPC endpoints for ECR/S3/CW/KMS, the only NAT traffic is outbound to LLM providers — which could go through a public subnet proxy task instead. Realistic optimized cost: **~$95/month**. #### At 10K requests/day (~$155/month infrastructure) | Change from 1K | Impact | |----------------|--------| | Proxy scales to 3-4 tasks | +$15-22 | | TimescaleDB storage grows to ~15MB/month compressed | Negligible | | ALB LCU usage increases | +$5 | | SES volume increases (more digest recipients) | +$1 | | **Total** | **~$155/month** | #### At 100K requests/day (~$320/month infrastructure) | Change from 10K | Impact | |-----------------|--------| | Proxy scales to 6-10 tasks | +$45-75 | | API scales to 2-3 tasks | +$7-15 | | TimescaleDB upgrade to db.t4g.medium (more IOPS) | +$25 | | ElastiCache upgrade to cache.t4g.small | +$8 | | ALB LCU usage | +$15 | | NAT data transfer (~50GB/month) | +$25 | | **Total** | **~$320/month** | **Gross margin at each scale:** | Scale | Requests/Day | Est. Customers | Est. MRR | Infra Cost | Gross Margin | |-------|-------------|----------------|----------|------------|-------------| | 1K | 1,000 | 5-10 | $375-750 | $129 | 66-83% | | 10K | 10,000 | 50-100 | $3,750-7,500 | $155 | 96-98% | | 100K | 100,000 | 200-500 | $15,000-37,500 | $320 | 98-99% | The unit economics are absurd. Near-zero marginal cost per customer. This is the beauty of a proxy — it adds almost no compute to the request path. ### 4.3 Scaling Strategy **Proxy horizontal scaling (the only thing that needs to scale):** ECS Service Auto Scaling with two policies: 1. **Target tracking:** CPU utilization target 60%. Scale out when sustained above 60%, scale in when below 40%. 2. **Step scaling:** Request count per target (from ALB). Scale out aggressively at >500 req/min/task. Min tasks: 2 (availability). Max tasks: 20 (cost cap — revisit at $10K MRR). **Database scaling:** TimescaleDB is the bottleneck candidate. Scaling path: 1. **V1 (1K-10K req/day):** db.t4g.small, single-AZ. Continuous aggregates handle dashboard query load. 2. **V1.5 (10K-100K req/day):** db.t4g.medium, add a read replica for dashboard API queries. Proxy writes to primary, API reads from replica. 3. **V2 (100K+ req/day):** If TimescaleDB hits limits, evaluate: - Upgrade to db.r6g.large (more memory for hot data) - Or migrate telemetry to ClickHouse (better for high-cardinality analytics at scale) - Decision point: when continuous aggregate refresh takes >5 minutes PostgreSQL (config store) stays on db.t4g.micro indefinitely. Config data is tiny. **Redis scaling:** cache.t4g.micro handles ~12K ops/sec. At 100K requests/day (~1.2 req/sec average, ~10 req/sec peak), Redis is at <0.1% capacity. Redis is not a scaling concern until 1M+ requests/day. ### 4.4 CI/CD Pipeline **GitHub Actions. No Jenkins. No CodePipeline. Keep it simple.** ```yaml # .github/workflows/deploy.yml (simplified) name: Build & Deploy on: push: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: dtolnay/rust-toolchain@stable - run: cargo test --workspace - run: cargo clippy --workspace -- -D warnings - run: cargo fmt --check build-and-push: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE }} - uses: aws-actions/amazon-ecr-login@v2 - run: | docker build -t dd0c-proxy -f crates/proxy/Dockerfile . docker build -t dd0c-api -f crates/api/Dockerfile . docker build -t dd0c-worker -f crates/worker/Dockerfile . # Tag and push to ECR for svc in proxy api worker; do docker tag dd0c-$svc $ECR_REGISTRY/dd0c-$svc:$GITHUB_SHA docker push $ECR_REGISTRY/dd0c-$svc:$GITHUB_SHA done deploy: needs: build-and-push runs-on: ubuntu-latest steps: - uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE }} - run: | # Update ECS services with new image for svc in proxy api worker; do aws ecs update-service \ --cluster dd0c-prod \ --service dd0c-$svc \ --force-new-deployment done deploy-ui: needs: test runs-on: ubuntu-latest defaults: run: working-directory: ui steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: { node-version: 20 } - run: npm ci && npm run build - run: | aws s3 sync dist/ s3://dd0c-dashboard-ui/ --delete aws cloudfront create-invalidation --distribution-id $CF_DIST_ID --paths "/*" ``` **Deployment strategy:** Rolling update via ECS (default). No blue/green for V1 — adds complexity. The proxy is stateless; rolling updates cause zero downtime. If a bad deploy ships, `aws ecs update-service --force-new-deployment` with the previous image SHA rolls back in <2 minutes. **Database migrations:** `sqlx migrate run` executed as a pre-deploy step in the API container's entrypoint. Migrations are forward-only, backward-compatible (add columns, don't rename/drop). This means the old code can run against the new schema during rolling deploys. ### 4.5 Monitoring & Alerting **Eat your own dogfood:** dd0c/route monitors its own LLM provider calls through itself. The proxy routes its own internal LLM calls (if any future features use LLMs) through the same routing engine. **CloudWatch metrics (custom + built-in):** | Metric | Source | Alarm Threshold | |--------|--------|----------------| | `dd0c.proxy.request_count` | Proxy (StatsD → CW) | N/A (dashboard only) | | `dd0c.proxy.latency_p99` | Proxy | >50ms for 5 minutes | | `dd0c.proxy.error_rate` | Proxy | >5% for 3 minutes | | `dd0c.proxy.provider_error_rate` | Proxy (per provider) | >10% for 2 minutes | | `dd0c.proxy.circuit_breaker_open` | Proxy | Any open → alert | | `dd0c.telemetry.batch_lag` | Proxy | >1000 events queued | | ECS CPU/Memory | CloudWatch built-in | CPU >80% sustained 5min | | RDS CPU/Connections/IOPS | CloudWatch built-in | CPU >70%, connections >80% of max | | ALB 5xx rate | CloudWatch built-in | >1% for 3 minutes | | ALB target response time | CloudWatch built-in | p99 >200ms for 5 minutes | **Alerting channels:** | Severity | Channel | Response | |----------|---------|----------| | P0 (proxy down, >5% error rate) | PagerDuty → phone call | Wake up Brian | | P1 (high latency, circuit breaker, DB issues) | Slack #dd0c-alerts | Check within 1 hour | | P2 (capacity warnings, cost anomalies) | Email digest | Review next morning | **Structured logging:** All services emit JSON logs to CloudWatch Logs: ```json { "timestamp": "2026-03-15T14:22:33.456Z", "level": "info", "service": "proxy", "trace_id": "abc123", "org_id": "org_456", "event": "request_routed", "model_requested": "gpt-4o", "model_used": "gpt-4o-mini", "latency_ms": 3, "cost_saved": 0.0018 } ``` **No prompt content in logs. Ever.** The `tracing` crate with custom `Layer` implementation strips any field named `prompt`, `messages`, `content`, or `system` before emission. Defense in depth — even if a developer accidentally logs request content, the layer redacts it. **Uptime monitoring:** External health check via UptimeRobot (free tier, 5-minute intervals) hitting `GET /health` on the ALB. If the proxy is unreachable from the internet, Brian gets a text. **Solo founder operational reality:** Brian can realistically monitor: - 1 Slack channel (#dd0c-alerts) — glance at it 3x/day - 1 PagerDuty rotation — himself, 24/7 (this is the solo founder life) - 1 CloudWatch dashboard — check it during weekly review - UptimeRobot — set it and forget it Everything else must be automated. No manual log tailing. No daily metric reviews. Alerts fire when something is wrong. Silence means everything is fine. --- ## Section 5: SECURITY ### 5.1 API Key Management — The Trust Problem This is the #1 adoption barrier. Customers must give dd0c/route their OpenAI/Anthropic API keys so the proxy can forward requests. If they don't trust us with their keys, the product is dead. **How customer LLM API keys are handled:** ``` Customer enters API key in dashboard │ ├─ 1. Key transmitted over TLS 1.3 (HTTPS only, HSTS enforced) ├─ 2. API server receives key in memory ├─ 3. Key encrypted with AES-256-GCM using org-specific DEK │ DEK (Data Encryption Key) is itself encrypted by AWS KMS CMK │ Envelope encryption: KMS never sees the API key ├─ 4. Encrypted key stored in PostgreSQL (provider_credentials.encrypted_key) ├─ 5. Plaintext key zeroed from memory (Rust: zeroize crate) │ └─ At request time: ├─ Proxy fetches encrypted key from PG (cached in Redis, encrypted, 5min TTL) ├─ Decrypts with DEK (DEK cached in proxy memory, rotated hourly) ├─ Uses plaintext key for provider API call └─ Plaintext key held only for request duration, then dropped ``` **Key security properties:** | Property | Implementation | |----------|---------------| | Encryption at rest | AES-256-GCM, envelope encryption via AWS KMS | | Encryption in transit | TLS 1.3 (ALB terminates, internal traffic in VPC) | | Key isolation | Per-org DEK — compromising one org's DEK doesn't expose others | | Key rotation | KMS CMK auto-rotates annually. DEKs can be rotated per-org on demand. | | Access logging | Every KMS `Decrypt` call logged in CloudTrail. Anomalous decryption patterns trigger alerts. | | Zero-knowledge option (V1.5) | Customer runs proxy in their VPC. Keys never leave their infrastructure. dd0c SaaS only receives telemetry. | | Key revocation | Customer can delete their provider credentials from the dashboard instantly. Cached copies expire within 5 minutes (Redis TTL). | **Trust mitigation strategy (layered):** 1. **Transparency:** Open-source the proxy core. Customers can read every line of code that touches their API keys. "Don't trust us — read the code." 2. **Minimization:** The proxy only needs the key for the duration of the API call. It doesn't store it in logs, doesn't include it in telemetry, doesn't transmit it anywhere except to the LLM provider. 3. **Bring-your-own-proxy (V1.5):** For customers who won't send keys to a third party, ship a Docker image they run in their VPC. The proxy connects outbound to dd0c SaaS for config and sends telemetry. Keys never leave the customer's network. 4. **Audit trail:** Every API key usage is logged (not the key itself — the key_id and timestamp). Customers can see when their keys were last used in the dashboard. 5. **Insurance:** If a key is compromised through dd0c, we'll cover the cost of any unauthorized API usage. (This is a marketing commitment, not a legal one — but it signals confidence.) ### 5.2 Authentication & Authorization Model **Three auth contexts:** | Context | Method | Token Type | Lifetime | |---------|--------|-----------|----------| | Dashboard UI | GitHub OAuth → JWT | Access token (short) + Refresh token | 15min / 7 days | | Proxy API | dd0c API key | Bearer token (hashed, never expires unless revoked) | Until revoked | | Dashboard API (programmatic) | dd0c API key | Same as proxy | Until revoked | **GitHub OAuth flow:** ``` Browser → /api/auth/github → redirect to GitHub GitHub → /api/auth/callback?code=xxx │ ├─ Exchange code for GitHub access token ├─ Fetch GitHub user profile (id, login, email, avatar) ├─ Upsert user in PostgreSQL ├─ Issue JWT access token (15min, signed with RS256) ├─ Issue refresh token (7 days, stored in Redis, httpOnly cookie) └─ Redirect to dashboard with access token ``` **Authorization model (V1 — simple RBAC):** | Role | Permissions | |------|------------| | Owner | Everything. Billing. Delete org. Manage members. | | Admin | Manage routing rules, API keys, alerts. View all data. Cannot delete org or manage billing. | | Member | View dashboard, view request inspector. Cannot modify config. | V1 ships with Owner + Member only. Admin role added when the first customer asks for it. **API key format:** ``` dd0c_sk_live_a3f2b8c9d4e5f6a7b8c9d4e5f6a7b8c9 Prefix: dd0c_sk_ Environment: live_ or test_ Random: 32 hex chars (128 bits of entropy) ``` The full key is shown once at creation. Only the SHA-256 hash is stored. The prefix (`dd0c_sk_live_a3f2...`) is stored for display in the dashboard ("Which key is this?"). ### 5.3 Data Encryption | Layer | Method | Key Management | |-------|--------|---------------| | In transit (client → ALB) | TLS 1.3 via ACM certificate | AWS Certificate Manager auto-renewal | | In transit (ALB → ECS) | TLS 1.2+ (ALB → target group HTTPS) | Self-signed certs in containers, rotated on deploy | | In transit (ECS → RDS) | TLS 1.2 (RDS `require_ssl`) | RDS CA certificate | | In transit (ECS → ElastiCache) | TLS 1.2 (in-transit encryption enabled) | ElastiCache managed | | At rest (RDS) | AES-256 via RDS encryption | AWS KMS (RDS default key) | | At rest (provider API keys) | AES-256-GCM application-level | AWS KMS CMK (dd0c-managed) | | At rest (S3) | AES-256 (SSE-S3) | AWS managed | | At rest (CloudWatch Logs) | AES-256 | AWS KMS (CW default key) | ### 5.4 SOC 2 Readiness SOC 2 Type II is a V3 milestone (month 7-12). But V1 architecture decisions should not create SOC 2 blockers. **V1 decisions that are SOC 2 forward-compatible:** | SOC 2 Requirement | V1 Implementation | |-------------------|-------------------| | Access control | GitHub OAuth + RBAC. No shared accounts. | | Audit logging | CloudTrail for AWS API calls. Application-level audit log for config changes (who changed what routing rule, when). | | Encryption | All data encrypted in transit and at rest (see 5.3). | | Change management | GitHub PRs required for main branch. CI/CD pipeline enforces tests. | | Incident response | PagerDuty alerting. Documented runbook (even if it's just a README). | | Vendor management | Only AWS + GitHub + Stripe as vendors. All SOC 2 certified themselves. | | Data retention | Configurable per plan. Deletion is automated via TimescaleDB retention policies. | | Availability | Multi-AZ ALB. ECS tasks across 2 AZs. RDS single-AZ (upgrade to multi-AZ for SOC 2). | **SOC 2 blockers to address before certification:** 1. RDS must be multi-AZ (adds ~$25/month per instance) 2. Formal security policy documentation 3. Background checks for employees (just Brian for now — easy) 4. Penetration test (budget ~$5K) 5. Auditor engagement (~$20-30K for Type II) Total SOC 2 cost: ~$30-40K. Only pursue at $10K+ MRR when enterprise customers demand it. ### 5.5 Trust Barrier Mitigation (The #1 Risk) The product brief identifies trust as the highest-severity risk. Here's the technical architecture's answer: **Phase 1 (V1 launch): Transparency + Beachhead** - Open-source the proxy core on GitHub. MIT license. - Publish a security whitepaper: "How dd0c/route handles your API keys" — detailed, technical, honest. - Target startups without compliance teams. They evaluate tools by reading code, not requesting SOC 2 reports. - Shadow Audit mode proves value without requiring key trust. Convert skeptics with their own savings data. **Phase 2 (V1.5, month 4-5): Self-Hosted Data Plane** - Ship `dd0c-proxy` as a Docker image customers run in their own VPC/infrastructure. - The proxy connects outbound to `api.route.dd0c.dev` for: - Routing rule configuration (pull) - Telemetry data (push — metadata only, no prompt content) - Cost table updates (pull) - Customer's LLM API keys stay in their infrastructure. Period. - dd0c SaaS provides the dashboard, digest, and analytics. The proxy is the customer's. **Phase 3 (V2+): Compliance Certifications** - SOC 2 Type II - GDPR DPA (Data Processing Agreement) - Optional: HIPAA BAA for healthcare vertical The architecture is designed so that Phase 2 is a deployment topology change, not a rewrite. The proxy binary is the same — it just reads config from a different source (local file vs. API) and sends telemetry to a different endpoint (local collector vs. SaaS). --- ## Section 6: MVP SCOPE ### 6.1 What Ships in V1 (4-6 Week Build) The V1 is ruthlessly scoped. Every feature must answer: "Does this help a customer save money on LLM calls within 5 minutes of signup?" **Week 1-2: Proxy Core** | Deliverable | Details | Done When | |-------------|---------|-----------| | OpenAI-compatible proxy | `POST /v1/chat/completions` with streaming support | A client can swap `api.openai.com` for `proxy.route.dd0c.dev` and get identical responses | | Auth layer | dd0c API key validation (Redis-cached hash lookup) | Unauthorized requests get 401. Valid keys route correctly. | | Provider dispatch | OpenAI + Anthropic providers with connection pooling | Requests forward to the correct provider with <5ms overhead | | Telemetry emission | Async batch insert to TimescaleDB | Every request produces a `request_event` row within 2 seconds | | Health endpoint | `GET /health` returns 200 with version + uptime | ALB health checks pass | **Week 2-3: Router Brain + Cost Engine** | Deliverable | Details | Done When | |-------------|---------|-----------| | Heuristic complexity classifier | Token count + task pattern + model hint → LOW/MEDIUM/HIGH | Classifier runs in <2ms and agrees with human judgment ~75% of the time on a test set of 100 prompts | | Rule engine | First-match rule evaluation with passthrough/cheapest/cascading strategies | A routing rule like "if feature=classify, use cheapest from [gpt-4o-mini, claude-haiku]" works | | Cost tables | Seeded with current OpenAI + Anthropic pricing | `model_costs` table populated, proxy loads into memory | | Fallback chains | Circuit breaker per provider/model | If gpt-4o-mini returns 5xx, request automatically retries on claude-haiku | | Response headers | `X-DD0C-Model`, `X-DD0C-Cost`, `X-DD0C-Saved` on every response | Client can programmatically read routing decisions | **Week 3-4: Dashboard API + UI** | Deliverable | Details | Done When | |-------------|---------|-----------| | GitHub OAuth | Sign up / sign in with GitHub | New user can create an org and get an API key in <60 seconds | | Cost overview page | Real-time cost ticker, 7/30-day spend chart, savings counter | Marcus sees "You saved $X this week" on the dashboard | | Cost treemap | Spend breakdown by feature tag, team tag, model | Marcus can identify which feature is the most expensive | | Request inspector | Paginated table of recent requests with model, cost, routing decision | Marcus can drill into individual requests to understand routing | | Routing config UI | CRUD for routing rules with drag-to-reorder priority | Marcus can create a rule "route all classify requests to gpt-4o-mini" | | API key management | Create/revoke dd0c API keys, add provider credentials | Marcus can set up his org without touching a CLI | **Week 4-5: Retention Mechanics** | Deliverable | Details | Done When | |-------------|---------|-----------| | Weekly savings digest | Monday 9am email: "Last week you saved $X. Breakdown by feature/model." | Email renders correctly in Gmail/Outlook. Unsubscribe works. | | Budget alerts | Threshold-based: "Alert me when daily spend exceeds $100" | Slack webhook fires when threshold is crossed | | Shadow Audit CLI | `npx dd0c-scan ./src` scans codebase for LLM calls and estimates savings | CLI runs on a sample Node.js project and produces a plausible savings report | **Week 5-6: Hardening + Launch Prep** | Deliverable | Details | Done When | |-------------|---------|-----------| | Rate limiting | Per-key rate limits (1000 req/min default) via Redis | Burst traffic doesn't take down the proxy | | Error handling | Graceful degradation: if TimescaleDB is down, proxy still routes (telemetry dropped) | Proxy availability is independent of analytics availability | | Monitoring | CloudWatch dashboards, PagerDuty alerts for P0/P1 | Brian gets woken up if the proxy is down | | Documentation | API docs (OpenAPI spec), quickstart guide, "How we handle your keys" page | A developer can integrate in <5 minutes by reading the docs | | Landing page | route.dd0c.dev — value prop, pricing, "Try the CLI" CTA | Visitors understand what dd0c/route does in 10 seconds | | Infrastructure | CDK/Terraform for the full AWS stack, CI/CD pipeline | `git push main` deploys to production | ### 6.2 What's Explicitly Deferred to V2 | Feature | Why Deferred | V2 Timeline | |---------|-------------|-------------| | ML-based complexity classifier | Needs training data from V1 telemetry. Heuristic is good enough to prove the value prop. | Month 3-4 | | Google/Gemini provider | Two providers cover 80%+ of the market. Adding Gemini is a weekend of work once the provider trait is proven. | Month 2-3 | | Self-hosted proxy (BYOP) | Critical for enterprise trust, but V1 targets startups who are less paranoid. | Month 4-5 | | WASM client-side classifier | Requires the self-hosted proxy architecture. | Month 5-6 | | GitHub Action (PR cost comments) | Cool PLG feature but not core. Needs the CLI to be stable first. | Month 3-4 | | VS Code extension | Same — derivative of the CLI. | Month 4-5 | | Log ingestion (Mode B shadow audit) | Requires building a log parser for multiple formats. CLI scan is simpler and ships first. | Month 2-3 | | Multi-region deployment | us-east-1 covers the beachhead. EU region when EU customers appear. | Month 6+ | | SSO / SAML | Enterprise feature. GitHub OAuth is fine for startups. | Month 6+ (with SOC 2) | | Prompt caching (semantic dedup) | Technically complex (embedding similarity). Exact-match cache in Redis is V1. Semantic cache is V2. | Month 4-5 | | Carbon tracking | Interesting differentiator but not a V1 priority. | Month 6+ | | Cascading try-cheap-first with quality feedback | Needs the ML classifier to evaluate response quality. V1 cascading is based on error codes only. | Month 4-5 | | Stripe billing integration | V1 is free tier only (up to 10K requests/day). Billing ships when there are paying customers. | Month 2-3 | | Team/seat management | V1 orgs have one owner. Multi-user orgs are a V1.5 feature. | Month 2-3 | ### 6.3 Technical Debt Budget V1 will accumulate debt. That's fine. Here's what we're consciously accepting: | Debt Item | Severity | Why It's Acceptable | Payoff Trigger | |-----------|----------|-------------------|----------------| | Single-AZ RDS instances | Medium | Saves ~$50/month. Acceptable downtime risk for <100 customers. | First enterprise customer or SOC 2 prep | | No database connection pooling (PgBouncer) | Low | Direct connections are fine at <50 concurrent proxy tasks. | >50 proxy tasks or connection count warnings | | Hardcoded cost tables (seeded, not auto-updated) | Low | Model pricing changes monthly. Manual DB update is fine at V1 scale. | When Brian forgets to update and a customer notices | | No request body validation beyond auth | Medium | The proxy trusts that the client sends valid OpenAI-format requests. Invalid requests get a provider error, not a dd0c error. | When support tickets about confusing errors pile up | | No end-to-end encryption tests | Medium | Unit tests + integration tests cover the critical paths. E2E is expensive to maintain for a solo founder. | First hire or first security incident | | Monolithic continuous aggregate | Low | One hourly aggregate serves all dashboard queries. May need feature-specific aggregates at scale. | Dashboard queries exceed 500ms | | No graceful shutdown / drain | Medium | ECS rolling update kills tasks. In-flight requests may fail. At low traffic, this is rare. | When a customer reports a failed request during deploy | **Total acceptable debt: ~2 weeks of cleanup work.** Schedule a "debt sprint" at month 3 (after V1 launch stabilizes). ### 6.4 Solo Founder Operational Considerations Brian is one person. The architecture must respect that constraint. **What one person can realistically operate:** | Responsibility | Time Budget | Automation | |---------------|-------------|------------| | Incident response | <2 hrs/week (target: 0) | PagerDuty + automated restarts (ECS health checks) | | Deploys | 1 deploy/day, <5 min each | Fully automated CI/CD. `git push` = deploy. | | Database maintenance | <1 hr/week | RDS automated backups, TimescaleDB automated compression/retention | | Cost monitoring | 15 min/week | AWS Budgets alert at $150, $200, $300 thresholds | | Customer support | 2-4 hrs/week (at <100 customers) | GitHub Issues + email. No live chat. No phone. | | Security patches | 1 hr/week | Dependabot for Rust crates + npm. Automated PR creation. | | Feature development | 20-30 hrs/week | Everything else is automated so Brian can code | **Things Brian should NOT do manually:** - ❌ SSH into servers (there are no servers — Fargate) - ❌ Run database queries to answer customer questions (build it into the dashboard) - ❌ Manually rotate secrets (KMS auto-rotation + Secrets Manager) - ❌ Monitor logs in real-time (alerts handle this) - ❌ Manually scale infrastructure (auto-scaling handles this) - ❌ Process refunds or billing changes (Stripe self-serve portal) **On-call reality:** Brian is on-call 24/7. The architecture minimizes pages by: 1. Making the proxy stateless and self-healing (ECS restarts failed tasks) 2. Making telemetry failure non-fatal (proxy works without TimescaleDB) 3. Using circuit breakers to handle provider outages automatically 4. Setting alert thresholds high enough to avoid noise, low enough to catch real problems If Brian gets paged more than twice a week, something is architecturally wrong and needs fixing — not more monitoring. --- ## Section 7: API DESIGN ### 7.1 OpenAI-Compatible Proxy Endpoint The proxy endpoint is a drop-in replacement for `api.openai.com`. Customers change one environment variable and everything works. ``` # Before OPENAI_API_BASE=https://api.openai.com/v1 # After OPENAI_API_BASE=https://proxy.route.dd0c.dev/v1 ``` **Supported endpoints (V1):** #### `POST /v1/chat/completions` The primary endpoint. Handles both streaming and non-streaming requests. **Request (identical to OpenAI):** ```json { "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Classify this support ticket: ..."} ], "temperature": 0.3, "max_tokens": 100, "stream": true } ``` **dd0c-specific request headers (all optional):** | Header | Type | Description | |--------|------|-------------| | `Authorization` | `Bearer dd0c_sk_live_...` | Required. dd0c API key. | | `X-DD0C-Feature` | string | Tag this request with a feature name for cost attribution. E.g., `classify`, `summarize`, `chat`. | | `X-DD0C-Team` | string | Tag with team name. E.g., `backend`, `ml-team`, `support`. | | `X-DD0C-Environment` | string | `production`, `staging`, `development`. Defaults to key's environment. | | `X-DD0C-Routing` | `auto` \| `passthrough` | Override routing. `passthrough` = use the requested model, no routing. Default: `auto`. | | `X-DD0C-Budget-Id` | string | Associate with a specific budget for limit enforcement. | **Response (identical to OpenAI, plus dd0c headers):** ```http HTTP/1.1 200 OK Content-Type: application/json X-DD0C-Request-Id: req_a1b2c3d4e5f6 X-DD0C-Model-Requested: gpt-4o X-DD0C-Model-Used: gpt-4o-mini X-DD0C-Provider: openai X-DD0C-Cost: 0.000150 X-DD0C-Cost-Without-Routing: 0.002500 X-DD0C-Saved: 0.002350 X-DD0C-Complexity: LOW X-DD0C-Complexity-Confidence: 0.92 X-DD0C-Latency-Overhead-Ms: 3 { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1709251200, "model": "gpt-4o-mini-2024-07-18", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "This is a billing inquiry." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 42, "completion_tokens": 8, "total_tokens": 50 } } ``` The response body is untouched — it's exactly what the LLM provider returned. dd0c metadata lives exclusively in response headers. This means existing client code that parses the response body works without modification. **Streaming response:** SSE stream is passed through transparently. dd0c headers are on the initial HTTP response. The final `data: [DONE]` chunk is forwarded as-is. #### `POST /v1/completions` Legacy completions endpoint. Same routing logic applies. Included for backward compatibility with older OpenAI SDK versions. #### `POST /v1/embeddings` Passthrough only — no routing (embedding models aren't interchangeable like chat models). Telemetry is still captured for cost attribution. #### `GET /v1/models` Returns the union of models available across all configured providers for this org, enriched with dd0c cost data: ```json { "data": [ { "id": "gpt-4o", "object": "model", "owned_by": "openai", "dd0c": { "input_cost_per_m": 2.50, "output_cost_per_m": 10.00, "quality_tier": "frontier", "routing_eligible": true } }, { "id": "gpt-4o-mini", "object": "model", "owned_by": "openai", "dd0c": { "input_cost_per_m": 0.15, "output_cost_per_m": 0.60, "quality_tier": "economy", "routing_eligible": true } } ] } ``` #### `GET /health` ```json { "status": "healthy", "version": "0.1.0", "uptime_seconds": 86400, "providers": { "openai": {"status": "healthy", "latency_ms": 45}, "anthropic": {"status": "healthy", "latency_ms": 52} } } ``` **Error responses:** dd0c errors use standard OpenAI error format so client SDKs handle them correctly: ```json { "error": { "message": "Invalid dd0c API key", "type": "authentication_error", "code": "invalid_api_key", "dd0c_code": "DD0C_AUTH_001" } } ``` | HTTP Status | dd0c_code | Meaning | |-------------|-----------|---------| | 401 | DD0C_AUTH_001 | Invalid or revoked API key | | 403 | DD0C_AUTH_002 | API key doesn't have permission for this org | | 429 | DD0C_RATE_001 | dd0c rate limit exceeded (not provider rate limit) | | 429 | DD0C_BUDGET_001 | Budget limit reached for this key/feature/team | | 502 | DD0C_PROVIDER_001 | All providers in fallback chain returned errors | | 503 | DD0C_PROXY_001 | Proxy is overloaded or shutting down | Provider errors (OpenAI 429, Anthropic 529, etc.) are passed through with original status codes and bodies, plus an `X-DD0C-Provider-Error: true` header so clients can distinguish dd0c errors from provider errors. ### 7.2 Shadow Audit API The Shadow Audit CLI (`npx dd0c-scan`) is primarily offline, but it calls two API endpoints: #### `GET /api/v1/pricing/current` Public endpoint (no auth required). Returns current model pricing for the CLI's savings calculations. ```json { "updated_at": "2026-03-01T00:00:00Z", "models": [ { "provider": "openai", "model": "gpt-4o", "input_cost_per_m": 2.50, "output_cost_per_m": 10.00, "quality_tier": "frontier" }, { "provider": "openai", "model": "gpt-4o-mini", "input_cost_per_m": 0.15, "output_cost_per_m": 0.60, "quality_tier": "economy" } ] } ``` #### `POST /api/v1/scan/report` (optional, with user consent) If the user opts in (`--share-report`), the CLI sends an anonymized scan summary for lead generation: ```json { "email": "marcus@example.com", "scan_summary": { "total_llm_calls_found": 14, "models_detected": ["gpt-4o", "gpt-4"], "estimated_monthly_cost": 4217.00, "estimated_monthly_savings": 2327.00, "savings_percentage": 55.2, "language": "typescript", "framework": "express" } } ``` No source code, no prompt content, no file paths. Just aggregate numbers for the sales funnel. ### 7.3 Dashboard API Endpoints All dashboard endpoints require authentication (JWT or dd0c API key). All responses are JSON. All list endpoints support pagination via `?cursor=xxx&limit=50`. #### Auth | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/auth/github` | Initiate GitHub OAuth flow | | `GET` | `/api/auth/callback` | GitHub OAuth callback | | `POST` | `/api/auth/refresh` | Refresh access token | | `POST` | `/api/auth/logout` | Invalidate refresh token | #### Organizations | Method | Path | Description | |--------|------|-------------| | `POST` | `/api/orgs` | Create organization | | `GET` | `/api/orgs/:org_id` | Get org details | | `PATCH` | `/api/orgs/:org_id` | Update org settings | | `GET` | `/api/orgs/:org_id/members` | List members | | `POST` | `/api/orgs/:org_id/members` | Invite member (V1.5) | #### API Keys | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/orgs/:org_id/keys` | List API keys (prefix + metadata only) | | `POST` | `/api/orgs/:org_id/keys` | Create API key (returns full key once) | | `DELETE` | `/api/orgs/:org_id/keys/:key_id` | Revoke API key | #### Provider Credentials | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/orgs/:org_id/providers` | List configured providers (suffix only, never the key) | | `PUT` | `/api/orgs/:org_id/providers/:provider` | Set/update provider API key | | `DELETE` | `/api/orgs/:org_id/providers/:provider` | Remove provider credential | | `POST` | `/api/orgs/:org_id/providers/:provider/test` | Test provider credential (makes a minimal API call) | #### Dashboard (Analytics) | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/orgs/:org_id/dashboard/summary` | Current period cost summary (total spend, total saved, request count) | | `GET` | `/api/orgs/:org_id/dashboard/timeseries` | Cost over time. Query params: `period=7d\|30d\|90d`, `granularity=hour\|day` | | `GET` | `/api/orgs/:org_id/dashboard/treemap` | Cost breakdown by feature/team/model for treemap visualization | | `GET` | `/api/orgs/:org_id/dashboard/top-savings` | Top 10 features/endpoints by savings opportunity | | `GET` | `/api/orgs/:org_id/dashboard/model-usage` | Model usage distribution (pie chart data) | **Example: `/api/orgs/:org_id/dashboard/summary`** ```json { "period": "7d", "total_requests": 42850, "total_cost": 127.43, "total_cost_without_routing": 891.20, "total_saved": 763.77, "savings_percentage": 85.7, "avg_latency_ms": 4.2, "top_model": "gpt-4o-mini", "top_feature": "classify", "cache_hit_rate": 0.12 } ``` #### Request Inspector | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/orgs/:org_id/requests` | Paginated request list. Filters: `model`, `feature`, `team`, `status`, `date_from`, `date_to`, `min_cost`, `was_routed` | | `GET` | `/api/orgs/:org_id/requests/:request_id` | Single request detail (routing decision, timing breakdown) | **Example: `/api/orgs/:org_id/requests?feature=classify&limit=20`** ```json { "data": [ { "id": "req_a1b2c3", "timestamp": "2026-03-15T14:22:33Z", "model_requested": "gpt-4o", "model_used": "gpt-4o-mini", "provider": "openai", "feature_tag": "classify", "input_tokens": 142, "output_tokens": 8, "cost": 0.000026, "cost_without_routing": 0.000435, "saved": 0.000409, "latency_ms": 245, "complexity": "LOW", "status": 200 } ], "cursor": "eyJpZCI6InJlcV...", "has_more": true } ``` Note: No prompt content in the response. Ever. The request inspector shows metadata only. #### Routing Rules | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/orgs/:org_id/routing/rules` | List routing rules (ordered by priority) | | `POST` | `/api/orgs/:org_id/routing/rules` | Create routing rule | | `PATCH` | `/api/orgs/:org_id/routing/rules/:rule_id` | Update rule | | `DELETE` | `/api/orgs/:org_id/routing/rules/:rule_id` | Delete rule | | `POST` | `/api/orgs/:org_id/routing/rules/reorder` | Reorder rules (accepts array of rule IDs in new order) | | `GET` | `/api/orgs/:org_id/routing/models` | List available models with current pricing | **Example: Create a routing rule** ```json POST /api/orgs/:org_id/routing/rules { "name": "Route classification to economy models", "match_tags": {"feature": "classify"}, "match_complexity": null, "strategy": "cheapest", "model_chain": ["gpt-4o-mini", "claude-3-haiku"], "daily_budget": 50.00 } ``` #### Alerts | Method | Path | Description | |--------|------|-------------| | `GET` | `/api/orgs/:org_id/alerts` | List alert configurations | | `POST` | `/api/orgs/:org_id/alerts` | Create alert | | `PATCH` | `/api/orgs/:org_id/alerts/:alert_id` | Update alert | | `DELETE` | `/api/orgs/:org_id/alerts/:alert_id` | Delete alert | | `GET` | `/api/orgs/:org_id/alerts/history` | Alert firing history | ### 7.4 Webhook & Notification API V1 supports outbound webhooks for two events: #### Budget Alert Webhook Fires when a spend threshold is crossed. ```json POST {customer_webhook_url} Content-Type: application/json X-DD0C-Signature: sha256=abc123... { "event": "budget.threshold_reached", "timestamp": "2026-03-15T14:22:33Z", "org_id": "org_456", "alert": { "id": "alert_789", "name": "Daily spend limit", "threshold": 100.00, "current_spend": 102.47, "period": "daily" }, "scope": { "feature": "summarize", "team": null } } ``` #### Slack Integration Native Slack webhook support (no Slack app — just incoming webhooks for V1): ```json { "text": "🚨 *dd0c/route Budget Alert*\nDaily spend for `summarize` reached $102.47 (limit: $100.00)\n" } ``` **Webhook security:** All outbound webhooks include an `X-DD0C-Signature` header containing an HMAC-SHA256 signature of the request body, using a per-org webhook secret. Customers can verify the signature to ensure the webhook came from dd0c. ### 7.5 SDK Considerations **V1: No SDK. Use the OpenAI SDK.** The entire point of OpenAI compatibility is that customers don't need a dd0c SDK. They use the official OpenAI Python/Node/Go SDK and change the base URL. Done. ```python # Python — using official OpenAI SDK from openai import OpenAI client = OpenAI( api_key="dd0c_sk_live_a3f2b8c9...", base_url="https://proxy.route.dd0c.dev/v1" ) response = client.chat.completions.create( model="gpt-4o", # dd0c may route this to a cheaper model messages=[{"role": "user", "content": "Classify: ..."}], extra_headers={ "X-DD0C-Feature": "classify", "X-DD0C-Team": "backend" } ) # Read routing metadata from response headers # (requires accessing the raw httpx response) ``` ```typescript // TypeScript — using official OpenAI SDK import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'dd0c_sk_live_a3f2b8c9...', baseURL: 'https://proxy.route.dd0c.dev/v1', defaultHeaders: { 'X-DD0C-Feature': 'classify', 'X-DD0C-Team': 'backend', }, }); ``` **V1.5: Thin wrapper SDK (optional convenience)** If customers want easier access to dd0c response headers and routing metadata, ship a thin wrapper: ```python # dd0c Python SDK (V1.5) — wraps OpenAI SDK from dd0c import DD0CClient client = DD0CClient( dd0c_key="dd0c_sk_live_...", # Inherits all OpenAI SDK options ) response = client.chat.completions.create( model="gpt-4o", messages=[...], feature="classify", # convenience param → X-DD0C-Feature header team="backend", ) # Easy access to routing metadata print(response.dd0c.model_used) # "gpt-4o-mini" print(response.dd0c.cost) # 0.000150 print(response.dd0c.saved) # 0.002350 print(response.dd0c.complexity) # "LOW" ``` The SDK is a convenience, not a requirement. The proxy works with any HTTP client that can set headers and parse JSON. --- ## Appendix: Decision Log | Decision | Options Considered | Chosen | Rationale | |----------|-------------------|--------|-----------| | Proxy language | Rust, Go, Node.js | Rust | <10ms latency requirement eliminates GC languages. Rust's ownership model prevents memory leaks in long-running proxy. | | API language | Node.js, Python, Rust | Rust (Axum) | Single-language stack for solo founder. Shared crate library. One build system. | | Telemetry store | PostgreSQL, ClickHouse, TimescaleDB | TimescaleDB | "It's just Postgres" — Brian knows it. Continuous aggregates solve the dashboard query problem. Compression solves storage. | | Config store | SQLite, DynamoDB, PostgreSQL | PostgreSQL (RDS) | Relational integrity for org/key/rule relationships. RDS is managed. Brian's home turf. | | Cache | In-process, Memcached, Redis | Redis (ElastiCache) | Shared state across proxy instances (circuit breakers, rate limits). ElastiCache is managed. | | Compute | Lambda, EC2, ECS Fargate | ECS Fargate | No cold starts (Lambda). No server management (EC2). Right abstraction for stateless containers. | | Auth | Auth0, Clerk, Custom | Custom (GitHub OAuth + JWT) | ~200 lines of code. No vendor dependency. No per-MAU pricing. GitHub is where the users are. | | UI framework | Next.js, SvelteKit, React+Vite | React + Vite | Largest ecosystem. SPA is sufficient (no SSR/SEO needed). Vite is fast. | | Email | Resend, SendGrid, SES | AWS SES | Brian has AWS credits. $0.10/1K emails. Plain HTML digest — no template engine needed. | | IaC | Terraform, CDK, Pulumi | CDK (TypeScript) or Terraform | Brian's choice. Both work. CDK if he wants to stay in AWS-native tooling. Terraform if he wants portability. | | Deployment | Blue/green, Canary, Rolling | Rolling (ECS default) | Simplest. Proxy is stateless. Rolling update = zero downtime. Rollback = redeploy previous SHA. | | Monitoring | Datadog, Grafana Cloud, CloudWatch | CloudWatch | Already included with AWS. No additional vendor. Good enough for V1. Migrate to Grafana Cloud at $5K MRR if CW becomes limiting. | --- *Architecture document generated as Phase 6 of the BMad product development pipeline for dd0c/route.* *Next phase: Implementation planning and sprint breakdown.*