Files

Max Mayfield 5ee95d8b13 dd0c: full product research pipeline - 6 products, 8 phases each

Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
        product-brief, architecture, epics (incl. Epic 10 TF compliance),
        test-architecture (TDD strategy)

Brand strategy and market research included.

2026-02-28 17:35:02 +00:00

81 KiB

Raw Permalink Blame History

dd0c/route — Technical Architecture

Product: dd0c/route — LLM Cost Router & Optimization Dashboard Author: Architecture Phase (BMad Phase 6) Date: February 28, 2026 Status: V1 MVP Architecture — Solo Founder Scope

Section 1: SYSTEM OVERVIEW

1.1 High-Level Architecture

graph TB
    subgraph Clients["Client Applications"]
        APP1[App Service A]
        APP2[App Service B]
        CLI[dd0c-scan CLI]
    end

    subgraph DD0C["dd0c/route Platform (AWS us-east-1)"]
        subgraph ProxyTier["Proxy Tier (ECS Fargate)"]
            PROXY1[Rust Proxy Instance 1]
            PROXY2[Rust Proxy Instance N]
        end

        subgraph ControlPlane["Control Plane (ECS Fargate)"]
            API[Dashboard API<br/>Axum/Rust]
            WORKER[Async Worker<br/>Digest + Alerts]
        end

        subgraph DataTier["Data Tier"]
            PG[(PostgreSQL RDS<br/>Config + Auth)]
            TS[(TimescaleDB RDS<br/>Request Telemetry)]
            REDIS[(ElastiCache Redis<br/>Rate Limits + Cache)]
        end
    end

    subgraph Providers["LLM Providers"]
        OAI[OpenAI API]
        ANT[Anthropic API]
    end

    subgraph External["External Services"]
        GH[GitHub OAuth]
        SES[AWS SES<br/>Digest Emails]
        SLACK[Slack Webhooks]
    end

    APP1 -->|HTTPS / OpenAI-compat| PROXY1
    APP2 -->|HTTPS / OpenAI-compat| PROXY2
    PROXY1 --> OAI
    PROXY1 --> ANT
    PROXY2 --> OAI
    PROXY2 --> ANT
    PROXY1 -->|async telemetry| TS
    PROXY2 -->|async telemetry| TS
    PROXY1 --> REDIS
    PROXY2 --> REDIS
    API --> PG
    API --> TS
    WORKER --> TS
    WORKER --> SES
    WORKER --> SLACK
    CLI -->|log analysis| APP1

1.2 Component Inventory

Component	Language/Runtime	Responsibility	Criticality
Proxy Engine	Rust (tokio + hyper)	Request interception, complexity classification, model routing, response passthrough, telemetry emission	P0 — the product IS this
Router Brain	Rust (embedded in proxy)	Rule evaluation, cost table lookups, fallback chain execution, cascading try-cheap-first logic	P0 — routing decisions
Dashboard API	Rust (axum)	REST API for dashboard UI, config management, auth, org/team CRUD	P0 — the "aha moment"
Dashboard UI	TypeScript (React + Vite)	Cost treemap, request inspector, routing config editor, real-time ticker	P0 — what Marcus sees
Async Worker	Rust (tokio-cron)	Weekly digest generation, anomaly detection (threshold-based), alert dispatch	P1 — retention mechanism
PostgreSQL	AWS RDS (db.t4g.micro)	Organizations, API keys, routing rules, user accounts	P0 — config store
TimescaleDB	AWS RDS (db.t4g.small)	Request telemetry, cost events, token counts — time-series optimized	P0 — analytics backbone
Redis	AWS ElastiCache (t4g.micro)	Rate limiting, exact-match response cache, session tokens	P1 — performance layer

1.3 Technology Choices & Justification

Choice	Alternative Considered	Why This One
Rust (proxy)	Go, Node.js	<10ms p99 overhead is non-negotiable. Rust's zero-cost abstractions and tokio async runtime give us predictable tail latency. Go would add GC pauses. Node.js adds event loop overhead. Portkey's 20-40ms overhead in Node.js is the cautionary tale.
Rust (API)	Node.js (Express), Python (FastAPI)	Single language across the stack reduces cognitive overhead for a solo founder. Axum is production-ready and shares the tokio runtime. One `cargo build` produces the proxy AND the API.
TimescaleDB	ClickHouse, plain PostgreSQL	TimescaleDB is PostgreSQL with time-series superpowers — hypertables, continuous aggregates, compression. Brian already knows PostgreSQL. ClickHouse is faster for analytics but adds operational complexity (separate cluster, different query language, different backup strategy). For a solo founder, "it's just Postgres" wins. Continuous aggregates handle the dashboard rollups. Compression handles storage costs.
PostgreSQL (config)	SQLite, DynamoDB	RDS PostgreSQL is Brian's home turf (AWS architect). Managed backups, failover, IAM auth. DynamoDB would work but adds a second data model to reason about. SQLite doesn't scale past a single instance.
Redis (cache)	In-process LRU, DynamoDB DAX	Shared cache across proxy instances for exact-match response dedup. ElastiCache is managed, cheap at t4g.micro ($0.016/hr). In-process cache doesn't share across instances.
React + Vite (UI)	Next.js, SvelteKit, HTMX	React has the largest hiring pool if Brian ever hires. Vite is fast. The dashboard is a SPA — no SSR needed, no SEO needed. Keep it simple.
AWS SES (email)	Resend, SendGrid	Brian has AWS credits and expertise. SES is $0.10/1000 emails. The digest email is plain HTML — no fancy template engine needed.
GitHub OAuth	Auth0, Clerk, email/password	One-click signup for the developer audience. No password management burden. GitHub is where the users live. Implemented via `oauth2` Rust crate — ~200 lines of code.

1.4 Deployment Model

V1: Containerized services on ECS Fargate. Not Lambda. Not a single binary.

Rationale:

Why not Lambda: The proxy needs persistent connections to LLM providers (connection pooling, keep-alive). Lambda cold starts (100-500ms) violate the <10ms latency budget. Lambda's 15-minute timeout conflicts with streaming responses. Lambda per-invocation pricing gets expensive at 100K+ requests/day.
Why not single binary: The proxy and the dashboard API have different scaling profiles. The proxy scales horizontally with request volume. The API scales with dashboard users (much lower). Coupling them wastes money.
Why ECS Fargate: No EC2 instances to manage. Auto-scaling built in. Brian knows ECS. Task definitions are the deployment unit. ALB handles TLS termination and health checks.

Container topology:

Service	Container	vCPU	Memory	Min Instances	Auto-Scale Trigger
Proxy	`dd0c-proxy`	0.25	512MB	2	CPU > 60% or request count
Dashboard API	`dd0c-api`	0.25	512MB	1	CPU > 70%
Async Worker	`dd0c-worker`	0.25	512MB	1	None (singleton)
Dashboard UI	S3 + CloudFront	—	—	—	CDN-managed

Build artifact: docker build produces three images from a single Rust workspace (cargo workspace). The UI is a static build deployed to S3/CloudFront.

dd0c-route/
├── Cargo.toml          (workspace root)
├── crates/
│   ├── proxy/          (the proxy engine + router brain)
│   ├── api/            (dashboard REST API)
│   ├── worker/         (digest + alerts)
│   └── shared/         (models, DB queries, cost tables)
├── ui/                 (React dashboard)
├── cli/                (dd0c-scan — separate npm package)
└── infra/              (CDK or Terraform)

Section 2: CORE COMPONENTS

2.1 Proxy Engine (Rust — `crates/proxy`)

The proxy is the hot path. Every design decision optimizes for one thing: don't add latency.

Request lifecycle:

Client Request (OpenAI-compat)
    │
    ├─ 1. TLS termination (ALB — not our problem)
    ├─ 2. Auth validation (API key lookup — Redis cache, PG fallback) ........... <1ms
    ├─ 3. Request parsing (extract model, messages, metadata) ................... <0.5ms
    ├─ 4. Tag extraction (X-DD0C-Feature, X-DD0C-Team headers) ................. <0.1ms
    ├─ 5. Router Brain evaluation (complexity + rules → target model) ........... <2ms
    ├─ 6. Provider dispatch (connection-pooled HTTPS to OpenAI/Anthropic) ....... network
    ├─ 7. Response passthrough (streaming SSE or buffered JSON) ................. passthrough
    ├─ 8. Telemetry emission (async, non-blocking — tokio::spawn) ............... 0ms on hot path
    └─ 9. Response headers injected (X-DD0C-Model, X-DD0C-Cost, X-DD0C-Saved)

Latency budget breakdown:

Stage	Budget	Implementation
Auth	<1ms	Redis `GET dd0c_key:{hash}` with 60s TTL. Cache miss → PG lookup + cache set.
Parse	<0.5ms	`serde_json` zero-copy deserialization. No full body buffering for streaming requests — parse headers + first chunk only.
Route	<2ms	In-memory rule engine. Cost tables loaded at startup, refreshed every 60s via background task. No DB call on hot path.
Dispatch	0ms overhead	`hyper` connection pool to each provider. Pre-warmed connections. HTTP/2 multiplexing.
Telemetry	0ms on hot path	`tokio::spawn` fires a telemetry event to an in-memory channel. Background task batch-inserts to TimescaleDB every 1s or 100 events (whichever comes first).
Total overhead	<5ms p99	Target is <10ms p99 with margin.

Streaming support:

The proxy MUST support Server-Sent Events (SSE) streaming — this is how most chat applications consume LLM responses. The proxy operates as a transparent stream relay:

Client sends request with "stream": true
Proxy makes routing decision based on headers + first message content (no need to buffer full body)
Proxy opens streaming connection to target provider
Each SSE chunk is forwarded to client immediately (Transfer-Encoding: chunked)
Token counting happens on-the-fly by parsing usage from the final SSE [DONE] chunk (OpenAI) or message_stop event (Anthropic)
If the provider doesn't return usage in the stream, the proxy counts tokens from accumulated chunks using tiktoken-rs

Provider abstraction:

// Simplified — the actual trait is more detailed
#[async_trait]
trait LlmProvider: Send + Sync {
    fn name(&self) -> &str;
    fn supports_model(&self, model: &str) -> bool;
    fn translate_request(&self, req: &ProxyRequest) -> ProviderRequest;
    fn translate_response(&self, resp: ProviderResponse) -> ProxyResponse;
    async fn send(&self, req: ProviderRequest) -> Result<ProviderResponse>;
    async fn send_stream(&self, req: ProviderRequest) -> Result<impl Stream<Item = SseChunk>>;
}

V1 ships two implementations: OpenAiProvider and AnthropicProvider. Adding a new provider means implementing this trait — no proxy core changes. The translate_request / translate_response methods handle the format differences (Anthropic's messages API vs OpenAI's chat/completions).

Connection pooling:

Each proxy instance maintains a hyper connection pool per provider:

Max 100 connections to api.openai.com
Max 50 connections to api.anthropic.com
Keep-alive: 90s
Connection timeout: 5s
Request timeout: 300s (LLM responses can be slow for long completions)

2.2 Router Brain (`crates/shared/router`)

The Router Brain is embedded in the proxy process — no network hop, no RPC. It's a pure function: (request, rules, cost_tables) → routing_decision.

Decision pipeline:

Input: ProxyRequest + RoutingConfig
    │
    ├─ 1. Rule matching: find first rule where all match conditions are true
    │     Match on: request tags, model requested, token count estimate, time of day
    │
    ├─ 2. Strategy execution (per matched rule):
    │     ├─ "passthrough"  → use requested model, no routing
    │     ├─ "cheapest"     → pick cheapest model from rule's model list
    │     ├─ "quality-first"→ pick highest-quality model, fallback down on error
    │     └─ "cascading"    → try cheapest first, escalate on low confidence
    │
    ├─ 3. Budget check: if org/team/feature has hit a hard budget limit → throttle to cheapest or reject
    │
    └─ 4. Output: RoutingDecision { target_model, target_provider, reason, confidence }

Complexity classifier (V1 — heuristic, not ML):

The V1 classifier is deliberately simple. It uses three signals:

Signal	Weight	Logic
Token count	30%	Short prompts (<500 tokens) with short expected outputs are likely simple tasks.
Task pattern	50%	Regex/keyword matching on system prompt: "classify", "extract", "format JSON", "yes or no" → LOW complexity. "analyze", "reason step by step", "write code" → HIGH complexity.
Model requested	20%	If the user explicitly requests a frontier model AND the task looks complex, respect the request. Don't downgrade a code generation request from GPT-4o.

Output: ComplexityScore { level: Low|Medium|High, confidence: f32 }

This gets 70-80% accuracy. Good enough for V1. The ML classifier (V2) trains on the telemetry data: for each routed request, did the user complain? Did they retry with a different model? Did the downstream application error? That feedback loop is the data flywheel.

Cost tables:

struct ModelCost {
    provider: Provider,
    model_id: String,          // "gpt-4o-2024-11-20"
    model_alias: String,       // "gpt-4o"
    input_cost_per_m: f64,     // $/million input tokens
    output_cost_per_m: f64,    // $/million output tokens
    quality_tier: QualityTier, // Frontier, Standard, Economy
    max_context: u32,          // 128000
    supports_streaming: bool,
    supports_tools: bool,
    supports_vision: bool,
    updated_at: DateTime<Utc>,
}

Cost tables are stored in PostgreSQL and loaded into memory at proxy startup. A background task polls for updates every 60 seconds. When a provider changes pricing (happens ~monthly), Brian updates one row in the DB and all proxy instances pick it up within 60s. No redeploy.

Fallback chains with circuit breakers:

Primary: gpt-4o-mini (OpenAI)
    │ ── if error rate > 10% in last 60s ──→ circuit OPEN
    │
    ▼
Fallback 1: claude-3-haiku (Anthropic)
    │ ── if error rate > 10% in last 60s ──→ circuit OPEN
    │
    ▼
Fallback 2: gpt-4o (OpenAI) ← expensive but reliable last resort
    │
    ▼
Final fallback: return 503 with X-DD0C-Fallback-Exhausted header

Circuit breaker state is stored in Redis (shared across proxy instances). State transitions: CLOSED → OPEN (on threshold breach) → HALF-OPEN (after 30s cooldown, allow 1 probe request) → CLOSED (if probe succeeds).

2.3 Analytics Pipeline

Telemetry flows from the proxy to TimescaleDB asynchronously. The proxy never blocks on analytics.

Event schema (what the proxy emits per request):

struct RequestEvent {
    id: Uuid,
    org_id: Uuid,
    api_key_id: Uuid,
    timestamp: DateTime<Utc>,
    // Request metadata
    model_requested: String,
    model_used: String,
    provider: String,
    feature_tag: Option<String>,
    team_tag: Option<String>,
    environment_tag: Option<String>,
    // Tokens & cost
    input_tokens: u32,
    output_tokens: u32,
    cost_actual: f64,        // what they paid (routed model)
    cost_original: f64,      // what they would have paid (requested model)
    cost_saved: f64,         // delta
    // Performance
    latency_ms: u32,
    ttfb_ms: u32,            // time to first byte (streaming)
    // Routing
    complexity_score: f32,
    complexity_level: String, // LOW, MEDIUM, HIGH
    routing_reason: String,
    was_cached: bool,
    was_fallback: bool,
    // Status
    status_code: u16,
    error_type: Option<String>,
}

Batch insert pipeline:

Proxy hot path                    Background task
─────────────                     ───────────────
request completes
    │
    ├─ tokio::spawn ──→ mpsc channel ──→ batch collector
                                            │
                                            ├─ accumulate events
                                            ├─ flush every 1s OR 100 events
                                            └─ COPY INTO request_events (bulk insert)

COPY (PostgreSQL bulk insert) handles 10K+ rows/second on a db.t4g.small. At 100K requests/day (~1.2 req/s average), this is trivially within capacity.

Continuous aggregates (TimescaleDB):

Pre-computed rollups for dashboard queries:

-- Hourly rollup by org, feature, model
CREATE MATERIALIZED VIEW hourly_cost_summary
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', timestamp) AS bucket,
    org_id,
    feature_tag,
    team_tag,
    model_used,
    provider,
    COUNT(*) AS request_count,
    SUM(input_tokens) AS total_input_tokens,
    SUM(output_tokens) AS total_output_tokens,
    SUM(cost_actual) AS total_cost,
    SUM(cost_saved) AS total_saved,
    AVG(latency_ms) AS avg_latency,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99_latency
FROM request_events
GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider;

Dashboard queries hit the continuous aggregate, not the raw events table. This keeps dashboard response times <200ms even with millions of rows.

Savings calculation:

cost_saved = cost_original - cost_actual

where:
  cost_original = (input_tokens × requested_model.input_cost_per_m / 1_000_000)
                + (output_tokens × requested_model.output_cost_per_m / 1_000_000)

  cost_actual   = (input_tokens × used_model.input_cost_per_m / 1_000_000)
                + (output_tokens × used_model.output_cost_per_m / 1_000_000)

This is computed at request time in the proxy (cost tables are in memory) and stored with the event. No post-hoc recalculation needed.

2.4 Dashboard API (`crates/api`)

Framework: Axum (Rust). Same tokio runtime as the proxy. Shares the crates/shared library for DB models and queries.

Why not a separate language (Node/Python)? Solo founder. One language. One build system. One deployment pipeline. The API is not performance-critical (dashboard users, not proxy traffic), but keeping it in Rust means Brian debugs one ecosystem, not two.

Key endpoint groups (detailed in Section 7):

Group	Purpose
`/api/auth/*`	GitHub OAuth flow, session management
`/api/orgs/*`	Organization CRUD, team management
`/api/dashboard/*`	Cost summaries, treemap data, time-series
`/api/requests/*`	Request inspector — paginated, filterable
`/api/routing/*`	Routing rules CRUD, cost tables
`/api/alerts/*`	Alert configuration, budget limits
`/api/keys/*`	API key management (dd0c keys + encrypted provider keys)

Auth model: JWT tokens issued after GitHub OAuth. Short-lived access tokens (15min) + refresh tokens (7 days) stored in Redis. API keys for programmatic access (prefixed dd0c_sk_).

2.5 Shadow Audit Mode (The PLG Wedge)

Shadow Audit is the product-led growth engine. It provides value before the customer routes a single request through the proxy.

Two modes:

Mode A: CLI Scan (npx dd0c-scan)

Scans a local codebase for LLM API calls
Parses model names, estimates token counts from prompt templates
Applies current pricing to estimate monthly cost
Applies dd0c routing logic to estimate savings
Outputs a report to stdout — no data leaves the machine
Captures email (optional) for follow-up

$ npx dd0c-scan ./src

  dd0c/route — Cost Scan Report
  ─────────────────────────────
  Found 14 LLM API calls across 8 files

  Current estimated monthly cost:    $4,217
  With dd0c/route routing:           $1,890
  Potential monthly savings:          $2,327 (55%)

  Top opportunities:
  ┌─────────────────────────────────────────────────────┐
  │ src/services/classify.ts    gpt-4o → gpt-4o-mini   │
  │   Est. savings: $890/mo     Confidence: HIGH        │
  │                                                     │
  │ src/services/summarize.ts   gpt-4o → claude-haiku   │
  │   Est. savings: $670/mo     Confidence: MEDIUM      │
  │                                                     │
  │ src/services/extract.ts     gpt-4o → gpt-4o-mini   │
  │   Est. savings: $440/mo     Confidence: HIGH        │
  └─────────────────────────────────────────────────────┘

  → Sign up at route.dd0c.dev to start saving

Mode B: Log Ingestion (V1.1)

Customer points dd0c at their existing LLM provider logs (OpenAI usage export CSV, or application logs with token counts)
dd0c processes the logs offline and generates a retrospective savings report
"Here's what you spent last month. Here's what you WOULD have spent."
This is the enterprise conversion tool — show the CFO real numbers from their own data

Section 3: DATA ARCHITECTURE

3.1 Database Schema

Two databases, clear separation of concerns:

PostgreSQL (RDS): Configuration, auth, organizational data. Low-write, high-read. Relational integrity matters.
TimescaleDB (RDS): Request telemetry, cost events. High-write, time-series queries. Compression and retention policies matter.

PostgreSQL — Configuration Store

-- Organizations (multi-tenant root)
CREATE TABLE organizations (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    slug            VARCHAR(63) NOT NULL UNIQUE,  -- used in URLs
    plan            VARCHAR(20) NOT NULL DEFAULT 'free',  -- free, pro, business
    stripe_customer_id VARCHAR(255),
    monthly_llm_spend_limit NUMERIC(10,2),  -- plan-based cap on routed spend
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Users (GitHub OAuth)
CREATE TABLE users (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    github_id       BIGINT NOT NULL UNIQUE,
    github_login    VARCHAR(255) NOT NULL,
    email           VARCHAR(255),
    avatar_url      VARCHAR(512),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Org membership
CREATE TABLE org_members (
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    role            VARCHAR(20) NOT NULL DEFAULT 'member',  -- owner, admin, member
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (org_id, user_id)
);

-- dd0c API keys (what customers use to auth with the proxy)
CREATE TABLE api_keys (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    key_hash        VARCHAR(64) NOT NULL UNIQUE,  -- SHA-256 of the key; raw key never stored
    key_prefix      VARCHAR(12) NOT NULL,          -- "dd0c_sk_a3f..." for display
    name            VARCHAR(255),                   -- human label: "production", "staging"
    environment     VARCHAR(50) DEFAULT 'production',
    is_active       BOOLEAN NOT NULL DEFAULT true,
    last_used_at    TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash) WHERE is_active = true;

-- Customer's LLM provider credentials (encrypted at rest)
CREATE TABLE provider_credentials (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    provider        VARCHAR(50) NOT NULL,           -- 'openai', 'anthropic'
    encrypted_key   BYTEA NOT NULL,                 -- AES-256-GCM encrypted API key
    key_suffix      VARCHAR(8),                     -- last 4 chars for display: "...a3f2"
    is_active       BOOLEAN NOT NULL DEFAULT true,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(org_id, provider)
);

-- Routing rules (ordered, first-match-wins)
CREATE TABLE routing_rules (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    priority        INTEGER NOT NULL DEFAULT 0,     -- lower = higher priority
    name            VARCHAR(255) NOT NULL,
    is_active       BOOLEAN NOT NULL DEFAULT true,
    -- Match conditions (all must be true)
    match_tags      JSONB DEFAULT '{}',             -- {"feature": "classify", "team": "backend"}
    match_models    TEXT[],                          -- models this rule applies to, NULL = all
    match_complexity VARCHAR(20),                    -- LOW, MEDIUM, HIGH, NULL = all
    -- Routing strategy
    strategy        VARCHAR(20) NOT NULL,            -- passthrough, cheapest, quality_first, cascading
    model_chain     TEXT[] NOT NULL,                  -- ordered list of models to try
    -- Budget constraints
    daily_budget    NUMERIC(10,2),                   -- hard limit per day for this rule
    -- Metadata
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_routing_rules_org ON routing_rules(org_id, priority) WHERE is_active = true;

-- Model cost table (the source of truth for pricing)
CREATE TABLE model_costs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    provider        VARCHAR(50) NOT NULL,
    model_id        VARCHAR(100) NOT NULL,           -- "gpt-4o-2024-11-20"
    model_alias     VARCHAR(100) NOT NULL,           -- "gpt-4o"
    input_cost_per_m  NUMERIC(10,4) NOT NULL,        -- $/million input tokens
    output_cost_per_m NUMERIC(10,4) NOT NULL,        -- $/million output tokens
    quality_tier    VARCHAR(20) NOT NULL,             -- frontier, standard, economy
    max_context     INTEGER NOT NULL,
    supports_streaming BOOLEAN DEFAULT true,
    supports_tools  BOOLEAN DEFAULT false,
    supports_vision BOOLEAN DEFAULT false,
    is_active       BOOLEAN NOT NULL DEFAULT true,
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(provider, model_id)
);

-- Alert configurations
CREATE TABLE alert_configs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    name            VARCHAR(255) NOT NULL,
    alert_type      VARCHAR(50) NOT NULL,            -- spend_threshold, anomaly, budget_warning
    -- Conditions
    threshold_amount NUMERIC(10,2),                  -- dollar amount trigger
    threshold_pct   NUMERIC(5,2),                    -- percentage above baseline
    scope_tags      JSONB DEFAULT '{}',              -- scope to specific feature/team
    -- Notification
    notify_slack_webhook VARCHAR(512),
    notify_email    VARCHAR(255),
    -- State
    is_active       BOOLEAN NOT NULL DEFAULT true,
    last_fired_at   TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

TimescaleDB — Telemetry Store

-- Raw request events (hypertable — partitioned by time automatically)
CREATE TABLE request_events (
    id              UUID NOT NULL DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL,
    api_key_id      UUID NOT NULL,
    timestamp       TIMESTAMPTZ NOT NULL,
    -- Request
    model_requested VARCHAR(100) NOT NULL,
    model_used      VARCHAR(100) NOT NULL,
    provider        VARCHAR(50) NOT NULL,
    feature_tag     VARCHAR(100),
    team_tag        VARCHAR(100),
    environment_tag VARCHAR(50),
    -- Tokens & cost
    input_tokens    INTEGER NOT NULL,
    output_tokens   INTEGER NOT NULL,
    cost_actual     NUMERIC(12,8) NOT NULL,
    cost_original   NUMERIC(12,8) NOT NULL,
    cost_saved      NUMERIC(12,8) NOT NULL,
    -- Performance
    latency_ms      INTEGER NOT NULL,
    ttfb_ms         INTEGER,
    -- Routing
    complexity_score REAL,
    complexity_level VARCHAR(10),
    routing_reason  VARCHAR(255),
    was_cached      BOOLEAN DEFAULT false,
    was_fallback    BOOLEAN DEFAULT false,
    -- Status
    status_code     SMALLINT NOT NULL,
    error_type      VARCHAR(100)
);

-- Convert to hypertable (TimescaleDB magic)
SELECT create_hypertable('request_events', 'timestamp',
    chunk_time_interval => INTERVAL '1 day'
);

-- Indexes for common query patterns
CREATE INDEX idx_re_org_time ON request_events(org_id, timestamp DESC);
CREATE INDEX idx_re_org_feature ON request_events(org_id, feature_tag, timestamp DESC);
CREATE INDEX idx_re_org_team ON request_events(org_id, team_tag, timestamp DESC);

-- Compression policy: compress chunks older than 7 days (90%+ space savings)
ALTER TABLE request_events SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'org_id',
    timescaledb.compress_orderby = 'timestamp DESC'
);
SELECT add_compression_policy('request_events', INTERVAL '7 days');

-- Retention policy: drop raw data older than plan retention (90 days for Pro)
-- Applied per-org via the worker, not a global policy
-- Business tier gets 1 year; continuous aggregates survive raw data deletion

-- Continuous aggregate: hourly rollup
CREATE MATERIALIZED VIEW hourly_cost_summary
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', timestamp) AS bucket,
    org_id,
    feature_tag,
    team_tag,
    model_used,
    provider,
    COUNT(*) AS request_count,
    SUM(input_tokens)::BIGINT AS total_input_tokens,
    SUM(output_tokens)::BIGINT AS total_output_tokens,
    SUM(cost_actual) AS total_cost,
    SUM(cost_saved) AS total_saved,
    AVG(latency_ms)::INTEGER AS avg_latency_ms,
    MAX(latency_ms) AS max_latency_ms
FROM request_events
GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider
WITH NO DATA;

-- Refresh policy: keep hourly aggregates up to date
SELECT add_continuous_aggregate_policy('hourly_cost_summary',
    start_offset => INTERVAL '3 hours',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);

-- Daily rollup (for long-range dashboard views)
CREATE MATERIALIZED VIEW daily_cost_summary
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 day', timestamp) AS bucket,
    org_id,
    feature_tag,
    team_tag,
    model_used,
    provider,
    COUNT(*) AS request_count,
    SUM(cost_actual) AS total_cost,
    SUM(cost_saved) AS total_saved
FROM request_events
GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider
WITH NO DATA;

SELECT add_continuous_aggregate_policy('daily_cost_summary',
    start_offset => INTERVAL '3 days',
    end_offset => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 day'
);

3.2 Data Flow Diagram

flowchart LR
    subgraph Client
        APP[Application]
    end

    subgraph Proxy["Proxy Engine"]
        AUTH[Auth Check]
        PARSE[Parse Request]
        ROUTE[Router Brain]
        DISPATCH[Provider Dispatch]
        TEL[Telemetry Emitter]
    end

    subgraph Async["Async Pipeline"]
        CHAN[mpsc Channel]
        BATCH[Batch Collector]
    end

    subgraph Storage
        REDIS[(Redis)]
        TSDB[(TimescaleDB)]
        PG[(PostgreSQL)]
    end

    subgraph Aggregation
        HOURLY[Hourly Aggregate]
        DAILY[Daily Aggregate]
    end

    subgraph Consumers
        DASH[Dashboard API]
        DIGEST[Weekly Digest Worker]
        ALERTS[Alert Evaluator]
    end

    APP -->|1. HTTPS request| AUTH
    AUTH -->|key lookup| REDIS
    REDIS -.->|cache miss| PG
    AUTH --> PARSE
    PARSE --> ROUTE
    ROUTE -->|rules from memory| ROUTE
    ROUTE --> DISPATCH
    DISPATCH -->|2. to LLM provider| LLM[OpenAI / Anthropic]
    LLM -->|3. response| DISPATCH
    DISPATCH -->|4. response to client| APP
    DISPATCH --> TEL
    TEL -->|fire & forget| CHAN
    CHAN --> BATCH
    BATCH -->|COPY bulk insert| TSDB
    TSDB --> HOURLY
    TSDB --> DAILY
    HOURLY --> DASH
    DAILY --> DASH
    HOURLY --> DIGEST
    HOURLY --> ALERTS
    ALERTS -->|Slack / Email| EXT[External Notifications]

3.3 Storage Strategy

Tier	Data	Store	Retention	Compression
Hot	Raw request events (last 7 days)	TimescaleDB — uncompressed chunks	7 days uncompressed	None — fast queries
Warm	Raw request events (8–90 days)	TimescaleDB — compressed chunks	Up to 90 days (Pro) / 365 days (Business)	TimescaleDB native compression (~90% reduction)
Cold	Continuous aggregates (hourly/daily)	TimescaleDB — materialized views	Indefinite (survives raw data deletion)	Inherently compact (aggregated)
Config	Orgs, keys, rules, users	PostgreSQL	Indefinite	N/A
Ephemeral	Auth sessions, rate limits, cache	Redis	TTL-based (15min–24hr)	N/A

Storage estimates at scale:

Scale	Requests/Day	Raw Event Size	Daily Raw Storage	Monthly (compressed)
1K	1,000	~500 bytes/row	~0.5 MB	~1.5 MB
10K	10,000	~500 bytes/row	~5 MB	~15 MB
100K	100,000	~500 bytes/row	~50 MB	~150 MB

At 100K requests/day with 90-day retention and 90% compression: ~500 MB total. A db.t4g.small with 20GB gp3 storage handles this trivially. Storage is not a concern at V1 scale.

3.4 Privacy & Data Handling

This is the section that matters most for trust. The proxy sits in the middle of every LLM request. Customers need to know exactly what we see, store, and can access.

What the proxy sees (in memory, during request processing):

Data	Seen	Stored	Purpose
Full prompt content (system + user messages)	✅ Yes — in memory during routing	❌ No — never persisted	Complexity classification reads the system prompt to detect task patterns
Full response content	✅ Yes — streamed through	❌ No — never persisted	Token counting on stream completion
Model name (requested + used)	✅ Yes	✅ Yes	Core telemetry
Token counts (input + output)	✅ Yes	✅ Yes	Cost calculation
Customer's LLM API keys	✅ Yes — decrypted in memory for provider dispatch	✅ Encrypted at rest (AES-256-GCM)	Forwarding requests to providers
dd0c API key	✅ Yes — hash compared	✅ Hash only (SHA-256)	Authentication
Request tags (feature, team)	✅ Yes	✅ Yes	Attribution
IP address	✅ Yes	❌ No	Rate limiting only
Latency, status code	✅ Yes	✅ Yes	Performance telemetry

Critical privacy guarantees:

Prompt content is NEVER stored. Not in the database. Not in logs. Not in error reports. The proxy processes prompts in memory and discards them. This is the #1 trust requirement.
Customer LLM API keys are encrypted at rest using AES-256-GCM with a per-org encryption key derived from AWS KMS. The proxy decrypts them in memory only for the duration of the provider request.
Telemetry contains metadata, not content. We store: "this request used 1,247 input tokens on gpt-4o-mini and cost $0.0002." We do NOT store: "the user asked about quarterly revenue projections for Q3."
No cross-org data leakage. Every query is scoped by org_id. TimescaleDB chunks are segmented by org_id for compression. There is no query path that returns data from multiple orgs.

V1.5 enhancement — client-side classification:

For customers who can't accept prompt content transiting through a third-party proxy (Jordan's VPC requirement), V1.5 ships a lightweight WASM classifier that runs client-side. The proxy receives only the routing hint (complexity: LOW) and the encrypted request body, which it forwards to the provider without inspection. Telemetry still flows to the dashboard, but prompt content never leaves the customer's infrastructure.

Section 4: INFRASTRUCTURE

4.1 AWS Architecture

Single region: us-east-1 (Virginia). Lowest latency to OpenAI and Anthropic API endpoints (both hosted in US East). Multi-region is a V2 concern — the beachhead is US startups.

graph TB
    subgraph Internet
        CLIENT[Client Apps]
        USER[Dashboard Users]
    end

    subgraph AWS["AWS us-east-1"]
        subgraph Edge["Edge Layer"]
            CF[CloudFront CDN<br/>Dashboard UI + API cache]
            ALB[Application Load Balancer<br/>TLS termination, path routing]
        end

        subgraph Compute["ECS Fargate Cluster"]
            SVC_PROXY[Service: dd0c-proxy<br/>2–10 tasks, 0.25 vCPU / 512MB]
            SVC_API[Service: dd0c-api<br/>1–3 tasks, 0.25 vCPU / 512MB]
            SVC_WORKER[Service: dd0c-worker<br/>1 task, 0.25 vCPU / 512MB]
        end

        subgraph Data["Data Layer (Private Subnets)"]
            RDS_PG[RDS PostgreSQL 16<br/>db.t4g.micro, 20GB gp3<br/>Config Store]
            RDS_TS[RDS PostgreSQL 16 + TimescaleDB<br/>db.t4g.small, 50GB gp3<br/>Telemetry Store]
            ELASTICACHE[ElastiCache Redis 7<br/>cache.t4g.micro<br/>Cache + Rate Limits]
        end

        subgraph Security["Security"]
            KMS[KMS<br/>Encryption keys]
            SM[Secrets Manager<br/>DB creds, signing keys]
            WAF[WAF v2<br/>Rate limiting, geo-blocking]
        end

        subgraph Ops["Operations"]
            CW[CloudWatch<br/>Logs + Metrics + Alarms]
            ECR[ECR<br/>Container Registry]
            S3_UI[S3 Bucket<br/>Dashboard static assets]
            SES_SVC[SES<br/>Digest emails]
        end
    end

    CLIENT -->|HTTPS :443| ALB
    USER -->|HTTPS| CF
    CF --> S3_UI
    CF --> ALB
    ALB -->|/v1/*| SVC_PROXY
    ALB -->|/api/*| SVC_API
    SVC_PROXY --> RDS_TS
    SVC_PROXY --> ELASTICACHE
    SVC_PROXY --> KMS
    SVC_API --> RDS_PG
    SVC_API --> RDS_TS
    SVC_API --> ELASTICACHE
    SVC_WORKER --> RDS_TS
    SVC_WORKER --> SES_SVC
    SVC_WORKER --> SM

Network topology:

VPC with 2 AZs (cost-conscious — 3 AZs is overkill for V1)
Public subnets: ALB only
Private subnets: ECS tasks, RDS, ElastiCache
NAT Gateway: 1 (not 2 — single NAT saves ~$32/month; acceptable risk for V1)
VPC endpoints for ECR, S3, CloudWatch, KMS (avoid NAT charges for AWS service traffic)

ALB routing rules:

Path Pattern	Target Group	Notes
`/v1/chat/completions`	dd0c-proxy	OpenAI-compatible proxy endpoint
`/v1/completions`	dd0c-proxy	Legacy completions
`/v1/embeddings`	dd0c-proxy	Embedding passthrough (no routing — just telemetry)
`/api/*`	dd0c-api	Dashboard REST API
`/*` (default)	404 fixed response	Reject unknown paths

Dashboard UI is served from S3 via CloudFront — never hits the ALB.

4.2 Cost Estimate

Real numbers. No hand-waving.

At 1K requests/day (~$65/month infrastructure)

Service	Spec	Monthly Cost
ECS Fargate (proxy)	2 tasks × 0.25 vCPU × 512MB × 730hrs	$14.60
ECS Fargate (api)	1 task × 0.25 vCPU × 512MB × 730hrs	$7.30
ECS Fargate (worker)	1 task × 0.25 vCPU × 512MB × 730hrs	$7.30
RDS PostgreSQL	db.t4g.micro, 20GB gp3, single-AZ	$12.41
RDS TimescaleDB	db.t4g.small, 50GB gp3, single-AZ	$24.82
ElastiCache Redis	cache.t4g.micro, single-AZ	$8.35
ALB	1 ALB + minimal LCUs	$16.20
NAT Gateway	1 gateway + ~5GB data	$33.48
CloudFront	<1GB transfer	$0.00 (free tier)
S3	<1GB static assets	$0.02
SES	<1000 emails/month	$0.10
KMS	1 key + ~10K requests	$1.03
CloudWatch	Logs + basic metrics	$3.00
Total		~$129/month

Optimization note: The NAT Gateway at $33/month is the biggest single line item. Alternative: replace with a NAT instance on a t4g.nano ($3/month) or use VPC endpoints aggressively to eliminate NAT for AWS service traffic. With VPC endpoints for ECR/S3/CW/KMS, the only NAT traffic is outbound to LLM providers — which could go through a public subnet proxy task instead. Realistic optimized cost: ~$95/month.

At 10K requests/day (~$155/month infrastructure)

Change from 1K	Impact
Proxy scales to 3-4 tasks	+$15-22
TimescaleDB storage grows to ~15MB/month compressed	Negligible
ALB LCU usage increases	+$5
SES volume increases (more digest recipients)	+$1
Total	~$155/month

At 100K requests/day (~$320/month infrastructure)

Change from 10K	Impact
Proxy scales to 6-10 tasks	+$45-75
API scales to 2-3 tasks	+$7-15
TimescaleDB upgrade to db.t4g.medium (more IOPS)	+$25
ElastiCache upgrade to cache.t4g.small	+$8
ALB LCU usage	+$15
NAT data transfer (~50GB/month)	+$25
Total	~$320/month

Gross margin at each scale:

Scale	Requests/Day	Est. Customers	Est. MRR	Infra Cost	Gross Margin
1K	1,000	5-10	$375-750	$129	66-83%
10K	10,000	50-100	$3,750-7,500	$155	96-98%
100K	100,000	200-500	$15,000-37,500	$320	98-99%

The unit economics are absurd. Near-zero marginal cost per customer. This is the beauty of a proxy — it adds almost no compute to the request path.

4.3 Scaling Strategy

Proxy horizontal scaling (the only thing that needs to scale):

ECS Service Auto Scaling with two policies:

Target tracking: CPU utilization target 60%. Scale out when sustained above 60%, scale in when below 40%.
Step scaling: Request count per target (from ALB). Scale out aggressively at >500 req/min/task.

Min tasks: 2 (availability). Max tasks: 20 (cost cap — revisit at $10K MRR).

Database scaling:

TimescaleDB is the bottleneck candidate. Scaling path:

V1 (1K-10K req/day): db.t4g.small, single-AZ. Continuous aggregates handle dashboard query load.
V1.5 (10K-100K req/day): db.t4g.medium, add a read replica for dashboard API queries. Proxy writes to primary, API reads from replica.
V2 (100K+ req/day): If TimescaleDB hits limits, evaluate:
- Upgrade to db.r6g.large (more memory for hot data)
- Or migrate telemetry to ClickHouse (better for high-cardinality analytics at scale)
- Decision point: when continuous aggregate refresh takes >5 minutes

PostgreSQL (config store) stays on db.t4g.micro indefinitely. Config data is tiny.

Redis scaling:

cache.t4g.micro handles ~12K ops/sec. At 100K requests/day (~1.2 req/sec average, ~10 req/sec peak), Redis is at <0.1% capacity. Redis is not a scaling concern until 1M+ requests/day.

4.4 CI/CD Pipeline

GitHub Actions. No Jenkins. No CodePipeline. Keep it simple.

# .github/workflows/deploy.yml (simplified)
name: Build & Deploy
on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo test --workspace
      - run: cargo clippy --workspace -- -D warnings
      - run: cargo fmt --check

  build-and-push:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE }}
      - uses: aws-actions/amazon-ecr-login@v2
      - run: |
          docker build -t dd0c-proxy -f crates/proxy/Dockerfile .
          docker build -t dd0c-api -f crates/api/Dockerfile .
          docker build -t dd0c-worker -f crates/worker/Dockerfile .
          # Tag and push to ECR
          for svc in proxy api worker; do
            docker tag dd0c-$svc $ECR_REGISTRY/dd0c-$svc:$GITHUB_SHA
            docker push $ECR_REGISTRY/dd0c-$svc:$GITHUB_SHA
          done

  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE }}
      - run: |
          # Update ECS services with new image
          for svc in proxy api worker; do
            aws ecs update-service \
              --cluster dd0c-prod \
              --service dd0c-$svc \
              --force-new-deployment
          done

  deploy-ui:
    needs: test
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ui
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci && npm run build
      - run: |
          aws s3 sync dist/ s3://dd0c-dashboard-ui/ --delete
          aws cloudfront create-invalidation --distribution-id $CF_DIST_ID --paths "/*"

Deployment strategy: Rolling update via ECS (default). No blue/green for V1 — adds complexity. The proxy is stateless; rolling updates cause zero downtime. If a bad deploy ships, aws ecs update-service --force-new-deployment with the previous image SHA rolls back in <2 minutes.

Database migrations: sqlx migrate run executed as a pre-deploy step in the API container's entrypoint. Migrations are forward-only, backward-compatible (add columns, don't rename/drop). This means the old code can run against the new schema during rolling deploys.

4.5 Monitoring & Alerting

Eat your own dogfood: dd0c/route monitors its own LLM provider calls through itself. The proxy routes its own internal LLM calls (if any future features use LLMs) through the same routing engine.

CloudWatch metrics (custom + built-in):

Metric	Source	Alarm Threshold
`dd0c.proxy.request_count`	Proxy (StatsD → CW)	N/A (dashboard only)
`dd0c.proxy.latency_p99`	Proxy	>50ms for 5 minutes
`dd0c.proxy.error_rate`	Proxy	>5% for 3 minutes
`dd0c.proxy.provider_error_rate`	Proxy (per provider)	>10% for 2 minutes
`dd0c.proxy.circuit_breaker_open`	Proxy	Any open → alert
`dd0c.telemetry.batch_lag`	Proxy	>1000 events queued
ECS CPU/Memory	CloudWatch built-in	CPU >80% sustained 5min
RDS CPU/Connections/IOPS	CloudWatch built-in	CPU >70%, connections >80% of max
ALB 5xx rate	CloudWatch built-in	>1% for 3 minutes
ALB target response time	CloudWatch built-in	p99 >200ms for 5 minutes

Alerting channels:

Severity	Channel	Response
P0 (proxy down, >5% error rate)	PagerDuty → phone call	Wake up Brian
P1 (high latency, circuit breaker, DB issues)	Slack #dd0c-alerts	Check within 1 hour
P2 (capacity warnings, cost anomalies)	Email digest	Review next morning

Structured logging:

All services emit JSON logs to CloudWatch Logs:

{
  "timestamp": "2026-03-15T14:22:33.456Z",
  "level": "info",
  "service": "proxy",
  "trace_id": "abc123",
  "org_id": "org_456",
  "event": "request_routed",
  "model_requested": "gpt-4o",
  "model_used": "gpt-4o-mini",
  "latency_ms": 3,
  "cost_saved": 0.0018
}

No prompt content in logs. Ever. The tracing crate with custom Layer implementation strips any field named prompt, messages, content, or system before emission. Defense in depth — even if a developer accidentally logs request content, the layer redacts it.

Uptime monitoring: External health check via UptimeRobot (free tier, 5-minute intervals) hitting GET /health on the ALB. If the proxy is unreachable from the internet, Brian gets a text.

Solo founder operational reality:

Brian can realistically monitor:

1 Slack channel (#dd0c-alerts) — glance at it 3x/day
1 PagerDuty rotation — himself, 24/7 (this is the solo founder life)
1 CloudWatch dashboard — check it during weekly review
UptimeRobot — set it and forget it

Everything else must be automated. No manual log tailing. No daily metric reviews. Alerts fire when something is wrong. Silence means everything is fine.

Section 5: SECURITY

5.1 API Key Management — The Trust Problem

This is the #1 adoption barrier. Customers must give dd0c/route their OpenAI/Anthropic API keys so the proxy can forward requests. If they don't trust us with their keys, the product is dead.

How customer LLM API keys are handled:

Customer enters API key in dashboard
    │
    ├─ 1. Key transmitted over TLS 1.3 (HTTPS only, HSTS enforced)
    ├─ 2. API server receives key in memory
    ├─ 3. Key encrypted with AES-256-GCM using org-specific DEK
    │     DEK (Data Encryption Key) is itself encrypted by AWS KMS CMK
    │     Envelope encryption: KMS never sees the API key
    ├─ 4. Encrypted key stored in PostgreSQL (provider_credentials.encrypted_key)
    ├─ 5. Plaintext key zeroed from memory (Rust: zeroize crate)
    │
    └─ At request time:
         ├─ Proxy fetches encrypted key from PG (cached in Redis, encrypted, 5min TTL)
         ├─ Decrypts with DEK (DEK cached in proxy memory, rotated hourly)
         ├─ Uses plaintext key for provider API call
         └─ Plaintext key held only for request duration, then dropped

Key security properties:

Property	Implementation
Encryption at rest	AES-256-GCM, envelope encryption via AWS KMS
Encryption in transit	TLS 1.3 (ALB terminates, internal traffic in VPC)
Key isolation	Per-org DEK — compromising one org's DEK doesn't expose others
Key rotation	KMS CMK auto-rotates annually. DEKs can be rotated per-org on demand.
Access logging	Every KMS `Decrypt` call logged in CloudTrail. Anomalous decryption patterns trigger alerts.
Zero-knowledge option (V1.5)	Customer runs proxy in their VPC. Keys never leave their infrastructure. dd0c SaaS only receives telemetry.
Key revocation	Customer can delete their provider credentials from the dashboard instantly. Cached copies expire within 5 minutes (Redis TTL).

Trust mitigation strategy (layered):

Transparency: Open-source the proxy core. Customers can read every line of code that touches their API keys. "Don't trust us — read the code."
Minimization: The proxy only needs the key for the duration of the API call. It doesn't store it in logs, doesn't include it in telemetry, doesn't transmit it anywhere except to the LLM provider.
Bring-your-own-proxy (V1.5): For customers who won't send keys to a third party, ship a Docker image they run in their VPC. The proxy connects outbound to dd0c SaaS for config and sends telemetry. Keys never leave the customer's network.
Audit trail: Every API key usage is logged (not the key itself — the key_id and timestamp). Customers can see when their keys were last used in the dashboard.
Insurance: If a key is compromised through dd0c, we'll cover the cost of any unauthorized API usage. (This is a marketing commitment, not a legal one — but it signals confidence.)

5.2 Authentication & Authorization Model

Three auth contexts:

Context	Method	Token Type	Lifetime
Dashboard UI	GitHub OAuth → JWT	Access token (short) + Refresh token	15min / 7 days
Proxy API	dd0c API key	Bearer token (hashed, never expires unless revoked)	Until revoked
Dashboard API (programmatic)	dd0c API key	Same as proxy	Until revoked

GitHub OAuth flow:

Browser → /api/auth/github → redirect to GitHub
GitHub → /api/auth/callback?code=xxx
    │
    ├─ Exchange code for GitHub access token
    ├─ Fetch GitHub user profile (id, login, email, avatar)
    ├─ Upsert user in PostgreSQL
    ├─ Issue JWT access token (15min, signed with RS256)
    ├─ Issue refresh token (7 days, stored in Redis, httpOnly cookie)
    └─ Redirect to dashboard with access token

Authorization model (V1 — simple RBAC):

Role	Permissions
Owner	Everything. Billing. Delete org. Manage members.
Admin	Manage routing rules, API keys, alerts. View all data. Cannot delete org or manage billing.
Member	View dashboard, view request inspector. Cannot modify config.

V1 ships with Owner + Member only. Admin role added when the first customer asks for it.

API key format:

dd0c_sk_live_a3f2b8c9d4e5f6a7b8c9d4e5f6a7b8c9

Prefix: dd0c_sk_
Environment: live_ or test_
Random: 32 hex chars (128 bits of entropy)

The full key is shown once at creation. Only the SHA-256 hash is stored. The prefix (dd0c_sk_live_a3f2...) is stored for display in the dashboard ("Which key is this?").

5.3 Data Encryption

Layer	Method	Key Management
In transit (client → ALB)	TLS 1.3 via ACM certificate	AWS Certificate Manager auto-renewal
In transit (ALB → ECS)	TLS 1.2+ (ALB → target group HTTPS)	Self-signed certs in containers, rotated on deploy
In transit (ECS → RDS)	TLS 1.2 (RDS `require_ssl`)	RDS CA certificate
In transit (ECS → ElastiCache)	TLS 1.2 (in-transit encryption enabled)	ElastiCache managed
At rest (RDS)	AES-256 via RDS encryption	AWS KMS (RDS default key)
At rest (provider API keys)	AES-256-GCM application-level	AWS KMS CMK (dd0c-managed)
At rest (S3)	AES-256 (SSE-S3)	AWS managed
At rest (CloudWatch Logs)	AES-256	AWS KMS (CW default key)

5.4 SOC 2 Readiness

SOC 2 Type II is a V3 milestone (month 7-12). But V1 architecture decisions should not create SOC 2 blockers.

V1 decisions that are SOC 2 forward-compatible:

SOC 2 Requirement	V1 Implementation
Access control	GitHub OAuth + RBAC. No shared accounts.
Audit logging	CloudTrail for AWS API calls. Application-level audit log for config changes (who changed what routing rule, when).
Encryption	All data encrypted in transit and at rest (see 5.3).
Change management	GitHub PRs required for main branch. CI/CD pipeline enforces tests.
Incident response	PagerDuty alerting. Documented runbook (even if it's just a README).
Vendor management	Only AWS + GitHub + Stripe as vendors. All SOC 2 certified themselves.
Data retention	Configurable per plan. Deletion is automated via TimescaleDB retention policies.
Availability	Multi-AZ ALB. ECS tasks across 2 AZs. RDS single-AZ (upgrade to multi-AZ for SOC 2).

SOC 2 blockers to address before certification:

RDS must be multi-AZ (adds ~$25/month per instance)
Formal security policy documentation
Background checks for employees (just Brian for now — easy)
Penetration test (budget ~$5K)
Auditor engagement (~$20-30K for Type II)

Total SOC 2 cost: ~$30-40K. Only pursue at $10K+ MRR when enterprise customers demand it.

5.5 Trust Barrier Mitigation (The #1 Risk)

The product brief identifies trust as the highest-severity risk. Here's the technical architecture's answer:

Phase 1 (V1 launch): Transparency + Beachhead

Open-source the proxy core on GitHub. MIT license.
Publish a security whitepaper: "How dd0c/route handles your API keys" — detailed, technical, honest.
Target startups without compliance teams. They evaluate tools by reading code, not requesting SOC 2 reports.
Shadow Audit mode proves value without requiring key trust. Convert skeptics with their own savings data.

Phase 2 (V1.5, month 4-5): Self-Hosted Data Plane

Ship dd0c-proxy as a Docker image customers run in their own VPC/infrastructure.
The proxy connects outbound to api.route.dd0c.dev for:
- Routing rule configuration (pull)
- Telemetry data (push — metadata only, no prompt content)
- Cost table updates (pull)
Customer's LLM API keys stay in their infrastructure. Period.
dd0c SaaS provides the dashboard, digest, and analytics. The proxy is the customer's.

Phase 3 (V2+): Compliance Certifications

SOC 2 Type II
GDPR DPA (Data Processing Agreement)
Optional: HIPAA BAA for healthcare vertical

The architecture is designed so that Phase 2 is a deployment topology change, not a rewrite. The proxy binary is the same — it just reads config from a different source (local file vs. API) and sends telemetry to a different endpoint (local collector vs. SaaS).

Section 6: MVP SCOPE

6.1 What Ships in V1 (4-6 Week Build)

The V1 is ruthlessly scoped. Every feature must answer: "Does this help a customer save money on LLM calls within 5 minutes of signup?"

Week 1-2: Proxy Core

Deliverable	Details	Done When
OpenAI-compatible proxy	`POST /v1/chat/completions` with streaming support	A client can swap `api.openai.com` for `proxy.route.dd0c.dev` and get identical responses
Auth layer	dd0c API key validation (Redis-cached hash lookup)	Unauthorized requests get 401. Valid keys route correctly.
Provider dispatch	OpenAI + Anthropic providers with connection pooling	Requests forward to the correct provider with <5ms overhead
Telemetry emission	Async batch insert to TimescaleDB	Every request produces a `request_event` row within 2 seconds
Health endpoint	`GET /health` returns 200 with version + uptime	ALB health checks pass

Week 2-3: Router Brain + Cost Engine

Deliverable	Details	Done When
Heuristic complexity classifier	Token count + task pattern + model hint → LOW/MEDIUM/HIGH	Classifier runs in <2ms and agrees with human judgment ~75% of the time on a test set of 100 prompts
Rule engine	First-match rule evaluation with passthrough/cheapest/cascading strategies	A routing rule like "if feature=classify, use cheapest from [gpt-4o-mini, claude-haiku]" works
Cost tables	Seeded with current OpenAI + Anthropic pricing	`model_costs` table populated, proxy loads into memory
Fallback chains	Circuit breaker per provider/model	If gpt-4o-mini returns 5xx, request automatically retries on claude-haiku
Response headers	`X-DD0C-Model`, `X-DD0C-Cost`, `X-DD0C-Saved` on every response	Client can programmatically read routing decisions

Week 3-4: Dashboard API + UI

Deliverable	Details	Done When
GitHub OAuth	Sign up / sign in with GitHub	New user can create an org and get an API key in <60 seconds
Cost overview page	Real-time cost ticker, 7/30-day spend chart, savings counter	Marcus sees "You saved $X this week" on the dashboard
Cost treemap	Spend breakdown by feature tag, team tag, model	Marcus can identify which feature is the most expensive
Request inspector	Paginated table of recent requests with model, cost, routing decision	Marcus can drill into individual requests to understand routing
Routing config UI	CRUD for routing rules with drag-to-reorder priority	Marcus can create a rule "route all classify requests to gpt-4o-mini"
API key management	Create/revoke dd0c API keys, add provider credentials	Marcus can set up his org without touching a CLI

Week 4-5: Retention Mechanics

Deliverable	Details	Done When
Weekly savings digest	Monday 9am email: "Last week you saved $X. Breakdown by feature/model."	Email renders correctly in Gmail/Outlook. Unsubscribe works.
Budget alerts	Threshold-based: "Alert me when daily spend exceeds $100"	Slack webhook fires when threshold is crossed
Shadow Audit CLI	`npx dd0c-scan ./src` scans codebase for LLM calls and estimates savings	CLI runs on a sample Node.js project and produces a plausible savings report

Week 5-6: Hardening + Launch Prep

Deliverable	Details	Done When
Rate limiting	Per-key rate limits (1000 req/min default) via Redis	Burst traffic doesn't take down the proxy
Error handling	Graceful degradation: if TimescaleDB is down, proxy still routes (telemetry dropped)	Proxy availability is independent of analytics availability
Monitoring	CloudWatch dashboards, PagerDuty alerts for P0/P1	Brian gets woken up if the proxy is down
Documentation	API docs (OpenAPI spec), quickstart guide, "How we handle your keys" page	A developer can integrate in <5 minutes by reading the docs
Landing page	route.dd0c.dev — value prop, pricing, "Try the CLI" CTA	Visitors understand what dd0c/route does in 10 seconds
Infrastructure	CDK/Terraform for the full AWS stack, CI/CD pipeline	`git push main` deploys to production

6.2 What's Explicitly Deferred to V2

Feature	Why Deferred	V2 Timeline
ML-based complexity classifier	Needs training data from V1 telemetry. Heuristic is good enough to prove the value prop.	Month 3-4
Google/Gemini provider	Two providers cover 80%+ of the market. Adding Gemini is a weekend of work once the provider trait is proven.	Month 2-3
Self-hosted proxy (BYOP)	Critical for enterprise trust, but V1 targets startups who are less paranoid.	Month 4-5
WASM client-side classifier	Requires the self-hosted proxy architecture.	Month 5-6
GitHub Action (PR cost comments)	Cool PLG feature but not core. Needs the CLI to be stable first.	Month 3-4
VS Code extension	Same — derivative of the CLI.	Month 4-5
Log ingestion (Mode B shadow audit)	Requires building a log parser for multiple formats. CLI scan is simpler and ships first.	Month 2-3
Multi-region deployment	us-east-1 covers the beachhead. EU region when EU customers appear.	Month 6+
SSO / SAML	Enterprise feature. GitHub OAuth is fine for startups.	Month 6+ (with SOC 2)
Prompt caching (semantic dedup)	Technically complex (embedding similarity). Exact-match cache in Redis is V1. Semantic cache is V2.	Month 4-5
Carbon tracking	Interesting differentiator but not a V1 priority.	Month 6+
Cascading try-cheap-first with quality feedback	Needs the ML classifier to evaluate response quality. V1 cascading is based on error codes only.	Month 4-5
Stripe billing integration	V1 is free tier only (up to 10K requests/day). Billing ships when there are paying customers.	Month 2-3
Team/seat management	V1 orgs have one owner. Multi-user orgs are a V1.5 feature.	Month 2-3

6.3 Technical Debt Budget

V1 will accumulate debt. That's fine. Here's what we're consciously accepting:

Debt Item	Severity	Why It's Acceptable	Payoff Trigger
Single-AZ RDS instances	Medium	Saves ~$50/month. Acceptable downtime risk for <100 customers.	First enterprise customer or SOC 2 prep
No database connection pooling (PgBouncer)	Low	Direct connections are fine at <50 concurrent proxy tasks.	>50 proxy tasks or connection count warnings
Hardcoded cost tables (seeded, not auto-updated)	Low	Model pricing changes monthly. Manual DB update is fine at V1 scale.	When Brian forgets to update and a customer notices
No request body validation beyond auth	Medium	The proxy trusts that the client sends valid OpenAI-format requests. Invalid requests get a provider error, not a dd0c error.	When support tickets about confusing errors pile up
No end-to-end encryption tests	Medium	Unit tests + integration tests cover the critical paths. E2E is expensive to maintain for a solo founder.	First hire or first security incident
Monolithic continuous aggregate	Low	One hourly aggregate serves all dashboard queries. May need feature-specific aggregates at scale.	Dashboard queries exceed 500ms
No graceful shutdown / drain	Medium	ECS rolling update kills tasks. In-flight requests may fail. At low traffic, this is rare.	When a customer reports a failed request during deploy

Total acceptable debt: ~2 weeks of cleanup work. Schedule a "debt sprint" at month 3 (after V1 launch stabilizes).

6.4 Solo Founder Operational Considerations

Brian is one person. The architecture must respect that constraint.

What one person can realistically operate:

Responsibility	Time Budget	Automation
Incident response	<2 hrs/week (target: 0)	PagerDuty + automated restarts (ECS health checks)
Deploys	1 deploy/day, <5 min each	Fully automated CI/CD. `git push` = deploy.
Database maintenance	<1 hr/week	RDS automated backups, TimescaleDB automated compression/retention
Cost monitoring	15 min/week	AWS Budgets alert at $150, $200, $300 thresholds
Customer support	2-4 hrs/week (at <100 customers)	GitHub Issues + email. No live chat. No phone.
Security patches	1 hr/week	Dependabot for Rust crates + npm. Automated PR creation.
Feature development	20-30 hrs/week	Everything else is automated so Brian can code

Things Brian should NOT do manually:

❌ SSH into servers (there are no servers — Fargate)
❌ Run database queries to answer customer questions (build it into the dashboard)
❌ Manually rotate secrets (KMS auto-rotation + Secrets Manager)
❌ Monitor logs in real-time (alerts handle this)
❌ Manually scale infrastructure (auto-scaling handles this)
❌ Process refunds or billing changes (Stripe self-serve portal)

On-call reality: Brian is on-call 24/7. The architecture minimizes pages by:

Making the proxy stateless and self-healing (ECS restarts failed tasks)
Making telemetry failure non-fatal (proxy works without TimescaleDB)
Using circuit breakers to handle provider outages automatically
Setting alert thresholds high enough to avoid noise, low enough to catch real problems

If Brian gets paged more than twice a week, something is architecturally wrong and needs fixing — not more monitoring.

Section 7: API DESIGN

7.1 OpenAI-Compatible Proxy Endpoint

The proxy endpoint is a drop-in replacement for api.openai.com. Customers change one environment variable and everything works.

# Before
OPENAI_API_BASE=https://api.openai.com/v1

# After
OPENAI_API_BASE=https://proxy.route.dd0c.dev/v1

Supported endpoints (V1):

`POST /v1/chat/completions`

The primary endpoint. Handles both streaming and non-streaming requests.

Request (identical to OpenAI):

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Classify this support ticket: ..."}
  ],
  "temperature": 0.3,
  "max_tokens": 100,
  "stream": true
}

dd0c-specific request headers (all optional):

Header	Type	Description
`Authorization`	`Bearer dd0c_sk_live_...`	Required. dd0c API key.
`X-DD0C-Feature`	string	Tag this request with a feature name for cost attribution. E.g., `classify`, `summarize`, `chat`.
`X-DD0C-Team`	string	Tag with team name. E.g., `backend`, `ml-team`, `support`.
`X-DD0C-Environment`	string	`production`, `staging`, `development`. Defaults to key's environment.
`X-DD0C-Routing`	`auto` \| `passthrough`	Override routing. `passthrough` = use the requested model, no routing. Default: `auto`.
`X-DD0C-Budget-Id`	string	Associate with a specific budget for limit enforcement.

Response (identical to OpenAI, plus dd0c headers):

HTTP/1.1 200 OK
Content-Type: application/json
X-DD0C-Request-Id: req_a1b2c3d4e5f6
X-DD0C-Model-Requested: gpt-4o
X-DD0C-Model-Used: gpt-4o-mini
X-DD0C-Provider: openai
X-DD0C-Cost: 0.000150
X-DD0C-Cost-Without-Routing: 0.002500
X-DD0C-Saved: 0.002350
X-DD0C-Complexity: LOW
X-DD0C-Complexity-Confidence: 0.92
X-DD0C-Latency-Overhead-Ms: 3

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This is a billing inquiry."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 8,
    "total_tokens": 50
  }
}

The response body is untouched — it's exactly what the LLM provider returned. dd0c metadata lives exclusively in response headers. This means existing client code that parses the response body works without modification.

Streaming response:

SSE stream is passed through transparently. dd0c headers are on the initial HTTP response. The final data: [DONE] chunk is forwarded as-is.

`POST /v1/completions`

Legacy completions endpoint. Same routing logic applies. Included for backward compatibility with older OpenAI SDK versions.

`POST /v1/embeddings`

Passthrough only — no routing (embedding models aren't interchangeable like chat models). Telemetry is still captured for cost attribution.

`GET /v1/models`

Returns the union of models available across all configured providers for this org, enriched with dd0c cost data:

{
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai",
      "dd0c": {
        "input_cost_per_m": 2.50,
        "output_cost_per_m": 10.00,
        "quality_tier": "frontier",
        "routing_eligible": true
      }
    },
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "owned_by": "openai",
      "dd0c": {
        "input_cost_per_m": 0.15,
        "output_cost_per_m": 0.60,
        "quality_tier": "economy",
        "routing_eligible": true
      }
    }
  ]
}

`GET /health`

{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 86400,
  "providers": {
    "openai": {"status": "healthy", "latency_ms": 45},
    "anthropic": {"status": "healthy", "latency_ms": 52}
  }
}

Error responses:

dd0c errors use standard OpenAI error format so client SDKs handle them correctly:

{
  "error": {
    "message": "Invalid dd0c API key",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "dd0c_code": "DD0C_AUTH_001"
  }
}

HTTP Status	dd0c_code	Meaning
401	DD0C_AUTH_001	Invalid or revoked API key
403	DD0C_AUTH_002	API key doesn't have permission for this org
429	DD0C_RATE_001	dd0c rate limit exceeded (not provider rate limit)
429	DD0C_BUDGET_001	Budget limit reached for this key/feature/team
502	DD0C_PROVIDER_001	All providers in fallback chain returned errors
503	DD0C_PROXY_001	Proxy is overloaded or shutting down

Provider errors (OpenAI 429, Anthropic 529, etc.) are passed through with original status codes and bodies, plus an X-DD0C-Provider-Error: true header so clients can distinguish dd0c errors from provider errors.

7.2 Shadow Audit API

The Shadow Audit CLI (npx dd0c-scan) is primarily offline, but it calls two API endpoints:

`GET /api/v1/pricing/current`

Public endpoint (no auth required). Returns current model pricing for the CLI's savings calculations.

{
  "updated_at": "2026-03-01T00:00:00Z",
  "models": [
    {
      "provider": "openai",
      "model": "gpt-4o",
      "input_cost_per_m": 2.50,
      "output_cost_per_m": 10.00,
      "quality_tier": "frontier"
    },
    {
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_cost_per_m": 0.15,
      "output_cost_per_m": 0.60,
      "quality_tier": "economy"
    }
  ]
}

`POST /api/v1/scan/report` (optional, with user consent)

If the user opts in (--share-report), the CLI sends an anonymized scan summary for lead generation:

{
  "email": "marcus@example.com",
  "scan_summary": {
    "total_llm_calls_found": 14,
    "models_detected": ["gpt-4o", "gpt-4"],
    "estimated_monthly_cost": 4217.00,
    "estimated_monthly_savings": 2327.00,
    "savings_percentage": 55.2,
    "language": "typescript",
    "framework": "express"
  }
}

No source code, no prompt content, no file paths. Just aggregate numbers for the sales funnel.

7.3 Dashboard API Endpoints

All dashboard endpoints require authentication (JWT or dd0c API key). All responses are JSON. All list endpoints support pagination via ?cursor=xxx&limit=50.

Auth

Method	Path	Description
`GET`	`/api/auth/github`	Initiate GitHub OAuth flow
`GET`	`/api/auth/callback`	GitHub OAuth callback
`POST`	`/api/auth/refresh`	Refresh access token
`POST`	`/api/auth/logout`	Invalidate refresh token

Organizations

Method	Path	Description
`POST`	`/api/orgs`	Create organization
`GET`	`/api/orgs/:org_id`	Get org details
`PATCH`	`/api/orgs/:org_id`	Update org settings
`GET`	`/api/orgs/:org_id/members`	List members
`POST`	`/api/orgs/:org_id/members`	Invite member (V1.5)

API Keys

Method	Path	Description
`GET`	`/api/orgs/:org_id/keys`	List API keys (prefix + metadata only)
`POST`	`/api/orgs/:org_id/keys`	Create API key (returns full key once)
`DELETE`	`/api/orgs/:org_id/keys/:key_id`	Revoke API key

Provider Credentials

Method	Path	Description
`GET`	`/api/orgs/:org_id/providers`	List configured providers (suffix only, never the key)
`PUT`	`/api/orgs/:org_id/providers/:provider`	Set/update provider API key
`DELETE`	`/api/orgs/:org_id/providers/:provider`	Remove provider credential
`POST`	`/api/orgs/:org_id/providers/:provider/test`	Test provider credential (makes a minimal API call)

Dashboard (Analytics)

Method	Path	Description
`GET`	`/api/orgs/:org_id/dashboard/summary`	Current period cost summary (total spend, total saved, request count)
`GET`	`/api/orgs/:org_id/dashboard/timeseries`	Cost over time. Query params: `period=7d\|30d\|90d`, `granularity=hour\|day`
`GET`	`/api/orgs/:org_id/dashboard/treemap`	Cost breakdown by feature/team/model for treemap visualization
`GET`	`/api/orgs/:org_id/dashboard/top-savings`	Top 10 features/endpoints by savings opportunity
`GET`	`/api/orgs/:org_id/dashboard/model-usage`	Model usage distribution (pie chart data)

Example: /api/orgs/:org_id/dashboard/summary

{
  "period": "7d",
  "total_requests": 42850,
  "total_cost": 127.43,
  "total_cost_without_routing": 891.20,
  "total_saved": 763.77,
  "savings_percentage": 85.7,
  "avg_latency_ms": 4.2,
  "top_model": "gpt-4o-mini",
  "top_feature": "classify",
  "cache_hit_rate": 0.12
}

Request Inspector

Method	Path	Description
`GET`	`/api/orgs/:org_id/requests`	Paginated request list. Filters: `model`, `feature`, `team`, `status`, `date_from`, `date_to`, `min_cost`, `was_routed`
`GET`	`/api/orgs/:org_id/requests/:request_id`	Single request detail (routing decision, timing breakdown)

Example: /api/orgs/:org_id/requests?feature=classify&limit=20

{
  "data": [
    {
      "id": "req_a1b2c3",
      "timestamp": "2026-03-15T14:22:33Z",
      "model_requested": "gpt-4o",
      "model_used": "gpt-4o-mini",
      "provider": "openai",
      "feature_tag": "classify",
      "input_tokens": 142,
      "output_tokens": 8,
      "cost": 0.000026,
      "cost_without_routing": 0.000435,
      "saved": 0.000409,
      "latency_ms": 245,
      "complexity": "LOW",
      "status": 200
    }
  ],
  "cursor": "eyJpZCI6InJlcV...",
  "has_more": true
}

Note: No prompt content in the response. Ever. The request inspector shows metadata only.

Routing Rules

Method	Path	Description
`GET`	`/api/orgs/:org_id/routing/rules`	List routing rules (ordered by priority)
`POST`	`/api/orgs/:org_id/routing/rules`	Create routing rule
`PATCH`	`/api/orgs/:org_id/routing/rules/:rule_id`	Update rule
`DELETE`	`/api/orgs/:org_id/routing/rules/:rule_id`	Delete rule
`POST`	`/api/orgs/:org_id/routing/rules/reorder`	Reorder rules (accepts array of rule IDs in new order)
`GET`	`/api/orgs/:org_id/routing/models`	List available models with current pricing

Example: Create a routing rule

POST /api/orgs/:org_id/routing/rules

{
  "name": "Route classification to economy models",
  "match_tags": {"feature": "classify"},
  "match_complexity": null,
  "strategy": "cheapest",
  "model_chain": ["gpt-4o-mini", "claude-3-haiku"],
  "daily_budget": 50.00
}

Alerts

Method	Path	Description
`GET`	`/api/orgs/:org_id/alerts`	List alert configurations
`POST`	`/api/orgs/:org_id/alerts`	Create alert
`PATCH`	`/api/orgs/:org_id/alerts/:alert_id`	Update alert
`DELETE`	`/api/orgs/:org_id/alerts/:alert_id`	Delete alert
`GET`	`/api/orgs/:org_id/alerts/history`	Alert firing history

7.4 Webhook & Notification API

V1 supports outbound webhooks for two events:

Budget Alert Webhook

Fires when a spend threshold is crossed.

POST {customer_webhook_url}
Content-Type: application/json
X-DD0C-Signature: sha256=abc123...

{
  "event": "budget.threshold_reached",
  "timestamp": "2026-03-15T14:22:33Z",
  "org_id": "org_456",
  "alert": {
    "id": "alert_789",
    "name": "Daily spend limit",
    "threshold": 100.00,
    "current_spend": 102.47,
    "period": "daily"
  },
  "scope": {
    "feature": "summarize",
    "team": null
  }
}

Slack Integration

Native Slack webhook support (no Slack app — just incoming webhooks for V1):

{
  "text": "🚨 *dd0c/route Budget Alert*\nDaily spend for `summarize` reached $102.47 (limit: $100.00)\n<https://route.dd0c.dev/dashboard|View Dashboard>"
}

Webhook security: All outbound webhooks include an X-DD0C-Signature header containing an HMAC-SHA256 signature of the request body, using a per-org webhook secret. Customers can verify the signature to ensure the webhook came from dd0c.

7.5 SDK Considerations

V1: No SDK. Use the OpenAI SDK.

The entire point of OpenAI compatibility is that customers don't need a dd0c SDK. They use the official OpenAI Python/Node/Go SDK and change the base URL. Done.

# Python — using official OpenAI SDK
from openai import OpenAI

client = OpenAI(
    api_key="dd0c_sk_live_a3f2b8c9...",
    base_url="https://proxy.route.dd0c.dev/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",  # dd0c may route this to a cheaper model
    messages=[{"role": "user", "content": "Classify: ..."}],
    extra_headers={
        "X-DD0C-Feature": "classify",
        "X-DD0C-Team": "backend"
    }
)

# Read routing metadata from response headers
# (requires accessing the raw httpx response)

// TypeScript — using official OpenAI SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'dd0c_sk_live_a3f2b8c9...',
  baseURL: 'https://proxy.route.dd0c.dev/v1',
  defaultHeaders: {
    'X-DD0C-Feature': 'classify',
    'X-DD0C-Team': 'backend',
  },
});

V1.5: Thin wrapper SDK (optional convenience)

If customers want easier access to dd0c response headers and routing metadata, ship a thin wrapper:

# dd0c Python SDK (V1.5) — wraps OpenAI SDK
from dd0c import DD0CClient

client = DD0CClient(
    dd0c_key="dd0c_sk_live_...",
    # Inherits all OpenAI SDK options
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    feature="classify",  # convenience param → X-DD0C-Feature header
    team="backend",
)

# Easy access to routing metadata
print(response.dd0c.model_used)       # "gpt-4o-mini"
print(response.dd0c.cost)             # 0.000150
print(response.dd0c.saved)            # 0.002350
print(response.dd0c.complexity)       # "LOW"

The SDK is a convenience, not a requirement. The proxy works with any HTTP client that can set headers and parse JSON.

Appendix: Decision Log

Decision	Options Considered	Chosen	Rationale
Proxy language	Rust, Go, Node.js	Rust	<10ms latency requirement eliminates GC languages. Rust's ownership model prevents memory leaks in long-running proxy.
API language	Node.js, Python, Rust	Rust (Axum)	Single-language stack for solo founder. Shared crate library. One build system.
Telemetry store	PostgreSQL, ClickHouse, TimescaleDB	TimescaleDB	"It's just Postgres" — Brian knows it. Continuous aggregates solve the dashboard query problem. Compression solves storage.
Config store	SQLite, DynamoDB, PostgreSQL	PostgreSQL (RDS)	Relational integrity for org/key/rule relationships. RDS is managed. Brian's home turf.
Cache	In-process, Memcached, Redis	Redis (ElastiCache)	Shared state across proxy instances (circuit breakers, rate limits). ElastiCache is managed.
Compute	Lambda, EC2, ECS Fargate	ECS Fargate	No cold starts (Lambda). No server management (EC2). Right abstraction for stateless containers.
Auth	Auth0, Clerk, Custom	Custom (GitHub OAuth + JWT)	~200 lines of code. No vendor dependency. No per-MAU pricing. GitHub is where the users are.
UI framework	Next.js, SvelteKit, React+Vite	React + Vite	Largest ecosystem. SPA is sufficient (no SSR/SEO needed). Vite is fast.
Email	Resend, SendGrid, SES	AWS SES	Brian has AWS credits. $0.10/1K emails. Plain HTML digest — no template engine needed.
IaC	Terraform, CDK, Pulumi	CDK (TypeScript) or Terraform	Brian's choice. Both work. CDK if he wants to stay in AWS-native tooling. Terraform if he wants portability.
Deployment	Blue/green, Canary, Rolling	Rolling (ECS default)	Simplest. Proxy is stateless. Rolling update = zero downtime. Rollback = redeploy previous SHA.
Monitoring	Datadog, Grafana Cloud, CloudWatch	CloudWatch	Already included with AWS. No additional vendor. Good enough for V1. Migrate to Grafana Cloud at $5K MRR if CW becomes limiting.

Architecture document generated as Phase 6 of the BMad product development pipeline for dd0c/route. Next phase: Implementation planning and sprint breakdown.

81 KiB Raw Permalink Blame History Unescape Escape