Files
Max Mayfield 5ee95d8b13 dd0c: full product research pipeline - 6 products, 8 phases each
Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
        product-brief, architecture, epics (incl. Epic 10 TF compliance),
        test-architecture (TDD strategy)

Brand strategy and market research included.
2026-02-28 17:35:02 +00:00

81 KiB
Raw Permalink Blame History

dd0c/route — Technical Architecture

Product: dd0c/route — LLM Cost Router & Optimization Dashboard Author: Architecture Phase (BMad Phase 6) Date: February 28, 2026 Status: V1 MVP Architecture — Solo Founder Scope


Section 1: SYSTEM OVERVIEW

1.1 High-Level Architecture

graph TB
    subgraph Clients["Client Applications"]
        APP1[App Service A]
        APP2[App Service B]
        CLI[dd0c-scan CLI]
    end

    subgraph DD0C["dd0c/route Platform (AWS us-east-1)"]
        subgraph ProxyTier["Proxy Tier (ECS Fargate)"]
            PROXY1[Rust Proxy Instance 1]
            PROXY2[Rust Proxy Instance N]
        end

        subgraph ControlPlane["Control Plane (ECS Fargate)"]
            API[Dashboard API<br/>Axum/Rust]
            WORKER[Async Worker<br/>Digest + Alerts]
        end

        subgraph DataTier["Data Tier"]
            PG[(PostgreSQL RDS<br/>Config + Auth)]
            TS[(TimescaleDB RDS<br/>Request Telemetry)]
            REDIS[(ElastiCache Redis<br/>Rate Limits + Cache)]
        end
    end

    subgraph Providers["LLM Providers"]
        OAI[OpenAI API]
        ANT[Anthropic API]
    end

    subgraph External["External Services"]
        GH[GitHub OAuth]
        SES[AWS SES<br/>Digest Emails]
        SLACK[Slack Webhooks]
    end

    APP1 -->|HTTPS / OpenAI-compat| PROXY1
    APP2 -->|HTTPS / OpenAI-compat| PROXY2
    PROXY1 --> OAI
    PROXY1 --> ANT
    PROXY2 --> OAI
    PROXY2 --> ANT
    PROXY1 -->|async telemetry| TS
    PROXY2 -->|async telemetry| TS
    PROXY1 --> REDIS
    PROXY2 --> REDIS
    API --> PG
    API --> TS
    WORKER --> TS
    WORKER --> SES
    WORKER --> SLACK
    CLI -->|log analysis| APP1

1.2 Component Inventory

Component Language/Runtime Responsibility Criticality
Proxy Engine Rust (tokio + hyper) Request interception, complexity classification, model routing, response passthrough, telemetry emission P0 — the product IS this
Router Brain Rust (embedded in proxy) Rule evaluation, cost table lookups, fallback chain execution, cascading try-cheap-first logic P0 — routing decisions
Dashboard API Rust (axum) REST API for dashboard UI, config management, auth, org/team CRUD P0 — the "aha moment"
Dashboard UI TypeScript (React + Vite) Cost treemap, request inspector, routing config editor, real-time ticker P0 — what Marcus sees
Async Worker Rust (tokio-cron) Weekly digest generation, anomaly detection (threshold-based), alert dispatch P1 — retention mechanism
PostgreSQL AWS RDS (db.t4g.micro) Organizations, API keys, routing rules, user accounts P0 — config store
TimescaleDB AWS RDS (db.t4g.small) Request telemetry, cost events, token counts — time-series optimized P0 — analytics backbone
Redis AWS ElastiCache (t4g.micro) Rate limiting, exact-match response cache, session tokens P1 — performance layer

1.3 Technology Choices & Justification

Choice Alternative Considered Why This One
Rust (proxy) Go, Node.js <10ms p99 overhead is non-negotiable. Rust's zero-cost abstractions and tokio async runtime give us predictable tail latency. Go would add GC pauses. Node.js adds event loop overhead. Portkey's 20-40ms overhead in Node.js is the cautionary tale.
Rust (API) Node.js (Express), Python (FastAPI) Single language across the stack reduces cognitive overhead for a solo founder. Axum is production-ready and shares the tokio runtime. One cargo build produces the proxy AND the API.
TimescaleDB ClickHouse, plain PostgreSQL TimescaleDB is PostgreSQL with time-series superpowers — hypertables, continuous aggregates, compression. Brian already knows PostgreSQL. ClickHouse is faster for analytics but adds operational complexity (separate cluster, different query language, different backup strategy). For a solo founder, "it's just Postgres" wins. Continuous aggregates handle the dashboard rollups. Compression handles storage costs.
PostgreSQL (config) SQLite, DynamoDB RDS PostgreSQL is Brian's home turf (AWS architect). Managed backups, failover, IAM auth. DynamoDB would work but adds a second data model to reason about. SQLite doesn't scale past a single instance.
Redis (cache) In-process LRU, DynamoDB DAX Shared cache across proxy instances for exact-match response dedup. ElastiCache is managed, cheap at t4g.micro ($0.016/hr). In-process cache doesn't share across instances.
React + Vite (UI) Next.js, SvelteKit, HTMX React has the largest hiring pool if Brian ever hires. Vite is fast. The dashboard is a SPA — no SSR needed, no SEO needed. Keep it simple.
AWS SES (email) Resend, SendGrid Brian has AWS credits and expertise. SES is $0.10/1000 emails. The digest email is plain HTML — no fancy template engine needed.
GitHub OAuth Auth0, Clerk, email/password One-click signup for the developer audience. No password management burden. GitHub is where the users live. Implemented via oauth2 Rust crate — ~200 lines of code.

1.4 Deployment Model

V1: Containerized services on ECS Fargate. Not Lambda. Not a single binary.

Rationale:

  • Why not Lambda: The proxy needs persistent connections to LLM providers (connection pooling, keep-alive). Lambda cold starts (100-500ms) violate the <10ms latency budget. Lambda's 15-minute timeout conflicts with streaming responses. Lambda per-invocation pricing gets expensive at 100K+ requests/day.
  • Why not single binary: The proxy and the dashboard API have different scaling profiles. The proxy scales horizontally with request volume. The API scales with dashboard users (much lower). Coupling them wastes money.
  • Why ECS Fargate: No EC2 instances to manage. Auto-scaling built in. Brian knows ECS. Task definitions are the deployment unit. ALB handles TLS termination and health checks.

Container topology:

Service Container vCPU Memory Min Instances Auto-Scale Trigger
Proxy dd0c-proxy 0.25 512MB 2 CPU > 60% or request count
Dashboard API dd0c-api 0.25 512MB 1 CPU > 70%
Async Worker dd0c-worker 0.25 512MB 1 None (singleton)
Dashboard UI S3 + CloudFront CDN-managed

Build artifact: docker build produces three images from a single Rust workspace (cargo workspace). The UI is a static build deployed to S3/CloudFront.

dd0c-route/
├── Cargo.toml          (workspace root)
├── crates/
│   ├── proxy/          (the proxy engine + router brain)
│   ├── api/            (dashboard REST API)
│   ├── worker/         (digest + alerts)
│   └── shared/         (models, DB queries, cost tables)
├── ui/                 (React dashboard)
├── cli/                (dd0c-scan — separate npm package)
└── infra/              (CDK or Terraform)

Section 2: CORE COMPONENTS

2.1 Proxy Engine (Rust — crates/proxy)

The proxy is the hot path. Every design decision optimizes for one thing: don't add latency.

Request lifecycle:

Client Request (OpenAI-compat)
    │
    ├─ 1. TLS termination (ALB — not our problem)
    ├─ 2. Auth validation (API key lookup — Redis cache, PG fallback) ........... <1ms
    ├─ 3. Request parsing (extract model, messages, metadata) ................... <0.5ms
    ├─ 4. Tag extraction (X-DD0C-Feature, X-DD0C-Team headers) ................. <0.1ms
    ├─ 5. Router Brain evaluation (complexity + rules → target model) ........... <2ms
    ├─ 6. Provider dispatch (connection-pooled HTTPS to OpenAI/Anthropic) ....... network
    ├─ 7. Response passthrough (streaming SSE or buffered JSON) ................. passthrough
    ├─ 8. Telemetry emission (async, non-blocking — tokio::spawn) ............... 0ms on hot path
    └─ 9. Response headers injected (X-DD0C-Model, X-DD0C-Cost, X-DD0C-Saved)

Latency budget breakdown:

Stage Budget Implementation
Auth <1ms Redis GET dd0c_key:{hash} with 60s TTL. Cache miss → PG lookup + cache set.
Parse <0.5ms serde_json zero-copy deserialization. No full body buffering for streaming requests — parse headers + first chunk only.
Route <2ms In-memory rule engine. Cost tables loaded at startup, refreshed every 60s via background task. No DB call on hot path.
Dispatch 0ms overhead hyper connection pool to each provider. Pre-warmed connections. HTTP/2 multiplexing.
Telemetry 0ms on hot path tokio::spawn fires a telemetry event to an in-memory channel. Background task batch-inserts to TimescaleDB every 1s or 100 events (whichever comes first).
Total overhead <5ms p99 Target is <10ms p99 with margin.

Streaming support:

The proxy MUST support Server-Sent Events (SSE) streaming — this is how most chat applications consume LLM responses. The proxy operates as a transparent stream relay:

  1. Client sends request with "stream": true
  2. Proxy makes routing decision based on headers + first message content (no need to buffer full body)
  3. Proxy opens streaming connection to target provider
  4. Each SSE chunk is forwarded to client immediately (Transfer-Encoding: chunked)
  5. Token counting happens on-the-fly by parsing usage from the final SSE [DONE] chunk (OpenAI) or message_stop event (Anthropic)
  6. If the provider doesn't return usage in the stream, the proxy counts tokens from accumulated chunks using tiktoken-rs

Provider abstraction:

// Simplified — the actual trait is more detailed
#[async_trait]
trait LlmProvider: Send + Sync {
    fn name(&self) -> &str;
    fn supports_model(&self, model: &str) -> bool;
    fn translate_request(&self, req: &ProxyRequest) -> ProviderRequest;
    fn translate_response(&self, resp: ProviderResponse) -> ProxyResponse;
    async fn send(&self, req: ProviderRequest) -> Result<ProviderResponse>;
    async fn send_stream(&self, req: ProviderRequest) -> Result<impl Stream<Item = SseChunk>>;
}

V1 ships two implementations: OpenAiProvider and AnthropicProvider. Adding a new provider means implementing this trait — no proxy core changes. The translate_request / translate_response methods handle the format differences (Anthropic's messages API vs OpenAI's chat/completions).

Connection pooling:

Each proxy instance maintains a hyper connection pool per provider:

  • Max 100 connections to api.openai.com
  • Max 50 connections to api.anthropic.com
  • Keep-alive: 90s
  • Connection timeout: 5s
  • Request timeout: 300s (LLM responses can be slow for long completions)

2.2 Router Brain (crates/shared/router)

The Router Brain is embedded in the proxy process — no network hop, no RPC. It's a pure function: (request, rules, cost_tables) → routing_decision.

Decision pipeline:

Input: ProxyRequest + RoutingConfig
    │
    ├─ 1. Rule matching: find first rule where all match conditions are true
    │     Match on: request tags, model requested, token count estimate, time of day
    │
    ├─ 2. Strategy execution (per matched rule):
    │     ├─ "passthrough"  → use requested model, no routing
    │     ├─ "cheapest"     → pick cheapest model from rule's model list
    │     ├─ "quality-first"→ pick highest-quality model, fallback down on error
    │     └─ "cascading"    → try cheapest first, escalate on low confidence
    │
    ├─ 3. Budget check: if org/team/feature has hit a hard budget limit → throttle to cheapest or reject
    │
    └─ 4. Output: RoutingDecision { target_model, target_provider, reason, confidence }

Complexity classifier (V1 — heuristic, not ML):

The V1 classifier is deliberately simple. It uses three signals:

Signal Weight Logic
Token count 30% Short prompts (<500 tokens) with short expected outputs are likely simple tasks.
Task pattern 50% Regex/keyword matching on system prompt: "classify", "extract", "format JSON", "yes or no" → LOW complexity. "analyze", "reason step by step", "write code" → HIGH complexity.
Model requested 20% If the user explicitly requests a frontier model AND the task looks complex, respect the request. Don't downgrade a code generation request from GPT-4o.

Output: ComplexityScore { level: Low|Medium|High, confidence: f32 }

This gets 70-80% accuracy. Good enough for V1. The ML classifier (V2) trains on the telemetry data: for each routed request, did the user complain? Did they retry with a different model? Did the downstream application error? That feedback loop is the data flywheel.

Cost tables:

struct ModelCost {
    provider: Provider,
    model_id: String,          // "gpt-4o-2024-11-20"
    model_alias: String,       // "gpt-4o"
    input_cost_per_m: f64,     // $/million input tokens
    output_cost_per_m: f64,    // $/million output tokens
    quality_tier: QualityTier, // Frontier, Standard, Economy
    max_context: u32,          // 128000
    supports_streaming: bool,
    supports_tools: bool,
    supports_vision: bool,
    updated_at: DateTime<Utc>,
}

Cost tables are stored in PostgreSQL and loaded into memory at proxy startup. A background task polls for updates every 60 seconds. When a provider changes pricing (happens ~monthly), Brian updates one row in the DB and all proxy instances pick it up within 60s. No redeploy.

Fallback chains with circuit breakers:

Primary: gpt-4o-mini (OpenAI)
    │ ── if error rate > 10% in last 60s ──→ circuit OPEN
    │
    ▼
Fallback 1: claude-3-haiku (Anthropic)
    │ ── if error rate > 10% in last 60s ──→ circuit OPEN
    │
    ▼
Fallback 2: gpt-4o (OpenAI) ← expensive but reliable last resort
    │
    ▼
Final fallback: return 503 with X-DD0C-Fallback-Exhausted header

Circuit breaker state is stored in Redis (shared across proxy instances). State transitions: CLOSED → OPEN (on threshold breach) → HALF-OPEN (after 30s cooldown, allow 1 probe request) → CLOSED (if probe succeeds).

2.3 Analytics Pipeline

Telemetry flows from the proxy to TimescaleDB asynchronously. The proxy never blocks on analytics.

Event schema (what the proxy emits per request):

struct RequestEvent {
    id: Uuid,
    org_id: Uuid,
    api_key_id: Uuid,
    timestamp: DateTime<Utc>,
    // Request metadata
    model_requested: String,
    model_used: String,
    provider: String,
    feature_tag: Option<String>,
    team_tag: Option<String>,
    environment_tag: Option<String>,
    // Tokens & cost
    input_tokens: u32,
    output_tokens: u32,
    cost_actual: f64,        // what they paid (routed model)
    cost_original: f64,      // what they would have paid (requested model)
    cost_saved: f64,         // delta
    // Performance
    latency_ms: u32,
    ttfb_ms: u32,            // time to first byte (streaming)
    // Routing
    complexity_score: f32,
    complexity_level: String, // LOW, MEDIUM, HIGH
    routing_reason: String,
    was_cached: bool,
    was_fallback: bool,
    // Status
    status_code: u16,
    error_type: Option<String>,
}

Batch insert pipeline:

Proxy hot path                    Background task
─────────────                     ───────────────
request completes
    │
    ├─ tokio::spawn ──→ mpsc channel ──→ batch collector
                                            │
                                            ├─ accumulate events
                                            ├─ flush every 1s OR 100 events
                                            └─ COPY INTO request_events (bulk insert)

COPY (PostgreSQL bulk insert) handles 10K+ rows/second on a db.t4g.small. At 100K requests/day (~1.2 req/s average), this is trivially within capacity.

Continuous aggregates (TimescaleDB):

Pre-computed rollups for dashboard queries:

-- Hourly rollup by org, feature, model
CREATE MATERIALIZED VIEW hourly_cost_summary
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', timestamp) AS bucket,
    org_id,
    feature_tag,
    team_tag,
    model_used,
    provider,
    COUNT(*) AS request_count,
    SUM(input_tokens) AS total_input_tokens,
    SUM(output_tokens) AS total_output_tokens,
    SUM(cost_actual) AS total_cost,
    SUM(cost_saved) AS total_saved,
    AVG(latency_ms) AS avg_latency,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99_latency
FROM request_events
GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider;

Dashboard queries hit the continuous aggregate, not the raw events table. This keeps dashboard response times <200ms even with millions of rows.

Savings calculation:

cost_saved = cost_original - cost_actual

where:
  cost_original = (input_tokens × requested_model.input_cost_per_m / 1_000_000)
                + (output_tokens × requested_model.output_cost_per_m / 1_000_000)

  cost_actual   = (input_tokens × used_model.input_cost_per_m / 1_000_000)
                + (output_tokens × used_model.output_cost_per_m / 1_000_000)

This is computed at request time in the proxy (cost tables are in memory) and stored with the event. No post-hoc recalculation needed.

2.4 Dashboard API (crates/api)

Framework: Axum (Rust). Same tokio runtime as the proxy. Shares the crates/shared library for DB models and queries.

Why not a separate language (Node/Python)? Solo founder. One language. One build system. One deployment pipeline. The API is not performance-critical (dashboard users, not proxy traffic), but keeping it in Rust means Brian debugs one ecosystem, not two.

Key endpoint groups (detailed in Section 7):

Group Purpose
/api/auth/* GitHub OAuth flow, session management
/api/orgs/* Organization CRUD, team management
/api/dashboard/* Cost summaries, treemap data, time-series
/api/requests/* Request inspector — paginated, filterable
/api/routing/* Routing rules CRUD, cost tables
/api/alerts/* Alert configuration, budget limits
/api/keys/* API key management (dd0c keys + encrypted provider keys)

Auth model: JWT tokens issued after GitHub OAuth. Short-lived access tokens (15min) + refresh tokens (7 days) stored in Redis. API keys for programmatic access (prefixed dd0c_sk_).

2.5 Shadow Audit Mode (The PLG Wedge)

Shadow Audit is the product-led growth engine. It provides value before the customer routes a single request through the proxy.

Two modes:

Mode A: CLI Scan (npx dd0c-scan)

  • Scans a local codebase for LLM API calls
  • Parses model names, estimates token counts from prompt templates
  • Applies current pricing to estimate monthly cost
  • Applies dd0c routing logic to estimate savings
  • Outputs a report to stdout — no data leaves the machine
  • Captures email (optional) for follow-up
$ npx dd0c-scan ./src

  dd0c/route — Cost Scan Report
  ─────────────────────────────
  Found 14 LLM API calls across 8 files

  Current estimated monthly cost:    $4,217
  With dd0c/route routing:           $1,890
  Potential monthly savings:          $2,327 (55%)

  Top opportunities:
  ┌─────────────────────────────────────────────────────┐
  │ src/services/classify.ts    gpt-4o → gpt-4o-mini   │
  │   Est. savings: $890/mo     Confidence: HIGH        │
  │                                                     │
  │ src/services/summarize.ts   gpt-4o → claude-haiku   │
  │   Est. savings: $670/mo     Confidence: MEDIUM      │
  │                                                     │
  │ src/services/extract.ts     gpt-4o → gpt-4o-mini   │
  │   Est. savings: $440/mo     Confidence: HIGH        │
  └─────────────────────────────────────────────────────┘

  → Sign up at route.dd0c.dev to start saving

Mode B: Log Ingestion (V1.1)

  • Customer points dd0c at their existing LLM provider logs (OpenAI usage export CSV, or application logs with token counts)
  • dd0c processes the logs offline and generates a retrospective savings report
  • "Here's what you spent last month. Here's what you WOULD have spent."
  • This is the enterprise conversion tool — show the CFO real numbers from their own data

Section 3: DATA ARCHITECTURE

3.1 Database Schema

Two databases, clear separation of concerns:

  • PostgreSQL (RDS): Configuration, auth, organizational data. Low-write, high-read. Relational integrity matters.
  • TimescaleDB (RDS): Request telemetry, cost events. High-write, time-series queries. Compression and retention policies matter.

PostgreSQL — Configuration Store

-- Organizations (multi-tenant root)
CREATE TABLE organizations (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    slug            VARCHAR(63) NOT NULL UNIQUE,  -- used in URLs
    plan            VARCHAR(20) NOT NULL DEFAULT 'free',  -- free, pro, business
    stripe_customer_id VARCHAR(255),
    monthly_llm_spend_limit NUMERIC(10,2),  -- plan-based cap on routed spend
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Users (GitHub OAuth)
CREATE TABLE users (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    github_id       BIGINT NOT NULL UNIQUE,
    github_login    VARCHAR(255) NOT NULL,
    email           VARCHAR(255),
    avatar_url      VARCHAR(512),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Org membership
CREATE TABLE org_members (
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    role            VARCHAR(20) NOT NULL DEFAULT 'member',  -- owner, admin, member
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (org_id, user_id)
);

-- dd0c API keys (what customers use to auth with the proxy)
CREATE TABLE api_keys (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    key_hash        VARCHAR(64) NOT NULL UNIQUE,  -- SHA-256 of the key; raw key never stored
    key_prefix      VARCHAR(12) NOT NULL,          -- "dd0c_sk_a3f..." for display
    name            VARCHAR(255),                   -- human label: "production", "staging"
    environment     VARCHAR(50) DEFAULT 'production',
    is_active       BOOLEAN NOT NULL DEFAULT true,
    last_used_at    TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash) WHERE is_active = true;

-- Customer's LLM provider credentials (encrypted at rest)
CREATE TABLE provider_credentials (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    provider        VARCHAR(50) NOT NULL,           -- 'openai', 'anthropic'
    encrypted_key   BYTEA NOT NULL,                 -- AES-256-GCM encrypted API key
    key_suffix      VARCHAR(8),                     -- last 4 chars for display: "...a3f2"
    is_active       BOOLEAN NOT NULL DEFAULT true,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(org_id, provider)
);

-- Routing rules (ordered, first-match-wins)
CREATE TABLE routing_rules (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    priority        INTEGER NOT NULL DEFAULT 0,     -- lower = higher priority
    name            VARCHAR(255) NOT NULL,
    is_active       BOOLEAN NOT NULL DEFAULT true,
    -- Match conditions (all must be true)
    match_tags      JSONB DEFAULT '{}',             -- {"feature": "classify", "team": "backend"}
    match_models    TEXT[],                          -- models this rule applies to, NULL = all
    match_complexity VARCHAR(20),                    -- LOW, MEDIUM, HIGH, NULL = all
    -- Routing strategy
    strategy        VARCHAR(20) NOT NULL,            -- passthrough, cheapest, quality_first, cascading
    model_chain     TEXT[] NOT NULL,                  -- ordered list of models to try
    -- Budget constraints
    daily_budget    NUMERIC(10,2),                   -- hard limit per day for this rule
    -- Metadata
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_routing_rules_org ON routing_rules(org_id, priority) WHERE is_active = true;

-- Model cost table (the source of truth for pricing)
CREATE TABLE model_costs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    provider        VARCHAR(50) NOT NULL,
    model_id        VARCHAR(100) NOT NULL,           -- "gpt-4o-2024-11-20"
    model_alias     VARCHAR(100) NOT NULL,           -- "gpt-4o"
    input_cost_per_m  NUMERIC(10,4) NOT NULL,        -- $/million input tokens
    output_cost_per_m NUMERIC(10,4) NOT NULL,        -- $/million output tokens
    quality_tier    VARCHAR(20) NOT NULL,             -- frontier, standard, economy
    max_context     INTEGER NOT NULL,
    supports_streaming BOOLEAN DEFAULT true,
    supports_tools  BOOLEAN DEFAULT false,
    supports_vision BOOLEAN DEFAULT false,
    is_active       BOOLEAN NOT NULL DEFAULT true,
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(provider, model_id)
);

-- Alert configurations
CREATE TABLE alert_configs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    name            VARCHAR(255) NOT NULL,
    alert_type      VARCHAR(50) NOT NULL,            -- spend_threshold, anomaly, budget_warning
    -- Conditions
    threshold_amount NUMERIC(10,2),                  -- dollar amount trigger
    threshold_pct   NUMERIC(5,2),                    -- percentage above baseline
    scope_tags      JSONB DEFAULT '{}',              -- scope to specific feature/team
    -- Notification
    notify_slack_webhook VARCHAR(512),
    notify_email    VARCHAR(255),
    -- State
    is_active       BOOLEAN NOT NULL DEFAULT true,
    last_fired_at   TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

TimescaleDB — Telemetry Store

-- Raw request events (hypertable — partitioned by time automatically)
CREATE TABLE request_events (
    id              UUID NOT NULL DEFAULT gen_random_uuid(),
    org_id          UUID NOT NULL,
    api_key_id      UUID NOT NULL,
    timestamp       TIMESTAMPTZ NOT NULL,
    -- Request
    model_requested VARCHAR(100) NOT NULL,
    model_used      VARCHAR(100) NOT NULL,
    provider        VARCHAR(50) NOT NULL,
    feature_tag     VARCHAR(100),
    team_tag        VARCHAR(100),
    environment_tag VARCHAR(50),
    -- Tokens & cost
    input_tokens    INTEGER NOT NULL,
    output_tokens   INTEGER NOT NULL,
    cost_actual     NUMERIC(12,8) NOT NULL,
    cost_original   NUMERIC(12,8) NOT NULL,
    cost_saved      NUMERIC(12,8) NOT NULL,
    -- Performance
    latency_ms      INTEGER NOT NULL,
    ttfb_ms         INTEGER,
    -- Routing
    complexity_score REAL,
    complexity_level VARCHAR(10),
    routing_reason  VARCHAR(255),
    was_cached      BOOLEAN DEFAULT false,
    was_fallback    BOOLEAN DEFAULT false,
    -- Status
    status_code     SMALLINT NOT NULL,
    error_type      VARCHAR(100)
);

-- Convert to hypertable (TimescaleDB magic)
SELECT create_hypertable('request_events', 'timestamp',
    chunk_time_interval => INTERVAL '1 day'
);

-- Indexes for common query patterns
CREATE INDEX idx_re_org_time ON request_events(org_id, timestamp DESC);
CREATE INDEX idx_re_org_feature ON request_events(org_id, feature_tag, timestamp DESC);
CREATE INDEX idx_re_org_team ON request_events(org_id, team_tag, timestamp DESC);

-- Compression policy: compress chunks older than 7 days (90%+ space savings)
ALTER TABLE request_events SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'org_id',
    timescaledb.compress_orderby = 'timestamp DESC'
);
SELECT add_compression_policy('request_events', INTERVAL '7 days');

-- Retention policy: drop raw data older than plan retention (90 days for Pro)
-- Applied per-org via the worker, not a global policy
-- Business tier gets 1 year; continuous aggregates survive raw data deletion

-- Continuous aggregate: hourly rollup
CREATE MATERIALIZED VIEW hourly_cost_summary
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', timestamp) AS bucket,
    org_id,
    feature_tag,
    team_tag,
    model_used,
    provider,
    COUNT(*) AS request_count,
    SUM(input_tokens)::BIGINT AS total_input_tokens,
    SUM(output_tokens)::BIGINT AS total_output_tokens,
    SUM(cost_actual) AS total_cost,
    SUM(cost_saved) AS total_saved,
    AVG(latency_ms)::INTEGER AS avg_latency_ms,
    MAX(latency_ms) AS max_latency_ms
FROM request_events
GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider
WITH NO DATA;

-- Refresh policy: keep hourly aggregates up to date
SELECT add_continuous_aggregate_policy('hourly_cost_summary',
    start_offset => INTERVAL '3 hours',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);

-- Daily rollup (for long-range dashboard views)
CREATE MATERIALIZED VIEW daily_cost_summary
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 day', timestamp) AS bucket,
    org_id,
    feature_tag,
    team_tag,
    model_used,
    provider,
    COUNT(*) AS request_count,
    SUM(cost_actual) AS total_cost,
    SUM(cost_saved) AS total_saved
FROM request_events
GROUP BY bucket, org_id, feature_tag, team_tag, model_used, provider
WITH NO DATA;

SELECT add_continuous_aggregate_policy('daily_cost_summary',
    start_offset => INTERVAL '3 days',
    end_offset => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 day'
);

3.2 Data Flow Diagram

flowchart LR
    subgraph Client
        APP[Application]
    end

    subgraph Proxy["Proxy Engine"]
        AUTH[Auth Check]
        PARSE[Parse Request]
        ROUTE[Router Brain]
        DISPATCH[Provider Dispatch]
        TEL[Telemetry Emitter]
    end

    subgraph Async["Async Pipeline"]
        CHAN[mpsc Channel]
        BATCH[Batch Collector]
    end

    subgraph Storage
        REDIS[(Redis)]
        TSDB[(TimescaleDB)]
        PG[(PostgreSQL)]
    end

    subgraph Aggregation
        HOURLY[Hourly Aggregate]
        DAILY[Daily Aggregate]
    end

    subgraph Consumers
        DASH[Dashboard API]
        DIGEST[Weekly Digest Worker]
        ALERTS[Alert Evaluator]
    end

    APP -->|1. HTTPS request| AUTH
    AUTH -->|key lookup| REDIS
    REDIS -.->|cache miss| PG
    AUTH --> PARSE
    PARSE --> ROUTE
    ROUTE -->|rules from memory| ROUTE
    ROUTE --> DISPATCH
    DISPATCH -->|2. to LLM provider| LLM[OpenAI / Anthropic]
    LLM -->|3. response| DISPATCH
    DISPATCH -->|4. response to client| APP
    DISPATCH --> TEL
    TEL -->|fire & forget| CHAN
    CHAN --> BATCH
    BATCH -->|COPY bulk insert| TSDB
    TSDB --> HOURLY
    TSDB --> DAILY
    HOURLY --> DASH
    DAILY --> DASH
    HOURLY --> DIGEST
    HOURLY --> ALERTS
    ALERTS -->|Slack / Email| EXT[External Notifications]

3.3 Storage Strategy

Tier Data Store Retention Compression
Hot Raw request events (last 7 days) TimescaleDB — uncompressed chunks 7 days uncompressed None — fast queries
Warm Raw request events (890 days) TimescaleDB — compressed chunks Up to 90 days (Pro) / 365 days (Business) TimescaleDB native compression (~90% reduction)
Cold Continuous aggregates (hourly/daily) TimescaleDB — materialized views Indefinite (survives raw data deletion) Inherently compact (aggregated)
Config Orgs, keys, rules, users PostgreSQL Indefinite N/A
Ephemeral Auth sessions, rate limits, cache Redis TTL-based (15min24hr) N/A

Storage estimates at scale:

Scale Requests/Day Raw Event Size Daily Raw Storage Monthly (compressed)
1K 1,000 ~500 bytes/row ~0.5 MB ~1.5 MB
10K 10,000 ~500 bytes/row ~5 MB ~15 MB
100K 100,000 ~500 bytes/row ~50 MB ~150 MB

At 100K requests/day with 90-day retention and 90% compression: ~500 MB total. A db.t4g.small with 20GB gp3 storage handles this trivially. Storage is not a concern at V1 scale.

3.4 Privacy & Data Handling

This is the section that matters most for trust. The proxy sits in the middle of every LLM request. Customers need to know exactly what we see, store, and can access.

What the proxy sees (in memory, during request processing):

Data Seen Stored Purpose
Full prompt content (system + user messages) Yes — in memory during routing No — never persisted Complexity classification reads the system prompt to detect task patterns
Full response content Yes — streamed through No — never persisted Token counting on stream completion
Model name (requested + used) Yes Yes Core telemetry
Token counts (input + output) Yes Yes Cost calculation
Customer's LLM API keys Yes — decrypted in memory for provider dispatch Encrypted at rest (AES-256-GCM) Forwarding requests to providers
dd0c API key Yes — hash compared Hash only (SHA-256) Authentication
Request tags (feature, team) Yes Yes Attribution
IP address Yes No Rate limiting only
Latency, status code Yes Yes Performance telemetry

Critical privacy guarantees:

  1. Prompt content is NEVER stored. Not in the database. Not in logs. Not in error reports. The proxy processes prompts in memory and discards them. This is the #1 trust requirement.
  2. Customer LLM API keys are encrypted at rest using AES-256-GCM with a per-org encryption key derived from AWS KMS. The proxy decrypts them in memory only for the duration of the provider request.
  3. Telemetry contains metadata, not content. We store: "this request used 1,247 input tokens on gpt-4o-mini and cost $0.0002." We do NOT store: "the user asked about quarterly revenue projections for Q3."
  4. No cross-org data leakage. Every query is scoped by org_id. TimescaleDB chunks are segmented by org_id for compression. There is no query path that returns data from multiple orgs.

V1.5 enhancement — client-side classification:

For customers who can't accept prompt content transiting through a third-party proxy (Jordan's VPC requirement), V1.5 ships a lightweight WASM classifier that runs client-side. The proxy receives only the routing hint (complexity: LOW) and the encrypted request body, which it forwards to the provider without inspection. Telemetry still flows to the dashboard, but prompt content never leaves the customer's infrastructure.


Section 4: INFRASTRUCTURE

4.1 AWS Architecture

Single region: us-east-1 (Virginia). Lowest latency to OpenAI and Anthropic API endpoints (both hosted in US East). Multi-region is a V2 concern — the beachhead is US startups.

graph TB
    subgraph Internet
        CLIENT[Client Apps]
        USER[Dashboard Users]
    end

    subgraph AWS["AWS us-east-1"]
        subgraph Edge["Edge Layer"]
            CF[CloudFront CDN<br/>Dashboard UI + API cache]
            ALB[Application Load Balancer<br/>TLS termination, path routing]
        end

        subgraph Compute["ECS Fargate Cluster"]
            SVC_PROXY[Service: dd0c-proxy<br/>210 tasks, 0.25 vCPU / 512MB]
            SVC_API[Service: dd0c-api<br/>13 tasks, 0.25 vCPU / 512MB]
            SVC_WORKER[Service: dd0c-worker<br/>1 task, 0.25 vCPU / 512MB]
        end

        subgraph Data["Data Layer (Private Subnets)"]
            RDS_PG[RDS PostgreSQL 16<br/>db.t4g.micro, 20GB gp3<br/>Config Store]
            RDS_TS[RDS PostgreSQL 16 + TimescaleDB<br/>db.t4g.small, 50GB gp3<br/>Telemetry Store]
            ELASTICACHE[ElastiCache Redis 7<br/>cache.t4g.micro<br/>Cache + Rate Limits]
        end

        subgraph Security["Security"]
            KMS[KMS<br/>Encryption keys]
            SM[Secrets Manager<br/>DB creds, signing keys]
            WAF[WAF v2<br/>Rate limiting, geo-blocking]
        end

        subgraph Ops["Operations"]
            CW[CloudWatch<br/>Logs + Metrics + Alarms]
            ECR[ECR<br/>Container Registry]
            S3_UI[S3 Bucket<br/>Dashboard static assets]
            SES_SVC[SES<br/>Digest emails]
        end
    end

    CLIENT -->|HTTPS :443| ALB
    USER -->|HTTPS| CF
    CF --> S3_UI
    CF --> ALB
    ALB -->|/v1/*| SVC_PROXY
    ALB -->|/api/*| SVC_API
    SVC_PROXY --> RDS_TS
    SVC_PROXY --> ELASTICACHE
    SVC_PROXY --> KMS
    SVC_API --> RDS_PG
    SVC_API --> RDS_TS
    SVC_API --> ELASTICACHE
    SVC_WORKER --> RDS_TS
    SVC_WORKER --> SES_SVC
    SVC_WORKER --> SM

Network topology:

  • VPC with 2 AZs (cost-conscious — 3 AZs is overkill for V1)
  • Public subnets: ALB only
  • Private subnets: ECS tasks, RDS, ElastiCache
  • NAT Gateway: 1 (not 2 — single NAT saves ~$32/month; acceptable risk for V1)
  • VPC endpoints for ECR, S3, CloudWatch, KMS (avoid NAT charges for AWS service traffic)

ALB routing rules:

Path Pattern Target Group Notes
/v1/chat/completions dd0c-proxy OpenAI-compatible proxy endpoint
/v1/completions dd0c-proxy Legacy completions
/v1/embeddings dd0c-proxy Embedding passthrough (no routing — just telemetry)
/api/* dd0c-api Dashboard REST API
/* (default) 404 fixed response Reject unknown paths

Dashboard UI is served from S3 via CloudFront — never hits the ALB.

4.2 Cost Estimate

Real numbers. No hand-waving.

At 1K requests/day (~$65/month infrastructure)

Service Spec Monthly Cost
ECS Fargate (proxy) 2 tasks × 0.25 vCPU × 512MB × 730hrs $14.60
ECS Fargate (api) 1 task × 0.25 vCPU × 512MB × 730hrs $7.30
ECS Fargate (worker) 1 task × 0.25 vCPU × 512MB × 730hrs $7.30
RDS PostgreSQL db.t4g.micro, 20GB gp3, single-AZ $12.41
RDS TimescaleDB db.t4g.small, 50GB gp3, single-AZ $24.82
ElastiCache Redis cache.t4g.micro, single-AZ $8.35
ALB 1 ALB + minimal LCUs $16.20
NAT Gateway 1 gateway + ~5GB data $33.48
CloudFront <1GB transfer $0.00 (free tier)
S3 <1GB static assets $0.02
SES <1000 emails/month $0.10
KMS 1 key + ~10K requests $1.03
CloudWatch Logs + basic metrics $3.00
Total ~$129/month

Optimization note: The NAT Gateway at $33/month is the biggest single line item. Alternative: replace with a NAT instance on a t4g.nano ($3/month) or use VPC endpoints aggressively to eliminate NAT for AWS service traffic. With VPC endpoints for ECR/S3/CW/KMS, the only NAT traffic is outbound to LLM providers — which could go through a public subnet proxy task instead. Realistic optimized cost: ~$95/month.

At 10K requests/day (~$155/month infrastructure)

Change from 1K Impact
Proxy scales to 3-4 tasks +$15-22
TimescaleDB storage grows to ~15MB/month compressed Negligible
ALB LCU usage increases +$5
SES volume increases (more digest recipients) +$1
Total ~$155/month

At 100K requests/day (~$320/month infrastructure)

Change from 10K Impact
Proxy scales to 6-10 tasks +$45-75
API scales to 2-3 tasks +$7-15
TimescaleDB upgrade to db.t4g.medium (more IOPS) +$25
ElastiCache upgrade to cache.t4g.small +$8
ALB LCU usage +$15
NAT data transfer (~50GB/month) +$25
Total ~$320/month

Gross margin at each scale:

Scale Requests/Day Est. Customers Est. MRR Infra Cost Gross Margin
1K 1,000 5-10 $375-750 $129 66-83%
10K 10,000 50-100 $3,750-7,500 $155 96-98%
100K 100,000 200-500 $15,000-37,500 $320 98-99%

The unit economics are absurd. Near-zero marginal cost per customer. This is the beauty of a proxy — it adds almost no compute to the request path.

4.3 Scaling Strategy

Proxy horizontal scaling (the only thing that needs to scale):

ECS Service Auto Scaling with two policies:

  1. Target tracking: CPU utilization target 60%. Scale out when sustained above 60%, scale in when below 40%.
  2. Step scaling: Request count per target (from ALB). Scale out aggressively at >500 req/min/task.

Min tasks: 2 (availability). Max tasks: 20 (cost cap — revisit at $10K MRR).

Database scaling:

TimescaleDB is the bottleneck candidate. Scaling path:

  1. V1 (1K-10K req/day): db.t4g.small, single-AZ. Continuous aggregates handle dashboard query load.
  2. V1.5 (10K-100K req/day): db.t4g.medium, add a read replica for dashboard API queries. Proxy writes to primary, API reads from replica.
  3. V2 (100K+ req/day): If TimescaleDB hits limits, evaluate:
    • Upgrade to db.r6g.large (more memory for hot data)
    • Or migrate telemetry to ClickHouse (better for high-cardinality analytics at scale)
    • Decision point: when continuous aggregate refresh takes >5 minutes

PostgreSQL (config store) stays on db.t4g.micro indefinitely. Config data is tiny.

Redis scaling:

cache.t4g.micro handles ~12K ops/sec. At 100K requests/day (~1.2 req/sec average, ~10 req/sec peak), Redis is at <0.1% capacity. Redis is not a scaling concern until 1M+ requests/day.

4.4 CI/CD Pipeline

GitHub Actions. No Jenkins. No CodePipeline. Keep it simple.

# .github/workflows/deploy.yml (simplified)
name: Build & Deploy
on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo test --workspace
      - run: cargo clippy --workspace -- -D warnings
      - run: cargo fmt --check

  build-and-push:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE }}
      - uses: aws-actions/amazon-ecr-login@v2
      - run: |
          docker build -t dd0c-proxy -f crates/proxy/Dockerfile .
          docker build -t dd0c-api -f crates/api/Dockerfile .
          docker build -t dd0c-worker -f crates/worker/Dockerfile .
          # Tag and push to ECR
          for svc in proxy api worker; do
            docker tag dd0c-$svc $ECR_REGISTRY/dd0c-$svc:$GITHUB_SHA
            docker push $ECR_REGISTRY/dd0c-$svc:$GITHUB_SHA
          done

  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE }}
      - run: |
          # Update ECS services with new image
          for svc in proxy api worker; do
            aws ecs update-service \
              --cluster dd0c-prod \
              --service dd0c-$svc \
              --force-new-deployment
          done

  deploy-ui:
    needs: test
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ui
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci && npm run build
      - run: |
          aws s3 sync dist/ s3://dd0c-dashboard-ui/ --delete
          aws cloudfront create-invalidation --distribution-id $CF_DIST_ID --paths "/*"

Deployment strategy: Rolling update via ECS (default). No blue/green for V1 — adds complexity. The proxy is stateless; rolling updates cause zero downtime. If a bad deploy ships, aws ecs update-service --force-new-deployment with the previous image SHA rolls back in <2 minutes.

Database migrations: sqlx migrate run executed as a pre-deploy step in the API container's entrypoint. Migrations are forward-only, backward-compatible (add columns, don't rename/drop). This means the old code can run against the new schema during rolling deploys.

4.5 Monitoring & Alerting

Eat your own dogfood: dd0c/route monitors its own LLM provider calls through itself. The proxy routes its own internal LLM calls (if any future features use LLMs) through the same routing engine.

CloudWatch metrics (custom + built-in):

Metric Source Alarm Threshold
dd0c.proxy.request_count Proxy (StatsD → CW) N/A (dashboard only)
dd0c.proxy.latency_p99 Proxy >50ms for 5 minutes
dd0c.proxy.error_rate Proxy >5% for 3 minutes
dd0c.proxy.provider_error_rate Proxy (per provider) >10% for 2 minutes
dd0c.proxy.circuit_breaker_open Proxy Any open → alert
dd0c.telemetry.batch_lag Proxy >1000 events queued
ECS CPU/Memory CloudWatch built-in CPU >80% sustained 5min
RDS CPU/Connections/IOPS CloudWatch built-in CPU >70%, connections >80% of max
ALB 5xx rate CloudWatch built-in >1% for 3 minutes
ALB target response time CloudWatch built-in p99 >200ms for 5 minutes

Alerting channels:

Severity Channel Response
P0 (proxy down, >5% error rate) PagerDuty → phone call Wake up Brian
P1 (high latency, circuit breaker, DB issues) Slack #dd0c-alerts Check within 1 hour
P2 (capacity warnings, cost anomalies) Email digest Review next morning

Structured logging:

All services emit JSON logs to CloudWatch Logs:

{
  "timestamp": "2026-03-15T14:22:33.456Z",
  "level": "info",
  "service": "proxy",
  "trace_id": "abc123",
  "org_id": "org_456",
  "event": "request_routed",
  "model_requested": "gpt-4o",
  "model_used": "gpt-4o-mini",
  "latency_ms": 3,
  "cost_saved": 0.0018
}

No prompt content in logs. Ever. The tracing crate with custom Layer implementation strips any field named prompt, messages, content, or system before emission. Defense in depth — even if a developer accidentally logs request content, the layer redacts it.

Uptime monitoring: External health check via UptimeRobot (free tier, 5-minute intervals) hitting GET /health on the ALB. If the proxy is unreachable from the internet, Brian gets a text.

Solo founder operational reality:

Brian can realistically monitor:

  • 1 Slack channel (#dd0c-alerts) — glance at it 3x/day
  • 1 PagerDuty rotation — himself, 24/7 (this is the solo founder life)
  • 1 CloudWatch dashboard — check it during weekly review
  • UptimeRobot — set it and forget it

Everything else must be automated. No manual log tailing. No daily metric reviews. Alerts fire when something is wrong. Silence means everything is fine.


Section 5: SECURITY

5.1 API Key Management — The Trust Problem

This is the #1 adoption barrier. Customers must give dd0c/route their OpenAI/Anthropic API keys so the proxy can forward requests. If they don't trust us with their keys, the product is dead.

How customer LLM API keys are handled:

Customer enters API key in dashboard
    │
    ├─ 1. Key transmitted over TLS 1.3 (HTTPS only, HSTS enforced)
    ├─ 2. API server receives key in memory
    ├─ 3. Key encrypted with AES-256-GCM using org-specific DEK
    │     DEK (Data Encryption Key) is itself encrypted by AWS KMS CMK
    │     Envelope encryption: KMS never sees the API key
    ├─ 4. Encrypted key stored in PostgreSQL (provider_credentials.encrypted_key)
    ├─ 5. Plaintext key zeroed from memory (Rust: zeroize crate)
    │
    └─ At request time:
         ├─ Proxy fetches encrypted key from PG (cached in Redis, encrypted, 5min TTL)
         ├─ Decrypts with DEK (DEK cached in proxy memory, rotated hourly)
         ├─ Uses plaintext key for provider API call
         └─ Plaintext key held only for request duration, then dropped

Key security properties:

Property Implementation
Encryption at rest AES-256-GCM, envelope encryption via AWS KMS
Encryption in transit TLS 1.3 (ALB terminates, internal traffic in VPC)
Key isolation Per-org DEK — compromising one org's DEK doesn't expose others
Key rotation KMS CMK auto-rotates annually. DEKs can be rotated per-org on demand.
Access logging Every KMS Decrypt call logged in CloudTrail. Anomalous decryption patterns trigger alerts.
Zero-knowledge option (V1.5) Customer runs proxy in their VPC. Keys never leave their infrastructure. dd0c SaaS only receives telemetry.
Key revocation Customer can delete their provider credentials from the dashboard instantly. Cached copies expire within 5 minutes (Redis TTL).

Trust mitigation strategy (layered):

  1. Transparency: Open-source the proxy core. Customers can read every line of code that touches their API keys. "Don't trust us — read the code."
  2. Minimization: The proxy only needs the key for the duration of the API call. It doesn't store it in logs, doesn't include it in telemetry, doesn't transmit it anywhere except to the LLM provider.
  3. Bring-your-own-proxy (V1.5): For customers who won't send keys to a third party, ship a Docker image they run in their VPC. The proxy connects outbound to dd0c SaaS for config and sends telemetry. Keys never leave the customer's network.
  4. Audit trail: Every API key usage is logged (not the key itself — the key_id and timestamp). Customers can see when their keys were last used in the dashboard.
  5. Insurance: If a key is compromised through dd0c, we'll cover the cost of any unauthorized API usage. (This is a marketing commitment, not a legal one — but it signals confidence.)

5.2 Authentication & Authorization Model

Three auth contexts:

Context Method Token Type Lifetime
Dashboard UI GitHub OAuth → JWT Access token (short) + Refresh token 15min / 7 days
Proxy API dd0c API key Bearer token (hashed, never expires unless revoked) Until revoked
Dashboard API (programmatic) dd0c API key Same as proxy Until revoked

GitHub OAuth flow:

Browser → /api/auth/github → redirect to GitHub
GitHub → /api/auth/callback?code=xxx
    │
    ├─ Exchange code for GitHub access token
    ├─ Fetch GitHub user profile (id, login, email, avatar)
    ├─ Upsert user in PostgreSQL
    ├─ Issue JWT access token (15min, signed with RS256)
    ├─ Issue refresh token (7 days, stored in Redis, httpOnly cookie)
    └─ Redirect to dashboard with access token

Authorization model (V1 — simple RBAC):

Role Permissions
Owner Everything. Billing. Delete org. Manage members.
Admin Manage routing rules, API keys, alerts. View all data. Cannot delete org or manage billing.
Member View dashboard, view request inspector. Cannot modify config.

V1 ships with Owner + Member only. Admin role added when the first customer asks for it.

API key format:

dd0c_sk_live_a3f2b8c9d4e5f6a7b8c9d4e5f6a7b8c9

Prefix: dd0c_sk_
Environment: live_ or test_
Random: 32 hex chars (128 bits of entropy)

The full key is shown once at creation. Only the SHA-256 hash is stored. The prefix (dd0c_sk_live_a3f2...) is stored for display in the dashboard ("Which key is this?").

5.3 Data Encryption

Layer Method Key Management
In transit (client → ALB) TLS 1.3 via ACM certificate AWS Certificate Manager auto-renewal
In transit (ALB → ECS) TLS 1.2+ (ALB → target group HTTPS) Self-signed certs in containers, rotated on deploy
In transit (ECS → RDS) TLS 1.2 (RDS require_ssl) RDS CA certificate
In transit (ECS → ElastiCache) TLS 1.2 (in-transit encryption enabled) ElastiCache managed
At rest (RDS) AES-256 via RDS encryption AWS KMS (RDS default key)
At rest (provider API keys) AES-256-GCM application-level AWS KMS CMK (dd0c-managed)
At rest (S3) AES-256 (SSE-S3) AWS managed
At rest (CloudWatch Logs) AES-256 AWS KMS (CW default key)

5.4 SOC 2 Readiness

SOC 2 Type II is a V3 milestone (month 7-12). But V1 architecture decisions should not create SOC 2 blockers.

V1 decisions that are SOC 2 forward-compatible:

SOC 2 Requirement V1 Implementation
Access control GitHub OAuth + RBAC. No shared accounts.
Audit logging CloudTrail for AWS API calls. Application-level audit log for config changes (who changed what routing rule, when).
Encryption All data encrypted in transit and at rest (see 5.3).
Change management GitHub PRs required for main branch. CI/CD pipeline enforces tests.
Incident response PagerDuty alerting. Documented runbook (even if it's just a README).
Vendor management Only AWS + GitHub + Stripe as vendors. All SOC 2 certified themselves.
Data retention Configurable per plan. Deletion is automated via TimescaleDB retention policies.
Availability Multi-AZ ALB. ECS tasks across 2 AZs. RDS single-AZ (upgrade to multi-AZ for SOC 2).

SOC 2 blockers to address before certification:

  1. RDS must be multi-AZ (adds ~$25/month per instance)
  2. Formal security policy documentation
  3. Background checks for employees (just Brian for now — easy)
  4. Penetration test (budget ~$5K)
  5. Auditor engagement (~$20-30K for Type II)

Total SOC 2 cost: ~$30-40K. Only pursue at $10K+ MRR when enterprise customers demand it.

5.5 Trust Barrier Mitigation (The #1 Risk)

The product brief identifies trust as the highest-severity risk. Here's the technical architecture's answer:

Phase 1 (V1 launch): Transparency + Beachhead

  • Open-source the proxy core on GitHub. MIT license.
  • Publish a security whitepaper: "How dd0c/route handles your API keys" — detailed, technical, honest.
  • Target startups without compliance teams. They evaluate tools by reading code, not requesting SOC 2 reports.
  • Shadow Audit mode proves value without requiring key trust. Convert skeptics with their own savings data.

Phase 2 (V1.5, month 4-5): Self-Hosted Data Plane

  • Ship dd0c-proxy as a Docker image customers run in their own VPC/infrastructure.
  • The proxy connects outbound to api.route.dd0c.dev for:
    • Routing rule configuration (pull)
    • Telemetry data (push — metadata only, no prompt content)
    • Cost table updates (pull)
  • Customer's LLM API keys stay in their infrastructure. Period.
  • dd0c SaaS provides the dashboard, digest, and analytics. The proxy is the customer's.

Phase 3 (V2+): Compliance Certifications

  • SOC 2 Type II
  • GDPR DPA (Data Processing Agreement)
  • Optional: HIPAA BAA for healthcare vertical

The architecture is designed so that Phase 2 is a deployment topology change, not a rewrite. The proxy binary is the same — it just reads config from a different source (local file vs. API) and sends telemetry to a different endpoint (local collector vs. SaaS).


Section 6: MVP SCOPE

6.1 What Ships in V1 (4-6 Week Build)

The V1 is ruthlessly scoped. Every feature must answer: "Does this help a customer save money on LLM calls within 5 minutes of signup?"

Week 1-2: Proxy Core

Deliverable Details Done When
OpenAI-compatible proxy POST /v1/chat/completions with streaming support A client can swap api.openai.com for proxy.route.dd0c.dev and get identical responses
Auth layer dd0c API key validation (Redis-cached hash lookup) Unauthorized requests get 401. Valid keys route correctly.
Provider dispatch OpenAI + Anthropic providers with connection pooling Requests forward to the correct provider with <5ms overhead
Telemetry emission Async batch insert to TimescaleDB Every request produces a request_event row within 2 seconds
Health endpoint GET /health returns 200 with version + uptime ALB health checks pass

Week 2-3: Router Brain + Cost Engine

Deliverable Details Done When
Heuristic complexity classifier Token count + task pattern + model hint → LOW/MEDIUM/HIGH Classifier runs in <2ms and agrees with human judgment ~75% of the time on a test set of 100 prompts
Rule engine First-match rule evaluation with passthrough/cheapest/cascading strategies A routing rule like "if feature=classify, use cheapest from [gpt-4o-mini, claude-haiku]" works
Cost tables Seeded with current OpenAI + Anthropic pricing model_costs table populated, proxy loads into memory
Fallback chains Circuit breaker per provider/model If gpt-4o-mini returns 5xx, request automatically retries on claude-haiku
Response headers X-DD0C-Model, X-DD0C-Cost, X-DD0C-Saved on every response Client can programmatically read routing decisions

Week 3-4: Dashboard API + UI

Deliverable Details Done When
GitHub OAuth Sign up / sign in with GitHub New user can create an org and get an API key in <60 seconds
Cost overview page Real-time cost ticker, 7/30-day spend chart, savings counter Marcus sees "You saved $X this week" on the dashboard
Cost treemap Spend breakdown by feature tag, team tag, model Marcus can identify which feature is the most expensive
Request inspector Paginated table of recent requests with model, cost, routing decision Marcus can drill into individual requests to understand routing
Routing config UI CRUD for routing rules with drag-to-reorder priority Marcus can create a rule "route all classify requests to gpt-4o-mini"
API key management Create/revoke dd0c API keys, add provider credentials Marcus can set up his org without touching a CLI

Week 4-5: Retention Mechanics

Deliverable Details Done When
Weekly savings digest Monday 9am email: "Last week you saved $X. Breakdown by feature/model." Email renders correctly in Gmail/Outlook. Unsubscribe works.
Budget alerts Threshold-based: "Alert me when daily spend exceeds $100" Slack webhook fires when threshold is crossed
Shadow Audit CLI npx dd0c-scan ./src scans codebase for LLM calls and estimates savings CLI runs on a sample Node.js project and produces a plausible savings report

Week 5-6: Hardening + Launch Prep

Deliverable Details Done When
Rate limiting Per-key rate limits (1000 req/min default) via Redis Burst traffic doesn't take down the proxy
Error handling Graceful degradation: if TimescaleDB is down, proxy still routes (telemetry dropped) Proxy availability is independent of analytics availability
Monitoring CloudWatch dashboards, PagerDuty alerts for P0/P1 Brian gets woken up if the proxy is down
Documentation API docs (OpenAPI spec), quickstart guide, "How we handle your keys" page A developer can integrate in <5 minutes by reading the docs
Landing page route.dd0c.dev — value prop, pricing, "Try the CLI" CTA Visitors understand what dd0c/route does in 10 seconds
Infrastructure CDK/Terraform for the full AWS stack, CI/CD pipeline git push main deploys to production

6.2 What's Explicitly Deferred to V2

Feature Why Deferred V2 Timeline
ML-based complexity classifier Needs training data from V1 telemetry. Heuristic is good enough to prove the value prop. Month 3-4
Google/Gemini provider Two providers cover 80%+ of the market. Adding Gemini is a weekend of work once the provider trait is proven. Month 2-3
Self-hosted proxy (BYOP) Critical for enterprise trust, but V1 targets startups who are less paranoid. Month 4-5
WASM client-side classifier Requires the self-hosted proxy architecture. Month 5-6
GitHub Action (PR cost comments) Cool PLG feature but not core. Needs the CLI to be stable first. Month 3-4
VS Code extension Same — derivative of the CLI. Month 4-5
Log ingestion (Mode B shadow audit) Requires building a log parser for multiple formats. CLI scan is simpler and ships first. Month 2-3
Multi-region deployment us-east-1 covers the beachhead. EU region when EU customers appear. Month 6+
SSO / SAML Enterprise feature. GitHub OAuth is fine for startups. Month 6+ (with SOC 2)
Prompt caching (semantic dedup) Technically complex (embedding similarity). Exact-match cache in Redis is V1. Semantic cache is V2. Month 4-5
Carbon tracking Interesting differentiator but not a V1 priority. Month 6+
Cascading try-cheap-first with quality feedback Needs the ML classifier to evaluate response quality. V1 cascading is based on error codes only. Month 4-5
Stripe billing integration V1 is free tier only (up to 10K requests/day). Billing ships when there are paying customers. Month 2-3
Team/seat management V1 orgs have one owner. Multi-user orgs are a V1.5 feature. Month 2-3

6.3 Technical Debt Budget

V1 will accumulate debt. That's fine. Here's what we're consciously accepting:

Debt Item Severity Why It's Acceptable Payoff Trigger
Single-AZ RDS instances Medium Saves ~$50/month. Acceptable downtime risk for <100 customers. First enterprise customer or SOC 2 prep
No database connection pooling (PgBouncer) Low Direct connections are fine at <50 concurrent proxy tasks. >50 proxy tasks or connection count warnings
Hardcoded cost tables (seeded, not auto-updated) Low Model pricing changes monthly. Manual DB update is fine at V1 scale. When Brian forgets to update and a customer notices
No request body validation beyond auth Medium The proxy trusts that the client sends valid OpenAI-format requests. Invalid requests get a provider error, not a dd0c error. When support tickets about confusing errors pile up
No end-to-end encryption tests Medium Unit tests + integration tests cover the critical paths. E2E is expensive to maintain for a solo founder. First hire or first security incident
Monolithic continuous aggregate Low One hourly aggregate serves all dashboard queries. May need feature-specific aggregates at scale. Dashboard queries exceed 500ms
No graceful shutdown / drain Medium ECS rolling update kills tasks. In-flight requests may fail. At low traffic, this is rare. When a customer reports a failed request during deploy

Total acceptable debt: ~2 weeks of cleanup work. Schedule a "debt sprint" at month 3 (after V1 launch stabilizes).

6.4 Solo Founder Operational Considerations

Brian is one person. The architecture must respect that constraint.

What one person can realistically operate:

Responsibility Time Budget Automation
Incident response <2 hrs/week (target: 0) PagerDuty + automated restarts (ECS health checks)
Deploys 1 deploy/day, <5 min each Fully automated CI/CD. git push = deploy.
Database maintenance <1 hr/week RDS automated backups, TimescaleDB automated compression/retention
Cost monitoring 15 min/week AWS Budgets alert at $150, $200, $300 thresholds
Customer support 2-4 hrs/week (at <100 customers) GitHub Issues + email. No live chat. No phone.
Security patches 1 hr/week Dependabot for Rust crates + npm. Automated PR creation.
Feature development 20-30 hrs/week Everything else is automated so Brian can code

Things Brian should NOT do manually:

  • SSH into servers (there are no servers — Fargate)
  • Run database queries to answer customer questions (build it into the dashboard)
  • Manually rotate secrets (KMS auto-rotation + Secrets Manager)
  • Monitor logs in real-time (alerts handle this)
  • Manually scale infrastructure (auto-scaling handles this)
  • Process refunds or billing changes (Stripe self-serve portal)

On-call reality: Brian is on-call 24/7. The architecture minimizes pages by:

  1. Making the proxy stateless and self-healing (ECS restarts failed tasks)
  2. Making telemetry failure non-fatal (proxy works without TimescaleDB)
  3. Using circuit breakers to handle provider outages automatically
  4. Setting alert thresholds high enough to avoid noise, low enough to catch real problems

If Brian gets paged more than twice a week, something is architecturally wrong and needs fixing — not more monitoring.


Section 7: API DESIGN

7.1 OpenAI-Compatible Proxy Endpoint

The proxy endpoint is a drop-in replacement for api.openai.com. Customers change one environment variable and everything works.

# Before
OPENAI_API_BASE=https://api.openai.com/v1

# After
OPENAI_API_BASE=https://proxy.route.dd0c.dev/v1

Supported endpoints (V1):

POST /v1/chat/completions

The primary endpoint. Handles both streaming and non-streaming requests.

Request (identical to OpenAI):

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Classify this support ticket: ..."}
  ],
  "temperature": 0.3,
  "max_tokens": 100,
  "stream": true
}

dd0c-specific request headers (all optional):

Header Type Description
Authorization Bearer dd0c_sk_live_... Required. dd0c API key.
X-DD0C-Feature string Tag this request with a feature name for cost attribution. E.g., classify, summarize, chat.
X-DD0C-Team string Tag with team name. E.g., backend, ml-team, support.
X-DD0C-Environment string production, staging, development. Defaults to key's environment.
X-DD0C-Routing auto | passthrough Override routing. passthrough = use the requested model, no routing. Default: auto.
X-DD0C-Budget-Id string Associate with a specific budget for limit enforcement.

Response (identical to OpenAI, plus dd0c headers):

HTTP/1.1 200 OK
Content-Type: application/json
X-DD0C-Request-Id: req_a1b2c3d4e5f6
X-DD0C-Model-Requested: gpt-4o
X-DD0C-Model-Used: gpt-4o-mini
X-DD0C-Provider: openai
X-DD0C-Cost: 0.000150
X-DD0C-Cost-Without-Routing: 0.002500
X-DD0C-Saved: 0.002350
X-DD0C-Complexity: LOW
X-DD0C-Complexity-Confidence: 0.92
X-DD0C-Latency-Overhead-Ms: 3

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This is a billing inquiry."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 8,
    "total_tokens": 50
  }
}

The response body is untouched — it's exactly what the LLM provider returned. dd0c metadata lives exclusively in response headers. This means existing client code that parses the response body works without modification.

Streaming response:

SSE stream is passed through transparently. dd0c headers are on the initial HTTP response. The final data: [DONE] chunk is forwarded as-is.

POST /v1/completions

Legacy completions endpoint. Same routing logic applies. Included for backward compatibility with older OpenAI SDK versions.

POST /v1/embeddings

Passthrough only — no routing (embedding models aren't interchangeable like chat models). Telemetry is still captured for cost attribution.

GET /v1/models

Returns the union of models available across all configured providers for this org, enriched with dd0c cost data:

{
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai",
      "dd0c": {
        "input_cost_per_m": 2.50,
        "output_cost_per_m": 10.00,
        "quality_tier": "frontier",
        "routing_eligible": true
      }
    },
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "owned_by": "openai",
      "dd0c": {
        "input_cost_per_m": 0.15,
        "output_cost_per_m": 0.60,
        "quality_tier": "economy",
        "routing_eligible": true
      }
    }
  ]
}

GET /health

{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 86400,
  "providers": {
    "openai": {"status": "healthy", "latency_ms": 45},
    "anthropic": {"status": "healthy", "latency_ms": 52}
  }
}

Error responses:

dd0c errors use standard OpenAI error format so client SDKs handle them correctly:

{
  "error": {
    "message": "Invalid dd0c API key",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "dd0c_code": "DD0C_AUTH_001"
  }
}
HTTP Status dd0c_code Meaning
401 DD0C_AUTH_001 Invalid or revoked API key
403 DD0C_AUTH_002 API key doesn't have permission for this org
429 DD0C_RATE_001 dd0c rate limit exceeded (not provider rate limit)
429 DD0C_BUDGET_001 Budget limit reached for this key/feature/team
502 DD0C_PROVIDER_001 All providers in fallback chain returned errors
503 DD0C_PROXY_001 Proxy is overloaded or shutting down

Provider errors (OpenAI 429, Anthropic 529, etc.) are passed through with original status codes and bodies, plus an X-DD0C-Provider-Error: true header so clients can distinguish dd0c errors from provider errors.

7.2 Shadow Audit API

The Shadow Audit CLI (npx dd0c-scan) is primarily offline, but it calls two API endpoints:

GET /api/v1/pricing/current

Public endpoint (no auth required). Returns current model pricing for the CLI's savings calculations.

{
  "updated_at": "2026-03-01T00:00:00Z",
  "models": [
    {
      "provider": "openai",
      "model": "gpt-4o",
      "input_cost_per_m": 2.50,
      "output_cost_per_m": 10.00,
      "quality_tier": "frontier"
    },
    {
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_cost_per_m": 0.15,
      "output_cost_per_m": 0.60,
      "quality_tier": "economy"
    }
  ]
}

POST /api/v1/scan/report (optional, with user consent)

If the user opts in (--share-report), the CLI sends an anonymized scan summary for lead generation:

{
  "email": "marcus@example.com",
  "scan_summary": {
    "total_llm_calls_found": 14,
    "models_detected": ["gpt-4o", "gpt-4"],
    "estimated_monthly_cost": 4217.00,
    "estimated_monthly_savings": 2327.00,
    "savings_percentage": 55.2,
    "language": "typescript",
    "framework": "express"
  }
}

No source code, no prompt content, no file paths. Just aggregate numbers for the sales funnel.

7.3 Dashboard API Endpoints

All dashboard endpoints require authentication (JWT or dd0c API key). All responses are JSON. All list endpoints support pagination via ?cursor=xxx&limit=50.

Auth

Method Path Description
GET /api/auth/github Initiate GitHub OAuth flow
GET /api/auth/callback GitHub OAuth callback
POST /api/auth/refresh Refresh access token
POST /api/auth/logout Invalidate refresh token

Organizations

Method Path Description
POST /api/orgs Create organization
GET /api/orgs/:org_id Get org details
PATCH /api/orgs/:org_id Update org settings
GET /api/orgs/:org_id/members List members
POST /api/orgs/:org_id/members Invite member (V1.5)

API Keys

Method Path Description
GET /api/orgs/:org_id/keys List API keys (prefix + metadata only)
POST /api/orgs/:org_id/keys Create API key (returns full key once)
DELETE /api/orgs/:org_id/keys/:key_id Revoke API key

Provider Credentials

Method Path Description
GET /api/orgs/:org_id/providers List configured providers (suffix only, never the key)
PUT /api/orgs/:org_id/providers/:provider Set/update provider API key
DELETE /api/orgs/:org_id/providers/:provider Remove provider credential
POST /api/orgs/:org_id/providers/:provider/test Test provider credential (makes a minimal API call)

Dashboard (Analytics)

Method Path Description
GET /api/orgs/:org_id/dashboard/summary Current period cost summary (total spend, total saved, request count)
GET /api/orgs/:org_id/dashboard/timeseries Cost over time. Query params: period=7d|30d|90d, granularity=hour|day
GET /api/orgs/:org_id/dashboard/treemap Cost breakdown by feature/team/model for treemap visualization
GET /api/orgs/:org_id/dashboard/top-savings Top 10 features/endpoints by savings opportunity
GET /api/orgs/:org_id/dashboard/model-usage Model usage distribution (pie chart data)

Example: /api/orgs/:org_id/dashboard/summary

{
  "period": "7d",
  "total_requests": 42850,
  "total_cost": 127.43,
  "total_cost_without_routing": 891.20,
  "total_saved": 763.77,
  "savings_percentage": 85.7,
  "avg_latency_ms": 4.2,
  "top_model": "gpt-4o-mini",
  "top_feature": "classify",
  "cache_hit_rate": 0.12
}

Request Inspector

Method Path Description
GET /api/orgs/:org_id/requests Paginated request list. Filters: model, feature, team, status, date_from, date_to, min_cost, was_routed
GET /api/orgs/:org_id/requests/:request_id Single request detail (routing decision, timing breakdown)

Example: /api/orgs/:org_id/requests?feature=classify&limit=20

{
  "data": [
    {
      "id": "req_a1b2c3",
      "timestamp": "2026-03-15T14:22:33Z",
      "model_requested": "gpt-4o",
      "model_used": "gpt-4o-mini",
      "provider": "openai",
      "feature_tag": "classify",
      "input_tokens": 142,
      "output_tokens": 8,
      "cost": 0.000026,
      "cost_without_routing": 0.000435,
      "saved": 0.000409,
      "latency_ms": 245,
      "complexity": "LOW",
      "status": 200
    }
  ],
  "cursor": "eyJpZCI6InJlcV...",
  "has_more": true
}

Note: No prompt content in the response. Ever. The request inspector shows metadata only.

Routing Rules

Method Path Description
GET /api/orgs/:org_id/routing/rules List routing rules (ordered by priority)
POST /api/orgs/:org_id/routing/rules Create routing rule
PATCH /api/orgs/:org_id/routing/rules/:rule_id Update rule
DELETE /api/orgs/:org_id/routing/rules/:rule_id Delete rule
POST /api/orgs/:org_id/routing/rules/reorder Reorder rules (accepts array of rule IDs in new order)
GET /api/orgs/:org_id/routing/models List available models with current pricing

Example: Create a routing rule

POST /api/orgs/:org_id/routing/rules

{
  "name": "Route classification to economy models",
  "match_tags": {"feature": "classify"},
  "match_complexity": null,
  "strategy": "cheapest",
  "model_chain": ["gpt-4o-mini", "claude-3-haiku"],
  "daily_budget": 50.00
}

Alerts

Method Path Description
GET /api/orgs/:org_id/alerts List alert configurations
POST /api/orgs/:org_id/alerts Create alert
PATCH /api/orgs/:org_id/alerts/:alert_id Update alert
DELETE /api/orgs/:org_id/alerts/:alert_id Delete alert
GET /api/orgs/:org_id/alerts/history Alert firing history

7.4 Webhook & Notification API

V1 supports outbound webhooks for two events:

Budget Alert Webhook

Fires when a spend threshold is crossed.

POST {customer_webhook_url}
Content-Type: application/json
X-DD0C-Signature: sha256=abc123...

{
  "event": "budget.threshold_reached",
  "timestamp": "2026-03-15T14:22:33Z",
  "org_id": "org_456",
  "alert": {
    "id": "alert_789",
    "name": "Daily spend limit",
    "threshold": 100.00,
    "current_spend": 102.47,
    "period": "daily"
  },
  "scope": {
    "feature": "summarize",
    "team": null
  }
}

Slack Integration

Native Slack webhook support (no Slack app — just incoming webhooks for V1):

{
  "text": "🚨 *dd0c/route Budget Alert*\nDaily spend for `summarize` reached $102.47 (limit: $100.00)\n<https://route.dd0c.dev/dashboard|View Dashboard>"
}

Webhook security: All outbound webhooks include an X-DD0C-Signature header containing an HMAC-SHA256 signature of the request body, using a per-org webhook secret. Customers can verify the signature to ensure the webhook came from dd0c.

7.5 SDK Considerations

V1: No SDK. Use the OpenAI SDK.

The entire point of OpenAI compatibility is that customers don't need a dd0c SDK. They use the official OpenAI Python/Node/Go SDK and change the base URL. Done.

# Python — using official OpenAI SDK
from openai import OpenAI

client = OpenAI(
    api_key="dd0c_sk_live_a3f2b8c9...",
    base_url="https://proxy.route.dd0c.dev/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",  # dd0c may route this to a cheaper model
    messages=[{"role": "user", "content": "Classify: ..."}],
    extra_headers={
        "X-DD0C-Feature": "classify",
        "X-DD0C-Team": "backend"
    }
)

# Read routing metadata from response headers
# (requires accessing the raw httpx response)
// TypeScript — using official OpenAI SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'dd0c_sk_live_a3f2b8c9...',
  baseURL: 'https://proxy.route.dd0c.dev/v1',
  defaultHeaders: {
    'X-DD0C-Feature': 'classify',
    'X-DD0C-Team': 'backend',
  },
});

V1.5: Thin wrapper SDK (optional convenience)

If customers want easier access to dd0c response headers and routing metadata, ship a thin wrapper:

# dd0c Python SDK (V1.5) — wraps OpenAI SDK
from dd0c import DD0CClient

client = DD0CClient(
    dd0c_key="dd0c_sk_live_...",
    # Inherits all OpenAI SDK options
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    feature="classify",  # convenience param → X-DD0C-Feature header
    team="backend",
)

# Easy access to routing metadata
print(response.dd0c.model_used)       # "gpt-4o-mini"
print(response.dd0c.cost)             # 0.000150
print(response.dd0c.saved)            # 0.002350
print(response.dd0c.complexity)       # "LOW"

The SDK is a convenience, not a requirement. The proxy works with any HTTP client that can set headers and parse JSON.


Appendix: Decision Log

Decision Options Considered Chosen Rationale
Proxy language Rust, Go, Node.js Rust <10ms latency requirement eliminates GC languages. Rust's ownership model prevents memory leaks in long-running proxy.
API language Node.js, Python, Rust Rust (Axum) Single-language stack for solo founder. Shared crate library. One build system.
Telemetry store PostgreSQL, ClickHouse, TimescaleDB TimescaleDB "It's just Postgres" — Brian knows it. Continuous aggregates solve the dashboard query problem. Compression solves storage.
Config store SQLite, DynamoDB, PostgreSQL PostgreSQL (RDS) Relational integrity for org/key/rule relationships. RDS is managed. Brian's home turf.
Cache In-process, Memcached, Redis Redis (ElastiCache) Shared state across proxy instances (circuit breakers, rate limits). ElastiCache is managed.
Compute Lambda, EC2, ECS Fargate ECS Fargate No cold starts (Lambda). No server management (EC2). Right abstraction for stateless containers.
Auth Auth0, Clerk, Custom Custom (GitHub OAuth + JWT) ~200 lines of code. No vendor dependency. No per-MAU pricing. GitHub is where the users are.
UI framework Next.js, SvelteKit, React+Vite React + Vite Largest ecosystem. SPA is sufficient (no SSR/SEO needed). Vite is fast.
Email Resend, SendGrid, SES AWS SES Brian has AWS credits. $0.10/1K emails. Plain HTML digest — no template engine needed.
IaC Terraform, CDK, Pulumi CDK (TypeScript) or Terraform Brian's choice. Both work. CDK if he wants to stay in AWS-native tooling. Terraform if he wants portability.
Deployment Blue/green, Canary, Rolling Rolling (ECS default) Simplest. Proxy is stateless. Rolling update = zero downtime. Rollback = redeploy previous SHA.
Monitoring Datadog, Grafana Cloud, CloudWatch CloudWatch Already included with AWS. No additional vendor. Good enough for V1. Migrate to Grafana Cloud at $5K MRR if CW becomes limiting.

Architecture document generated as Phase 6 of the BMad product development pipeline for dd0c/route. Next phase: Implementation planning and sprint breakdown.