dd0c: full product research pipeline - 6 products, 8 phases each
Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
product-brief, architecture, epics (incl. Epic 10 TF compliance),
test-architecture (TDD strategy)
Brand strategy and market research included.
This commit is contained in:
340
products/01-llm-cost-router/epics/epics.md
Normal file
340
products/01-llm-cost-router/epics/epics.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# dd0c/route — V1 MVP Epics
|
||||
|
||||
This document outlines the core Epics and User Stories for the V1 MVP of dd0c/route, designed for a solo founder to implement in 1-3 day chunks per story.
|
||||
|
||||
---
|
||||
|
||||
## Epic 1: Proxy Engine
|
||||
**Description:** Core Rust proxy that sits between the client application and LLM providers. Must maintain strict OpenAI API compatibility, support SSE streaming, and introduce <5ms latency overhead.
|
||||
|
||||
### User Stories
|
||||
- **Story 1.1:** As a developer, I want to swap my `OPENAI_BASE_URL` to the proxy endpoint, so that my existing OpenAI SDK works without code changes.
|
||||
- **Story 1.2:** As a developer, I want streaming support (SSE) preserved, so that my chat applications remain responsive while using the proxy.
|
||||
- **Story 1.3:** As a platform engineer, I want the proxy latency overhead to be <5ms, so that intelligent routing doesn't degrade our application's user experience.
|
||||
- **Story 1.4:** As a developer, I want provider errors (e.g., rate limits) to be passed through transparently, so that my app's existing error handling continues to work.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Implements `POST /v1/chat/completions` for both streaming (`stream: true`) and non-streaming requests.
|
||||
- Validates the `Authorization: Bearer` header against a Redis cache (falling back to DB).
|
||||
- Successfully forwards requests to OpenAI and Anthropic, translating formats if necessary.
|
||||
- Asynchronously emits telemetry events to an in-memory channel without blocking the hot path.
|
||||
- P99 latency overhead is measured at <5ms.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: None
|
||||
### Technical Notes:
|
||||
- Stack: Rust, `tokio`, `hyper`, `axum`.
|
||||
- Use connection pooling for upstream providers to eliminate TLS handshake overhead.
|
||||
- For streaming, parse only the first chunk/headers to make a routing decision, then passthrough. Count tokens from the final SSE chunk (e.g., `[DONE]`).
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 2: Router Brain
|
||||
**Description:** The intelligence core of dd0c/route embedded within the proxy. It evaluates incoming requests against routing rules, classifies complexity heuristically, checks cost tables, and executes fallback chains.
|
||||
|
||||
### User Stories
|
||||
- **Story 2.1:** As an engineering manager, I want the router to classify the complexity of requests, so that simple extraction tasks are downgraded to cheaper models.
|
||||
- **Story 2.2:** As an engineering manager, I want to configure routing rules (e.g., if feature=classify -> use cheapest from [gpt-4o-mini, claude-haiku]), so that I can automatically save money on predictable workloads.
|
||||
- **Story 2.3:** As an application developer, I want the router to automatically fallback to an alternative model if the primary model fails or rate-limits, so that my application remains highly available.
|
||||
- **Story 2.4:** As an engineering manager, I want cost savings calculated instantly based on up-to-date provider pricing, so that my dashboard data is immediately accurate.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Heuristic complexity classifier runs in <2ms based on token count, task patterns (regex on system prompt), and model hints.
|
||||
- Evaluates first-match routing rules based on request tags (`X-DD0C-Feature`, `X-DD0C-Team`).
|
||||
- Executes "passthrough", "cheapest", "quality-first", and "cascading" routing strategies.
|
||||
- Enforces circuit breakers on downstream providers (e.g., open circuit if error rate > 10%).
|
||||
- Calculates `cost_saved = cost_original - cost_actual` on the fly using in-memory cost tables.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 1 (Proxy Engine)
|
||||
### Technical Notes:
|
||||
- Stack: Rust.
|
||||
- Run purely in-memory on the proxy hot path. No DB queries per request.
|
||||
- Cost tables and routing rules must be loaded at startup and refreshed via a background task every 60s.
|
||||
- Use `serde_json` to inspect the `messages` array for complexity classification but do not persist the prompt.
|
||||
- Circuit breaker state must be shared via Redis so all proxy instances agree on provider health.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 3: Analytics Pipeline
|
||||
**Description:** High-throughput logging and aggregation system using TimescaleDB. Focuses on ingesting asynchronous telemetry from the Proxy Engine without blocking request processing.
|
||||
|
||||
### User Stories
|
||||
- **Story 3.1:** As a platform engineer, I want the proxy to emit telemetry without blocking the main request thread, so that our application performance remains unaffected.
|
||||
- **Story 3.2:** As an engineering manager, I want my dashboard queries to be lightning fast even with millions of rows, so that I can quickly slice and dice our AI spend.
|
||||
- **Story 3.3:** As an engineering manager, I want historical telemetry to be compressed or aged out automatically, so that the database storage costs remain minimal.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Proxy emits a `RequestEvent` over an in-memory `mpsc` channel via `tokio::spawn`.
|
||||
- A background worker batches events and inserts them into TimescaleDB every 1s or 100 events using bulk `COPY INTO`.
|
||||
- Continuous aggregates (`hourly_cost_summary`, `daily_cost_summary`) are created and updated on schedule to pre-calculate `total_cost`, `total_saved`, and `avg_latency`.
|
||||
- TimescaleDB compression policies compress chunks older than 7 days by 90%+.
|
||||
- The proxy must degrade gracefully if the analytics database is unavailable.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 1 (Proxy Engine)
|
||||
### Technical Notes:
|
||||
- Stack: Rust (worker), PostgreSQL/TimescaleDB.
|
||||
- Write the TimescaleDB migration scripts for the hypertable `request_events` and the continuous aggregates.
|
||||
- Batching must be robust to worker panics (use bounded channels).
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 4: Dashboard API
|
||||
**Description:** Axum REST API providing authentication, org/team management, routing rule CRUD, and data endpoints for the frontend dashboard. Focuses on frictionless developer onboarding.
|
||||
|
||||
### User Stories
|
||||
- **Story 4.1:** As an engineering manager, I want to authenticate via GitHub OAuth, so that I can create an organization and get an API key in under 60 seconds without remembering a password.
|
||||
- **Story 4.2:** As an engineering manager, I want to manage my organization's routing rules and provider API keys securely, so that dd0c/route can successfully broker requests to OpenAI and Anthropic.
|
||||
- **Story 4.3:** As an engineering manager, I want an endpoint that provides my historical spend and savings summary, so that I can visualize it in the UI.
|
||||
- **Story 4.4:** As a platform engineer, I want to revoke an active API key, so that compromised credentials are immediately blocked.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Implements `/api/auth/github` OAuth flow issuing JWTs and refresh tokens.
|
||||
- Implements `/api/orgs` CRUD for managing an organization and API keys.
|
||||
- Implements `/api/dashboard/summary` and `/api/dashboard/treemap` queries hitting the TimescaleDB continuous aggregates.
|
||||
- Implements `/api/requests` for the request inspector with filters (e.g., `model`, `feature`, `team`).
|
||||
- Securely stores and encrypts provider API keys in PostgreSQL using an AES-256-GCM Data Encryption Key.
|
||||
- Enforces an RBAC model (Owner, Member) per organization.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: Epic 3 (Analytics Pipeline)
|
||||
### Technical Notes:
|
||||
- Stack: Rust (`axum`), PostgreSQL.
|
||||
- Reuse `tokio` runtime to minimize context switches for a solo founder.
|
||||
- Use `oauth2` crate for GitHub integration. JWTs are signed with RS256, refresh tokens in Redis.
|
||||
- Ensure API keys are hashed (SHA-256) before storage; raw keys are never stored.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 5: Dashboard UI
|
||||
**Description:** The React SPA serving the cost attribution dashboard. Visualizes the AI spend treemap, routing rules editor, real-time ticker, and request inspector. This is the product's primary visual "Aha" moment.
|
||||
|
||||
### User Stories
|
||||
- **Story 5.1:** As an engineering manager, I want to see a treemap of my organization's AI spend broken down by team, feature, and model, so that I can instantly identify the most expensive areas of my application.
|
||||
- **Story 5.2:** As an engineering manager, I want a real-time counter showing "You saved $X this week," so that I feel confident the tool is paying for itself.
|
||||
- **Story 5.3:** As a platform engineer, I want an interface to configure routing rules (e.g., drag-to-reorder priority), so that I can instruct the proxy without editing config files.
|
||||
- **Story 5.4:** As a platform engineer, I want a request inspector that displays metadata, cost, latency, and the specific routing decision for every request, so that I can debug why a certain model was chosen.
|
||||
|
||||
### Acceptance Criteria
|
||||
- React + Vite SPA deployed as static assets to S3 + CloudFront.
|
||||
- Treemap visualization renders cost aggregations dynamically over selected time periods (7d/30d/90d).
|
||||
- A routing rules editor allows CRUD operations and priority reordering for a team's rules.
|
||||
- Request Inspector table displays paginated, filterable (`feature`, `team`, `status`) lists of telemetry without showing prompt content.
|
||||
- Allows an admin to securely input OpenAI and Anthropic API keys.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: Epic 4 (Dashboard API)
|
||||
### Technical Notes:
|
||||
- Stack: React, TypeScript, Vite, Tailwind CSS.
|
||||
- No SSR required for V1 (keep it simple). Use `react-query` or similar for data fetching and caching.
|
||||
- Build the treemap with a charting library like D3 or Recharts.
|
||||
- Emphasize speed—data fetches should resolve from continuous aggregates in <200ms.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 6: Shadow Audit CLI
|
||||
**Description:** The PLG "Shadow Audit" command-line tool (`npx dd0c-scan`). It analyzes a local codebase for LLM API calls, estimates monthly cost based on prompt templates, and projects savings with dd0c/route.
|
||||
|
||||
### User Stories
|
||||
- **Story 6.1:** As a developer, I want a zero-setup CLI tool (`npx dd0c-scan`) that scans my codebase and estimates how much money I'm currently wasting on overqualified LLMs, so that I can convince my manager to use dd0c/route.
|
||||
- **Story 6.2:** As an engineering manager, I want the CLI to run locally without sending my source code to a third party, so that I can securely audit my own projects.
|
||||
- **Story 6.3:** As an engineering manager, I want a clean, visually appealing terminal report showing "Top Opportunities" for model downgrades, so that I immediately see the value of routing.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Parses a local directory for OpenAI or Anthropic SDK usage in TypeScript/JavaScript/Python files.
|
||||
- Identifies the models requested in the code and estimates token usage heuristically based on the strings passed to the SDK.
|
||||
- Hits `/api/v1/pricing/current` to fetch the latest cost tables and calculates an estimated monthly bill and projected savings.
|
||||
- Outputs a formatted terminal report showing total potential savings and a breakdown of the highest-impact files.
|
||||
- Anonymized scan summary is sent to the server only if the user explicitly opts in.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 4 (Dashboard API - Pricing Endpoint)
|
||||
### Technical Notes:
|
||||
- Stack: Node.js, `commander`, `chalk`, simple regex parsers for Python/JS SDKs.
|
||||
- Keep the CLI lightweight, fast, and dependency-free as possible. No actual LLM parsing; use heuristics (string length/structure) for token estimates.
|
||||
- Must run completely offline if the pricing table is cached.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 7: Slack Integration
|
||||
**Description:** The primary retention mechanism and anomaly alerting system. An asynchronous worker task dispatches weekly savings digests and threshold-based budget alerts to Slack and Email.
|
||||
|
||||
### User Stories
|
||||
- **Story 7.1:** As an engineering manager, I want an automated weekly digest summarizing my team's AI savings, so that I can easily report to the CFO that our tooling investment is paying off.
|
||||
- **Story 7.2:** As a platform engineer, I want to configure a budget limit (e.g., alert if daily spend > $100) and receive a Slack webhook notification immediately, so that I can stop a retry storm before the bill gets out of hand.
|
||||
- **Story 7.3:** As an engineering manager, I want an email version of the weekly digest, so that I can forward it straight to my leadership team.
|
||||
|
||||
### Acceptance Criteria
|
||||
- A standalone asynchronous worker (`dd0c-worker`) evaluates continuous aggregates (via TimescaleDB) every hour.
|
||||
- Generates a "Monday Morning Digest" email via AWS SES.
|
||||
- Emits Slack webhook payloads when a threshold alert is triggered (`threshold_amount`, `threshold_pct`).
|
||||
- Adds a `X-DD0C-Signature` to outbound webhooks to prevent spoofing.
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 3 (Analytics Pipeline), Epic 4 (Dashboard API)
|
||||
### Technical Notes:
|
||||
- Stack: Rust (`tokio-cron`), `reqwest` (for webhooks), AWS SES.
|
||||
- Worker is a singleton container (1 task) running alongside the proxy to avoid lock contention on cron tasks.
|
||||
- Ensure alerts maintain state (using PostgreSQL `alert_configs` and `last_fired_at`) so users aren't spammed for the same incident.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 8: Infrastructure & DevOps
|
||||
**Description:** Containerized ECS Fargate deployment, AWS native networking, basic monitoring, and fully automated CI/CD for the entire dd0c stack. Essential for a solo founder to deploy safely and frequently.
|
||||
|
||||
### User Stories
|
||||
- **Story 8.1:** As a solo founder, I want to use AWS ECS Fargate, so that I don't have to manage EC2 instances or worry about OS-level patching.
|
||||
- **Story 8.2:** As a solo founder, I want a GitHub Actions CI/CD pipeline, so that `git push` automatically runs tests, builds containers, and deploys rolling updates with zero downtime.
|
||||
- **Story 8.3:** As an operator, I want standard AWS CloudWatch alarms (e.g., P99 proxy latency > 50ms) connected to PagerDuty, so that I am only woken up when a critical threshold is breached.
|
||||
- **Story 8.4:** As a solo founder, I want a strict separation between my configuration (PostgreSQL) and telemetry (TimescaleDB) stores, so that I can scale analytics independently from org/auth state.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Full AWS infrastructure defined via CDK (TypeScript) or Terraform.
|
||||
- ALB routes `/v1/*` to the proxy container, `/api/*` to the dashboard API container.
|
||||
- Dashboard static assets deployed to an S3 bucket with CloudFront caching.
|
||||
- `docker build` produces three optimized images from a single Rust workspace (`dd0c-proxy`, `dd0c-api`, `dd0c-worker`).
|
||||
- CloudWatch dashboards and minimum alarms configured (CPU >80%, Proxy Error Rate >5%, ALB 5xx Rate).
|
||||
- `git push main` triggers a GitHub Action to test, lint, build, push to ECR, and update the ECS Fargate services.
|
||||
|
||||
### Estimate: 13 points
|
||||
### Dependencies: Epic 1 (Proxy Engine), Epic 4 (Dashboard API)
|
||||
### Technical Notes:
|
||||
- Stack: AWS ECS Fargate, ALB, CloudFront, S3, RDS (PostgreSQL/TimescaleDB), ElastiCache (Redis), GitHub Actions.
|
||||
- Ensure the ALB utilizes path-based routing correctly and handles TLS termination.
|
||||
- For cost optimization on AWS, explore consolidating NAT Gateways or utilizing VPC Endpoints for S3/ECR/CloudWatch.
|
||||
|
||||
---
|
||||
|
||||
## Epic 9: Onboarding & PLG
|
||||
**Description:** Self-serve signup, free tier, API key management, and a getting-started flow that gets users routing their first LLM call through dd0c/route in under 2 minutes. This is the growth engine.
|
||||
|
||||
### User Stories
|
||||
- **Story 9.1:** As a new user, I want to sign up with GitHub OAuth in one click, so that I can start using dd0c/route without filling out forms.
|
||||
- **Story 9.2:** As a new user, I want a free tier (up to $50/month in routed LLM spend), so that I can evaluate the product with real traffic before committing.
|
||||
- **Story 9.3:** As a developer, I want to generate and manage API keys from the dashboard, so that I can integrate dd0c/route into my applications.
|
||||
- **Story 9.4:** As a new user, I want a guided "First Route" onboarding flow that gives me a working curl command, so that I see cost savings within 2 minutes of signing up.
|
||||
- **Story 9.5:** As a team lead, I want to invite team members via email, so that my team can share a single org and see aggregated savings.
|
||||
|
||||
### Acceptance Criteria
|
||||
- GitHub OAuth signup creates org + first API key automatically.
|
||||
- Free tier enforced at the proxy level — requests beyond $50/month routed spend return 429 with upgrade CTA.
|
||||
- API key CRUD: create, list, revoke, rotate. Keys are hashed at rest (bcrypt), only shown once on creation.
|
||||
- Onboarding wizard: 3 steps — (1) copy API key, (2) paste curl command, (3) see first request in dashboard. Completion rate tracked.
|
||||
- Team invite sends email with magic link. Invited user joins existing org on signup.
|
||||
- Stripe Checkout integration for upgrade from free → paid ($49/month base).
|
||||
|
||||
### Estimate: 8 points
|
||||
### Dependencies: Epic 4 (Dashboard API), Epic 5 (Dashboard UI)
|
||||
### Technical Notes:
|
||||
- Use Stripe Checkout Sessions for payment — no custom billing UI needed for V1.
|
||||
- Free tier enforcement happens in the proxy hot path — must be O(1) lookup (Redis counter per org, reset monthly via cron).
|
||||
- Onboarding completion events tracked via PostHog or simple DB events for funnel analysis.
|
||||
- Magic link invites use signed JWTs with 72-hour expiry, stored in `pending_invites` table.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Epic 10: Transparent Factory Compliance
|
||||
**Description:** Cross-cutting epic ensuring dd0c/route adheres to the 5 Transparent Factory architectural tenets: Atomic Flagging, Elastic Schema, Cognitive Durability, Semantic Observability, and Configurable Autonomy. These stories are woven across the existing system — they don't add features, they add engineering discipline.
|
||||
|
||||
### Story 10.1: Atomic Flagging — Feature Flag Infrastructure
|
||||
**As a** solo founder, **I want** every new routing rule, cost threshold, and provider failover behavior wrapped in a feature flag (default: off), **so that** I can deploy code continuously without risking production traffic.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- OpenFeature SDK integrated into the Rust proxy via a compatible provider (e.g., `flagd` sidecar or env-based provider for V1).
|
||||
- All flags evaluate locally (in-memory or sidecar) — zero network calls on the hot path.
|
||||
- Every flag has an `owner` field and a `ttl` (max 14 days). CI blocks deployment if any flag exceeds its TTL at 100% rollout.
|
||||
- Automated circuit breaker: if a flagged code path increases P99 latency by >5% or error rate >2%, the flag auto-disables within 30 seconds.
|
||||
- Flags exist for: model routing strategies, complexity classifier thresholds, provider failover chains, new dashboard features.
|
||||
|
||||
**Estimate:** 5 points
|
||||
**Dependencies:** Epic 1 (Proxy Engine), Epic 2 (Router Brain)
|
||||
**Technical Notes:**
|
||||
- Use OpenFeature Rust SDK. For V1, a simple JSON file or env-var provider is fine — no LaunchDarkly needed.
|
||||
- Circuit breaker integration: extend the existing Redis-backed circuit breaker to also flip flags.
|
||||
- Flag cleanup: add a `make flag-audit` target that lists expired flags.
|
||||
|
||||
### Story 10.2: Elastic Schema — Additive-Only Migration Discipline
|
||||
**As a** solo founder, **I want** all TimescaleDB and Redis schema changes to be strictly additive, **so that** I can roll back any deployment instantly without data loss or broken readers.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- CI lint step rejects any migration containing `DROP`, `ALTER ... TYPE`, or `RENAME` on existing columns.
|
||||
- New fields use `_v2` suffix or a new table when breaking changes are unavoidable.
|
||||
- All Rust structs use `#[serde(deny_unknown_fields = false)]` (or equivalent) so V1 code ignores V2 fields.
|
||||
- Dual-write pattern documented and enforced: during migration windows, the API writes to both old and new schema targets within the same DB transaction.
|
||||
- Every migration file includes a `sunset_date` comment (max 30 days). A CI check warns if any migration is past sunset without cleanup.
|
||||
|
||||
**Estimate:** 3 points
|
||||
**Dependencies:** Epic 3 (Analytics Pipeline)
|
||||
**Technical Notes:**
|
||||
- Use `sqlx` migration files. Add a pre-commit hook or CI step that greps for forbidden DDL keywords.
|
||||
- Redis key schema: version keys with prefix (e.g., `route:v1:config`, `route:v2:config`). Never rename keys.
|
||||
- For the `request_events` hypertable, new columns are always `NULLABLE` with defaults.
|
||||
|
||||
### Story 10.3: Cognitive Durability — Decision Logs for Routing Logic
|
||||
**As a** future maintainer (or future me), **I want** every change to routing algorithms, cost models, or provider selection logic accompanied by a `decision_log.json`, **so that** I can understand *why* a decision was made months later in under 60 seconds.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- `decision_log.json` schema defined: `{ prompt, reasoning, alternatives_considered, confidence, timestamp, author }`.
|
||||
- CI requires a `decision_log.json` entry for any PR touching `src/router/`, `src/cost/`, or migration files.
|
||||
- Cyclomatic complexity cap of 10 enforced via `cargo clippy` or a custom lint. PRs exceeding this are blocked.
|
||||
- Decision logs are committed alongside code in a `docs/decisions/` directory, one file per significant change.
|
||||
|
||||
**Estimate:** 2 points
|
||||
**Dependencies:** None
|
||||
**Technical Notes:**
|
||||
- Use a PR template that prompts for the decision log fields.
|
||||
- For the complexity cap, `cargo clippy -W clippy::cognitive_complexity` with threshold 10.
|
||||
- Decision logs for cost table updates should include: source of pricing data, comparison with previous rates, expected savings impact.
|
||||
|
||||
### Story 10.4: Semantic Observability — AI Reasoning Spans on Routing Decisions
|
||||
**As a** platform engineer debugging a misrouted request, **I want** every proxy routing decision to emit an OpenTelemetry span with structured AI reasoning metadata, **so that** I can trace exactly which model was chosen, why, and what alternatives were rejected.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- Every `/v1/chat/completions` request generates an `ai_routing_decision` span as a child of the request trace.
|
||||
- Span attributes include: `ai.model_selected`, `ai.model_alternatives` (JSON array of rejected models + reasons), `ai.cost_delta` (savings vs. default), `ai.complexity_score`, `ai.routing_strategy` (passthrough/cheapest/quality-first/cascading).
|
||||
- `ai.prompt_hash` (SHA-256 of first 500 chars of system prompt) included for correlation — never raw prompt content.
|
||||
- Spans export to any OTLP-compatible backend (Grafana Cloud, Jaeger, etc.).
|
||||
- No PII in any span attribute. Prompt content is hashed, not logged.
|
||||
|
||||
**Estimate:** 3 points
|
||||
**Dependencies:** Epic 1 (Proxy Engine), Epic 2 (Router Brain)
|
||||
**Technical Notes:**
|
||||
- Use `tracing` + `opentelemetry-rust` crate with OTLP exporter.
|
||||
- The span should be created *inside* the router decision function, not as middleware — it needs access to the alternatives list.
|
||||
- For V1, export to stdout in OTLP JSON format. Production: OTLP gRPC to a collector.
|
||||
|
||||
### Story 10.5: Configurable Autonomy — Governance Policy for Automated Routing
|
||||
**As a** solo founder, **I want** a `policy.json` governance file that controls what the system is allowed to do autonomously (e.g., switch models, update cost tables, add providers), **so that** I maintain human oversight as the system grows.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- `policy.json` defines `governance_mode`: `strict` (all changes require manual approval) or `audit` (changes auto-apply but are logged).
|
||||
- The proxy checks `governance_mode` before applying any runtime config change (routing rule update, cost table refresh, provider addition).
|
||||
- `panic_mode` flag: when set to `true`, the proxy freezes all routing rules to their last-known-good state, disables auto-failover, and routes everything to a single hardcoded provider.
|
||||
- Governance drift monitoring: a weekly cron job logs the ratio of auto-applied vs. manually-approved changes. If auto-applied changes exceed 80% in `strict` mode, an alert fires.
|
||||
- All policy check decisions logged: "Allowed by audit mode", "Blocked by strict mode", "Panic mode active — frozen".
|
||||
|
||||
**Estimate:** 3 points
|
||||
**Dependencies:** Epic 2 (Router Brain)
|
||||
**Technical Notes:**
|
||||
- `policy.json` lives in the repo root and is loaded at startup + watched for changes via `notify` crate.
|
||||
- For V1 as a solo founder, start in `audit` mode. `strict` mode is for when you hire or add AI agents to the pipeline.
|
||||
- Panic mode should be triggerable via a single API call (`POST /admin/panic`) or by setting an env var — whichever is faster in an emergency.
|
||||
|
||||
### Epic 10 Summary
|
||||
| Story | Tenet | Points |
|
||||
|-------|-------|--------|
|
||||
| 10.1 | Atomic Flagging | 5 |
|
||||
| 10.2 | Elastic Schema | 3 |
|
||||
| 10.3 | Cognitive Durability | 2 |
|
||||
| 10.4 | Semantic Observability | 3 |
|
||||
| 10.5 | Configurable Autonomy | 3 |
|
||||
| **Total** | | **16** |
|
||||
Reference in New Issue
Block a user