Commit Graph

140 Commits

Author SHA1 Message Date
bbbea3519e Add unit tests for P2 SaaS, P3 notifications, P4 search, P5 ingestion, P6 API
- P2: nonce validation, severity levels, RLS withTenant
- P3: notification dispatcher severity gating, Slack Block Kit emoji mapping
- P4: Meilisearch fallback, service CRUD validation, staged update actions
- P5: cost ingestion validation, snooze range, optimistic locking
- P6: runbook API validation, approval decisions, execution status machine, Slack signature
2026-03-01 03:15:31 +00:00
3326d9a714 Add .gitignore files for P2-P6 2026-03-01 03:14:37 +00:00
6d66fff1bd Add root README with architecture diagram, .env.example for all products 2026-03-01 03:14:11 +00:00
b41cdd1db9 Fix P6 agent: add serde_yaml dep, make modules public for integration tests 2026-03-01 03:13:26 +00:00
829e408e1e Add notification dispatchers (P3 Slack/Email/Webhook, P5 Slack), full YAML parser for P6
- P3 alert: NotificationDispatcher with Slack Block Kit, Resend email, generic webhook; severity-gated dispatch
- P5 cost: CostSlackNotifier with anomaly Block Kit (score, deviation, snooze/expected buttons)
- P6 run: Full YAML runbook parser with serde_yaml, variable substitution ({{var}}), failure actions, 7 tests
- P6 parser: validates non-empty steps, default timeout (300s), default abort on failure
2026-03-01 03:13:06 +00:00
f2e0a32cc7 Wire auth middleware into all products, add docker-compose and init-db script
- Auth middleware (JWT + API key + RBAC) copied into P3/P4/P5/P6
- All server entry points now register auth hooks + auth routes
- Webhook and Slack endpoints skip JWT auth (use HMAC/signature)
- docker-compose.yml: shared Postgres + Redis + Meilisearch, all 4 Node products as services
- init-db.sh: creates per-product databases and runs migrations
- P1 (Rust) and P2 (Go agent) run standalone, not in compose
2026-03-01 03:10:35 +00:00
762e2db9df Add shared auth middleware (JWT + API key + RBAC) and canonical withTenant() helper 2026-03-01 03:09:01 +00:00
2bbaa1efde Add missing configs: CI workflows, tsconfigs, data layers for P4/P5/P6 2026-03-01 03:07:33 +00:00
4957946d29 Flesh out dd0c/cost: ingestion with Welford optimistic locking, anomaly API, governance, baselines
- Ingestion API: batch cost events, Welford baseline update with optimistic locking (version column), anomaly detection inline
- Anomaly API: list (filtered), acknowledge, snooze (1-168h), mark expected, dashboard summary with hourly trend
- Governance API: mode status, promotion eligibility check with FP rate calculation
- Baseline API: list with computed stddev, reset per resource
- Data layer: withTenant() RLS wrapper, Zod config with ANOMALY_THRESHOLD
- Fastify server entry point
2026-03-01 03:07:02 +00:00
a17527dfa4 Flesh out dd0c/portal: service CRUD, discovery API, Meilisearch search, data layer
- Service API: list (filtered by type/owner/lifecycle/tier), detail, upsert, delete, ownership summary
- Discovery API: trigger AWS/GitHub scans, scan history, staged update review (apply/reject)
- Search: Meilisearch full-text with PG ILIKE fallback, reindex endpoint
- Data layer: withTenant() RLS wrapper, Zod config with MEILI_URL/MEILI_KEY
- Fastify server entry point
2026-03-01 03:05:55 +00:00
d85cdaa3e7 Flesh out dd0c/alert: webhook routes, incident API, notification config, data layer
- Webhook routes: Datadog, PagerDuty, OpsGenie, Grafana with per-tenant HMAC/token auth
- Incident API: list (filtered), detail with alerts, acknowledge/resolve/suppress, dashboard summary
- Notification config: CRUD with upsert, test endpoint, Slack/email/webhook channels
- Grafana normalizer: severity mapping (critical/warning/info)
- Data layer: withTenant() RLS wrapper, Zod config validation
- Fastify server entry point with cors/helmet
2026-03-01 03:04:57 +00:00
57e7083986 Scaffold dd0c/run: Rust agent (classifier, executor, audit) + TypeScript SaaS
- Rust agent: clap CLI, command classifier (read-only/modifying/destructive), executor with approval gates, audit log entries
- Classifier: pattern-based safety classification for shell, AWS, kubectl, terraform/tofu commands
- 6 Rust tests: read-only, destructive, modifying, empty, terraform apply, tofu destroy
- SaaS backend: Fastify server, runbook CRUD API, approval API, Slack interactive handler
- Slack integration: signature verification, block_actions for approve/reject buttons
- PostgreSQL schema with RLS: runbooks, executions, audit_entries (append-only), agents
- Dual Dockerfiles: Rust multi-stage (agent), Node multi-stage (SaaS)
- Gitea Actions CI: Rust test+clippy, Node typecheck+test
- Fly.io config for SaaS
2026-03-01 03:03:29 +00:00
6f692fc5ef Scaffold dd0c/cost: Welford baseline, anomaly scorer, governance engine, tests
- Welford online algorithm for running mean/stddev baselines
- Anomaly scorer: z-score → 0-100 mapping, property-based tests (10K runs, fast-check)
- Governance engine: 14-day auto-promotion with FP rate gate, injectable Clock
- Panic mode: defaults to active (safe) when Redis unreachable
- Tests: 12 scorer cases (incl 2x 10K property-based), 9 governance cases, 3 panic mode cases
- PostgreSQL schema with RLS: baselines (optimistic locking), anomalies, remediation_actions
- Fly.io config, Dockerfile
2026-03-01 02:52:53 +00:00
23db74b306 Scaffold dd0c/portal: AWS+GitHub discovery, catalog service, ownership resolution
- AWS scanner: ECS/Lambda/RDS discovery with partial failure handling
- GitHub scanner: CODEOWNERS parsing, commit-based heuristic ownership, rate limit resilience
- Catalog service: ownership resolution (config > codeowners > aws-tag > heuristic), staged updates for partial scans
- Ownership tests: 6 cases covering full priority chain
- PostgreSQL schema with RLS: services, staged_updates, scan_history, free tier (50 services)
- Fly.io config, Dockerfile
2026-03-01 02:51:02 +00:00
ccc4cd1c32 Scaffold dd0c/alert: ingestion, correlation engine, HMAC validation, tests
- Webhook ingestion: HMAC validation for Datadog/PagerDuty/OpsGenie with 5-min timestamp freshness
- Payload normalizers: canonical alert schema with severity mapping per provider
- Correlation engine: time-window grouping, late-alert attachment (2x window), FakeClock for testing
- InMemoryWindowStore for unit tests
- Tests: 12 HMAC validation cases, 5 normalizer cases, 7 correlation engine cases
- PostgreSQL schema with RLS: tenants, incidents, alerts, webhook_secrets, notification_configs
- Free tier enforcement columns (alert_count_month, reset_at)
- Fly.io config, Dockerfile, Gitea Actions CI
2026-03-01 02:49:14 +00:00
5d67de6486 Add dd0c/drift notifications, infra, CI: Slack Block Kit, Dockerfiles, Gitea Actions
- Notification service: Slack Block Kit (remediate/accept buttons), webhook delivery, rate limit handling
- Dispatcher with severity-based channel filtering
- Agent Dockerfile: multi-stage Go build, static binary
- SaaS Dockerfile: multi-stage Node build
- Fly.io config: scale-to-zero, shared-cpu
- Gitea Actions: Go test+vet, Node typecheck+test, cross-compile agent (linux/darwin/windows)
2026-03-01 02:46:47 +00:00
e67cef518e Scaffold dd0c/drift SaaS backend: Fastify, RLS, ingestion, dashboard API
- Fastify server with Zod validation, pino logging, CORS/helmet
- Drift report ingestion endpoint with nonce replay prevention
- Dashboard API: stacks list, drift history, report detail, summary stats
- PostgreSQL schema with RLS: tenants, users, agent_keys, drift_reports, remediation_actions
- withTenant() helper for safe connection pool tenant context management
- Config via Zod-validated env vars
2026-03-01 02:45:33 +00:00
31cb36fb77 Scaffold dd0c/drift Go agent: CLI, scanner, scrubber, reporter, models
- cobra CLI: check (one-shot), watch (SQS consumer), version
- models: DriftReport, DriftedResource, severity classification (critical/high/medium/low)
- scanner: Terraform v4 state parser, resource counter
- scrubber: regex + Shannon entropy secret detection (strict/permissive/off modes)
- reporter: mTLS HTTP client with nonce replay prevention
- tests: severity classification (8 cases), scrubber (AWS keys, RSA, entropy, attributes)
2026-03-01 02:42:53 +00:00
e626608535 Add proxy latency benchmark (criterion, 1000 samples, 1/5/10 msg variants) 2026-03-01 02:40:45 +00:00
e882f181d5 Add dd0c/route integration tests: proxy engine with wiremock
- Forward request to upstream and verify response passthrough
- Telemetry event emission (org_id, model, latency, status)
- Low-complexity routing: gpt-4o → gpt-4o-mini with strategy=cheapest
- Upstream error passthrough (429 rate limit)
- Invalid JSON → 400 Bad Request
- Health endpoint returns 200
2026-03-01 02:40:09 +00:00
c5ef45e69b Add dd0c/route unit tests: router, middleware, config, digest
- Router tests: complexity classification (low/medium/high), routing decisions, cost delta
- Middleware tests: API key redaction (OpenAI, Anthropic, Bearer), JSON bodies, telemetry safety
- Config tests: defaults, unknown provider fallbacks
- Digest tests: next_monday_9am scheduling edge cases
- Anomaly tests: threshold logic, divide-by-zero guard
2026-03-01 02:39:01 +00:00
8a4c7c256d Add V1 infrastructure: Gitea Actions CI/CD + Fly.io + Cloudflare Pages
- Gitea Actions workflows: ci.yml (tests+clippy+fmt), benchmark.yml (P99 gate), deploy.yml (Fly+CF)
- Fly.io configs: proxy (shared-cpu, 256MB, min 1 machine), API (scale-to-zero)
- Dockerfiles: multi-stage Rust builds for proxy and API binaries
- INFRASTRUCTURE.md: full V1 stack (~$5/mo), AWS migration path, Gitea runner setup, DNS plan
- Stack: Fly.io + Cloudflare Pages + Neon + Upstash + Gitea Actions
2026-03-01 02:37:48 +00:00
a486373d93 Add dd0c/route Dashboard UI: React + Vite + Tailwind SPA
- Layout with sidebar navigation (Dashboard, Rules, Keys, Settings)
- Dashboard page: stat cards, cost savings area chart (Recharts), model usage table
- Rules page: routing rules CRUD with modal editor, strategy/complexity/model matching
- Keys page: API key generation, copy-once reveal, revocation, quick-start code snippet
- Settings page: org info, provider config, danger zone
- API client (SWR + fetch wrapper) with full TypeScript types
- dd0c dark theme: indigo primary, cyan accent, dark surfaces
- Vite proxy config for local dev against API on :3000
2026-03-01 02:36:32 +00:00
0fe25b8aa6 Add dd0c/route worker: weekly digest generation + hourly anomaly detection
- digest.rs: Weekly cost savings digest per org, top models, top routing savings
- anomaly.rs: Threshold-based anomaly detection (3x hourly average = alert)
- main.rs: Periodic task scheduler (hourly anomaly, weekly digest, daily cost refresh)
- next_monday_9am() with unit tests for scheduling
2026-03-01 02:32:28 +00:00
e234f66b9b Add dd0c/route Dashboard API: analytics, routing rules CRUD, API keys, providers
- GET /api/v1/analytics/summary — 7-day cost savings, latency, routing breakdown
- GET /api/v1/analytics/timeseries — hourly/daily rollups from TimescaleDB continuous aggregates
- GET /api/v1/analytics/models — per-model cost and token breakdown
- CRUD /api/v1/rules — routing rules with priority, match conditions, strategies
- CRUD /api/v1/keys — API key generation (dd0c_ prefix), bcrypt hashing, revocation
- CRUD /api/v1/providers — provider config upsert with encrypted key storage
- GET /api/v1/org — org info
- Role-based access: Owner required for mutations
2026-03-01 02:31:28 +00:00
311a834228 Add dd0c/route project scaffolding: migrations, docker-compose, README
- PostgreSQL schema: orgs, users, api_keys, provider_configs, routing_rules, cost_tables, feature_flags
- TimescaleDB schema: request_events hypertable, hourly/daily continuous aggregates, compression, retention
- docker-compose.yml: postgres, timescaledb, redis for local dev
- README with quick start, architecture overview, pricing tiers
- .env.example, .gitignore
2026-03-01 02:29:23 +00:00
72a0f26a7b Add BMad review epic addendums for all 6 products
Per-product surgical additions to existing epics (not cross-cutting):
- P1 route: 8pts (key redaction, SSE billing, token math, CI runner)
- P2 drift: 12pts (mTLS revocation, state lock recovery, pgmq visibility, RLS leak, entropy scrubber)
- P3 alert: 10pts (HMAC replay, claim-check, out-of-order correlation, free tier, tenant isolation)
- P4 portal: 9pts (partial scan recovery, ownership conflicts, Meilisearch rebuild, VCR freshness, free tier)
- P5 cost: 7pts (concurrent baselines, remediation RBAC, Clock interface, property tests, Redis fallback)
- P6 run: 15pts (shell AST parsing, canary suite, intervention TTL, streaming audit, crypto signatures)

Total: 61 story points across 30 new stories
2026-03-01 02:27:55 +00:00
cc003cbb1c Scaffold dd0c/route core proxy engine (handler, router, auth, config) 2026-03-01 02:23:27 +00:00
d038cd9c5c Implement BMad Must-Have Before Launch fixes for all 6 products
P1: API key redaction, SSE billing leak, token math edge cases, CI runner config
P2: mTLS revocation lockout, terraform state lock recovery, RLS pool leak, entropy scrubber, pgmq visibility
P3: HMAC replay prevention, cross-tenant negative tests, correlation window edge cases, SQS claim-check, free tier
P4: Discovery partial failure recovery, ownership conflict integration test, VCR freshness CI, Meilisearch rebuild, Cmd+K latency
P5: Concurrent baseline conflicts, remediation RBAC, Clock interface for governance, 10K property-based runs, Redis panic fallback
P6: Cryptographic agent update signatures, streaming audit logs with WAL, shell AST parsing (mvdan/sh), intervention deadlock TTL, canary suite CI gate
2026-03-01 02:14:04 +00:00
b24cfa7c0d BMad code reviews complete for all 6 products
P1 route: Gemini — 'Ship the proxy, stop writing tests for the tests'
P2 drift: Gemini — mTLS revocation, state lock corruption, RLS pool leak
P3 alert: Gemini — replay attacks, trace propagation, SQS claim-check
P4 portal: Manual — discovery reliability is existential risk
P5 cost: Manual — concurrent baselines, remediation RBAC, pricing staleness
P6 run: Gemini — policy update loophole, AST parsing, audit streaming
2026-03-01 02:09:19 +00:00
9cc5aeaa03 BMad code reviews for P4 (portal) and P5 (cost) — manual
P4: Discovery reliability flagged as existential risk, VCR cassette staleness,
    ownership conflict race condition, Step Functions→cron gap
P5: Concurrent baseline update risk, remediation RBAC gap, pricing staleness,
    property-based tests need 10K runs, Clock interface needed for governance
2026-03-01 02:06:06 +00:00
b7cce013ed Phase 3: BDD acceptance specs for P4 (portal) — partial
P4: 1,177 lines (subagent still running, may have more output pending)
All 6 products now have acceptance specs committed.
2026-03-01 02:01:04 +00:00
c3bafa238a Add dual-mode deployment addendums for all 6 products
P1 route: 16 pts (template, full docker-compose + install script)
P2 drift: 17 pts (pgmq, local CA for mTLS)
P3 alert: 19 pts (Lambda→Fastify, DynamoDB→PG JSONB)
P4 portal: 18 pts (Step Functions→cron, Aurora→PG+pgvector)
P5 cost: 19 pts (EventBridge→agent/polling, DynamoDB→PG JSONB)
P6 run: 15 pts (easiest — already PG-native, no AWS deps in core)

Total self-hosted effort: ~104 story points across all 6 products
2026-03-01 02:00:00 +00:00
96e51054ae Add dual-mode deployment architecture addendum for P1 (route)
Docker Compose self-hosted mode, install script, auth abstraction,
data layer abstraction (SQS→pgmq, Cognito→local JWT, S3→local FS),
Caddy auto-TLS, upgrade path, self-hosted BDD specs.
16 story points additional effort. Template for all 6 products.
2026-03-01 01:58:15 +00:00
4938674c20 Phase 3: BDD acceptance specs for P2 (drift), P3 (alert), P6 (run)
P2: 2,245 lines, 10 epics — Sonnet subagent (8min)
P3: 1,653 lines, 10 epics — Sonnet subagent (6min)
P6: 2,303 lines, 262 scenarios, 10 epics — Sonnet subagent (7min)
P4 (portal) still in progress
2026-03-01 01:54:35 +00:00
c1484426cc Phase 3: BDD acceptance specs for P1 (route) and P5 (cost)
P1: 50+ scenarios across 10 epics, all stories covered
P5: 55+ scenarios across 10 epics, written manually (Sonnet credential failures)
Remaining P2/P3/P4/P6 in progress via subagents
2026-03-01 01:50:30 +00:00
03bfe931fc Implement review remediation + PLG analytics SDK
- All 6 test architectures patched with Section 11 addendums
- P5 (cost) fully rewritten from 232 to ~600 lines
- PLG brainstorm + party mode advisory board results
- Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe)
- Analytics tests v2 (safeParse, no , no timestamp, no PII)
- Addresses all Gemini review findings across P1-P6
2026-03-01 01:42:49 +00:00
2fe0ed856e Add Gemini TDD reviews for all 6 products
P1, P2, P3, P4, P6 reviewed by Gemini subagents.
P5 reviewed manually (Gemini credential errors).
All reviews flag coverage gaps, anti-patterns, and Transparent Factory tenet gaps.
2026-03-01 00:29:24 +00:00
1101fef096 Update test architectures for P3, P4, P5 2026-02-28 23:33:07 +00:00
5ee95d8b13 dd0c: full product research pipeline - 6 products, 8 phases each
Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
        product-brief, architecture, epics (incl. Epic 10 TF compliance),
        test-architecture (TDD strategy)

Brand strategy and market research included.
2026-02-28 17:35:02 +00:00