Commit Graph

14 Commits

Author SHA1 Message Date
eb953cdea5 Security hardening: auth encapsulation, pool restriction, rate limiting, invites, async webhooks
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 43s
CI — P2 Drift (Go + Node) / saas (push) Failing after 5s
CI — P3 Alert / test (push) Failing after 4s
CI — P4 Portal / test (push) Failing after 4s
CI — P5 Cost / test (push) Failing after 4s
CI — P6 Run / saas (push) Failing after 5s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 7s
CI — P3 Alert / build-push (push) Has been skipped
CI — P4 Portal / build-push (push) Has been skipped
CI — P5 Cost / build-push (push) Has been skipped
CI — P6 Run / build-push (push) Failing after 5s
Phase 1 (Security Critical):
- Auth plugin encapsulation: replaced global addHook with Fastify plugin scope
- Removed startsWith URL matching; public routes registered outside auth scope
- JWT verify now enforces algorithms: ['HS256'] (prevents algorithm confusion)
- Raw pool no longer exported from db.ts; systemQuery() + getPoolForAuth() instead
- withTenant() remains primary tenant-scoped query path

Phase 2 (Infrastructure):
- docker-compose.yml: all secrets via env var substitution (${VAR:-default})
- Per-service Postgres users (dd0c_drift, dd0c_alert, etc.) in docker-init-db.sh
- .env.example with all configurable secrets
- build-push.sh uses $REGISTRY_PASSWORD instead of hardcoded
- .gitignore excludes .env files
- @fastify/rate-limit: 100 req/min global, 5/min login, 3/min signup
- CORS_ORIGIN default changed from '*' to 'http://localhost:5173'

Phase 3 (Product):
- Team invite flow: tenant_invites table, POST /invite, GET /invites, DELETE /invites/:id
- Signup accepts optional invite_token to join existing tenant
- Async webhook ingestion (P3): LPUSH to Redis, BRPOP worker, dead-letter queue

Console:
- All 5 product modules wired: drift, alert, portal, cost, run
- PageHeader accepts children prop
- 71 modules, 70KB gzipped production build

All 6 projects compile clean (tsc --noEmit).
2026-03-02 23:53:55 +00:00
be3f37cfdd Fix CRITICAL auth bypass: exact match for login/signup paths
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 45s
CI — P2 Drift (Go + Node) / saas (push) Successful in 28s
CI — P3 Alert / test (push) Successful in 24s
CI — P4 Portal / test (push) Successful in 27s
CI — P5 Cost / test (push) Successful in 26s
CI — P6 Run / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 46s
CI — P3 Alert / build-push (push) Failing after 38s
CI — P4 Portal / build-push (push) Failing after 50s
CI — P5 Cost / build-push (push) Failing after 22s
CI — P6 Run / build-push (push) Failing after 1m3s
startsWith('/api/v1/auth/login') allowed any path with that prefix
to bypass authentication (e.g. /api/v1/auth/login-anything).
Changed to exact path match with query string stripping.
Fixed across all 5 products + shared/auth.ts.
2026-03-02 20:35:28 +00:00
3be37d1293 Skip auth on /version endpoint (same as /health)
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 9s
CI — P3 Alert / test (push) Successful in 21s
CI — P2 Drift (Go + Node) / saas (push) Successful in 35s
CI — P4 Portal / test (push) Successful in 25s
CI — P5 Cost / test (push) Successful in 37s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 3s
CI — P6 Run / saas (push) Successful in 22s
CI — P3 Alert / build-push (push) Failing after 3s
CI — P4 Portal / build-push (push) Failing after 2s
CI — P5 Cost / build-push (push) Failing after 2s
CI — P6 Run / build-push (push) Failing after 3s
2026-03-02 13:54:46 +00:00
5bad2481ae Add /version endpoint to all products + BUILD_SHA/BUILD_TIME in Dockerfiles
Some checks failed
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 4s
CI — P3 Alert / build-push (push) Failing after 3s
CI — P6 Run / saas (push) Successful in 23s
CI — P4 Portal / build-push (push) Failing after 2s
CI — P2 Drift (Go + Node) / agent (push) Successful in 17s
CI — P3 Alert / test (push) Successful in 21s
CI — P5 Cost / test (push) Successful in 24s
CI — P4 Portal / test (push) Successful in 38s
CI — P5 Cost / build-push (push) Failing after 3s
CI — P6 Run / build-push (push) Failing after 2s
2026-03-02 13:53:15 +00:00
81d03c1735 Fix tenant slug collision: append random hex suffix to prevent 23505 on duplicate tenant names
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P2 Drift (Go + Node) / agent (push) Successful in 1m6s
CI — P3 Alert / test (push) Successful in 37s
CI — P5 Cost / test (push) Successful in 29s
CI — P4 Portal / test (push) Successful in 48s
CI — P6 Run / saas (push) Successful in 25s
2026-03-01 22:36:21 +00:00
27a89ee2b7 Trigger CI with tsc fix
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Failing after 3s
CI — P2 Drift (Go + Node) / saas (push) Successful in 29s
CI — P3 Alert / test (push) Successful in 40s
CI — P4 Portal / test (push) Successful in 32s
CI — P6 Run / saas (push) Successful in 30s
CI — P5 Cost / test (push) Successful in 46s
2026-03-01 06:56:00 +00:00
3e68e8871d Trigger CI for P2-SaaS, P4, P5, P6
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Failing after 1s
CI — P4 Portal / test (push) Failing after 17s
CI — P5 Cost / test (push) Failing after 15s
CI — P6 Run / saas (push) Failing after 15s
CI — P2 Drift (Go + Node) / saas (push) Successful in 43s
2026-03-01 06:52:14 +00:00
68140881e0 Trigger CI for P3-P6 Node products
Some checks failed
CI — P3 Alert / test (push) Failing after 15s
CI — P4 Portal / test (push) Failing after 19s
CI — P5 Cost / test (push) Failing after 17s
CI — P6 Run / saas (push) Failing after 18s
2026-03-01 06:43:58 +00:00
4146f1c4d0 Fix TypeScript compilation errors across P3-P6
- jwt.sign: explicit SignOptions cast for expiresIn (all 4 products)
- ioredis: use named import { Redis } instead of default (P4, P6)
- P4 catalog/service: fix import paths for aws-scanner and github-scanner
- P4 discovery: pass pool to ScheduledDiscovery constructor
- P6 agent-bridge: add explicit types for Redis message callback params
- All 4 Node products now compile cleanly with tsc --noEmit
2026-03-01 06:06:31 +00:00
5ee869b9d8 Implement auth: login/signup (scrypt), API key generation, shared migration
- Login: email + password lookup, scrypt verify, JWT token
- Signup: create tenant + owner user in transaction, slug generation
- API key: dd0c_ prefix, SHA-256 hash (not bcrypt — faster for API key lookups), prefix index
- Scrypt over bcrypt: zero native deps, Node.js built-in crypto
- Auth routes skip JWT middleware (login/signup are public)
- 002_auth.sql: users + api_keys tables with RLS, copied to all products
- Synced auth middleware to P3/P4/P5/P6
2026-03-01 03:19:18 +00:00
829e408e1e Add notification dispatchers (P3 Slack/Email/Webhook, P5 Slack), full YAML parser for P6
- P3 alert: NotificationDispatcher with Slack Block Kit, Resend email, generic webhook; severity-gated dispatch
- P5 cost: CostSlackNotifier with anomaly Block Kit (score, deviation, snooze/expected buttons)
- P6 run: Full YAML runbook parser with serde_yaml, variable substitution ({{var}}), failure actions, 7 tests
- P6 parser: validates non-empty steps, default timeout (300s), default abort on failure
2026-03-01 03:13:06 +00:00
f2e0a32cc7 Wire auth middleware into all products, add docker-compose and init-db script
- Auth middleware (JWT + API key + RBAC) copied into P3/P4/P5/P6
- All server entry points now register auth hooks + auth routes
- Webhook and Slack endpoints skip JWT auth (use HMAC/signature)
- docker-compose.yml: shared Postgres + Redis + Meilisearch, all 4 Node products as services
- init-db.sh: creates per-product databases and runs migrations
- P1 (Rust) and P2 (Go agent) run standalone, not in compose
2026-03-01 03:10:35 +00:00
4957946d29 Flesh out dd0c/cost: ingestion with Welford optimistic locking, anomaly API, governance, baselines
- Ingestion API: batch cost events, Welford baseline update with optimistic locking (version column), anomaly detection inline
- Anomaly API: list (filtered), acknowledge, snooze (1-168h), mark expected, dashboard summary with hourly trend
- Governance API: mode status, promotion eligibility check with FP rate calculation
- Baseline API: list with computed stddev, reset per resource
- Data layer: withTenant() RLS wrapper, Zod config with ANOMALY_THRESHOLD
- Fastify server entry point
2026-03-01 03:07:02 +00:00
6f692fc5ef Scaffold dd0c/cost: Welford baseline, anomaly scorer, governance engine, tests
- Welford online algorithm for running mean/stddev baselines
- Anomaly scorer: z-score → 0-100 mapping, property-based tests (10K runs, fast-check)
- Governance engine: 14-day auto-promotion with FP rate gate, injectable Clock
- Panic mode: defaults to active (safe) when Redis unreachable
- Tests: 12 scorer cases (incl 2x 10K property-based), 9 governance cases, 3 panic mode cases
- PostgreSQL schema with RLS: baselines (optimistic locking), anomalies, remediation_actions
- Fly.io config, Dockerfile
2026-03-01 02:52:53 +00:00