Commit Graph

20 Commits

Author SHA1 Message Date
5792f95d7c Fix BMad adversarial security review findings
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 47s
CI — P2 Drift (Go + Node) / saas (push) Successful in 36s
CI — P3 Alert / test (push) Successful in 36s
CI — P4 Portal / build-push (push) Failing after 49s
CI — P5 Cost / build-push (push) Failing after 4s
CI — P6 Run / build-push (push) Failing after 4s
CI — P4 Portal / test (push) Successful in 35s
CI — P5 Cost / test (push) Successful in 40s
CI — P6 Run / saas (push) Successful in 36s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 17s
CI — P3 Alert / build-push (push) Failing after 15s
Resolves 11 of the 13 findings:
- [CRITICAL] SQLi in RLS: replaced SET LOCAL with parameterized set_config()
- [CRITICAL] Rate Limiting: installed and registered @fastify/rate-limit in all 5 apps
- [CRITICAL] Invite Hijacking: added email verification check to invite lookup
- [HIGH] Webhook HMAC: added Fastify rawBody parser to fix JSON.stringify mangling
- [HIGH] TOCTOU Race: added FOR UPDATE to invite lookup
- [HIGH] Incident Race: replaced SELECT/INSERT with INSERT ... ON CONFLICT
- [MEDIUM] Grafana Timing Attack: replaced === with crypto.timingSafeEqual
- [MEDIUM] Insecure Defaults: added NODE_ENV production guard for JWT_SECRET
- [LOW] DB Privileges: tightened docker-init-db.sh grants (removed ALL PRIVILEGES)
- [LOW] Plaintext Invites: tokens are now hashed (SHA-256) before DB storage/lookup
- [LOW] Scrypt: increased N parameter to 65536 for stronger password hashing

Note:
- Finding #4 (Fragmented Identity) requires a unified auth database architecture.
- Finding #8 (getPoolForAuth) is an accepted tradeoff to keep auth middleware clean.
2026-03-03 00:14:39 +00:00
eb953cdea5 Security hardening: auth encapsulation, pool restriction, rate limiting, invites, async webhooks
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 43s
CI — P2 Drift (Go + Node) / saas (push) Failing after 5s
CI — P3 Alert / test (push) Failing after 4s
CI — P4 Portal / test (push) Failing after 4s
CI — P5 Cost / test (push) Failing after 4s
CI — P6 Run / saas (push) Failing after 5s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 7s
CI — P3 Alert / build-push (push) Has been skipped
CI — P4 Portal / build-push (push) Has been skipped
CI — P5 Cost / build-push (push) Has been skipped
CI — P6 Run / build-push (push) Failing after 5s
Phase 1 (Security Critical):
- Auth plugin encapsulation: replaced global addHook with Fastify plugin scope
- Removed startsWith URL matching; public routes registered outside auth scope
- JWT verify now enforces algorithms: ['HS256'] (prevents algorithm confusion)
- Raw pool no longer exported from db.ts; systemQuery() + getPoolForAuth() instead
- withTenant() remains primary tenant-scoped query path

Phase 2 (Infrastructure):
- docker-compose.yml: all secrets via env var substitution (${VAR:-default})
- Per-service Postgres users (dd0c_drift, dd0c_alert, etc.) in docker-init-db.sh
- .env.example with all configurable secrets
- build-push.sh uses $REGISTRY_PASSWORD instead of hardcoded
- .gitignore excludes .env files
- @fastify/rate-limit: 100 req/min global, 5/min login, 3/min signup
- CORS_ORIGIN default changed from '*' to 'http://localhost:5173'

Phase 3 (Product):
- Team invite flow: tenant_invites table, POST /invite, GET /invites, DELETE /invites/:id
- Signup accepts optional invite_token to join existing tenant
- Async webhook ingestion (P3): LPUSH to Redis, BRPOP worker, dead-letter queue

Console:
- All 5 product modules wired: drift, alert, portal, cost, run
- PageHeader accepts children prop
- 71 modules, 70KB gzipped production build

All 6 projects compile clean (tsc --noEmit).
2026-03-02 23:53:55 +00:00
be3f37cfdd Fix CRITICAL auth bypass: exact match for login/signup paths
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 45s
CI — P2 Drift (Go + Node) / saas (push) Successful in 28s
CI — P3 Alert / test (push) Successful in 24s
CI — P4 Portal / test (push) Successful in 27s
CI — P5 Cost / test (push) Successful in 26s
CI — P6 Run / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 46s
CI — P3 Alert / build-push (push) Failing after 38s
CI — P4 Portal / build-push (push) Failing after 50s
CI — P5 Cost / build-push (push) Failing after 22s
CI — P6 Run / build-push (push) Failing after 1m3s
startsWith('/api/v1/auth/login') allowed any path with that prefix
to bypass authentication (e.g. /api/v1/auth/login-anything).
Changed to exact path match with query string stripping.
Fixed across all 5 products + shared/auth.ts.
2026-03-02 20:35:28 +00:00
3be37d1293 Skip auth on /version endpoint (same as /health)
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 9s
CI — P3 Alert / test (push) Successful in 21s
CI — P2 Drift (Go + Node) / saas (push) Successful in 35s
CI — P4 Portal / test (push) Successful in 25s
CI — P5 Cost / test (push) Successful in 37s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 3s
CI — P6 Run / saas (push) Successful in 22s
CI — P3 Alert / build-push (push) Failing after 3s
CI — P4 Portal / build-push (push) Failing after 2s
CI — P5 Cost / build-push (push) Failing after 2s
CI — P6 Run / build-push (push) Failing after 3s
2026-03-02 13:54:46 +00:00
5bad2481ae Add /version endpoint to all products + BUILD_SHA/BUILD_TIME in Dockerfiles
Some checks failed
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 4s
CI — P3 Alert / build-push (push) Failing after 3s
CI — P6 Run / saas (push) Successful in 23s
CI — P4 Portal / build-push (push) Failing after 2s
CI — P2 Drift (Go + Node) / agent (push) Successful in 17s
CI — P3 Alert / test (push) Successful in 21s
CI — P5 Cost / test (push) Successful in 24s
CI — P4 Portal / test (push) Successful in 38s
CI — P5 Cost / build-push (push) Failing after 3s
CI — P6 Run / build-push (push) Failing after 2s
2026-03-02 13:53:15 +00:00
2c9408b1df Fix indentation on registerWebhookSecretRoutes call
All checks were successful
CI — P3 Alert / test (push) Successful in 23s
2026-03-02 05:11:37 +00:00
9f46b84257 Add API to manage webhook secrets for alert integrations
All checks were successful
CI — P3 Alert / test (push) Successful in 15s
2026-03-02 05:00:40 +00:00
81d03c1735 Fix tenant slug collision: append random hex suffix to prevent 23505 on duplicate tenant names
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P2 Drift (Go + Node) / agent (push) Successful in 1m6s
CI — P3 Alert / test (push) Successful in 37s
CI — P5 Cost / test (push) Successful in 29s
CI — P4 Portal / test (push) Successful in 48s
CI — P6 Run / saas (push) Successful in 25s
2026-03-01 22:36:21 +00:00
27a89ee2b7 Trigger CI with tsc fix
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Failing after 3s
CI — P2 Drift (Go + Node) / saas (push) Successful in 29s
CI — P3 Alert / test (push) Successful in 40s
CI — P4 Portal / test (push) Successful in 32s
CI — P6 Run / saas (push) Successful in 30s
CI — P5 Cost / test (push) Successful in 46s
2026-03-01 06:56:00 +00:00
bfc599da52 Trigger CI after workflow rewrite
All checks were successful
CI — P3 Alert / test (push) Successful in 1m9s
2026-03-01 06:47:59 +00:00
68140881e0 Trigger CI for P3-P6 Node products
Some checks failed
CI — P3 Alert / test (push) Failing after 15s
CI — P4 Portal / test (push) Failing after 19s
CI — P5 Cost / test (push) Failing after 17s
CI — P6 Run / saas (push) Failing after 18s
2026-03-01 06:43:58 +00:00
4534f0aeba Fix test failures: HMAC length check (P3), fast-check fround (P5)
Some checks failed
CI — P3 Alert / test (push) Failing after 15s
CI — P5 Cost / test (push) Failing after 15s
- P3: timingSafeEqual requires equal-length buffers; add length guard before compare
- P5: fast-check fc.float requires 32-bit floats; wrap min with Math.fround()
- All 5 Node products: 83 tests passing across 13 test files
2026-03-01 06:24:46 +00:00
4146f1c4d0 Fix TypeScript compilation errors across P3-P6
- jwt.sign: explicit SignOptions cast for expiresIn (all 4 products)
- ioredis: use named import { Redis } instead of default (P4, P6)
- P4 catalog/service: fix import paths for aws-scanner and github-scanner
- P4 discovery: pass pool to ScheduledDiscovery constructor
- P6 agent-bridge: add explicit types for Redis message callback params
- All 4 Node products now compile cleanly with tsc --noEmit
2026-03-01 06:06:31 +00:00
e1b22e5309 Wire up remaining TODO stubs: P3 test notifications, P2 drift notification trigger
- P3: test notification endpoint now instantiates real Slack/Email/Webhook notifiers
- P2: drift processor triggers notification service when drift_score > 0 (non-fatal on failure)
2026-03-01 04:14:26 +00:00
5ee869b9d8 Implement auth: login/signup (scrypt), API key generation, shared migration
- Login: email + password lookup, scrypt verify, JWT token
- Signup: create tenant + owner user in transaction, slug generation
- API key: dd0c_ prefix, SHA-256 hash (not bcrypt — faster for API key lookups), prefix index
- Scrypt over bcrypt: zero native deps, Node.js built-in crypto
- Auth routes skip JWT middleware (login/signup are public)
- 002_auth.sql: users + api_keys tables with RLS, copied to all products
- Synced auth middleware to P3/P4/P5/P6
2026-03-01 03:19:18 +00:00
bdaa732ce1 Implement TODO stubs: webhook secret lookup, alert→incident wiring, catalog upsert/stage
- P3: getWebhookSecret() now queries DB; ingestAlert() creates/attaches incidents, auto-resolves on resolved status
- P4: stageUpdates() writes to staged_updates table; upsertService() with ON CONFLICT; getService/updateOwner implemented
2026-03-01 03:18:05 +00:00
829e408e1e Add notification dispatchers (P3 Slack/Email/Webhook, P5 Slack), full YAML parser for P6
- P3 alert: NotificationDispatcher with Slack Block Kit, Resend email, generic webhook; severity-gated dispatch
- P5 cost: CostSlackNotifier with anomaly Block Kit (score, deviation, snooze/expected buttons)
- P6 run: Full YAML runbook parser with serde_yaml, variable substitution ({{var}}), failure actions, 7 tests
- P6 parser: validates non-empty steps, default timeout (300s), default abort on failure
2026-03-01 03:13:06 +00:00
f2e0a32cc7 Wire auth middleware into all products, add docker-compose and init-db script
- Auth middleware (JWT + API key + RBAC) copied into P3/P4/P5/P6
- All server entry points now register auth hooks + auth routes
- Webhook and Slack endpoints skip JWT auth (use HMAC/signature)
- docker-compose.yml: shared Postgres + Redis + Meilisearch, all 4 Node products as services
- init-db.sh: creates per-product databases and runs migrations
- P1 (Rust) and P2 (Go agent) run standalone, not in compose
2026-03-01 03:10:35 +00:00
d85cdaa3e7 Flesh out dd0c/alert: webhook routes, incident API, notification config, data layer
- Webhook routes: Datadog, PagerDuty, OpsGenie, Grafana with per-tenant HMAC/token auth
- Incident API: list (filtered), detail with alerts, acknowledge/resolve/suppress, dashboard summary
- Notification config: CRUD with upsert, test endpoint, Slack/email/webhook channels
- Grafana normalizer: severity mapping (critical/warning/info)
- Data layer: withTenant() RLS wrapper, Zod config validation
- Fastify server entry point with cors/helmet
2026-03-01 03:04:57 +00:00
ccc4cd1c32 Scaffold dd0c/alert: ingestion, correlation engine, HMAC validation, tests
- Webhook ingestion: HMAC validation for Datadog/PagerDuty/OpsGenie with 5-min timestamp freshness
- Payload normalizers: canonical alert schema with severity mapping per provider
- Correlation engine: time-window grouping, late-alert attachment (2x window), FakeClock for testing
- InMemoryWindowStore for unit tests
- Tests: 12 HMAC validation cases, 5 normalizer cases, 7 correlation engine cases
- PostgreSQL schema with RLS: tenants, incidents, alerts, webhook_secrets, notification_configs
- Free tier enforcement columns (alert_count_month, reset_at)
- Fly.io config, Dockerfile, Gitea Actions CI
2026-03-01 02:49:14 +00:00