Commit Graph

140 Commits

Author SHA1 Message Date
Max
f133ca8ff6 feat(drift): add normalizer, chunk assembly, daily digest, Slack interactions, analytics
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 49s
CI — P2 Drift (Go + Node) / saas (push) Successful in 29s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 48s
- Canonical schema normalizer: cross-provider resource type mapping
- Chunked report reassembly via Redis (10min TTL, out-of-order safe)
- Daily drift digest worker with Slack Block Kit summary
- Slack interactive handler: remediate + accept drift actions
- Analytics API: drift trends and health summary
- 005_drift_features.sql migration (remediations, acceptances, indexes)
2026-03-03 06:56:44 +00:00
Max
ef3d00f124 feat(run): add runbook parser, safety classifier, audit hash chain, trust levels
Some checks failed
CI — P6 Run / saas (push) Successful in 29s
CI — P6 Run / build-push (push) Failing after 50s
- Multi-format parser: YAML, Markdown, Confluence HTML
- Deterministic safety scanner: destructive commands, privilege escalation, network changes
- Immutable audit trail with SHA-256 hash chain + verification endpoint
- Trust level enforcement: sandbox/restricted/standard/elevated
- 004_classifier_audit.sql migration
2026-03-03 06:48:56 +00:00
Max
093890503c feat(alert): add analytics, PagerDuty escalation, Slack interactions, daily noise report
Some checks failed
CI — P6 Run / saas (push) Successful in 48s
CI — P6 Run / build-push (push) Failing after 48s
- Analytics API: MTTR by severity, noise reduction stats, incident trends
- PagerDuty auto-escalation for unacknowledged critical incidents
- Slack interactive handler: acknowledge, resolve, mark noise/helpful
- Daily noise report worker with Slack summary
- 005_analytics.sql migration (resolved_at, time-series indexes)
2026-03-03 06:41:25 +00:00
Max
f1f4dee7ab feat(cost): add zombie hunter, Slack interactions, composite scoring
Some checks failed
CI — P3 Alert / test (push) Successful in 28s
CI — P5 Cost / test (push) Successful in 42s
CI — P6 Run / saas (push) Successful in 41s
CI — P6 Run / build-push (push) Has been cancelled
CI — P3 Alert / build-push (push) Failing after 53s
CI — P5 Cost / build-push (push) Failing after 5s
- Zombie resource hunter: detects idle EC2/RDS/EBS/EIP/NAT resources
- Slack interactive handler: acknowledge, snooze, create-ticket actions
- Composite anomaly scorer: Z-Score + rate-of-change + pattern + novelty
- Cold-start fast path for new resources (<7 days data)
- 005_zombies.sql migration
2026-03-03 06:39:20 +00:00
Max
cfe269a031 feat(portal): add CloudFormation/APIGateway scanners, analytics endpoints, search caching
Some checks failed
CI — P4 Portal / test (push) Failing after 32s
CI — P4 Portal / build-push (push) Has been skipped
- CloudFormation scanner: discovers stacks and maps resources to services
- API Gateway scanner: discovers REST/HTTP APIs and routes
- Analytics API: ownership coverage, health scorecards, tech debt indicators
- Redis prefix cache for Cmd+K search (60s TTL)
- 005_analytics.sql migration for aggregation helpers
2026-03-03 06:36:24 +00:00
Max
47a64d53fd fix: align backend API routes with console frontend contract
Some checks failed
CI — P3 Alert / test (push) Successful in 34s
CI — P4 Portal / test (push) Successful in 37s
CI — P5 Cost / test (push) Successful in 35s
CI — P6 Run / saas (push) Successful in 33s
CI — P5 Cost / build-push (push) Failing after 5s
CI — P6 Run / build-push (push) Failing after 4s
CI — P2 Drift (Go + Node) / agent (push) Successful in 1m5s
CI — P2 Drift (Go + Node) / saas (push) Successful in 37s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 16s
CI — P3 Alert / build-push (push) Failing after 14s
CI — P4 Portal / build-push (push) Failing after 27s
2026-03-03 06:09:41 +00:00
Protocol dd0c Agent
76715d169e fix: RLS auth bypass for signup/login flows
Some checks failed
CI — P2 Drift (Go + Node) / saas (push) Successful in 26s
CI — P3 Alert / test (push) Successful in 23s
CI — P6 Run / build-push (push) Failing after 15s
CI — P2 Drift (Go + Node) / agent (push) Successful in 38s
CI — P4 Portal / test (push) Successful in 34s
CI — P5 Cost / test (push) Successful in 35s
CI — P6 Run / saas (push) Successful in 33s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 50s
CI — P3 Alert / build-push (push) Failing after 5s
CI — P4 Portal / build-push (push) Failing after 51s
CI — P5 Cost / build-push (push) Failing after 15s
- Add set_config('app.tenant_id') before user INSERT in signup tx
- Add 004_auth_rls_fix.sql: permissive SELECT on users/api_keys for
  auth lookups, INSERT on users with tenant context check
- db-setup now runs migrations on every up (idempotent)
2026-03-03 05:38:25 +00:00
Protocol dd0c Agent
1d068c3f75 fix: add maxmem to scrypt params (128MB)
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 38s
CI — P2 Drift (Go + Node) / saas (push) Successful in 25s
CI — P3 Alert / test (push) Successful in 25s
CI — P4 Portal / test (push) Successful in 32s
CI — P5 Cost / test (push) Successful in 35s
CI — P6 Run / saas (push) Successful in 32s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 16s
CI — P3 Alert / build-push (push) Failing after 15s
CI — P4 Portal / build-push (push) Failing after 40s
CI — P5 Cost / build-push (push) Failing after 41s
CI — P6 Run / build-push (push) Failing after 42s
Node's OpenSSL defaults to 32MB scrypt memory limit but N=65536/r=8/p=1
needs ~64MB. Adds maxmem: 128*1024*1024 to all 5 services' hash and
verify functions.
2026-03-03 05:11:37 +00:00
Protocol dd0c Agent
e16f322869 add db-setup service: idempotent per-service Postgres user bootstrap
Runs on every 'docker compose up -d' before product services start.
Creates dd0c_{drift,alert,portal,cost,run} users with least-privilege
grants if they don't exist. Fixes auth failures on existing PG volumes
that predate the security hardening.
2026-03-03 05:04:42 +00:00
0bf91e07eb Fix console nginx: use variable upstreams for resilient DNS resolution
nginx crashes at startup if upstream hosts aren't resolvable yet.
Using 'set $upstream' + Docker's internal resolver (127.0.0.11)
defers DNS resolution to request time, so console starts even if
backends are still booting.
2026-03-03 00:52:00 +00:00
322a8d6a91 Console nginx reverse proxy: route API calls to backend services
Console on :3010 now proxies all /api/v1/* requests to the correct
backend service via Docker Compose service names (drift, alert, portal,
cost, run). No CORS issues, no client-side port config needed.
2026-03-03 00:37:40 +00:00
5a1e287ab6 Add console + marketing site to Docker Compose and build-push
- Console: nginx SPA on port 3010, image reg.dd0c.net/dd0c-console
- Marketing: nginx static on port 3011, image reg.dd0c.net/dd0c-marketing
- Dockerfiles + .dockerignore for both
- build-push.sh updated to include console + marketing targets
2026-03-03 00:36:48 +00:00
5792f95d7c Fix BMad adversarial security review findings
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 47s
CI — P2 Drift (Go + Node) / saas (push) Successful in 36s
CI — P3 Alert / test (push) Successful in 36s
CI — P4 Portal / build-push (push) Failing after 49s
CI — P5 Cost / build-push (push) Failing after 4s
CI — P6 Run / build-push (push) Failing after 4s
CI — P4 Portal / test (push) Successful in 35s
CI — P5 Cost / test (push) Successful in 40s
CI — P6 Run / saas (push) Successful in 36s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 17s
CI — P3 Alert / build-push (push) Failing after 15s
Resolves 11 of the 13 findings:
- [CRITICAL] SQLi in RLS: replaced SET LOCAL with parameterized set_config()
- [CRITICAL] Rate Limiting: installed and registered @fastify/rate-limit in all 5 apps
- [CRITICAL] Invite Hijacking: added email verification check to invite lookup
- [HIGH] Webhook HMAC: added Fastify rawBody parser to fix JSON.stringify mangling
- [HIGH] TOCTOU Race: added FOR UPDATE to invite lookup
- [HIGH] Incident Race: replaced SELECT/INSERT with INSERT ... ON CONFLICT
- [MEDIUM] Grafana Timing Attack: replaced === with crypto.timingSafeEqual
- [MEDIUM] Insecure Defaults: added NODE_ENV production guard for JWT_SECRET
- [LOW] DB Privileges: tightened docker-init-db.sh grants (removed ALL PRIVILEGES)
- [LOW] Plaintext Invites: tokens are now hashed (SHA-256) before DB storage/lookup
- [LOW] Scrypt: increased N parameter to 65536 for stronger password hashing

Note:
- Finding #4 (Fragmented Identity) requires a unified auth database architecture.
- Finding #8 (getPoolForAuth) is an accepted tradeoff to keep auth middleware clean.
2026-03-03 00:14:39 +00:00
eb953cdea5 Security hardening: auth encapsulation, pool restriction, rate limiting, invites, async webhooks
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 43s
CI — P2 Drift (Go + Node) / saas (push) Failing after 5s
CI — P3 Alert / test (push) Failing after 4s
CI — P4 Portal / test (push) Failing after 4s
CI — P5 Cost / test (push) Failing after 4s
CI — P6 Run / saas (push) Failing after 5s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 7s
CI — P3 Alert / build-push (push) Has been skipped
CI — P4 Portal / build-push (push) Has been skipped
CI — P5 Cost / build-push (push) Has been skipped
CI — P6 Run / build-push (push) Failing after 5s
Phase 1 (Security Critical):
- Auth plugin encapsulation: replaced global addHook with Fastify plugin scope
- Removed startsWith URL matching; public routes registered outside auth scope
- JWT verify now enforces algorithms: ['HS256'] (prevents algorithm confusion)
- Raw pool no longer exported from db.ts; systemQuery() + getPoolForAuth() instead
- withTenant() remains primary tenant-scoped query path

Phase 2 (Infrastructure):
- docker-compose.yml: all secrets via env var substitution (${VAR:-default})
- Per-service Postgres users (dd0c_drift, dd0c_alert, etc.) in docker-init-db.sh
- .env.example with all configurable secrets
- build-push.sh uses $REGISTRY_PASSWORD instead of hardcoded
- .gitignore excludes .env files
- @fastify/rate-limit: 100 req/min global, 5/min login, 3/min signup
- CORS_ORIGIN default changed from '*' to 'http://localhost:5173'

Phase 3 (Product):
- Team invite flow: tenant_invites table, POST /invite, GET /invites, DELETE /invites/:id
- Signup accepts optional invite_token to join existing tenant
- Async webhook ingestion (P3): LPUSH to Redis, BRPOP worker, dead-letter queue

Console:
- All 5 product modules wired: drift, alert, portal, cost, run
- PageHeader accepts children prop
- 71 modules, 70KB gzipped production build

All 6 projects compile clean (tsc --noEmit).
2026-03-02 23:53:55 +00:00
be3f37cfdd Fix CRITICAL auth bypass: exact match for login/signup paths
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 45s
CI — P2 Drift (Go + Node) / saas (push) Successful in 28s
CI — P3 Alert / test (push) Successful in 24s
CI — P4 Portal / test (push) Successful in 27s
CI — P5 Cost / test (push) Successful in 26s
CI — P6 Run / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 46s
CI — P3 Alert / build-push (push) Failing after 38s
CI — P4 Portal / build-push (push) Failing after 50s
CI — P5 Cost / build-push (push) Failing after 22s
CI — P6 Run / build-push (push) Failing after 1m3s
startsWith('/api/v1/auth/login') allowed any path with that prefix
to bypass authentication (e.g. /api/v1/auth/login-anything).
Changed to exact path match with query string stripping.
Fixed across all 5 products + shared/auth.ts.
2026-03-02 20:35:28 +00:00
dac6376fb2 Add dd0c Console — modular React dashboard with drift module
- Vite + React + TypeScript + Tailwind CSS
- Shell: auth provider, entitlement gate, dynamic sidebar
- Shared components: Button, Card, Table, Badge, Modal, EmptyState, PageHeader
- Drift module: dashboard, detail view, report viewer
- Module manifest pattern for pluggable product UIs
- Dockerfile: multi-stage node:22-slim → nginx:alpine
- 189KB JS + 17KB CSS (65KB gzipped)
2026-03-02 20:30:33 +00:00
3be37d1293 Skip auth on /version endpoint (same as /health)
Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 9s
CI — P3 Alert / test (push) Successful in 21s
CI — P2 Drift (Go + Node) / saas (push) Successful in 35s
CI — P4 Portal / test (push) Successful in 25s
CI — P5 Cost / test (push) Successful in 37s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 3s
CI — P6 Run / saas (push) Successful in 22s
CI — P3 Alert / build-push (push) Failing after 3s
CI — P4 Portal / build-push (push) Failing after 2s
CI — P5 Cost / build-push (push) Failing after 2s
CI — P6 Run / build-push (push) Failing after 3s
2026-03-02 13:54:46 +00:00
5bad2481ae Add /version endpoint to all products + BUILD_SHA/BUILD_TIME in Dockerfiles
Some checks failed
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 4s
CI — P3 Alert / build-push (push) Failing after 3s
CI — P6 Run / saas (push) Successful in 23s
CI — P4 Portal / build-push (push) Failing after 2s
CI — P2 Drift (Go + Node) / agent (push) Successful in 17s
CI — P3 Alert / test (push) Successful in 21s
CI — P5 Cost / test (push) Successful in 24s
CI — P4 Portal / test (push) Successful in 38s
CI — P5 Cost / build-push (push) Failing after 3s
CI — P6 Run / build-push (push) Failing after 2s
2026-03-02 13:53:15 +00:00
c4ec43cb76 Add CI build-push jobs targeting reg.dd0c.net with docker login + deploy
Some checks failed
CI — P2 Drift (Go + Node) / saas (push) Successful in 26s
CI — P2 Drift (Go + Node) / agent (push) Successful in 53s
CI — P3 Alert / test (push) Successful in 26s
CI — P5 Cost / test (push) Successful in 22s
CI — P4 Portal / test (push) Successful in 38s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 4s
CI — P3 Alert / build-push (push) Failing after 2s
CI — P6 Run / saas (push) Successful in 22s
CI — P5 Cost / build-push (push) Failing after 2s
CI — P4 Portal / build-push (push) Failing after 3s
CI — P6 Run / build-push (push) Failing after 2s
2026-03-02 13:48:10 +00:00
6b045637be Switch registry to reg.dd0c.net (HTTPS on 443) 2026-03-02 13:31:11 +00:00
09eb22af62 Add comprehensive API reference doc for all 6 products 2026-03-02 05:42:27 +00:00
6181727406 Add docker login to build-push.sh and watch-loop.sh for registry auth (HTTP secret) 2026-03-02 05:40:02 +00:00
18d476f7a0 Target Nas runner (ubuntu-24.04) for build-push jobs — sandbox lacks Docker
Some checks failed
CI — P2 Drift (Go + Node) / saas (push) Successful in 24s
CI — P2 Drift (Go + Node) / agent (push) Successful in 53s
CI — P3 Alert / test (push) Successful in 27s
CI — P5 Cost / test (push) Successful in 23s
CI — P4 Portal / test (push) Successful in 37s
CI — P6 Run / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 17s
CI — P3 Alert / build-push (push) Failing after 17s
CI — P5 Cost / build-push (push) Failing after 11s
CI — P4 Portal / build-push (push) Failing after 14s
CI — P6 Run / build-push (push) Failing after 13s
2026-03-02 05:32:04 +00:00
2df0ce2fff Trigger CI build+push to populate registry at 192.168.86.11:30095
Some checks failed
CI — P4 Portal / test (push) Successful in 36s
CI — P6 Run / saas (push) Successful in 22s
CI — P3 Alert / build-push (push) Failing after 1s
CI — P5 Cost / build-push (push) Failing after 0s
CI — P6 Run / build-push (push) Failing after 0s
CI — P2 Drift (Go + Node) / saas (push) Successful in 27s
CI — P2 Drift (Go + Node) / agent (push) Successful in 52s
CI — P3 Alert / test (push) Successful in 26s
CI — P5 Cost / test (push) Successful in 24s
CI — P4 Portal / build-push (push) Failing after 0s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 41s
2026-03-02 05:29:03 +00:00
6b79d3cbc9 Switch to Brian's registry at 192.168.86.11:30095, add CI build+push+deploy jobs
- All services pull from 192.168.86.11:30095 instead of localhost:5000
- Removed self-hosted registry container (Brian runs his own)
- CI workflows: test → build → push to registry → deploy
- build-push.sh and watch-loop.sh updated with new registry address
2026-03-02 05:28:35 +00:00
1ea42bbb87 Restore build: directives alongside image: tags — allows both local build and registry pull 2026-03-02 05:21:33 +00:00
17ec444dcd Update README: local registry workflow, smart watch-loop, test commands, fix cost port 2026-03-02 05:18:49 +00:00
21bf967e76 Add smart watch-loop.sh: detects changed products, builds only what changed, pushes to registry 2026-03-02 05:16:58 +00:00
41e016e9a6 Add local Docker registry: registry:2 on :5000, build-push.sh, CI auto-deploy
- docker-compose services now pull from localhost:5000 instead of building locally
- build-push.sh builds + pushes all 5 Node images to local registry
- CI workflows get build-push job: test → build → push → deploy
- Deploy becomes: docker compose pull && docker compose up -d
- Eliminates silent git pull + stale Docker cache issues
2026-03-02 05:15:37 +00:00
2c9408b1df Fix indentation on registerWebhookSecretRoutes call
All checks were successful
CI — P3 Alert / test (push) Successful in 23s
2026-03-02 05:11:37 +00:00
571a93953d Fix integration test ordering: move new tests before exit, add runbook execution flow 2026-03-02 05:07:59 +00:00
aea88065b4 Extend integration tests: webhook secrets CRUD + drift report submission/history/deletion 2026-03-02 05:04:11 +00:00
c537022fa8 Add drift report submission + stack deletion endpoints to P2
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / agent (push) Successful in 43s
2026-03-02 05:03:34 +00:00
9f46b84257 Add API to manage webhook secrets for alert integrations
All checks were successful
CI — P3 Alert / test (push) Successful in 15s
2026-03-02 05:00:40 +00:00
9d33c536d5 Add Fly.io deployment script with shared Postgres/Redis, custom domains, auto-stop 2026-03-02 04:57:50 +00:00
380a1b44d5 Fix integration-test.sh payloads to match correct Zod schemas and Fastify empty body requirements 2026-03-02 04:49:30 +00:00
4eda9d7be3 Add .dockerignore to all Node products (skip node_modules/dist/tests in build context)
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / agent (push) Successful in 52s
CI — P3 Alert / test (push) Successful in 29s
CI — P5 Cost / test (push) Successful in 23s
CI — P4 Portal / test (push) Successful in 36s
CI — P6 Run / saas (push) Successful in 21s
2026-03-02 04:45:57 +00:00
d175c3a6e7 Clean up drift: restore Dockerfile name, remove cache bust artifacts
All checks were successful
CI — P2 Drift (Go + Node) / agent (push) Successful in 14s
CI — P2 Drift (Go + Node) / saas (push) Successful in 28s
2026-03-02 04:45:12 +00:00
e0b84f5481 Fix drift SET LOCAL: use string interpolation with UUID validation (SET doesn't support params)
All checks were successful
CI — P2 Drift (Go + Node) / agent (push) Successful in 15s
CI — P2 Drift (Go + Node) / saas (push) Successful in 29s
2026-03-02 03:59:26 +00:00
d55162a047 Force drift rebuild: explicit image tag dd0c-drift:v2 2026-03-02 00:21:12 +00:00
364e411e69 Nuclear cache bust: rename drift Dockerfile to Dockerfile.v2
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 25s
CI — P2 Drift (Go + Node) / agent (push) Successful in 42s
2026-03-02 00:14:43 +00:00
00aaf1a941 Force drift rebuild: add CACHE_BUST build arg to Dockerfile + docker-compose
All checks were successful
CI — P2 Drift (Go + Node) / agent (push) Successful in 10s
CI — P2 Drift (Go + Node) / saas (push) Successful in 27s
2026-03-01 23:06:19 +00:00
cbc9e01807 Cache bust: force drift image rebuild to pick up auth middleware
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 26s
CI — P2 Drift (Go + Node) / agent (push) Successful in 44s
2026-03-01 22:59:34 +00:00
0da2f77035 Fix smoke-test.sh: correct port parsing, webhook slug path, 404 as valid response, arithmetic safety 2026-03-01 22:40:29 +00:00
81d03c1735 Fix tenant slug collision: append random hex suffix to prevent 23505 on duplicate tenant names
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P2 Drift (Go + Node) / agent (push) Successful in 1m6s
CI — P3 Alert / test (push) Successful in 37s
CI — P5 Cost / test (push) Successful in 29s
CI — P4 Portal / test (push) Successful in 48s
CI — P6 Run / saas (push) Successful in 25s
2026-03-01 22:36:21 +00:00
e0d3a3c043 Add auth middleware to P2 Drift (signup/login/API keys), remove pino-pretty dev transport
All checks were successful
CI — P2 Drift (Go + Node) / agent (push) Successful in 53s
CI — P2 Drift (Go + Node) / saas (push) Successful in 52s
2026-03-01 22:24:18 +00:00
23fda04854 Add waitlist modal to marketing site (all 7 pages) — triggers on #waitlist links, Formspree-ready 2026-03-01 22:09:35 +00:00
580872f059 Fix docker-compose: add NODE_ENV=production to all services (drift crashes on pino-pretty in dev mode) 2026-03-01 20:41:47 +00:00
d6d8de16db Fix docker-compose: remap Postgres from :5432 to :5433 (5432 already in use on NAS) 2026-03-01 20:34:49 +00:00
362c94af33 Fix Node Dockerfiles: npm ci --include=dev so tsc is available in builder stage
All checks were successful
CI — P2 Drift (Go + Node) / saas (push) Successful in 34s
CI — P3 Alert / test (push) Successful in 38s
CI — P4 Portal / test (push) Successful in 38s
CI — P6 Run / saas (push) Successful in 39s
CI — P2 Drift (Go + Node) / agent (push) Successful in 1m15s
CI — P5 Cost / test (push) Successful in 1m7s
2026-03-01 19:31:44 +00:00