Add dual-mode deployment addendums for all 6 products
P1 route: 16 pts (template, full docker-compose + install script) P2 drift: 17 pts (pgmq, local CA for mTLS) P3 alert: 19 pts (Lambda→Fastify, DynamoDB→PG JSONB) P4 portal: 18 pts (Step Functions→cron, Aurora→PG+pgvector) P5 cost: 19 pts (EventBridge→agent/polling, DynamoDB→PG JSONB) P6 run: 15 pts (easiest — already PG-native, no AWS deps in core) Total self-hosted effort: ~104 story points across all 6 products
This commit is contained in:
@@ -0,0 +1,61 @@
|
||||
# dd0c/drift — Dual-Mode Deployment Addendum
|
||||
|
||||
**Template:** Based on dd0c/route dual-mode pattern (`01-llm-cost-router/architecture/dual-mode-addendum.md`)
|
||||
|
||||
---
|
||||
|
||||
## Cloud → Self-Hosted Service Mapping
|
||||
|
||||
| Cloud Service | Self-Hosted Replacement | Notes |
|
||||
|--------------|----------------------|-------|
|
||||
| SQS FIFO | PostgreSQL pgmq | Agent pushes drift reports to pgmq instead of SQS |
|
||||
| RDS PostgreSQL | PostgreSQL container | Same schema, same RLS |
|
||||
| Cognito | Local JWT (HS256) | Same AuthProvider trait pattern |
|
||||
| S3 (drift report archive) | MinIO or local FS | Configurable via ObjectStore trait |
|
||||
| CloudWatch | Prometheus + Grafana | Bundled in compose |
|
||||
| SES | SMTP relay | For email notifications |
|
||||
| KMS | Local AES-256-GCM | Key file mounted as volume |
|
||||
|
||||
## Self-Hosted Compose Services
|
||||
|
||||
```yaml
|
||||
services:
|
||||
agent-gateway: # gRPC endpoint for agents (replaces SQS ingestion)
|
||||
image: ghcr.io/dd0c/drift-gateway:latest
|
||||
event-processor: # Normalizes drift reports, scores severity
|
||||
image: ghcr.io/dd0c/drift-processor:latest
|
||||
api: # Dashboard API
|
||||
image: ghcr.io/dd0c/drift-api:latest
|
||||
dashboard: # React SPA
|
||||
image: ghcr.io/dd0c/drift-dashboard:latest
|
||||
postgres: # Config + drift data (with RLS)
|
||||
image: postgres:16-alpine
|
||||
redis: # mTLS cert cache, circuit breakers
|
||||
image: redis:7-alpine
|
||||
caddy: # Reverse proxy + auto-TLS
|
||||
image: caddy:2-alpine
|
||||
```
|
||||
|
||||
## Agent Changes
|
||||
|
||||
The Go agent already connects via gRPC — it just needs a configurable endpoint:
|
||||
- Cloud: `grpcs://ingest.drift.dd0c.dev`
|
||||
- Self-hosted: `grpc://localhost:50051` (or user's domain with Caddy TLS)
|
||||
|
||||
mTLS certs: self-hosted uses a local CA (generated during install) instead of ACM.
|
||||
|
||||
## Epic Impact
|
||||
|
||||
| Epic | Change | Effort |
|
||||
|------|--------|--------|
|
||||
| 1 (Agent) | Add configurable gRPC endpoint | 1 pt |
|
||||
| 2 (Communication) | Local CA for mTLS, pgmq instead of SQS | 3 pts |
|
||||
| 3 (Event Processor) | Already PostgreSQL — no change | 0 |
|
||||
| 4 (Notifications) | SMTP fallback | 1 pt |
|
||||
| 5 (Remediation) | No change — agent-side | 0 |
|
||||
| 6 (Dashboard UI) | Local login form | 2 pts |
|
||||
| 7 (Dashboard API) | LocalAuthProvider | 2 pts |
|
||||
| 8 (Infrastructure) | docker-compose.yml + install.sh | 5 pts |
|
||||
| 9 (Onboarding) | Local signup, remove Stripe req | 3 pts |
|
||||
| 10 (TF Tenets) | No change | 0 |
|
||||
| **Total** | | **17 pts** |
|
||||
@@ -0,0 +1,77 @@
|
||||
# dd0c/alert — Dual-Mode Deployment Addendum
|
||||
|
||||
**Template:** Based on dd0c/route dual-mode pattern
|
||||
|
||||
---
|
||||
|
||||
## Cloud → Self-Hosted Service Mapping
|
||||
|
||||
| Cloud Service | Self-Hosted Replacement | Notes |
|
||||
|--------------|----------------------|-------|
|
||||
| API Gateway + Lambda | Fastify/Express container | Webhook ingestion endpoint |
|
||||
| SQS | PostgreSQL pgmq | Between ingestion and correlation |
|
||||
| ECS Fargate (Correlation) | Docker container | Same Node.js code |
|
||||
| DynamoDB | PostgreSQL (JSONB) | Incidents stored as JSONB with GIN indexes |
|
||||
| TimescaleDB on RDS | TimescaleDB container | Analytics unchanged |
|
||||
| Cognito | Local JWT (HS256) | AuthProvider pattern |
|
||||
| S3 (raw payload archive) | Local FS or MinIO | ObjectStore trait |
|
||||
| SES | SMTP relay | Email notifications |
|
||||
| EventBridge | Cron container | Scheduled tasks |
|
||||
|
||||
## Self-Hosted Compose Services
|
||||
|
||||
```yaml
|
||||
services:
|
||||
ingestion: # Webhook endpoint (replaces API Gateway + Lambda)
|
||||
image: ghcr.io/dd0c/alert-ingestion:latest
|
||||
correlation: # Correlation engine (replaces ECS Fargate)
|
||||
image: ghcr.io/dd0c/alert-correlation:latest
|
||||
api: # Dashboard API
|
||||
image: ghcr.io/dd0c/alert-api:latest
|
||||
dashboard: # React SPA
|
||||
image: ghcr.io/dd0c/alert-dashboard:latest
|
||||
postgres: # Incidents (JSONB) + config
|
||||
image: postgres:16-alpine
|
||||
timescaledb: # Analytics
|
||||
image: timescale/timescaledb:latest-pg16
|
||||
redis: # Sliding windows for correlation, circuit breakers
|
||||
image: redis:7-alpine
|
||||
caddy: # Reverse proxy + auto-TLS
|
||||
image: caddy:2-alpine
|
||||
```
|
||||
|
||||
## Key Difference: DynamoDB → PostgreSQL JSONB
|
||||
|
||||
DynamoDB Single-Table design maps to PostgreSQL JSONB with partition-like indexes:
|
||||
|
||||
```sql
|
||||
CREATE TABLE incidents (
|
||||
id UUID PRIMARY KEY,
|
||||
tenant_id TEXT NOT NULL,
|
||||
data JSONB NOT NULL,
|
||||
severity TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'open',
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
CREATE INDEX idx_incidents_tenant ON incidents(tenant_id);
|
||||
CREATE INDEX idx_incidents_data ON incidents USING GIN(data);
|
||||
-- RLS policy
|
||||
ALTER TABLE incidents ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY tenant_isolation ON incidents USING (tenant_id = current_setting('app.tenant_id'));
|
||||
```
|
||||
|
||||
## Epic Impact
|
||||
|
||||
| Epic | Change | Effort |
|
||||
|------|--------|--------|
|
||||
| 1 (Webhook Ingestion) | Lambda → Fastify container | 3 pts |
|
||||
| 2 (Normalization) | No change — pure logic | 0 |
|
||||
| 3 (Correlation) | pgmq instead of SQS, same Redis | 2 pts |
|
||||
| 4 (Notifications) | SMTP fallback | 1 pt |
|
||||
| 5 (Slack Bot) | No change | 0 |
|
||||
| 6 (Dashboard API) | LocalAuthProvider, DynamoDB→PG | 3 pts |
|
||||
| 7 (Dashboard UI) | Local login form | 2 pts |
|
||||
| 8 (Infrastructure) | docker-compose.yml + install.sh | 5 pts |
|
||||
| 9 (Onboarding) | Local signup, remove Stripe req | 3 pts |
|
||||
| 10 (TF Tenets) | No change | 0 |
|
||||
| **Total** | | **19 pts** |
|
||||
@@ -0,0 +1,65 @@
|
||||
# dd0c/portal — Dual-Mode Deployment Addendum
|
||||
|
||||
**Template:** Based on dd0c/route dual-mode pattern
|
||||
|
||||
---
|
||||
|
||||
## Cloud → Self-Hosted Service Mapping
|
||||
|
||||
| Cloud Service | Self-Hosted Replacement | Notes |
|
||||
|--------------|----------------------|-------|
|
||||
| Aurora Serverless v2 | PostgreSQL container | Same schema, pgvector extension |
|
||||
| Step Functions | Temporal or simple cron | Discovery orchestration |
|
||||
| Lambda (scanners) | Scanner containers | Same code, containerized |
|
||||
| Cognito | Local JWT (HS256) | AuthProvider pattern |
|
||||
| Meilisearch (managed) | Meilisearch container | Already self-hostable |
|
||||
| S3 | Local FS or MinIO | Discovery artifacts |
|
||||
| SES | SMTP relay | Notifications |
|
||||
| CloudWatch | Prometheus + Grafana | Bundled |
|
||||
|
||||
## Self-Hosted Compose Services
|
||||
|
||||
```yaml
|
||||
services:
|
||||
api: # Dashboard API + discovery orchestrator
|
||||
image: ghcr.io/dd0c/portal-api:latest
|
||||
scanner-aws: # AWS discovery scanner
|
||||
image: ghcr.io/dd0c/portal-scanner-aws:latest
|
||||
scanner-github: # GitHub discovery scanner
|
||||
image: ghcr.io/dd0c/portal-scanner-github:latest
|
||||
dashboard: # React SPA with Cmd+K search
|
||||
image: ghcr.io/dd0c/portal-dashboard:latest
|
||||
postgres: # Catalog + pgvector
|
||||
image: pgvector/pgvector:pg16
|
||||
meilisearch: # Full-text search
|
||||
image: getmeili/meilisearch:latest
|
||||
volumes:
|
||||
- meili_data:/meili_data
|
||||
redis: # Prefix cache for Cmd+K
|
||||
image: redis:7-alpine
|
||||
caddy:
|
||||
image: caddy:2-alpine
|
||||
```
|
||||
|
||||
## Key Difference: Step Functions → Cron
|
||||
|
||||
Self-hosted replaces Step Functions with a simple cron scheduler inside the API container:
|
||||
- AWS scan: every 6 hours
|
||||
- GitHub scan: every 4 hours
|
||||
- Scans run as background tasks, progress streamed via WebSocket (same as cloud)
|
||||
|
||||
## Epic Impact
|
||||
|
||||
| Epic | Change | Effort |
|
||||
|------|--------|--------|
|
||||
| 1 (AWS Scanner) | Lambda → container, Step Functions → cron | 3 pts |
|
||||
| 2 (GitHub Scanner) | Lambda → container | 2 pts |
|
||||
| 3 (Service Catalog) | Aurora → PostgreSQL container (same schema) | 1 pt |
|
||||
| 4 (Search) | Already Meilisearch — no change | 0 |
|
||||
| 5 (Dashboard UI) | Local login form | 2 pts |
|
||||
| 6 (Analytics) | No change | 0 |
|
||||
| 7 (Dashboard API) | LocalAuthProvider | 2 pts |
|
||||
| 8 (Infrastructure) | docker-compose.yml + install.sh | 5 pts |
|
||||
| 9 (Onboarding) | Local signup, remove Stripe req, WebSocket same | 3 pts |
|
||||
| 10 (TF Tenets) | No change | 0 |
|
||||
| **Total** | | **18 pts** |
|
||||
@@ -0,0 +1,88 @@
|
||||
# dd0c/cost — Dual-Mode Deployment Addendum
|
||||
|
||||
**Template:** Based on dd0c/route dual-mode pattern
|
||||
|
||||
---
|
||||
|
||||
## Cloud → Self-Hosted Service Mapping
|
||||
|
||||
| Cloud Service | Self-Hosted Replacement | Notes |
|
||||
|--------------|----------------------|-------|
|
||||
| EventBridge | Webhook polling + cron | Customer pushes CloudTrail logs or dd0c polls |
|
||||
| SQS FIFO | PostgreSQL pgmq | Event queue |
|
||||
| Lambda (normalizer) | Container process | Same TypeScript code |
|
||||
| DynamoDB | PostgreSQL (JSONB) | Single-table → JSONB with GIN indexes |
|
||||
| Cognito | Local JWT (HS256) | AuthProvider pattern |
|
||||
| STS (cross-account) | Direct IAM credentials | Customer provides access key or role ARN |
|
||||
| S3 | Local FS or MinIO | Raw event archive |
|
||||
| SES | SMTP relay | Digest emails |
|
||||
|
||||
## Self-Hosted Compose Services
|
||||
|
||||
```yaml
|
||||
services:
|
||||
ingestion: # CloudTrail event normalizer
|
||||
image: ghcr.io/dd0c/cost-ingestion:latest
|
||||
scorer: # Anomaly detection (Z-score, Welford, novelty)
|
||||
image: ghcr.io/dd0c/cost-scorer:latest
|
||||
zombie-hunter: # Daily idle resource scanner
|
||||
image: ghcr.io/dd0c/cost-zombie:latest
|
||||
api: # Dashboard API
|
||||
image: ghcr.io/dd0c/cost-api:latest
|
||||
dashboard: # React SPA
|
||||
image: ghcr.io/dd0c/cost-dashboard:latest
|
||||
postgres: # All data (JSONB), baselines, config
|
||||
image: postgres:16-alpine
|
||||
redis: # Panic mode, governance flags, circuit breakers
|
||||
image: redis:7-alpine
|
||||
caddy:
|
||||
image: caddy:2-alpine
|
||||
```
|
||||
|
||||
## Key Difference: EventBridge → Polling/Push
|
||||
|
||||
Self-hosted mode can't use EventBridge cross-account rules. Two alternatives:
|
||||
1. **Push mode:** Customer configures CloudTrail to send to an S3 bucket, dd0c polls the bucket
|
||||
2. **Agent mode:** Lightweight Go agent in customer VPC forwards CloudTrail events via gRPC (same pattern as dd0c/drift)
|
||||
|
||||
Agent mode is recommended — reuses the dd0c/drift agent pattern.
|
||||
|
||||
## Key Difference: DynamoDB → PostgreSQL JSONB
|
||||
|
||||
Same pattern as dd0c/alert:
|
||||
```sql
|
||||
CREATE TABLE cost_events (
|
||||
id UUID PRIMARY KEY,
|
||||
tenant_id TEXT NOT NULL,
|
||||
account_id TEXT NOT NULL,
|
||||
data JSONB NOT NULL,
|
||||
severity TEXT,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
CREATE TABLE baselines (
|
||||
tenant_id TEXT NOT NULL,
|
||||
account_id TEXT NOT NULL,
|
||||
resource_type TEXT NOT NULL,
|
||||
mean_hourly_cost NUMERIC(12,4),
|
||||
stddev NUMERIC(12,4),
|
||||
event_count INTEGER DEFAULT 0,
|
||||
observed_actors JSONB DEFAULT '[]',
|
||||
PRIMARY KEY (tenant_id, account_id, resource_type)
|
||||
);
|
||||
```
|
||||
|
||||
## Epic Impact
|
||||
|
||||
| Epic | Change | Effort |
|
||||
|------|--------|--------|
|
||||
| 1 (CloudTrail Ingestion) | EventBridge → agent/polling, SQS → pgmq | 4 pts |
|
||||
| 2 (Anomaly Detection) | No change — pure math | 0 |
|
||||
| 3 (Zombie Hunter) | Direct AWS API calls (same) | 0 |
|
||||
| 4 (Notifications) | SMTP fallback | 1 pt |
|
||||
| 5 (Onboarding) | No CFN quick-create; manual IAM setup guide | 3 pts |
|
||||
| 6 (Dashboard API) | LocalAuthProvider, DynamoDB → PG | 3 pts |
|
||||
| 7 (Dashboard UI) | Local login form | 2 pts |
|
||||
| 8 (Infrastructure) | docker-compose.yml + install.sh | 5 pts |
|
||||
| 9 (Multi-Account) | Same — just different credential input | 1 pt |
|
||||
| 10 (TF Tenets) | No change | 0 |
|
||||
| **Total** | | **19 pts** |
|
||||
@@ -0,0 +1,69 @@
|
||||
# dd0c/run — Dual-Mode Deployment Addendum
|
||||
|
||||
**Template:** Based on dd0c/route dual-mode pattern
|
||||
|
||||
---
|
||||
|
||||
## Cloud → Self-Hosted Service Mapping
|
||||
|
||||
| Cloud Service | Self-Hosted Replacement | Notes |
|
||||
|--------------|----------------------|-------|
|
||||
| RDS PostgreSQL | PostgreSQL container | Same schema, same RLS, same audit trail |
|
||||
| Cognito | Local JWT (HS256) | AuthProvider pattern |
|
||||
| S3 (compliance exports) | Local FS or MinIO | ObjectStore trait |
|
||||
| SES | SMTP relay | Notifications |
|
||||
| CloudWatch | Prometheus + Grafana | Bundled |
|
||||
| KMS (audit encryption) | Local AES-256-GCM | Key file mounted as volume |
|
||||
|
||||
## Self-Hosted Compose Services
|
||||
|
||||
```yaml
|
||||
services:
|
||||
engine: # Parser + Classifier + Execution Engine (Rust)
|
||||
image: ghcr.io/dd0c/run-engine:latest
|
||||
api: # Dashboard API
|
||||
image: ghcr.io/dd0c/run-api:latest
|
||||
dashboard: # React SPA (parse preview, execution timeline)
|
||||
image: ghcr.io/dd0c/run-dashboard:latest
|
||||
postgres: # Config + audit trail (RLS, hash chain)
|
||||
image: postgres:16-alpine
|
||||
redis: # Panic mode, execution locks
|
||||
image: redis:7-alpine
|
||||
caddy:
|
||||
image: caddy:2-alpine
|
||||
```
|
||||
|
||||
## Key Advantage: dd0c/run is Already Self-Host Friendly
|
||||
|
||||
dd0c/run has the simplest self-hosted story of all 6 products:
|
||||
- The Go agent already runs in customer VPCs
|
||||
- The SaaS is already PostgreSQL-native (no DynamoDB)
|
||||
- gRPC between agent and SaaS works the same locally
|
||||
- No EventBridge/SQS/Step Functions dependencies
|
||||
|
||||
The main change is auth and the install script.
|
||||
|
||||
## Agent Connection
|
||||
|
||||
- Cloud: `grpcs://engine.run.dd0c.dev`
|
||||
- Self-hosted: `grpc://localhost:50051` (or Caddy TLS)
|
||||
|
||||
Agent binary is the same — just different `--server` flag.
|
||||
|
||||
## Epic Impact
|
||||
|
||||
| Epic | Change | Effort |
|
||||
|------|--------|--------|
|
||||
| 1 (Parser) | No change — pure Rust | 0 |
|
||||
| 2 (Classifier) | No change — pure Rust | 0 |
|
||||
| 3 (Execution Engine) | No change — pure Rust | 0 |
|
||||
| 4 (Agent) | Configurable gRPC endpoint | 1 pt |
|
||||
| 5 (Audit Trail) | KMS → local AES-256-GCM | 2 pts |
|
||||
| 6 (Dashboard API) | LocalAuthProvider | 2 pts |
|
||||
| 7 (Dashboard UI) | Local login form | 2 pts |
|
||||
| 8 (Infrastructure) | docker-compose.yml + install.sh | 5 pts |
|
||||
| 9 (Onboarding) | Local signup, remove Stripe req | 3 pts |
|
||||
| 10 (TF Tenets) | No change | 0 |
|
||||
| **Total** | | **15 pts** |
|
||||
|
||||
*dd0c/run is the easiest product to self-host. Recommend it as the second self-hosted release after dd0c/route.*
|
||||
Reference in New Issue
Block a user