Some checks failed
CI — P2 Drift (Go + Node) / agent (push) Successful in 47s
CI — P2 Drift (Go + Node) / saas (push) Successful in 36s
CI — P3 Alert / test (push) Successful in 36s
CI — P4 Portal / build-push (push) Failing after 49s
CI — P5 Cost / build-push (push) Failing after 4s
CI — P6 Run / build-push (push) Failing after 4s
CI — P4 Portal / test (push) Successful in 35s
CI — P5 Cost / test (push) Successful in 40s
CI — P6 Run / saas (push) Successful in 36s
CI — P2 Drift (Go + Node) / build-push (push) Failing after 17s
CI — P3 Alert / build-push (push) Failing after 15s
Resolves 11 of the 13 findings: - [CRITICAL] SQLi in RLS: replaced SET LOCAL with parameterized set_config() - [CRITICAL] Rate Limiting: installed and registered @fastify/rate-limit in all 5 apps - [CRITICAL] Invite Hijacking: added email verification check to invite lookup - [HIGH] Webhook HMAC: added Fastify rawBody parser to fix JSON.stringify mangling - [HIGH] TOCTOU Race: added FOR UPDATE to invite lookup - [HIGH] Incident Race: replaced SELECT/INSERT with INSERT ... ON CONFLICT - [MEDIUM] Grafana Timing Attack: replaced === with crypto.timingSafeEqual - [MEDIUM] Insecure Defaults: added NODE_ENV production guard for JWT_SECRET - [LOW] DB Privileges: tightened docker-init-db.sh grants (removed ALL PRIVILEGES) - [LOW] Plaintext Invites: tokens are now hashed (SHA-256) before DB storage/lookup - [LOW] Scrypt: increased N parameter to 65536 for stronger password hashing Note: - Finding #4 (Fragmented Identity) requires a unified auth database architecture. - Finding #8 (getPoolForAuth) is an accepted tradeoff to keep auth middleware clean.
70 lines
6.0 KiB
Markdown
70 lines
6.0 KiB
Markdown
# Adversarial Security & Architecture Review
|
|
**Commit:** eb953cd
|
|
**Focus:** Security Hardening Changes
|
|
|
|
## Overview
|
|
The security hardening pass successfully implemented a few surface-level mitigations, but the implementation is fundamentally flawed. In several cases, the "fixes" are incomplete, introduced new critical vulnerabilities, or suffer from severe architectural oversights. The system is still highly vulnerable.
|
|
|
|
## PASS
|
|
- **JWT Algorithms:** Hardcoding `{ algorithms: ['HS256'] }` correctly mitigates algorithm confusion attacks.
|
|
- **Plugin Encapsulation:** Registering the `authHook` locally inside `protectedApp.addHook` correctly prevents bypasses compared to global URL prefix matching.
|
|
- **CORS:** Locked down correctly to `http://localhost:5173`.
|
|
- **Async Webhooks:** Redis `LPUSH`/`BRPOP` architecture and dead-letter queue provide a reasonable foundation for async processing.
|
|
|
|
## FAIL
|
|
|
|
### 1. [CRITICAL] SQL Injection in RLS Wrapper (`db.ts`)
|
|
**File:** `03-alert-intelligence/src/data/db.ts:13`
|
|
**Issue:** `await client.query(\`SET LOCAL app.tenant_id = '${tenantId}'\`);`
|
|
The `tenantId` is string-interpolated directly into the query instead of being parameterized. While `tenantId` comes from a signed JWT, if the JWT secret is compromised (see issue #10) or user input ever leaks into this function, it allows trivial SQL injection.
|
|
**Fix:** Use `await client.query('SELECT set_config($1, $2, true)', ['app.tenant_id', tenantId]);`
|
|
|
|
### 2. [CRITICAL] Rate Limiting is a Mirage (`index.ts` & `middleware.ts`)
|
|
**File:** `03-alert-intelligence/src/index.ts`
|
|
**Issue:** The route definitions proudly declare `{ config: { rateLimit: { max: 5, timeWindow: '1 minute' } } }`, but the `@fastify/rate-limit` plugin is **never actually registered** on the Fastify app. Furthermore, the planned "100 req/min global" limit is entirely missing. Rate limiting is 100% bypassable right now.
|
|
|
|
### 3. [CRITICAL] Invite Token Hijacking (`middleware.ts`)
|
|
**File:** `03-alert-intelligence/src/auth/middleware.ts:168`
|
|
**Issue:** During signup via an `invite_token`, the server pulls the invite record but *never verifies that `body.email` matches the invited email*. Anyone who obtains the invite link can register an account using their own email address, completely bypassing intended access controls.
|
|
|
|
### 4. [HIGH] Fragmented Identity Architecture (Microservice Anti-Pattern)
|
|
**File:** `03-alert`, `05-cost`, `04-portal` (all products)
|
|
**Issue:** The authentication routes, invite logic, and `users`/`tenants` DB tables have been copy-pasted into *every* microservice. Since each service uses its own separate database (`dd0c_alert`, `dd0c_cost`, etc.), identity is completely siloed. A user who signs up in the Portal doesn't exist in the Alert database. This breaks the unified console experience and means foreign keys will fail if a valid JWT from one service is used against another.
|
|
|
|
### 5. [HIGH] Webhook HMAC Verification Broken on Valid Payloads
|
|
**File:** `03-alert-intelligence/src/api/webhooks.ts`
|
|
**Issue:** Reconstructing the raw body via `const rawBody = JSON.stringify(body);` for HMAC validation is disastrous. Fastify parses the JSON, and stringifying it back will reorder keys or alter whitespace, meaning perfectly valid Datadog/PagerDuty webhooks will fail signature verification randomly. You must use a raw body parser plugin.
|
|
|
|
### 6. [HIGH] Token TOCTOU Race Condition
|
|
**File:** `03-alert-intelligence/src/auth/middleware.ts:175`
|
|
**Issue:** When checking and updating an invite (`SELECT ... FROM tenant_invites` followed by `UPDATE ... SET accepted_at = NOW()`), there is no row-level locking (e.g., `FOR UPDATE`). An attacker can send concurrent requests to use the same invite token multiple times before the first request marks it accepted.
|
|
|
|
### 7. [HIGH] Webhook Incident Creation Race Condition
|
|
**File:** `03-alert-intelligence/src/workers/webhook-processor.ts:51`
|
|
**Issue:** The worker queries `SELECT id FROM incidents WHERE fingerprint = $1`. If no incident exists, it inserts one. If two webhooks for the same fingerprint are popped by parallel workers, both will see no incident and insert duplicate incidents.
|
|
|
|
## WARN
|
|
|
|
### 8. [MEDIUM] `getPoolForAuth()` Defeats Pool Encapsulation
|
|
**File:** `03-alert-intelligence/src/data/db.ts`
|
|
**Issue:** The PR claimed to restrict access to the raw DB pool, but merely added an export named `getPoolForAuth()` which returns the raw `pg.Pool`. This allows any developer to easily bypass the `withTenant` RLS controls by importing the raw pool.
|
|
|
|
### 9. [MEDIUM] Timing Attack Vector in Grafana Webhooks
|
|
**File:** `03-alert-intelligence/src/api/webhooks.ts:80`
|
|
**Issue:** `token !== secret.secret` uses standard string equality instead of `crypto.timingSafeEqual`. While low risk for standard APIs, webhook secrets should be validated securely to prevent timing attacks.
|
|
|
|
### 10. [MEDIUM] Insecure Default Secrets in Infrastructure
|
|
**File:** `docker-compose.yml:51`
|
|
**Issue:** The compose file falls back to `${JWT_SECRET:-dev-secret-change-me-in-production!!}`. If operations forgets to configure `JWT_SECRET` in a production environment, the app fails open with a globally known secret, leading to total environment compromise.
|
|
|
|
### 11. [LOW] "Least Privilege" Misnomer
|
|
**File:** `docker-init-db.sh:15`
|
|
**Issue:** The script claims to create "least-privilege" users, but immediately runs `GRANT ALL PRIVILEGES ON DATABASE` and `GRANT ALL ON SCHEMA public`. The service users have full DDL rights and can DROP or TRUNCATE tables.
|
|
|
|
### 12. [LOW] Plaintext Invite Tokens in Database
|
|
**File:** `shared/003_invites.sql`
|
|
**Issue:** `tenant_invites.token` is stored as plaintext. If the database is dumped or compromised, the attacker has a list of active invite links. They should be stored as SHA-256 hashes, similar to API keys.
|
|
|
|
### 13. [LOW] Weak Password Hashing Defaults
|
|
**File:** `03-alert-intelligence/src/auth/middleware.ts:84`
|
|
**Issue:** `crypto.scrypt` is called without explicit options, defaulting to Node.js's native `N=16384`. This is considered too low for modern offline cracking resistance. Explicit parameters (e.g., `N=65536`) should be provided. |