Files
dd0c/products/SECURITY-REVIEW-POST-HARDENING.md

70 lines
6.0 KiB
Markdown
Raw Permalink Normal View History

# Adversarial Security & Architecture Review
**Commit:** eb953cd
**Focus:** Security Hardening Changes
## Overview
The security hardening pass successfully implemented a few surface-level mitigations, but the implementation is fundamentally flawed. In several cases, the "fixes" are incomplete, introduced new critical vulnerabilities, or suffer from severe architectural oversights. The system is still highly vulnerable.
## PASS
- **JWT Algorithms:** Hardcoding `{ algorithms: ['HS256'] }` correctly mitigates algorithm confusion attacks.
- **Plugin Encapsulation:** Registering the `authHook` locally inside `protectedApp.addHook` correctly prevents bypasses compared to global URL prefix matching.
- **CORS:** Locked down correctly to `http://localhost:5173`.
- **Async Webhooks:** Redis `LPUSH`/`BRPOP` architecture and dead-letter queue provide a reasonable foundation for async processing.
## FAIL
### 1. [CRITICAL] SQL Injection in RLS Wrapper (`db.ts`)
**File:** `03-alert-intelligence/src/data/db.ts:13`
**Issue:** `await client.query(\`SET LOCAL app.tenant_id = '${tenantId}'\`);`
The `tenantId` is string-interpolated directly into the query instead of being parameterized. While `tenantId` comes from a signed JWT, if the JWT secret is compromised (see issue #10) or user input ever leaks into this function, it allows trivial SQL injection.
**Fix:** Use `await client.query('SELECT set_config($1, $2, true)', ['app.tenant_id', tenantId]);`
### 2. [CRITICAL] Rate Limiting is a Mirage (`index.ts` & `middleware.ts`)
**File:** `03-alert-intelligence/src/index.ts`
**Issue:** The route definitions proudly declare `{ config: { rateLimit: { max: 5, timeWindow: '1 minute' } } }`, but the `@fastify/rate-limit` plugin is **never actually registered** on the Fastify app. Furthermore, the planned "100 req/min global" limit is entirely missing. Rate limiting is 100% bypassable right now.
### 3. [CRITICAL] Invite Token Hijacking (`middleware.ts`)
**File:** `03-alert-intelligence/src/auth/middleware.ts:168`
**Issue:** During signup via an `invite_token`, the server pulls the invite record but *never verifies that `body.email` matches the invited email*. Anyone who obtains the invite link can register an account using their own email address, completely bypassing intended access controls.
### 4. [HIGH] Fragmented Identity Architecture (Microservice Anti-Pattern)
**File:** `03-alert`, `05-cost`, `04-portal` (all products)
**Issue:** The authentication routes, invite logic, and `users`/`tenants` DB tables have been copy-pasted into *every* microservice. Since each service uses its own separate database (`dd0c_alert`, `dd0c_cost`, etc.), identity is completely siloed. A user who signs up in the Portal doesn't exist in the Alert database. This breaks the unified console experience and means foreign keys will fail if a valid JWT from one service is used against another.
### 5. [HIGH] Webhook HMAC Verification Broken on Valid Payloads
**File:** `03-alert-intelligence/src/api/webhooks.ts`
**Issue:** Reconstructing the raw body via `const rawBody = JSON.stringify(body);` for HMAC validation is disastrous. Fastify parses the JSON, and stringifying it back will reorder keys or alter whitespace, meaning perfectly valid Datadog/PagerDuty webhooks will fail signature verification randomly. You must use a raw body parser plugin.
### 6. [HIGH] Token TOCTOU Race Condition
**File:** `03-alert-intelligence/src/auth/middleware.ts:175`
**Issue:** When checking and updating an invite (`SELECT ... FROM tenant_invites` followed by `UPDATE ... SET accepted_at = NOW()`), there is no row-level locking (e.g., `FOR UPDATE`). An attacker can send concurrent requests to use the same invite token multiple times before the first request marks it accepted.
### 7. [HIGH] Webhook Incident Creation Race Condition
**File:** `03-alert-intelligence/src/workers/webhook-processor.ts:51`
**Issue:** The worker queries `SELECT id FROM incidents WHERE fingerprint = $1`. If no incident exists, it inserts one. If two webhooks for the same fingerprint are popped by parallel workers, both will see no incident and insert duplicate incidents.
## WARN
### 8. [MEDIUM] `getPoolForAuth()` Defeats Pool Encapsulation
**File:** `03-alert-intelligence/src/data/db.ts`
**Issue:** The PR claimed to restrict access to the raw DB pool, but merely added an export named `getPoolForAuth()` which returns the raw `pg.Pool`. This allows any developer to easily bypass the `withTenant` RLS controls by importing the raw pool.
### 9. [MEDIUM] Timing Attack Vector in Grafana Webhooks
**File:** `03-alert-intelligence/src/api/webhooks.ts:80`
**Issue:** `token !== secret.secret` uses standard string equality instead of `crypto.timingSafeEqual`. While low risk for standard APIs, webhook secrets should be validated securely to prevent timing attacks.
### 10. [MEDIUM] Insecure Default Secrets in Infrastructure
**File:** `docker-compose.yml:51`
**Issue:** The compose file falls back to `${JWT_SECRET:-dev-secret-change-me-in-production!!}`. If operations forgets to configure `JWT_SECRET` in a production environment, the app fails open with a globally known secret, leading to total environment compromise.
### 11. [LOW] "Least Privilege" Misnomer
**File:** `docker-init-db.sh:15`
**Issue:** The script claims to create "least-privilege" users, but immediately runs `GRANT ALL PRIVILEGES ON DATABASE` and `GRANT ALL ON SCHEMA public`. The service users have full DDL rights and can DROP or TRUNCATE tables.
### 12. [LOW] Plaintext Invite Tokens in Database
**File:** `shared/003_invites.sql`
**Issue:** `tenant_invites.token` is stored as plaintext. If the database is dumped or compromised, the attacker has a list of active invite links. They should be stored as SHA-256 hashes, similar to API keys.
### 13. [LOW] Weak Password Hashing Defaults
**File:** `03-alert-intelligence/src/auth/middleware.ts:84`
**Issue:** `crypto.scrypt` is called without explicit options, defaulting to Node.js's native `N=16384`. This is considered too low for modern offline cracking resistance. Explicit parameters (e.g., `N=65536`) should be provided.