Implement review remediation + PLG analytics SDK

- All 6 test architectures patched with Section 11 addendums
- P5 (cost) fully rewritten from 232 to ~600 lines
- PLG brainstorm + party mode advisory board results
- Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe)
- Analytics tests v2 (safeParse, no , no timestamp, no PII)
- Addresses all Gemini review findings across P1-P6
This commit is contained in:
2026-03-01 01:42:49 +00:00
parent 2fe0ed856e
commit 03bfe931fc
9 changed files with 2950 additions and 85 deletions

View File

@@ -0,0 +1,138 @@
import { PostHog } from 'posthog-node';
import { z } from 'zod';
// ---------------------------------------------------------
// 1. Unified Event Taxonomy (Zod Enforced, Strictly Typed)
// ---------------------------------------------------------
export enum EventName {
SignupCompleted = 'account.signup.completed',
FirstDollarSaved = 'routing.savings.first_dollar',
UpgradeCompleted = 'billing.upgrade.completed',
}
// Per-event property schemas — no z.any() PII loophole
const SignupProperties = z.object({
method: z.enum(['github_sso', 'google_sso', 'email']),
}).strict();
const ActivationProperties = z.object({
savings_amount: z.number().nonnegative(),
}).strict();
const UpgradeProperties = z.object({
plan: z.enum(['pro', 'business']),
mrr_increase: z.number().nonnegative(),
}).strict();
const PropertiesMap = {
[EventName.SignupCompleted]: SignupProperties,
[EventName.FirstDollarSaved]: ActivationProperties,
[EventName.UpgradeCompleted]: UpgradeProperties,
} as const;
export const EventSchema = z.object({
name: z.nativeEnum(EventName),
tenant_id: z.string().min(1, 'tenant_id is required'),
product: z.literal('route'),
properties: z.record(z.unknown()).optional().default({}),
});
export type AnalyticsEvent = z.infer<typeof EventSchema>;
// ---------------------------------------------------------
// 2. NoOp Client for local/test environments
// ---------------------------------------------------------
class NoOpPostHog {
capture() {}
identify() {}
async flushAsync() {}
async shutdown() {}
}
// ---------------------------------------------------------
// 3. Analytics SDK (PostHog Cloud, Lambda-Safe)
// ---------------------------------------------------------
export class Analytics {
private client: PostHog | NoOpPostHog;
public readonly isSessionReplayEnabled = false;
constructor(client?: PostHog) {
if (client) {
this.client = client;
} else {
const apiKey = process.env.POSTHOG_API_KEY;
if (!apiKey) {
// No key = NoOp. Never silently send to a mock key.
console.warn('[Analytics] POSTHOG_API_KEY not set — using NoOp client');
this.client = new NoOpPostHog();
} else {
this.client = new PostHog(apiKey, {
host: 'https://us.i.posthog.com',
flushAt: 20, // Batch up to 20 events
flushInterval: 5000, // Or flush every 5s
});
}
}
}
/**
* Identify a tenant once (on signup). Sets $set properties.
* Call this instead of embedding $set in every track() call.
*/
public identify(tenantId: string, properties?: Record<string, unknown>): void {
this.client.identify({
distinctId: tenantId,
properties: { tenant_id: tenantId, ...properties },
});
}
/**
* Track an event. Uses safeParse — never crashes the caller.
* Does NOT flush. Call flush() at Lambda teardown.
*/
public track(event: AnalyticsEvent): boolean {
// 1. Base schema validation
const baseResult = EventSchema.safeParse(event);
if (!baseResult.success) {
console.error('[Analytics] Invalid event (base):', baseResult.error.format());
return false;
}
// 2. Per-event property validation (strict, no PII loophole)
const propSchema = PropertiesMap[baseResult.data.name];
if (propSchema) {
const propResult = propSchema.safeParse(baseResult.data.properties);
if (!propResult.success) {
console.error('[Analytics] Invalid properties:', propResult.error.format());
return false;
}
}
// 3. Capture — let PostHog assign the timestamp (avoids clock skew)
this.client.capture({
distinctId: baseResult.data.tenant_id,
event: baseResult.data.name,
properties: {
product: baseResult.data.product,
...baseResult.data.properties,
},
});
return true;
}
/**
* Flush all queued events. Call once at Lambda teardown
* (e.g., in a Middy middleware or handler's finally block).
*/
public async flush(): Promise<void> {
await this.client.flushAsync();
}
public async shutdown(): Promise<void> {
await this.client.shutdown();
}
}

View File

@@ -2239,3 +2239,315 @@ Before writing any new function, ask:
*Test Architecture document generated for dd0c/route V1 MVP.*
*Total estimated test count at V1 launch: ~400 tests.*
*Target CI runtime: <8 minutes (unit + integration), <15 minutes (full pipeline with E2E).*
---
## 11. Review Remediation Addendum (Post-Gemini Review)
### 11.1 Replace MockKeyCache/MockKeyStore with Testcontainers
```rust
// BEFORE (anti-pattern — mocks hide real latency):
// let cache = MockKeyCache::new();
// let store = MockKeyStore::new();
// AFTER: Use Testcontainers for hot-path auth tests
#[tokio::test]
async fn auth_middleware_validates_key_under_5ms_with_real_redis() {
let redis = TestcontainersRedis::start().await;
let pg = TestcontainersPostgres::start().await;
let cache = RedisKeyCache::new(redis.connection_string());
let store = PgKeyStore::new(pg.connection_string());
let start = Instant::now();
let result = auth_middleware(&cache, &store, "sk-valid-key").await;
assert!(start.elapsed() < Duration::from_millis(5));
assert!(result.is_ok());
}
#[tokio::test]
async fn auth_middleware_handles_redis_connection_pool_exhaustion() {
// Exhaust all connections, verify fallback to PG
let redis = TestcontainersRedis::start().await;
let cache = RedisKeyCache::with_pool_size(redis.connection_string(), 1);
// Hold the single connection
let _held = cache.raw_connection().await;
// Auth must still work via PG fallback
let result = auth_middleware(&cache, &pg_store, "sk-valid-key").await;
assert!(result.is_ok());
}
```
### 11.2 Fix Encryption Test (Decrypt, Don't Just Assert Non-Plaintext)
```rust
// BEFORE (anti-pattern — passes if stored as random garbage):
// assert_ne!(stored.encrypted_key, b"sk-plaintext-key");
// AFTER: Full round-trip encryption test
#[tokio::test]
async fn provider_credential_encrypts_and_decrypts_correctly() {
let kms = LocalStackKMS::start().await;
let key_id = kms.create_key().await;
let store = CredentialStore::new(pg.pool(), kms.client(), key_id);
let original = "sk-live-abc123xyz";
store.save_credential("org-1", "openai", original).await.unwrap();
// Read raw from DB — must NOT be plaintext
let raw = pg.query_raw("SELECT encrypted_key FROM credentials LIMIT 1").await;
assert!(!String::from_utf8_lossy(&raw).contains(original));
// Decrypt via the store — must match original
let decrypted = store.get_credential("org-1", "openai").await.unwrap();
assert_eq!(decrypted, original);
}
#[tokio::test]
async fn kms_key_rotation_old_deks_still_decrypt_old_credentials() {
let kms = LocalStackKMS::start().await;
let key_id = kms.create_key().await;
let store = CredentialStore::new(pg.pool(), kms.client(), key_id);
// Save with original key
store.save_credential("org-1", "openai", "sk-old").await.unwrap();
// Rotate KMS key
kms.rotate_key(key_id).await;
// Old credential must still decrypt
let decrypted = store.get_credential("org-1", "openai").await.unwrap();
assert_eq!(decrypted, "sk-old");
// New credential uses new DEK
store.save_credential("org-1", "anthropic", "sk-new").await.unwrap();
let decrypted_new = store.get_credential("org-1", "anthropic").await.unwrap();
assert_eq!(decrypted_new, "sk-new");
}
```
### 11.3 Slow Dependency Chaos Test
```rust
#[tokio::test]
async fn chaos_slow_db_does_not_block_proxy_hot_path() {
let stack = E2EStack::start().await;
// Inject 5-second network delay on TimescaleDB port via tc netem
stack.inject_latency("timescaledb", Duration::from_secs(5)).await;
// Proxy must still route requests within SLA
let start = Instant::now();
let resp = stack.proxy()
.post("/v1/chat/completions")
.header("Authorization", "Bearer sk-valid")
.json(&chat_request())
.send().await;
let latency = start.elapsed();
assert_eq!(resp.status(), 200);
// Telemetry is dropped, but routing works
assert!(latency < Duration::from_millis(50),
"Proxy blocked by slow DB: {:?}", latency);
}
#[tokio::test]
async fn chaos_slow_redis_falls_back_to_pg_for_auth() {
let stack = E2EStack::start().await;
stack.inject_latency("redis", Duration::from_secs(3)).await;
let resp = stack.proxy()
.post("/v1/chat/completions")
.header("Authorization", "Bearer sk-valid")
.json(&chat_request())
.send().await;
assert_eq!(resp.status(), 200);
}
```
### 11.4 IDOR / Cross-Tenant Test Suite
```rust
// tests/integration/idor_test.rs
#[tokio::test]
async fn idor_org_a_cannot_read_org_b_routing_rules() {
let stack = E2EStack::start().await;
let org_a_token = stack.create_org_and_token("org-a").await;
let org_b_token = stack.create_org_and_token("org-b").await;
// Org B creates a routing rule
let rule = stack.api()
.post("/v1/routing-rules")
.bearer_auth(&org_b_token)
.json(&json!({ "name": "secret-rule", "model": "gpt-4" }))
.send().await.json::<RoutingRule>().await;
// Org A tries to read it
let resp = stack.api()
.get(&format!("/v1/routing-rules/{}", rule.id))
.bearer_auth(&org_a_token)
.send().await;
assert_eq!(resp.status(), 404); // Not 403 — don't leak existence
}
#[tokio::test]
async fn idor_org_a_cannot_read_org_b_api_keys() {
// Same pattern — create key as org B, attempt read as org A
}
#[tokio::test]
async fn idor_org_a_cannot_read_org_b_telemetry() {}
#[tokio::test]
async fn idor_org_a_cannot_mutate_org_b_routing_rules() {}
```
### 11.5 SSE Connection Drop / Billing Leak Test
```rust
#[tokio::test]
async fn sse_client_disconnect_aborts_upstream_provider_request() {
let stack = E2EStack::start().await;
let mock_provider = stack.mock_provider();
// Configure provider to stream slowly (1 token/sec for 60 tokens)
mock_provider.configure_slow_stream(60, Duration::from_secs(1));
// Start streaming request
let mut stream = stack.proxy()
.post("/v1/chat/completions")
.json(&json!({ "stream": true, "model": "gpt-4" }))
.send().await
.bytes_stream();
// Read 5 tokens then drop the connection
for _ in 0..5 {
stream.next().await;
}
drop(stream);
// Wait briefly for cleanup
tokio::time::sleep(Duration::from_millis(500)).await;
// Provider connection must be aborted — not still streaming
assert_eq!(mock_provider.active_connections(), 0);
// Billing: customer should only be charged for 5 tokens, not 60
let usage = stack.get_last_usage_record().await;
assert!(usage.completion_tokens <= 10); // Some buffer for in-flight
}
```
### 11.6 Concurrent Circuit Breaker Race Condition
```rust
#[tokio::test]
async fn circuit_breaker_handles_50_concurrent_failures_cleanly() {
let redis = TestcontainersRedis::start().await;
let breaker = RedisCircuitBreaker::new(redis.connection_string(), "openai", 10);
let mut handles = vec![];
for _ in 0..50 {
let b = breaker.clone();
handles.push(tokio::spawn(async move {
b.record_failure().await;
}));
}
futures::future::join_all(handles).await;
// Breaker must be open — no race condition leaving it closed
assert_eq!(breaker.state().await, CircuitState::Open);
// Failure count must be exactly 50 (atomic increments)
assert_eq!(breaker.failure_count().await, 50);
}
```
### 11.7 Trace Context Propagation
```rust
#[tokio::test]
async fn otel_trace_propagates_from_client_through_proxy_to_provider() {
let stack = E2EStack::start().await;
let tracer = stack.in_memory_tracer();
let resp = stack.proxy()
.post("/v1/chat/completions")
.header("traceparent", "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01")
.json(&chat_request())
.send().await;
let spans = tracer.finished_spans();
let proxy_span = spans.iter().find(|s| s.name == "proxy.route").unwrap();
// Proxy span must be child of the incoming trace
assert_eq!(proxy_span.trace_id, "4bf92f3577b34da6a3ce929d0e0e4736");
// Provider request must carry the same trace_id
let provider_req = stack.mock_provider().last_request();
assert!(provider_req.headers["traceparent"].contains("4bf92f3577b34da6a3ce929d0e0e4736"));
}
```
### 11.8 Flag Provider Fallback Test
```rust
#[test]
fn flag_provider_unreachable_falls_back_to_safe_default() {
// Simulate missing/corrupt flag config file
let provider = JsonFileProvider::new("/nonexistent/flags.json");
let result = provider.evaluate("enable_new_router", false);
// Must return the safe default (false), not panic or error
assert_eq!(result, false);
}
#[test]
fn flag_provider_malformed_json_falls_back_to_safe_default() {
let provider = JsonFileProvider::from_string("{ invalid json }}}");
let result = provider.evaluate("enable_new_router", false);
assert_eq!(result, false);
}
```
### 11.9 24-Hour Soak Test Spec
```rust
// tests/soak/long_running_latency.rs
// Run manually: cargo test --test soak -- --ignored
#[tokio::test]
#[ignore] // Only run in nightly CI
async fn soak_24h_proxy_latency_stays_under_5ms_p99() {
// k6 config: 10 RPS sustained for 24 hours
// Assert: p99 < 5ms, no memory growth > 50MB, no connection leaks
// This catches memory fragmentation and connection pool exhaustion
}
```
### 11.10 Panic Mode Authorization
```rust
#[tokio::test]
async fn panic_mode_requires_owner_role() {
let stack = E2EStack::start().await;
let viewer_token = stack.create_token_with_role("org-1", Role::Viewer).await;
let resp = stack.api()
.post("/admin/panic")
.bearer_auth(&viewer_token)
.send().await;
assert_eq!(resp.status(), 403);
}
#[tokio::test]
async fn panic_mode_allowed_for_owner_role() {
let owner_token = stack.create_token_with_role("org-1", Role::Owner).await;
let resp = stack.api()
.post("/admin/panic")
.bearer_auth(&owner_token)
.send().await;
assert_eq!(resp.status(), 200);
}
```
*End of P1 Review Remediation Addendum*

View File

@@ -0,0 +1,204 @@
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { Analytics, EventSchema, EventName } from '../../src/analytics';
import { PostHog } from 'posthog-node';
vi.mock('posthog-node');
describe('Analytics SDK (PostHog Cloud — v2 Post-Review)', () => {
let analytics: Analytics;
let mockPostHog: vi.Mocked<PostHog>;
beforeEach(() => {
vi.clearAllMocks();
mockPostHog = new PostHog('phc_test_key', { host: 'https://us.i.posthog.com' }) as any;
analytics = new Analytics(mockPostHog);
});
// ── Schema Validation (Zod) ──────────────────────────────
describe('Event Taxonomy Validation', () => {
it('accepts valid account.signup.completed event', () => {
const event = {
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route' as const,
properties: { method: 'github_sso' },
};
expect(() => EventSchema.parse(event)).not.toThrow();
});
it('rejects events missing tenant_id', () => {
const event = {
name: EventName.SignupCompleted,
product: 'route',
properties: { method: 'email' },
};
expect(() => EventSchema.parse(event as any)).toThrow(/tenant_id/);
});
it('accepts valid activation event', () => {
const event = {
name: EventName.FirstDollarSaved,
tenant_id: 'tenant-123',
product: 'route' as const,
properties: { savings_amount: 1.50 },
};
expect(() => EventSchema.parse(event)).not.toThrow();
});
it('accepts valid upgrade event', () => {
const event = {
name: EventName.UpgradeCompleted,
tenant_id: 'tenant-123',
product: 'route' as const,
properties: { plan: 'pro', mrr_increase: 49 },
};
expect(() => EventSchema.parse(event)).not.toThrow();
});
});
// ── track() Behavior ─────────────────────────────────────
describe('track()', () => {
it('captures valid events via PostHog client', () => {
const result = analytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'email' },
});
expect(result).toBe(true);
expect(mockPostHog.capture).toHaveBeenCalledWith(
expect.objectContaining({
distinctId: 'tenant-123',
event: 'account.signup.completed',
properties: expect.objectContaining({
product: 'route',
method: 'email',
}),
})
);
});
it('does NOT include $set in track calls (use identify instead)', () => {
analytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'github_sso' },
});
const captureCall = mockPostHog.capture.mock.calls[0][0];
expect(captureCall.properties).not.toHaveProperty('$set');
});
it('does NOT pass timestamp (let PostHog handle it to avoid clock skew)', () => {
analytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'email' },
});
const captureCall = mockPostHog.capture.mock.calls[0][0];
expect(captureCall).not.toHaveProperty('timestamp');
});
it('returns false and does NOT call PostHog if base validation fails', () => {
const result = analytics.track({
name: 'invalid.event' as any,
tenant_id: 'tenant-123',
product: 'route',
});
expect(result).toBe(false);
expect(mockPostHog.capture).not.toHaveBeenCalled();
});
it('returns false if per-event property validation fails (strict schema)', () => {
const result = analytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'invalid_method' }, // Not in enum
});
expect(result).toBe(false);
expect(mockPostHog.capture).not.toHaveBeenCalled();
});
it('rejects unknown properties (strict mode — no PII loophole)', () => {
const result = analytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'email', email: 'user@example.com' }, // PII leak attempt
});
expect(result).toBe(false);
expect(mockPostHog.capture).not.toHaveBeenCalled();
});
it('does NOT flush after each track call (Lambda batching)', () => {
analytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'email' },
});
expect(mockPostHog.flushAsync).not.toHaveBeenCalled();
});
});
// ── identify() ───────────────────────────────────────────
describe('identify()', () => {
it('calls PostHog identify with tenant_id as distinctId', () => {
analytics.identify('tenant-123', { company: 'Acme' });
expect(mockPostHog.identify).toHaveBeenCalledWith(
expect.objectContaining({
distinctId: 'tenant-123',
properties: expect.objectContaining({
tenant_id: 'tenant-123',
company: 'Acme',
}),
})
);
});
});
// ── flush() ──────────────────────────────────────────────
describe('flush()', () => {
it('calls flushAsync on the PostHog client', async () => {
await analytics.flush();
expect(mockPostHog.flushAsync).toHaveBeenCalledTimes(1);
});
});
// ── NoOp Client ──────────────────────────────────────────
describe('NoOp Client (missing API key)', () => {
it('does not throw when tracking without API key', () => {
const noopAnalytics = new Analytics(); // No client, no env var
const result = noopAnalytics.track({
name: EventName.SignupCompleted,
tenant_id: 'tenant-123',
product: 'route',
properties: { method: 'email' },
});
expect(result).toBe(true); // NoOp accepts everything silently
});
});
// ── Session Replay ───────────────────────────────────────
describe('Security', () => {
it('session replay is disabled', () => {
expect(analytics.isSessionReplayEnabled).toBe(false);
});
});
});

View File

@@ -1727,3 +1727,370 @@ Before any code ships to production, these tests must be green:
---
*Document complete. Total estimated test count at V1 launch: ~500 tests. Target by month 3: ~1,000 tests.*
---
## 11. Review Remediation Addendum (Post-Gemini Review)
### 11.1 Missing Epic Coverage
#### Epic 6: Dashboard UI (React Testing Library + Playwright)
```typescript
// tests/ui/components/DiffViewer.test.tsx
describe('DiffViewer Component', () => {
it('renders added lines in green', () => {});
it('renders removed lines in red', () => {});
it('renders unchanged lines in default color', () => {});
it('collapses large diffs with "Show more" toggle', () => {});
it('highlights HCL syntax in diff blocks', () => {});
it('shows resource type icon next to each drift item', () => {});
});
describe('StackOverview Component', () => {
it('renders drift count badge per stack', () => {});
it('sorts stacks by drift severity (critical first)', () => {});
it('shows last scan timestamp', () => {});
it('shows agent health indicator (green/yellow/red)', () => {});
});
// tests/e2e/ui/dashboard.spec.ts (Playwright)
test('OAuth login redirects to Cognito and back', async ({ page }) => {
await page.goto('/dashboard');
await expect(page).toHaveURL(/cognito/);
});
test('stack list renders with drift counts', async ({ page }) => {
await page.goto('/dashboard/stacks');
await expect(page.locator('[data-testid="stack-card"]')).toHaveCountGreaterThan(0);
});
test('diff viewer renders inline diff for Terraform resource', async ({ page }) => {
await page.goto('/dashboard/stacks/stack-1/drifts/drift-1');
await expect(page.locator('[data-testid="diff-viewer"]')).toBeVisible();
await expect(page.locator('.diff-added')).toHaveCountGreaterThan(0);
});
test('revert button triggers confirmation modal', async ({ page }) => {
await page.goto('/dashboard/stacks/stack-1/drifts/drift-1');
await page.click('[data-testid="revert-btn"]');
await expect(page.locator('[data-testid="confirm-modal"]')).toBeVisible();
});
```
#### Epic 9: Onboarding & PLG (Stripe + drift init)
```go
// pkg/onboarding/stripe_test.go
func TestStripeWebhookCheckoutCompleted_UpgradesTenant(t *testing.T) {}
func TestStripeWebhookSubscriptionDeleted_DowngradesTenant(t *testing.T) {}
func TestStripeWebhookInvalidSignature_Returns401(t *testing.T) {}
func TestStripeWebhookReplayedEvent_IsIdempotent(t *testing.T) {}
// pkg/agent/init_test.go
func TestDriftInit_DetectsTerraformInCurrentDir(t *testing.T) {}
func TestDriftInit_DetectsCloudFormationInCurrentDir(t *testing.T) {}
func TestDriftInit_DetectsPulumiInCurrentDir(t *testing.T) {}
func TestDriftInit_GeneratesValidYAMLConfig(t *testing.T) {}
func TestDriftInit_HandlesWindowsPaths(t *testing.T) {}
func TestDriftInit_HandlesMacPaths(t *testing.T) {}
func TestDriftInit_HandlesLinuxPaths(t *testing.T) {}
func TestDriftInit_FailsGracefullyOnEmptyDir(t *testing.T) {}
```
#### Epic 8: Infrastructure (Terratest)
```go
// tests/infra/terraform_test.go
func TestTerraformPlan_CreatesExpectedResources(t *testing.T) {
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../../infra/terraform",
})
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndPlan(t, terraformOptions)
}
func TestTerraformApply_SQSFIFOQueueCreated(t *testing.T) {}
func TestTerraformApply_RDSInstanceCreated(t *testing.T) {}
func TestTerraformApply_IAMRolesHaveLeastPrivilege(t *testing.T) {
// Verify no IAM policy has Action: "*"
}
func TestTerraformApply_VPCSecurityGroupsRestrictIngress(t *testing.T) {}
```
#### Epic 2: mTLS Certificate Lifecycle
```go
// pkg/agent/mtls_test.go
func TestMTLS_CertificateGeneration_ValidX509(t *testing.T) {}
func TestMTLS_CertificateExpiration_AgentRejectsExpiredCert(t *testing.T) {}
func TestMTLS_CertificateRotation_NewCertAcceptedMidConnection(t *testing.T) {}
func TestMTLS_CertificateRevocation_RevokedCertRejected(t *testing.T) {}
func TestMTLS_SelfSignedCert_RejectedBySaaS(t *testing.T) {}
func TestMTLS_CertificateChain_IntermediateCAValidated(t *testing.T) {}
```
### 11.2 Add t.Parallel() to Table-Driven Tests
```go
// BEFORE (sequential — wastes CI time):
func TestSecretScrubber(t *testing.T) {
tests := []struct{ name, input, expected string }{...}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// runs sequentially
})
}
}
// AFTER (parallel):
func TestSecretScrubber(t *testing.T) {
t.Parallel()
tests := []struct{ name, input, expected string }{...}
for _, tt := range tests {
tt := tt // capture range variable
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
// runs in parallel
})
}
}
```
### 11.3 Dynamic Resource Naming for LocalStack
```go
// BEFORE (shared state — flaky):
// bucket := "drift-reports"
// AFTER (per-test isolation):
func uniqueBucket(t *testing.T) string {
return fmt.Sprintf("drift-reports-%s-%d", t.Name(), time.Now().UnixNano())
}
func TestDriftReportUpload(t *testing.T) {
t.Parallel()
bucket := uniqueBucket(t)
s3Client.CreateBucket(ctx, &s3.CreateBucketInput{Bucket: &bucket})
// Test uses isolated bucket — no cross-test contamination
}
```
### 11.4 Distributed Tracing Cross-Boundary Tests
```go
// tests/integration/trace_propagation_test.go
func TestTraceContext_AgentToSaaS_SpanParentChain(t *testing.T) {
// Agent generates drift_scan span with trace_id
// POST /v1/drift-reports carries traceparent header
// SaaS Event Processor creates child span
// Verify parent-child relationship across HTTP boundary
exporter := tracetest.NewInMemoryExporter()
// Fire drift report with traceparent
traceID := "4bf92f3577b34da6a3ce929d0e0e4736"
resp := postDriftReport(t, stack, traceID)
assert.Equal(t, 200, resp.StatusCode)
spans := exporter.GetSpans()
eventProcessorSpan := findSpan(spans, "drift_report.process")
assert.Equal(t, traceID, eventProcessorSpan.SpanContext().TraceID().String())
}
func TestTraceContext_SQSBoundary_PreservesTraceID(t *testing.T) {
// Verify SQS message attributes contain traceparent
// Verify consumer extracts and continues the trace
}
func TestTraceContext_AgentScan_CreatesParentSpan(t *testing.T) {
// Verify agent drift_scan span has correct attributes:
// drift.stack_id, drift.resource_count, drift.duration_ms
}
```
### 11.5 Backward Compatibility Serialization (Elastic Schema)
```go
// tests/schema/backward_compat_test.go
func TestOldAgent_ParsesNewDynamoDBItem_WithV2Attributes(t *testing.T) {
// Simulate V2 DynamoDB item with new _v2 fields
item := map[string]types.AttributeValue{
"PK": &types.AttributeValueMemberS{Value: "STACK#123"},
"drift_score": &types.AttributeValueMemberN{Value: "85"},
"drift_score_v2": &types.AttributeValueMemberN{Value: "92"}, // New field
"remediation_v2": &types.AttributeValueMemberS{Value: "auto"}, // New field
}
// V1 parser must ignore unknown fields
result, err := ParseDriftItem(item)
assert.NoError(t, err)
assert.Equal(t, 85, result.DriftScore) // Uses V1 field
}
func TestV1Code_ReadsV2Writes_DuringMigrationWindow(t *testing.T) {
// V2 writes both drift_score and drift_score_v2
// V1 reads drift_score (ignores _v2)
// Verify no data loss
}
```
### 11.6 Security: RBAC Forgery & Replay Attacks
```go
// tests/integration/security_test.go
func TestAgentCannotForgeStackID(t *testing.T) {
// Agent with API key for org-A sends drift report claiming stack belongs to org-B
orgAKey := createAPIKey(t, "org-a")
report := makeDriftReport("org-b-stack-id") // Wrong org
resp := postDriftReportWithKey(t, report, orgAKey)
assert.Equal(t, 403, resp.StatusCode)
}
func TestReplayAttack_DuplicateReportID_Rejected(t *testing.T) {
report := makeDriftReport("stack-1")
resp1 := postDriftReport(t, report)
assert.Equal(t, 200, resp1.StatusCode)
// Replay exact same report
resp2 := postDriftReport(t, report)
assert.Equal(t, 409, resp2.StatusCode) // Conflict — already processed
}
func TestReplayAttack_OldTimestamp_Rejected(t *testing.T) {
report := makeDriftReport("stack-1")
report.Timestamp = time.Now().Add(-10 * time.Minute) // 10 min old
resp := postDriftReport(t, report)
assert.Equal(t, 400, resp.StatusCode) // Stale report
}
```
### 11.7 Noisy Neighbor & Fair-Share Processing
```go
// tests/integration/fair_share_test.go
func TestNoisyNeighbor_LargeOrgDoesNotStarveSmallOrg(t *testing.T) {
// Org A: 10,000 drifted resources
// Org B: 10 drifted resources
// Both submit reports simultaneously
seedDriftReports(t, "org-a", 10000)
seedDriftReports(t, "org-b", 10)
// Org B's reports must be processed within 30 seconds
// (not queued behind all 10K of Org A's)
start := time.Now()
waitForProcessed(t, "org-b", 10, 30*time.Second)
assert.Less(t, time.Since(start), 30*time.Second)
}
```
### 11.8 Panic Mode Mid-Remediation Race Condition
```go
// tests/integration/panic_remediation_test.go
func TestPanicMode_AbortsInFlightRemediation(t *testing.T) {
// Start a remediation (terraform apply)
execID := startRemediation(t, "stack-1", "drift-1")
waitForState(t, execID, "applying")
// Trigger panic mode
triggerPanicMode(t)
// Remediation must be aborted, not completed
state := waitForState(t, execID, "aborted")
assert.Equal(t, "aborted", state)
// Verify terraform state is not corrupted
// (agent should have run terraform state pull to verify)
}
func TestPanicMode_DoesNotAbortReadOnlyScans(t *testing.T) {
// Drift scans (read-only) should continue during panic
// Only write operations (remediation) are halted
scanID := startDriftScan(t, "stack-1")
triggerPanicMode(t)
state := waitForState(t, scanID, "completed")
assert.Equal(t, "completed", state) // Scan finishes normally
}
```
### 11.9 Remediation vs. Concurrent Scan Race Condition
```go
func TestConcurrentScanDuringRemediation_DoesNotReportHalfAppliedState(t *testing.T) {
// Start remediation (terraform apply — takes ~30s)
execID := startRemediation(t, "stack-1", "drift-1")
waitForState(t, execID, "applying")
// Trigger a drift scan while remediation is in progress
scanID := startDriftScan(t, "stack-1")
// Scan must either:
// a) Wait for remediation to complete, OR
// b) Skip the stack with "remediation in progress" status
scanResult := waitForScanComplete(t, scanID)
assert.NotEqual(t, "half-applied", scanResult.Status)
// Must be either "skipped_remediation_in_progress" or show post-remediation state
}
```
### 11.10 SaaS API Memory Profiling
```go
// tests/load/memory_profile_test.go
func TestEventProcessor_DoesNotOOM_On1MB_DriftReport(t *testing.T) {
// Generate a 1MB drift report (1000 resources with large diffs)
report := makeLargeDriftReport(1000)
assert.Greater(t, len(report), 1024*1024)
var memBefore, memAfter runtime.MemStats
runtime.ReadMemStats(&memBefore)
processReport(t, report)
runtime.ReadMemStats(&memAfter)
growth := memAfter.Alloc - memBefore.Alloc
assert.Less(t, growth, uint64(50*1024*1024)) // <50MB growth
}
```
### 11.11 Trim E2E to Smoke Tier
Per review recommendation, cap E2E at 10 critical paths. Remaining 40 tests pushed to integration:
| E2E (Keep — 10 max) | Demoted to Integration |
|---------------------|----------------------|
| Onboarding: init → connect → first scan | Agent heartbeat variations |
| First drift detected → Slack alert | Individual parser format tests |
| Revert flow: Slack → agent apply → verify | Secret scrubber edge cases |
| Panic mode halts remediation | DynamoDB access pattern tests |
| Cross-tenant isolation | Individual webhook format tests |
| OAuth login → dashboard → view diff | Notification batching |
| Free tier limit enforcement | Agent config reload |
| Agent disconnect → reconnect → resume | Baseline score calculations |
| mTLS cert rotation mid-scan | Individual API endpoint tests |
| Stripe upgrade → unlock features | Cache invalidation patterns |
### 11.12 Updated Test Pyramid (Post-Review)
| Level | Original | Revised | Rationale |
|-------|----------|---------|-----------|
| Unit | 70% (~350) | 65% (~350) | Add t.Parallel(), keep count but add UI component tests |
| Integration | 20% (~100) | 28% (~150) | Terratest, mTLS, trace propagation, fair-share, security |
| E2E/Smoke | 10% (~50) | 7% (~35) | Capped at 10 true E2E + 25 Playwright UI tests |
*End of P2 Review Remediation Addendum*

View File

@@ -1409,3 +1409,459 @@ Before any release, these tests must pass:
---
*End of dd0c/alert Test Architecture*
---
## 11. Review Remediation Addendum (Post-Gemini Review)
### 11.1 Missing Epic Coverage
#### Epic 6: Dashboard API
```typescript
describe('Dashboard API', () => {
describe('Authentication', () => {
it('returns 401 for missing Cognito JWT', async () => {});
it('returns 401 for expired JWT', async () => {});
it('returns 401 for JWT signed by wrong issuer', async () => {});
it('extracts tenantId from JWT claims', async () => {});
});
describe('Incident Listing (GET /v1/incidents)', () => {
it('returns paginated incidents for authenticated tenant', async () => {});
it('supports cursor-based pagination', async () => {});
it('filters by status (open, acknowledged, resolved)', async () => {});
it('filters by severity (critical, warning, info)', async () => {});
it('filters by time range (since, until)', async () => {});
it('returns empty array for tenant with no incidents', async () => {});
});
describe('Incident Detail (GET /v1/incidents/:id)', () => {
it('returns full incident with correlated alerts', async () => {});
it('returns 404 for incident belonging to different tenant', async () => {});
it('includes timeline of state transitions', async () => {});
});
describe('Analytics (GET /v1/analytics)', () => {
it('returns MTTR for last 7/30/90 days', async () => {});
it('returns alert volume by source', async () => {});
it('returns noise reduction percentage', async () => {});
it('scopes all analytics to authenticated tenant', async () => {});
});
describe('Tenant Isolation', () => {
it('tenant A cannot read tenant B incidents via API', async () => {});
it('tenant A cannot read tenant B analytics', async () => {});
it('all DynamoDB queries include tenantId partition key', async () => {});
});
});
```
#### Epic 7: Dashboard UI (Playwright)
```typescript
// tests/e2e/ui/dashboard.spec.ts
test('login redirects to Cognito hosted UI', async ({ page }) => {
await page.goto('/dashboard');
await expect(page).toHaveURL(/cognito/);
});
test('incident list renders with correct severity badges', async ({ page }) => {
await page.goto('/dashboard/incidents');
await expect(page.locator('[data-testid="incident-card"]')).toHaveCount(5);
await expect(page.locator('.severity-critical')).toBeVisible();
});
test('incident detail shows correlated alert timeline', async ({ page }) => {
await page.goto('/dashboard/incidents/inc-123');
await expect(page.locator('[data-testid="alert-timeline"]')).toBeVisible();
await expect(page.locator('.timeline-event')).toHaveCountGreaterThan(1);
});
test('MTTR chart renders with real data', async ({ page }) => {
await page.goto('/dashboard/analytics');
await expect(page.locator('[data-testid="mttr-chart"]')).toBeVisible();
});
test('noise reduction percentage displays correctly', async ({ page }) => {
await page.goto('/dashboard/analytics');
const noise = page.locator('[data-testid="noise-reduction"]');
await expect(noise).toContainText('%');
});
test('webhook setup wizard generates correct URL', async ({ page }) => {
await page.goto('/dashboard/settings/integrations');
await page.click('[data-testid="add-datadog"]');
const url = await page.locator('[data-testid="webhook-url"]').textContent();
expect(url).toMatch(/\/v1\/webhooks\/ingest\/.+/);
});
```
#### Epic 9: Onboarding & PLG
```typescript
describe('Free Tier Enforcement', () => {
it('allows up to 10,000 alerts/month on free tier', async () => {});
it('returns 429 with upgrade prompt at 10,001st alert', async () => {});
it('resets counter on first of each month', async () => {});
it('purges alert data older than 7 days on free tier', async () => {});
it('retains alert data for 90 days on pro tier', async () => {});
});
describe('OAuth Signup', () => {
it('creates tenant record on first Cognito login', async () => {});
it('assigns free tier by default', async () => {});
it('generates unique webhook URL per tenant', async () => {});
});
describe('Stripe Integration', () => {
it('creates checkout session with correct pricing', async () => {});
it('upgrades tenant on checkout.session.completed webhook', async () => {});
it('downgrades tenant on subscription.deleted webhook', async () => {});
it('validates Stripe webhook signature', async () => {});
});
```
#### Epic 5.3: Slack Feedback Endpoint
```typescript
describe('Slack Interactive Actions Endpoint', () => {
it('validates Slack request signature (HMAC-SHA256)', async () => {});
it('rejects request with invalid signature', async () => {});
it('handles "helpful" feedback — updates incident quality score', async () => {});
it('handles "noise" feedback — adds to suppression training data', async () => {});
it('handles "escalate" action — triggers PagerDuty/OpsGenie', async () => {});
it('updates original Slack message after action', async () => {});
it('scopes action to correct tenant', async () => {});
});
```
#### Epic 1.4: S3 Raw Payload Archival
```typescript
describe('Raw Payload Archival', () => {
it('saves raw webhook payload to S3 asynchronously', async () => {});
it('S3 key includes tenantId, source, and timestamp', async () => {});
it('archival failure does not block alert processing', async () => {});
it('archived payload is retrievable for replay', async () => {});
it('S3 lifecycle policy deletes after retention period', async () => {});
});
```
### 11.2 Anti-Pattern Fixes
#### Replace ioredis-mock with WindowStore Interface
```typescript
// BEFORE (anti-pattern):
// import RedisMock from 'ioredis-mock';
// const engine = new CorrelationEngine(new RedisMock());
// AFTER (correct):
interface WindowStore {
addEvent(tenantId: string, key: string, event: Alert, ttlMs: number): Promise<void>;
getWindow(tenantId: string, key: string): Promise<Alert[]>;
clearWindow(tenantId: string, key: string): Promise<void>;
}
class InMemoryWindowStore implements WindowStore {
private store = new Map<string, { events: Alert[]; expiresAt: number }>();
async addEvent(tenantId: string, key: string, event: Alert, ttlMs: number) {
const fullKey = `${tenantId}:${key}`;
const existing = this.store.get(fullKey) || { events: [], expiresAt: Date.now() + ttlMs };
existing.events.push(event);
this.store.set(fullKey, existing);
}
async getWindow(tenantId: string, key: string): Promise<Alert[]> {
const fullKey = `${tenantId}:${key}`;
const entry = this.store.get(fullKey);
if (!entry || entry.expiresAt < Date.now()) return [];
return entry.events;
}
}
// Unit tests use InMemoryWindowStore — no Redis dependency
// Integration tests use RedisWindowStore with Testcontainers
```
#### Replace sinon.useFakeTimers with Clock Interface
```typescript
// BEFORE (anti-pattern):
// sinon.useFakeTimers(new Date('2026-03-01T00:00:00Z'));
// AFTER (correct):
interface Clock {
now(): number;
advanceBy(ms: number): void;
}
class FakeClock implements Clock {
private current: number;
constructor(start: Date = new Date()) { this.current = start.getTime(); }
now() { return this.current; }
advanceBy(ms: number) { this.current += ms; }
}
class SystemClock implements Clock {
now() { return Date.now(); }
advanceBy() { throw new Error('Cannot advance system clock'); }
}
// Inject into CorrelationEngine:
const engine = new CorrelationEngine(new InMemoryWindowStore(), new FakeClock());
```
### 11.3 Trace Context Propagation Tests
```typescript
describe('Trace Context Propagation', () => {
it('API Gateway passes trace_id to Lambda via X-Amzn-Trace-Id', async () => {});
it('Lambda propagates trace_id into SQS message attributes', async () => {
// Verify SQS message has MessageAttribute 'traceparent' with W3C format
const msg = await getLastSQSMessage(localstack, 'alert-queue');
expect(msg.MessageAttributes.traceparent).toBeDefined();
expect(msg.MessageAttributes.traceparent.StringValue).toMatch(
/^00-[0-9a-f]{32}-[0-9a-f]{16}-0[01]$/
);
});
it('ECS Correlation Engine extracts trace_id from SQS message', async () => {
// Verify the correlation span has the correct parent from SQS
const spans = inMemoryExporter.getFinishedSpans();
const correlationSpan = spans.find(s => s.name === 'alert.correlation');
const ingestSpan = spans.find(s => s.name === 'webhook.ingest');
expect(correlationSpan.parentSpanId).toBeDefined();
// Parent chain must trace back to the original ingest span
});
it('end-to-end trace spans webhook → SQS → correlation → notification', async () => {
// Fire a webhook, wait for Slack notification, verify all spans share trace_id
const traceId = await fireWebhookAndGetTraceId();
const spans = await getSpansByTraceId(traceId);
const spanNames = spans.map(s => s.name);
expect(spanNames).toContain('webhook.ingest');
expect(spanNames).toContain('alert.normalize');
expect(spanNames).toContain('alert.correlation');
expect(spanNames).toContain('notification.slack');
});
});
```
### 11.4 HMAC Security Hardening
```typescript
describe('HMAC Signature Validation (Hardened)', () => {
it('uses crypto.timingSafeEqual, not === comparison', () => {
// Inspect the source to verify timing-safe comparison
const source = fs.readFileSync('src/ingestion/hmac.ts', 'utf8');
expect(source).toContain('timingSafeEqual');
expect(source).not.toMatch(/signature\s*===\s*/);
});
it('handles case-insensitive header names (dd-webhook-signature vs DD-WEBHOOK-SIGNATURE)', async () => {
const payload = makeAlertPayload('datadog');
const sig = computeHMAC(payload, DATADOG_SECRET);
// Lowercase header
const resp1 = await ingest(payload, { 'dd-webhook-signature': sig });
expect(resp1.status).toBe(200);
// Uppercase header
const resp2 = await ingest(payload, { 'DD-WEBHOOK-SIGNATURE': sig });
expect(resp2.status).toBe(200);
});
it('rejects completely missing signature header', async () => {
const resp = await ingest(makeAlertPayload('datadog'), {});
expect(resp.status).toBe(401);
});
it('rejects empty signature header', async () => {
const resp = await ingest(makeAlertPayload('datadog'), { 'dd-webhook-signature': '' });
expect(resp.status).toBe(401);
});
});
```
### 11.5 SQS 256KB Payload Limit
```typescript
describe('Large Payload Handling', () => {
it('compresses payloads >200KB before sending to SQS', async () => {
const largePayload = makeLargeAlertPayload(300 * 1024); // 300KB
const resp = await ingest(largePayload);
expect(resp.status).toBe(200);
const msg = await getLastSQSMessage(localstack, 'alert-queue');
// Payload must be compressed or use S3 pointer
expect(msg.Body.length).toBeLessThan(256 * 1024);
});
it('uses S3 pointer for payloads >256KB after compression', async () => {
const hugePayload = makeLargeAlertPayload(500 * 1024); // 500KB
const resp = await ingest(hugePayload);
expect(resp.status).toBe(200);
const msg = await getLastSQSMessage(localstack, 'alert-queue');
const body = JSON.parse(msg.Body);
expect(body.s3Pointer).toBeDefined();
expect(body.s3Pointer).toMatch(/^s3:\/\/dd0c-alert-overflow\//);
});
it('strips unnecessary fields from Datadog payload before SQS', async () => {
const payload = makeDatadogPayloadWithLargeTags(100); // 100 tags
const resp = await ingest(payload);
expect(resp.status).toBe(200);
const msg = await getLastSQSMessage(localstack, 'alert-queue');
const normalized = JSON.parse(msg.Body);
// Only essential fields should remain
expect(normalized.tags.length).toBeLessThanOrEqual(20);
});
it('rejects payloads >2MB at API Gateway level', async () => {
const massive = makeLargeAlertPayload(3 * 1024 * 1024);
const resp = await ingest(massive);
expect(resp.status).toBe(413);
});
});
```
### 11.6 DLQ Backpressure & Replay
```typescript
describe('DLQ Replay with Backpressure', () => {
it('replays DLQ messages in batches of 100', async () => {
await seedDLQ(10000); // 10K messages
const replayer = new DLQReplayer({ batchSize: 100, delayBetweenBatchesMs: 500 });
await replayer.start();
// Verify batched processing
expect(replayer.batchesProcessed).toBeGreaterThan(0);
expect(replayer.maxConcurrentMessages).toBeLessThanOrEqual(100);
});
it('pauses replay if correlation engine error rate exceeds 10%', async () => {
await seedDLQ(1000);
const replayer = new DLQReplayer({ batchSize: 100, errorThreshold: 0.1 });
// Simulate correlation engine returning errors
mockCorrelationEngine.failRate = 0.15;
await replayer.start();
expect(replayer.state).toBe('paused');
expect(replayer.pauseReason).toContain('error rate exceeded');
});
it('does not replay if circuit breaker is currently tripped', async () => {
await seedDLQ(100);
await tripCircuitBreaker();
const replayer = new DLQReplayer();
await replayer.start();
expect(replayer.messagesReplayed).toBe(0);
expect(replayer.state).toBe('blocked_by_circuit_breaker');
});
it('tracks replay progress for resumability', async () => {
await seedDLQ(500);
const replayer = new DLQReplayer({ batchSize: 50 });
// Process 3 batches then stop
await replayer.processNBatches(3);
expect(replayer.checkpoint).toBe(150);
// Resume from checkpoint
const replayer2 = new DLQReplayer({ resumeFrom: replayer.checkpoint });
await replayer2.start();
expect(replayer2.startedFrom).toBe(150);
});
});
```
### 11.7 Multi-Tenancy Isolation (DynamoDB)
```typescript
describe('DynamoDB Tenant Isolation', () => {
it('all DAO methods require tenantId parameter', () => {
// Compile-time check: DAO interface has tenantId as first param
const daoSource = fs.readFileSync('src/data/incident-dao.ts', 'utf8');
const methods = extractPublicMethods(daoSource);
for (const method of methods) {
expect(method.params[0].name).toBe('tenantId');
}
});
it('query for tenant A returns zero results for tenant B data', async () => {
const dao = new IncidentDAO(dynamoClient);
await dao.create('tenant-A', makeIncident());
await dao.create('tenant-B', makeIncident());
const results = await dao.list('tenant-A');
expect(results.every(r => r.tenantId === 'tenant-A')).toBe(true);
});
it('partition key always includes tenantId prefix', async () => {
const dao = new IncidentDAO(dynamoClient);
await dao.create('tenant-X', makeIncident());
// Read raw DynamoDB item
const item = await dynamoClient.scan({ TableName: 'dd0c-alert-main' });
expect(item.Items[0].PK.S).toStartWith('TENANT#tenant-X');
});
});
```
### 11.8 Slack Circuit Breaker
```typescript
describe('Slack Notification Circuit Breaker', () => {
it('opens circuit after 10 consecutive 429s from Slack', async () => {
const slackClient = new SlackClient({ circuitBreakerThreshold: 10 });
for (let i = 0; i < 10; i++) {
mockSlack.respondWith(429);
await slackClient.send(makeMessage()).catch(() => {});
}
expect(slackClient.circuitState).toBe('open');
});
it('queues notifications while circuit is open', async () => {
slackClient.openCircuit();
await slackClient.send(makeMessage());
expect(slackClient.queuedMessages).toBe(1);
});
it('half-opens circuit after 60 seconds', async () => {
slackClient.openCircuit();
clock.advanceBy(61000);
expect(slackClient.circuitState).toBe('half-open');
});
it('drains queue on successful half-open probe', async () => {
slackClient.openCircuit();
slackClient.queue(makeMessage());
slackClient.queue(makeMessage());
clock.advanceBy(61000);
mockSlack.respondWith(200);
await slackClient.probe();
expect(slackClient.circuitState).toBe('closed');
expect(slackClient.queuedMessages).toBe(0);
});
});
```
### 11.9 Updated Test Pyramid (Post-Review)
| Level | Original | Revised | Rationale |
|-------|----------|---------|-----------|
| Unit | 70% (~140) | 65% (~180) | More tests total, but integration share grows |
| Integration | 20% (~40) | 25% (~70) | Dashboard API, tenant isolation, trace propagation |
| E2E | 10% (~20) | 10% (~28) | Dashboard UI (Playwright), onboarding flow |
*End of P3 Review Remediation Addendum*

View File

@@ -1107,3 +1107,161 @@ Phase 7: E2E Validation
---
*End of dd0c/portal Test Architecture*
---
## 11. Review Remediation Addendum (Post-Gemini Review)
### 11.1 Resolve Database Misalignment (PostgreSQL vs DynamoDB)
Epic 10.2 specified DynamoDB Single-Table, but the Architecture and Test Architecture are fundamentally built around PostgreSQL (Aurora Serverless v2) with pgvector.
**Resolution:** The IDP requires relational joins and vector search. PostgreSQL is the definitive catalog database. DynamoDB references are removed.
```rust
// tests/schema/migration_validation_test.rs
#[tokio::test]
async fn elastic_schema_postgres_migration_is_additive_only() {
let migrations = read_sql_migrations("./migrations");
for migration in migrations {
assert!(!migration.contains("DROP COLUMN"), "Destructive schema change detected");
assert!(!migration.contains("ALTER COLUMN"), "Type modification detected");
assert!(!migration.contains("RENAME COLUMN"), "Column rename detected");
}
}
#[tokio::test]
async fn migration_does_not_hold_exclusive_locks_on_reads() {
// Concurrent index creation tests
assert!(migration_contains("CREATE INDEX CONCURRENTLY"),
"Indexes must be created concurrently to avoid locking the catalog");
}
```
### 11.2 Invert the Test Pyramid (Integration Honeycomb)
Shift from 70% Unit (with heavy moto/responses mocking) to 30/60/10 with VCR and LocalStack.
```python
# tests/integration/scanners/test_aws_scanner.py
@pytest.mark.vcr()
def test_aws_scanner_discovers_ecs_services_and_api_gateways(vcr_cassette):
# Uses real recorded AWS API responses, not moto mocks
# Validates actual boto3 parsing against real-world AWS shapes
scanner = AWSDiscoveryScanner(account_id="123456789012", region="us-east-1")
services = scanner.scan()
assert len(services) > 0
assert any(s.type == "ecs_service" for s in services)
@pytest.mark.vcr()
def test_github_scanner_handles_graphql_pagination(vcr_cassette):
# Validates real GitHub GraphQL paginated responses
scanner = GitHubDiscoveryScanner(org_name="dd0c")
repos = scanner.scan()
assert len(repos) > 100 # Proves pagination logic works
```
### 11.3 Missing Epic Coverage
#### Epic 3.4: PagerDuty & OpsGenie Integrations
```python
# tests/integration/test_pagerduty_sync.py
@pytest.mark.vcr()
def test_pagerduty_sync_maps_schedules_to_catalog_teams():
sync = PagerDutySyncer(api_key="sk-test-key")
teams = sync.fetch_oncall_schedules()
assert teams[0].oncall_email is not None
def test_pagerduty_credentials_are_encrypted_at_rest():
# Verify KMS envelope encryption for 3rd party API keys
pass
```
#### Epic 4.3: Redis Prefix Caching for Cmd+K
```python
# tests/integration/test_search_cache.py
def test_cmd_k_search_hits_redis_cache_before_postgres():
redis_client.set("search:auth", json.dumps([{"name": "auth-service"}]))
# Must return < 5ms from Redis, skipping DB
result = search_api.query("auth")
assert result[0]['name'] == "auth-service"
def test_catalog_update_invalidates_search_cache():
# Create new service
catalog_api.create_service("billing-api")
# Prefix cache must be purged
assert redis_client.keys("search:*") == []
```
#### Epics 5 & 6: UI and Dashboards (Playwright)
```typescript
// tests/e2e/ui/catalog.spec.ts
test('service catalog renders progressive disclosure UI', async ({ page }) => {
await page.goto('/catalog');
// Click expands details instead of navigating away
await page.click('[data-testid="service-row-auth-api"]');
await expect(page.locator('[data-testid="service-drawer"]')).toBeVisible();
});
test('dashboard KPI aggregation shows total services and ownership coverage', async ({ page }) => {
await page.goto('/dashboard');
await expect(page.locator('[data-testid="kpi-total-services"]')).toHaveText("150");
await expect(page.locator('[data-testid="kpi-ownership"]')).toHaveText("85%");
});
```
#### Epic 9: Onboarding & Stripe
```python
# tests/integration/test_stripe_webhooks.py
def test_stripe_checkout_completed_upgrades_tenant_tier():
payload = load_fixture("stripe_checkout_completed.json")
signature = generate_stripe_signature(payload, secret)
response = api_client.post("/webhooks/stripe", data=payload, headers={"Stripe-Signature": signature})
assert response.status_code == 200
tenant = db.get_tenant("t-123")
assert tenant.tier == "pro"
def test_websocket_streams_discovery_progress_during_onboarding():
# Connect WS client, trigger discovery, assert WS receives "discovering AWS...", "found 50 resources..."
pass
```
### 11.4 Scaled Performance Benchmarks
```python
# tests/performance/test_discovery_scale.py
def test_discovery_pipeline_handles_10000_aws_resources_without_step_functions_payload_limit():
# Simulate an AWS environment with 10k resources
# Must chunk state machine transitions to stay under 256KB Step Functions limit
pass
def test_discovery_pipeline_handles_1000_github_repos():
# Verify GraphQL batching and rate limit backoff
pass
```
### 11.5 Edge Case Resilience
```python
def test_github_graphql_concurrent_rate_limiting():
# If 5 tenants scan concurrently, respect Retry-After headers across workers
pass
def test_partial_discovery_scan_does_not_corrupt_catalog():
# If GitHub scan times out halfway, existing services must NOT be marked stale
pass
def test_ownership_conflict_resolution():
# If two discovery sources claim the same repo, prioritize Explicit (Config) over Implicit (Tags)
pass
def test_meilisearch_index_rebuild_does_not_drop_search():
# Verify zero-downtime index swapping during mapping updates
pass
```

View File

@@ -1,8 +1,8 @@
# dd0c/cost — Test Architecture & TDD Strategy
**Product:** dd0c/cost — AWS Cost Anomaly Detective
**Author:** Test Architecture Phase
**Date:** February 28, 2026
**Author:** Test Architecture Phase (v2 — Post-Review Rewrite)
**Date:** March 1, 2026
**Status:** V1 MVP — Solo Founder Scope
---
@@ -13,7 +13,9 @@
dd0c/cost sits at the intersection of **money and infrastructure**. A false negative means a customer loses thousands of dollars. A false positive means alert fatigue and churn. The test suite's primary job is to mathematically prove the anomaly scoring engine works across edge cases.
Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively unit-tested with synthetic data before any AWS APIs are mocked.
Guiding principle: **Test the math first, test the infrastructure second.** The Z-score and novelty algorithms must be exhaustively tested with property-based testing before any AWS APIs are mocked.
Second principle: **Every dollar matters.** Cost calculations involve floating-point arithmetic on money. Rounding errors, precision loss, and currency handling must be tested with the same rigor as a financial system.
### 1.2 Red-Green-Refactor Adapted to dd0c/cost
@@ -28,33 +30,52 @@ REFACTOR → Optimize the baseline lookup, extract novelty checks,
```
**When to write tests first (strict TDD):**
- Anomaly scoring engine (Z-scores, novelty checks, composite severity)
- Cold-start heuristics (fast-path for >$5/hr resources)
- Baseline calculation (moving averages, standard deviation)
- Governance policy (strict vs. audit mode, 14-day promotion)
- All anomaly scoring (Z-scores, novelty checks, composite severity)
- All cold-start heuristics (fast-path for >$5/hr resources)
- All baseline calculation (Welford algorithm, maturity transitions)
- All governance policy (strict vs. audit mode, 14-day auto-promotion, panic mode)
- All Slack signature validation (security-critical)
- All cost calculations (pricing lookup, hourly cost estimation)
- All feature flag circuit breakers
**When integration tests lead:**
- CloudTrail ingestion (implement against LocalStack EventBridge, then lock in)
- DynamoDB Single-Table schema (build access patterns, then integration test)
- Cross-account STS role assumption (test against LocalStack)
**When E2E tests lead:**
- The Slack alert interaction (format block kit, test the "Snooze/Terminate" buttons)
- Slack alert interaction (format block kit, test "Snooze/Terminate" buttons)
- Onboarding wizard (CloudFormation quick-create → role validation → first alert)
### 1.3 Test Naming Conventions
```typescript
// Unit tests
describe('AnomalyScorer', () => {
it('assigns critical severity when Z-score > 3 and hourly cost > $1', () => {});
it('flags actor novelty when IAM role has never launched this service', () => {});
it('assigns critical severity when Z-score exceeds 3 and hourly cost exceeds $1', () => {});
it('flags actor novelty when IAM role has never launched this service type', () => {});
it('bypasses baseline and triggers fast-path critical for $10/hr instance', () => {});
});
describe('CloudTrailNormalizer', () => {
it('extracts instance type and region from RunInstances event', () => {});
it('looks up correct on-demand pricing for us-east-1 r6g.xlarge', () => {});
describe('BaselineCalculator', () => {
it('updates running mean using Welford online algorithm', () => {});
it('handles zero standard deviation without division by zero', () => {});
});
// Property-based tests
describe('AnomalyScorer (property-based)', () => {
it('always returns severity between 0 and 100 for any valid input', () => {});
it('monotonically increases score as Z-score increases', () => {});
it('never assigns critical to events below $0.50/hr regardless of Z-score', () => {});
});
```
**Rules:**
- Describe the observable outcome, not the implementation
- Use present tense
- If you need "and" in the name, split into two tests
- Property-based tests explicitly state the invariant
---
## Section 2: Test Pyramid
@@ -63,93 +84,441 @@ describe('CloudTrailNormalizer', () => {
| Level | Target | Count (V1) | Runtime |
|-------|--------|------------|---------|
| Unit | 70% | ~250 tests | <20s |
| Integration | 20% | ~80 tests | <3min |
| E2E/Smoke | 10% | ~15 tests | <5min |
| Unit | 80% | ~350 tests | <25s |
| Integration | 15% | ~65 tests | <4min |
| E2E/Smoke | 5% | ~15 tests | <8min |
Higher unit ratio than other dd0c products because the core value is pure math (scoring, baselines, Z-scores).
### 2.2 Unit Test Targets
| Component | Key Behaviors | Est. Tests |
|-----------|--------------|------------|
| Event Normalizer | CloudTrail parsing, pricing lookup, deduplication | 40 |
| Baseline Engine | Running mean/stddev calculation, maturity checks | 35 |
| Anomaly Scorer | Z-score math, novelty detection, composite scoring | 50 |
| Remediation Handler | Stop/Terminate payload parsing, IAM role assumption logic | 20 |
| Notification Engine | Slack formatting, daily digest aggregation | 30 |
| Governance Policy | Mode enforcement, 14-day auto-promotion | 25 |
| Feature Flags | Circuit breaker on alert volume, flag metadata | 15 |
| CloudTrail Normalizer | Event parsing, pricing lookup, dedup, field extraction | 40 |
| Baseline Engine | Welford algorithm, maturity transitions, feedback loop | 45 |
| Anomaly Scorer | Z-score, novelty, composite scoring, cold-start fast-path | 60 |
| Zombie Hunter | Idle resource detection, cost estimation, age calculation | 25 |
| Notification Formatter | Slack Block Kit, daily digest, CLI command generation | 30 |
| Slack Bot | Command parsing, signature validation, action handling | 25 |
| Remediation Handler | Stop/Terminate logic, IAM role assumption, snooze/dismiss | 20 |
| Dashboard API | CRUD, tenant isolation, pagination, filtering | 25 |
| Governance Policy | Mode enforcement, 14-day promotion, panic mode | 30 |
| Feature Flags | Circuit breaker, flag lifecycle, local evaluation | 15 |
| Onboarding | CFN template validation, role validation, free tier enforcement | 20 |
| Cost Calculations | Pricing precision, rounding, fallback pricing, currency | 15 |
### 2.3 Integration Test Boundaries
| Boundary | What's Tested | Infrastructure |
|----------|--------------|----------------|
| EventBridge → SQS FIFO | Cross-account event routing, dedup, ordering | LocalStack |
| SQS → Event Processor Lambda | Batch processing, error handling, DLQ routing | LocalStack |
| Event Processor → DynamoDB | CostEvent writes, baseline updates, transactions | Testcontainers DynamoDB Local |
| Anomaly Scorer → DynamoDB | Baseline reads, anomaly record writes | Testcontainers DynamoDB Local |
| Notifier → Slack API | Block Kit delivery, rate limiting, message updates | WireMock |
| API Gateway → Lambda | Auth (Cognito JWT), routing, throttling | LocalStack |
| STS → Customer Account | Cross-account role assumption, ExternalId validation | LocalStack |
| CDK Synth | Infrastructure snapshot, resource policy validation | CDK assertions |
### 2.4 E2E/Smoke Scenarios
1. **Real-Time Anomaly Detection**: CloudTrail event → scoring → Slack alert (<30s)
2. **Interactive Remediation**: Slack button click → StopInstances → message update
3. **Onboarding Flow**: Signup → CFN deploy → role validation → first alert
4. **14-Day Auto-Promotion**: Simulate 14 days → verify strict→audit transition
5. **Zombie Hunter**: Daily scan → detect idle EC2 → Slack digest
6. **Panic Mode**: Enable panic → all alerting stops → anomalies still logged
---
## Section 3: Unit Test Strategy
### 3.1 Cost Ingestion & Normalization
### 3.1 CloudTrail Normalizer
```typescript
describe('CloudTrailNormalizer', () => {
it('normalizes EC2 RunInstances event to CostEvent schema', () => {});
it('normalizes RDS CreateDBInstance event to CostEvent schema', () => {});
it('extracts assumed role ARN as actor instead of base STS role', () => {});
it('applies fallback pricing when instance type is not in static table', () => {});
it('ignores non-cost-generating events (e.g., DescribeInstances)', () => {});
describe('Event Parsing', () => {
it('normalizes EC2 RunInstances to CostEvent schema', () => {});
it('normalizes RDS CreateDBInstance to CostEvent schema', () => {});
it('normalizes Lambda CreateFunction to CostEvent schema', () => {});
it('extracts assumed role ARN as actor (not base STS role)', () => {});
it('extracts instance type, region, and AZ from event detail', () => {});
it('handles batched RunInstances (multiple instances in one call)', () => {});
it('ignores non-cost-generating events (DescribeInstances, ListBuckets)', () => {});
it('handles malformed CloudTrail JSON without crashing', () => {});
it('handles missing optional fields gracefully', () => {});
});
describe('Pricing Lookup', () => {
it('looks up correct on-demand price for us-east-1 m5.xlarge', () => {});
it('looks up correct on-demand price for us-west-2 r6g.2xlarge', () => {});
it('applies fallback pricing when instance type not in static table', () => {});
it('returns $0 for instance types with no pricing data and logs warning', () => {});
it('handles GPU instances (p4d, g5) with correct pricing', () => {});
});
describe('Deduplication', () => {
it('generates deterministic fingerprint from eventID', () => {});
it('detects duplicate CloudTrail events by eventID', () => {});
it('allows same resource type from different events', () => {});
});
describe('Cost Precision', () => {
it('calculates hourly cost with 4 decimal places', () => {});
it('rounds consistently (banker rounding) to avoid accumulation errors', () => {});
it('handles sub-cent costs for Lambda invocations', () => {});
});
});
```
### 3.2 Anomaly Engine (The Math)
### 3.2 Anomaly Scorer
The most critical component. Uses property-based testing via `fast-check`.
```typescript
describe('AnomalyScorer', () => {
describe('Statistical Scoring (Z-Score)', () => {
it('returns score=0 when event cost exactly matches baseline mean', () => {});
describe('Z-Score Calculation', () => {
it('returns 0 when event cost exactly matches baseline mean', () => {});
it('returns proportional score for Z-scores between 1.0 and 3.0', () => {});
it('caps Z-score contribution at max threshold', () => {});
it('caps Z-score contribution at configurable max threshold', () => {});
it('handles zero standard deviation without division by zero', () => {});
it('handles single data point baseline (stddev undefined)', () => {});
it('handles extremely large values without float overflow', () => {});
it('handles negative cost delta (cost decrease) as non-anomalous', () => {});
});
describe('Novelty Scoring', () => {
it('adds novelty penalty when instance type is first seen for account', () => {});
it('adds novelty penalty when IAM user has never provisioned this service', () => {});
it('adds instance novelty penalty when type first seen for account', () => {});
it('adds actor novelty penalty when IAM role is new', () => {});
it('does not penalize known instance type + known actor', () => {});
it('weights instance novelty higher than actor novelty', () => {});
});
describe('Composite Scoring', () => {
it('combines Z-score + novelty into composite severity', () => {});
it('classifies composite < 30 as info', () => {});
it('classifies composite 30-60 as warning', () => {});
it('classifies composite > 60 as critical', () => {});
it('never assigns critical to events below $0.50/hr', () => {});
});
describe('Cold-Start Fast Path', () => {
it('flags $5/hr instance as warning when baseline < 14 days', () => {});
it('flags $25/hr instance as critical immediately, bypassing baseline', () => {});
it('ignores $0.10/hr instances during cold-start learning period', () => {});
it('ignores $0.10/hr instances during cold-start learning', () => {});
it('fast-path is always on — not behind a feature flag', () => {});
it('transitions from fast-path to statistical scoring at maturity', () => {});
});
describe('Feedback Loop', () => {
it('reduces score for resources marked as expected', () => {});
it('adds actor to expected list after mark-as-expected', () => {});
it('still flags expected actor if cost is 10x above baseline', () => {});
});
describe('Property-Based Tests (fast-check)', () => {
it('score is always between 0 and 100 for any valid input', () => {
// fc.assert(fc.property(
// fc.record({ cost: fc.float({min: 0}), mean: fc.float({min: 0}), stddev: fc.float({min: 0}) }),
// (input) => { const score = scorer.score(input); return score >= 0 && score <= 100; }
// ))
});
it('score monotonically increases as cost increases (baseline fixed)', () => {});
it('score monotonically increases as Z-score increases', () => {});
it('cold-start fast-path always triggers for cost > $25/hr', () => {});
it('mature baseline never uses fast-path thresholds', () => {});
});
});
```
### 3.3 Baseline Learning
### 3.3 Baseline Engine
```typescript
describe('BaselineCalculator', () => {
it('updates running mean and stddev using Welford algorithm', () => {});
it('adds new actor to observed_actors set', () => {});
it('marks baseline as mature when event_count > 20 and age_days > 14', () => {});
describe('Welford Online Algorithm', () => {
it('updates running mean correctly after each observation', () => {});
it('updates running variance correctly after each observation', () => {});
it('produces correct stddev after 100 observations', () => {});
it('handles first observation (count=1, stddev=0)', () => {});
it('handles identical observations (stddev=0)', () => {});
it('handles catastrophic cancellation with large values', () => {
// Welford is numerically stable — verify this property
});
});
describe('Maturity Transitions', () => {
it('starts in cold-start state', () => {});
it('transitions to learning after 5 events', () => {});
it('transitions to mature after 20 events AND 14 days', () => {});
it('does not mature with 100 events but only 3 days', () => {});
it('does not mature with 14 days but only 5 events', () => {});
});
describe('Actor & Instance Tracking', () => {
it('adds new actor to observed_actors set', () => {});
it('adds new instance type to observed_types set', () => {});
it('does not duplicate existing actors', () => {});
});
describe('Property-Based Tests', () => {
it('mean converges to true mean as observations increase', () => {});
it('variance is always non-negative', () => {});
it('stddev equals sqrt(variance) within float tolerance', () => {});
});
});
```
### 3.4 Zombie Hunter
```typescript
describe('ZombieHunter', () => {
it('detects EC2 instance running >7 days with <5% CPU utilization', () => {});
it('detects RDS instance with 0 connections for >3 days', () => {});
it('detects unattached EBS volumes older than 7 days', () => {});
it('calculates cumulative waste cost for each zombie', () => {});
it('excludes instances tagged dd0c:ignore', () => {});
it('handles API pagination for accounts with 500+ instances', () => {});
it('respects read-only IAM permissions (never modifies resources)', () => {});
});
```
### 3.5 Notification Formatter
```typescript
describe('NotificationFormatter', () => {
describe('Slack Block Kit', () => {
it('formats EC2 anomaly with resource type, region, cost, actor', () => {});
it('formats RDS anomaly with engine, storage, multi-AZ status', () => {});
it('includes "Why this alert" section with anomaly signals', () => {});
it('includes suggested CLI commands for remediation', () => {});
it('includes Snooze/Mark Expected/Stop Instance buttons', () => {});
it('generates correct aws ec2 stop-instances command', () => {});
it('generates correct aws rds stop-db-instance command', () => {});
});
describe('Daily Digest', () => {
it('aggregates 24h of anomalies into summary stats', () => {});
it('includes total estimated spend across all accounts', () => {});
it('highlights top 3 costliest anomalies', () => {});
it('includes zombie resource count and waste estimate', () => {});
it('shows baseline learning progress for new accounts', () => {});
});
});
```
### 3.6 Slack Bot
```typescript
describe('SlackBot', () => {
describe('Signature Validation', () => {
it('validates correct Slack request signature (HMAC-SHA256)', () => {});
it('rejects request with invalid signature', () => {});
it('rejects request with missing X-Slack-Signature header', () => {});
it('rejects request with expired timestamp (>5 min)', () => {});
it('uses timing-safe comparison to prevent timing attacks', () => {});
});
describe('Command Parsing', () => {
it('routes /dd0c status to status handler', () => {});
it('routes /dd0c anomalies to anomaly list handler', () => {});
it('routes /dd0c digest to digest handler', () => {});
it('returns help text for unknown commands', () => {});
it('responds within 3 seconds or defers with 200 OK', () => {});
});
describe('Interactive Actions', () => {
it('validates interactive payload signature', () => {});
it('handles mark_expected action and updates baseline', () => {});
it('handles snooze_1h action and sets snoozeUntil', () => {});
it('handles snooze_24h action', () => {});
it('updates original Slack message after action', () => {});
it('rejects action from user not in authorized workspace', () => {});
});
});
```
### 3.7 Governance Policy Engine
```typescript
describe('GovernancePolicy', () => {
describe('Mode Enforcement', () => {
it('strict mode: logs anomaly but does not send Slack alert', () => {});
it('audit mode: sends Slack alert with full logging', () => {});
it('defaults new accounts to strict mode', () => {});
});
describe('14-Day Auto-Promotion', () => {
it('does not promote account with <14 days of baseline', () => {});
it('does not promote account with >10% false-positive rate', () => {});
it('promotes account on day 15 if FP rate <10%', () => {});
it('calculates false-positive rate from mark-as-expected actions', () => {});
it('auto-promotion check runs daily via cron', () => {});
});
describe('Panic Mode', () => {
it('stops all alerting when panic=true', () => {});
it('continues scoring and logging during panic', () => {});
it('activates in <1 second via Redis key', () => {});
it('activatable via POST /admin/panic', () => {});
it('dashboard API returns "alerting paused" header during panic', () => {});
});
describe('Per-Account Override', () => {
it('account can set stricter mode than system default', () => {});
it('account cannot downgrade from system strict to audit', () => {});
it('merge logic: max_restrictive(system, account)', () => {});
});
describe('Policy Decision Logging', () => {
it('logs "suppressed by strict mode" with anomaly context', () => {});
it('logs "auto-promoted to audit mode" with baseline stats', () => {});
it('logs "panic mode active — alerting paused"', () => {});
});
});
```
### 3.8 Dashboard API
```typescript
describe('DashboardAPI', () => {
describe('Account Management', () => {
it('GET /v1/accounts returns connected accounts for tenant', () => {});
it('DELETE /v1/accounts/:id marks account as disconnecting', () => {});
it('returns 401 without valid Cognito JWT', () => {});
it('scopes all queries to authenticated tenantId', () => {});
});
describe('Anomaly Listing', () => {
it('GET /v1/anomalies returns recent anomalies', () => {});
it('supports since, status, severity filters', () => {});
it('implements cursor-based pagination', () => {});
it('includes slackMessageUrl when alert was sent', () => {});
});
describe('Baseline Overrides', () => {
it('PATCH /v1/accounts/:id/baselines/:service/:type updates sensitivity', () => {});
it('rejects invalid sensitivity values', () => {});
});
describe('Tenant Isolation', () => {
it('never returns anomalies from another tenant', () => {});
it('never returns accounts from another tenant', () => {});
it('enforces tenantId on all DynamoDB queries', () => {});
});
});
```
### 3.9 Onboarding & PLG
```typescript
describe('Onboarding', () => {
describe('CloudFormation Template', () => {
it('generates valid CFN YAML with correct IAM permissions', () => {});
it('includes ExternalId parameter', () => {});
it('includes EventBridge rule for cost-relevant CloudTrail events', () => {});
it('quick-create URL contains correct template URL and parameters', () => {});
});
describe('Role Validation', () => {
it('successfully assumes role with correct ExternalId', () => {});
it('returns clear error on role not found', () => {});
it('returns clear error on ExternalId mismatch', () => {});
it('triggers zombie scan on successful connection', () => {});
});
describe('Free Tier Enforcement', () => {
it('allows first account connection on free tier', () => {});
it('rejects second account with 403 and upgrade prompt', () => {});
it('allows multiple accounts on pro tier', () => {});
});
describe('Stripe Integration', () => {
it('creates Stripe Checkout session with correct pricing', () => {});
it('handles checkout.session.completed webhook', () => {});
it('handles customer.subscription.deleted webhook', () => {});
it('validates Stripe webhook signature', () => {});
it('updates tenant tier to pro on successful payment', () => {});
it('downgrades tenant on subscription cancellation', () => {});
});
});
```
### 3.10 Feature Flag Circuit Breaker
```typescript
describe('AlertVolumeCircuitBreaker', () => {
it('allows alerting when volume is within 3x baseline', () => {});
it('trips breaker when alerts exceed 3x baseline over 1 hour', () => {});
it('auto-disables the scoring flag when breaker trips', () => {});
it('buffers suppressed alerts in DLQ for review', () => {});
it('tracks alert-per-account rate in Redis sliding window', () => {});
it('resets breaker after manual flag re-enable', () => {});
it('fast-path alerts are exempt from circuit breaker', () => {});
});
```
---
## Section 4: Integration Test Strategy
### 4.1 DynamoDB Data Layer (Testcontainers)
```typescript
describe('DynamoDB Single-Table Patterns', () => {
it('writes CostEvent and updates Baseline in single transaction', async () => {});
it('queries all anomalies for tenant within time range', async () => {});
it('fetches tenant config and Slack tokens securely', async () => {});
describe('DynamoDB Integrations', () => {
let dynamodb: StartedTestContainer;
beforeAll(async () => {
dynamodb = await new GenericContainer('amazon/dynamodb-local:latest')
.withExposedPorts(8000).start();
// Create dd0c-cost-main table with GSIs
});
describe('Transactional Writes', () => {
it('writes CostEvent and updates Baseline in single TransactWriteItem', async () => {});
it('fails gracefully if TransactWriteItem encounters ConditionalCheckFailed', async () => {});
it('handles partial failure recovery when Baseline update conflicts', async () => {});
});
describe('Access Patterns', () => {
it('queries all anomalies for tenant within time range (GSI3)', async () => {});
it('fetches tenant config and Slack tokens securely', async () => {});
it('retrieves accurate Baseline snapshot by resource type', async () => {});
});
});
```
### 4.2 AWS API Contract Tests
### 4.2 Cross-Account STS & AWS APIs (LocalStack)
```typescript
describe('AWS Cross-Account Actions', () => {
// Uses LocalStack to simulate target account
it('assumes target account remediation role successfully', async () => {});
it('executes ec2:StopInstances when remediation approved', async () => {});
it('executes rds:DeleteDBInstance with skip-final-snapshot', async () => {});
describe('AWS Cross-Account Integrations', () => {
let localstack: StartedTestContainer;
beforeAll(async () => {
localstack = await new GenericContainer('localstack/localstack:3')
.withEnv('SERVICES', 'sts,ec2,rds')
.withExposedPorts(4566).start();
});
describe('Role Assumption', () => {
it('successfully assumes target account remediation role via STS', async () => {});
it('fails when ExternalId does not match (Security)', async () => {});
it('handles STS credential expiration gracefully', async () => {});
});
describe('Remediation Actions', () => {
it('executes ec2:StopInstances when remediation approved', async () => {});
it('executes rds:StopDBInstance when remediation approved', async () => {});
it('fails safely when target IAM role lacks StopInstances permission', async () => {});
});
});
```
### 4.3 Slack API Contract (WireMock)
```typescript
describe('Slack Integration', () => {
it('formats and delivers Block Kit message successfully', async () => {});
it('handles 429 Rate Limit by throwing retryable error for SQS visibility timeout', async () => {});
it('updates existing Slack message when anomaly is snoozed', async () => {});
});
```
@@ -159,24 +528,65 @@ describe('AWS Cross-Account Actions', () => {
### 5.1 Critical User Journeys
**Journey 1: Real-Time Anomaly Detection**
1. Send synthetic `RunInstances` event to EventBridge (p9.16xlarge, $40/hr).
2. Verify system processes event and triggers fast-path (no baseline).
3. Verify Slack alert is generated with correct cost estimate.
**Journey 1: Real-Time Anomaly Detection (The Golden Path)**
```typescript
describe('E2E: Anomaly Detection', () => {
it('detects anomaly and alerts Slack within 30 seconds', async () => {
// 1. Inject synthetic CloudTrail `RunInstances` event (p4d.24xlarge) into SQS Ingestion Queue
// 2. Poll DynamoDB to ensure CostEvent was recorded
// 3. Poll DynamoDB to ensure AnomalyRecord was created (fast-path triggered)
// 4. Assert WireMock received the Slack chat.postMessage call with Block Kit
});
});
```
**Journey 2: Interactive Remediation**
1. Send webhook simulating user clicking "Stop Instance" in Slack.
2. Verify API Gateway → Lambda executes `StopInstances` against LocalStack.
3. Verify Slack message updates to "Remediation Successful".
```typescript
describe('E2E: Interactive Remediation', () => {
it('stops EC2 instance when user clicks Stop in Slack', async () => {
// 1. Simulate Slack sending interactive webhook payload for "Stop Instance"
// 2. Validate HMAC signature in API Gateway lambda
// 3. Verify LocalStack EC2 mock receives StopInstances call
// 4. Verify Slack message is updated to "Remediation Successful"
});
});
```
**Journey 3: Onboarding & First Scan**
```typescript
describe('E2E: Onboarding', () => {
it('validates IAM role and triggers initial zombie scan', async () => {
// 1. Trigger POST /v1/accounts with new role ARN
// 2. Verify account marked active
// 3. Verify EventBridge Scheduler creates cron for Zombie Hunter
});
});
```
---
## Section 6: Performance & Load Testing
### 6.1 Ingestion & Scoring Throughput
```typescript
describe('Ingestion Throughput', () => {
it('processes 500 CloudTrail events/second via SQS FIFO', async () => {});
it('DynamoDB baseline updates complete in <20ms p95', async () => {});
describe('Performance: Alert Storm', () => {
it('processes 1000 CloudTrail events/sec without SQS DLQ overflow', async () => {
// k6 load test hitting SQS directly
});
it('DynamoDB baseline updates complete in <20ms p95 under load', async () => {
// Ensure Single-Table schema does not create hot partitions
});
it('Anomaly Scorer Lambda consumes <256MB memory during burst', async () => {});
});
```
### 6.2 Data Scale Tests
```typescript
describe('Performance: Baseline Scale', () => {
it('calculates Z-score in <5ms even when observed_actors set exceeds 1000', async () => {});
it('handles accounts with 100,000+ daily CostEvents without throttling DynamoDB (On-Demand scaling)', async () => {});
});
```
@@ -184,49 +594,119 @@ describe('Ingestion Throughput', () => {
## Section 7: CI/CD Pipeline Integration
- **PR Gate:** Unit tests (<2min), Coverage >85% (Scoring engine >95%).
- **Merge:** Integration tests with LocalStack & Testcontainers DynamoDB.
- **Staging:** E2E journeys against isolated staging AWS account.
### 7.1 Pipeline Stages
```
┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Pre-Commit │───▶│ PR Gate │───▶│ Merge │───▶│ Staging │───▶│ Prod │
│ (local) │ │ (CI) │ │ (CI) │ │ (CD) │ │ (CD) │
└─────────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
lint + type unit tests integration E2E + perf canary
<10s math prop Testcontainers LocalStack <5 mins
tests <1m <4 mins <10 mins
```
### 7.2 Coverage Gates
| Component | Threshold |
|-----------|-----------|
| Anomaly Scorer (Math) | 100% |
| CloudTrail Normalizer | 95% |
| Governance Policy | 95% |
| Slack Signature Auth | 100% |
| Overall Pipeline | 85% |
---
## Section 8: Transparent Factory Tenet Testing
### 8.1 Atomic Flagging (Circuit Breaker)
### 8.1 Atomic Flagging
```typescript
it('auto-disables scoring rule if it generates >10 alerts/hour for single tenant', () => {});
describe('Atomic Flagging', () => {
it('auto-disables scoring rule flag if alert volume exceeds 3x baseline in 1hr', () => {});
it('buffers suppressed anomalies in SQS DLQ while flag is off', () => {});
it('fails CI if any flag TTL exceeds 14 days', () => {});
it('evaluates flags strictly locally (in-memory provider)', () => {});
});
```
### 8.2 Configurable Autonomy (14-Day Auto-Promotion)
### 8.2 Elastic Schema
```typescript
it('keeps new tenant in strict mode (log-only) for first 14 days', () => {});
it('auto-promotes to audit mode (auto-alert) on day 15 if false-positive rate < 10%', () => {});
describe('Elastic Schema', () => {
it('rejects DynamoDB table definition modifications that alter key schemas', () => {});
it('requires all DynamoDB item updates to use ADD/SET (additive only)', () => {});
it('ignores unknown attributes (V2 fields) in V1 CostEvent decoders', () => {});
});
```
### 8.3 Cognitive Durability
```typescript
describe('Cognitive Durability', () => {
it('requires decision_log.json for any PR modifying Z-score thresholds or weights', () => {});
it('enforces cyclomatic complexity < 10 for all AnomalyScorer math functions', () => {});
});
```
### 8.4 Semantic Observability
```typescript
describe('Semantic Observability', () => {
it('emits OTEL span for every Anomaly Scoring decision', () => {});
it('includes attributes: cost.z_score, cost.anomaly_score, cost.baseline_days', () => {});
it('includes cost.fast_path_triggered flag when baseline is bypassed', () => {});
it('hashes AWS Account ID in spans to protect PII/tenant identity', () => {});
});
```
### 8.5 Configurable Autonomy
```typescript
describe('Configurable Autonomy', () => {
it('keeps new tenant in Strict Mode (log-only) for first 14 days', () => {});
it('auto-promotes to Audit Mode on day 15 if false-positive rate < 10%', () => {});
it('Panic Mode halts ALL Slack alerts in <1 second via Redis check', () => {});
it('Panic Mode does NOT halt baseline recording (read-only tracking continues)', () => {});
});
```
---
## Section 9: Test Data & Fixtures
```
fixtures/
cloudtrail/
ec2-runinstances.json
rds-create-db.json
lambda-create-function.json
baselines/
mature-steady-spend.json
volatile-dev-account.json
cold-start.json
### 9.1 Data Factories
```typescript
export const makeCloudTrailEvent = (overrides) => ({
eventVersion: '1.08',
userIdentity: { type: 'AssumedRole', arn: 'arn:aws:sts::123:assumed-role/user' },
eventTime: new Date().toISOString(),
eventSource: 'ec2.amazonaws.com',
eventName: 'RunInstances',
requestParameters: { instanceType: 'm5.large' },
...overrides
});
export const makeBaseline = (overrides) => ({
meanHourlyCost: 1.25,
stdDev: 0.15,
eventCount: 45,
ageDays: 16,
observedActors: ['arn:aws:iam::123:role/ci'],
observedInstanceTypes: ['t3.medium', 'm5.large'],
...overrides
});
```
---
## Section 10: TDD Implementation Order
1. **Phase 1:** Anomaly math + Unit tests (Strict TDD).
2. **Phase 2:** CloudTrail normalizer + Pricing tables.
3. **Phase 3:** DynamoDB single-table implementation (Integration led).
4. **Phase 4:** Slack formatting + Remediation Lambda.
5. **Phase 5:** Governance policies (14-day promotion logic).
1. **Phase 1: Math & Core Logic (Strict TDD)**
- Welford algorithm, Z-score math, Novelty scoring, `fast-check` property tests.
2. **Phase 2: Ingestion & Normalization**
- CloudTrail parsers, pricing static tables, event deduplication.
3. **Phase 3: Data Persistence (Integration Led)**
- DynamoDB Single-Table setup, TransactWriteItems, Testcontainers tests.
4. **Phase 4: Notifications & Slack Actions**
- Block Kit formatting, Slack signature validation, API Gateway endpoints.
5. **Phase 5: Governance & Tenets**
- 14-day promotion logic, Panic mode, OTEL tracing.
6. **Phase 6: E2E Pipeline**
- CDK definitions, LocalStack event injection, wire everything together.
*End of dd0c/cost Test Architecture*
*End of dd0c/cost Test Architecture (v2)*

View File

@@ -1760,3 +1760,527 @@ Before writing the `impl ExecutionEngine { pub async fn execute(...) }` function
5. `engine_pauses_in_flight_execution_when_panic_mode_set`
Only once these tests are defined can the state machine be implemented to make them pass (Green phase). This ensures no execution path can bypass the Trust Gradient.
---
## 11. Review Remediation Addendum (Post-Gemini Review)
The following sections address all gaps identified in the TDD review. These are net-new test specifications that must be integrated into the relevant sections above during implementation.
### 11.1 Missing Epic Coverage
#### Epic 3.4: Divergence Analysis
```rust
// pkg/executor/divergence/tests.rs
#[test] fn divergence_detects_extra_command_not_in_runbook() {}
#[test] fn divergence_detects_modified_command_vs_prescribed() {}
#[test] fn divergence_detects_skipped_step_not_marked_as_skipped() {}
#[test] fn divergence_report_includes_diff_of_prescribed_vs_actual() {}
#[test] fn divergence_flags_env_var_changes_made_during_execution() {}
#[test] fn divergence_ignores_whitespace_differences_in_commands() {}
#[test] fn divergence_analysis_runs_automatically_after_execution_completes() {}
#[test] fn divergence_report_written_to_audit_trail() {}
#[tokio::test]
async fn integration_divergence_analysis_detects_agent_side_extra_commands() {
// Agent executes an extra `whoami` not in the runbook
// Divergence analyzer must flag it
}
```
#### Epic 5.3: Compliance Export
```rust
// pkg/audit/export/tests.rs
#[tokio::test] async fn export_generates_valid_csv_for_date_range() {}
#[tokio::test] async fn export_generates_valid_pdf_with_execution_summary() {}
#[tokio::test] async fn export_uploads_to_s3_and_returns_presigned_url() {}
#[tokio::test] async fn export_presigned_url_expires_after_24_hours() {}
#[tokio::test] async fn export_scoped_to_tenant_via_rls() {}
#[tokio::test] async fn export_includes_hash_chain_verification_status() {}
#[tokio::test] async fn export_redacts_command_output_but_includes_hashes() {}
```
#### Epic 6.4: Classification Query API Rate Limiting
```rust
// tests/integration/api_rate_limit_test.rs
#[tokio::test]
async fn api_rate_limit_30_requests_per_minute_per_tenant() {
let stack = E2EStack::start().await;
for i in 0..30 {
let resp = stack.api().get("/v1/run/classifications").send().await;
assert_eq!(resp.status(), 200);
}
// 31st request must be rate-limited
let resp = stack.api().get("/v1/run/classifications").send().await;
assert_eq!(resp.status(), 429);
}
#[tokio::test]
async fn api_rate_limit_resets_after_60_seconds() {}
#[tokio::test]
async fn api_rate_limit_is_per_tenant_not_global() {
// Tenant A hitting limit must not affect Tenant B
}
#[tokio::test]
async fn api_rate_limit_returns_retry_after_header() {}
```
#### Epic 7: Dashboard UI (Playwright)
```typescript
// tests/e2e/ui/dashboard.spec.ts
test('parse preview renders within 5 seconds of paste', async ({ page }) => {
await page.goto('/dashboard/runbooks/new');
await page.fill('[data-testid="runbook-input"]', FIXTURE_RUNBOOK);
const preview = page.locator('[data-testid="parse-preview"]');
await expect(preview).toBeVisible({ timeout: 5000 });
await expect(preview.locator('.step-card')).toHaveCount(4);
});
test('trust level visualization shows correct colors per step', async ({ page }) => {
// 🟢 safe = green, 🟡 caution = yellow, 🔴 dangerous = red
});
test('MTTR dashboard loads and displays chart', async ({ page }) => {
await page.goto('/dashboard/analytics');
await expect(page.locator('[data-testid="mttr-chart"]')).toBeVisible();
});
test('execution timeline shows real-time step progress', async ({ page }) => {});
test('approval modal requires typed confirmation for dangerous steps', async ({ page }) => {});
test('panic mode banner appears when panic is active', async ({ page }) => {});
```
#### Epic 9: Onboarding & PLG
```rust
// pkg/onboarding/tests.rs
#[test] fn free_tier_allows_5_runbooks() {}
#[test] fn free_tier_allows_50_executions_per_month() {}
#[test] fn free_tier_rejects_6th_runbook_with_upgrade_prompt() {}
#[test] fn free_tier_rejects_51st_execution_with_upgrade_prompt() {}
#[test] fn free_tier_counter_resets_monthly() {}
#[test] fn agent_install_snippet_includes_correct_api_key() {}
#[test] fn agent_install_snippet_includes_correct_gateway_url() {}
#[test] fn agent_install_snippet_is_valid_bash() {}
#[tokio::test] async fn stripe_checkout_creates_session_with_correct_pricing() {}
#[tokio::test] async fn stripe_webhook_checkout_completed_upgrades_tenant() {}
#[tokio::test] async fn stripe_webhook_subscription_deleted_downgrades_tenant() {}
#[tokio::test] async fn stripe_webhook_validates_signature() {}
```
### 11.2 Agent-Side Security Tests (Zero-Trust Environment)
The Agent runs in customer VPCs — untrusted territory. These tests prove the Agent defends itself independently of the SaaS backend.
```rust
// pkg/agent/security/tests.rs
// Agent-side deterministic blocking (mirrors SaaS scanner)
#[test] fn agent_scanner_blocks_rm_rf_independently_of_saas() {}
#[test] fn agent_scanner_blocks_kubectl_delete_namespace_independently() {}
#[test] fn agent_scanner_blocks_drop_table_independently() {}
#[test] fn agent_scanner_rejects_command_even_if_saas_says_safe() {
// Simulates compromised SaaS sending a "safe" classification for rm -rf
let saas_classification = Classification { risk: RiskLevel::Safe, .. };
let agent_result = agent_scanner.classify("rm -rf /");
assert_eq!(agent_result.risk, RiskLevel::Dangerous);
// Agent MUST override SaaS classification
}
// Binary integrity
#[test] fn agent_validates_binary_checksum_on_startup() {}
#[test] fn agent_refuses_to_start_if_checksum_mismatch() {}
// Payload tampering
#[tokio::test] async fn agent_rejects_grpc_payload_with_invalid_hmac() {}
#[tokio::test] async fn agent_rejects_grpc_payload_with_expired_timestamp() {}
#[tokio::test] async fn agent_rejects_grpc_payload_with_mismatched_execution_id() {}
// Local fallback when SaaS is unreachable
#[tokio::test] async fn agent_falls_back_to_scanner_only_when_saas_disconnected() {}
#[tokio::test] async fn agent_in_fallback_mode_treats_all_unknowns_as_caution() {}
#[tokio::test] async fn agent_reconnects_automatically_when_saas_returns() {}
```
### 11.3 Realistic Sandbox Matrix
Replace Alpine-only sandbox with a matrix of realistic execution targets.
```rust
// tests/integration/sandbox_matrix_test.rs
#[rstest]
#[case("ubuntu:22.04")]
#[case("amazonlinux:2023")]
#[case("alpine:3.19")]
async fn sandbox_safe_command_executes_on_all_targets(#[case] image: &str) {
let sandbox = SandboxContainer::start(image).await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("ls /tmp").await.unwrap();
assert_eq!(result.exit_code, 0);
}
#[rstest]
#[case("ubuntu:22.04")]
#[case("amazonlinux:2023")]
async fn sandbox_dangerous_command_blocked_on_all_targets(#[case] image: &str) {
let sandbox = SandboxContainer::start(image).await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("rm -rf /").await;
assert!(result.is_err());
}
// Non-root execution
#[tokio::test]
async fn sandbox_agent_runs_as_non_root_user() {
let sandbox = SandboxContainer::start_as_user("ubuntu:22.04", "dd0c-agent").await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("whoami").await.unwrap();
assert_eq!(result.stdout.trim(), "dd0c-agent");
}
#[tokio::test]
async fn sandbox_non_root_agent_cannot_escalate_to_root() {
let sandbox = SandboxContainer::start_as_user("ubuntu:22.04", "dd0c-agent").await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("sudo cat /etc/shadow").await;
assert!(result.is_err() || result.unwrap().exit_code != 0);
}
// RBAC-restricted K3s
#[tokio::test]
async fn sandbox_k3s_rbac_denies_kubectl_delete_namespace() {
let k3s = K3sContainer::start_with_rbac("read-only-role").await;
let agent = TestAgent::with_kubeconfig(k3s.kubeconfig()).await;
let result = agent.execute("kubectl delete namespace default").await;
// Should be blocked by BOTH scanner AND K8s RBAC
assert!(result.is_err());
}
```
### 11.4 Advanced Command Injection Tests
```rust
// pkg/classifier/scanner/injection_tests.rs
// Semicolon injection
#[test] fn scanner_semicolon_rm_rf_is_dangerous() {
assert_dangerous("echo hello; rm -rf /");
}
// Pipe injection
#[test] fn scanner_pipe_to_rm_is_dangerous() {
assert_dangerous("find / -name '*.log' | xargs rm -rf");
}
// Backtick injection
#[test] fn scanner_backtick_rm_is_dangerous() {
assert_dangerous("echo `rm -rf /`");
}
// $() substitution (already tested, but more variants)
#[test] fn scanner_nested_substitution_is_dangerous() {
assert_dangerous("echo $(echo $(rm -rf /))");
}
// Newline injection
#[test] fn scanner_newline_injection_is_dangerous() {
assert_dangerous("echo safe\nrm -rf /");
}
// Null byte injection
#[test] fn scanner_null_byte_injection_is_dangerous() {
assert_dangerous("echo safe\0rm -rf /");
}
// Unicode homoglyph attack
#[test] fn scanner_unicode_homoglyph_rm_is_dangerous() {
// Using Cyrillic 'р' and 'м' that look like 'r' and 'm'
assert_dangerous("rм -rf /"); // Should still catch this
}
// Base64 encoded payload
#[test] fn scanner_base64_decode_pipe_bash_is_dangerous() {
assert_dangerous("echo cm0gLXJmIC8= | base64 -d | bash");
}
// Heredoc injection
#[test] fn scanner_heredoc_with_destructive_is_dangerous() {
assert_dangerous("cat << EOF | bash\nrm -rf /\nEOF");
}
// Environment variable expansion
#[test] fn scanner_env_var_expansion_to_rm_is_dangerous() {
assert_dangerous("$CMD"); // Unknown variable expansion = unknown, not safe
}
```
### 11.5 Privilege Escalation Tests
```rust
// pkg/classifier/scanner/escalation_tests.rs
#[test] fn scanner_sudo_anything_is_at_least_caution() {
assert_at_least_caution("sudo systemctl restart nginx");
}
#[test] fn scanner_sudo_rm_is_dangerous() {
assert_dangerous("sudo rm -rf /var/log");
}
#[test] fn scanner_su_root_is_dangerous() {
assert_dangerous("su - root -c 'rm -rf /'");
}
#[test] fn scanner_chmod_suid_is_dangerous() {
assert_dangerous("chmod u+s /usr/bin/find");
}
#[test] fn scanner_chown_root_is_caution() {
assert_at_least_caution("chown root:root /tmp/exploit");
}
#[test] fn scanner_nsenter_is_dangerous() {
assert_dangerous("nsenter --target 1 --mount --uts --ipc --net --pid");
}
#[test] fn scanner_docker_run_privileged_is_dangerous() {
assert_dangerous("docker run --privileged -v /:/host ubuntu");
}
#[test] fn scanner_kubectl_exec_as_root_is_caution() {
assert_at_least_caution("kubectl exec -it pod -- /bin/bash");
}
```
### 11.6 Rollback Failure & Nested Failure Tests
```rust
// pkg/executor/rollback/tests.rs
#[test] fn rollback_failure_transitions_to_manual_intervention() {
let mut engine = ExecutionEngine::new();
engine.transition(State::RollingBack);
engine.report_rollback_failure("rollback command timed out");
assert_eq!(engine.state(), State::ManualIntervention);
}
#[test] fn rollback_failure_does_not_retry_automatically() {
// Rollback failures are terminal — no auto-retry
}
#[test] fn rollback_timeout_kills_rollback_process_after_300s() {}
#[test] fn rollback_hanging_indefinitely_triggers_manual_intervention_after_timeout() {
let mut engine = ExecutionEngine::with_rollback_timeout(Duration::from_secs(5));
engine.transition(State::RollingBack);
// Simulate rollback that never completes
tokio::time::advance(Duration::from_secs(6)).await;
assert_eq!(engine.state(), State::ManualIntervention);
}
#[test] fn manual_intervention_state_sends_slack_alert_to_oncall() {}
#[test] fn manual_intervention_state_logs_full_context_to_audit() {}
```
### 11.7 Double Execution & Network Partition Tests
```rust
// pkg/executor/idempotency/tests.rs
#[tokio::test]
async fn agent_reconnect_after_partition_resyncs_already_executed_step() {
let stack = E2EStack::start().await;
let execution = stack.start_execution().await;
// Agent executes step successfully
stack.wait_for_step_state(&execution.id, &step_id, "executing").await;
// Network partition AFTER execution but BEFORE ACK
stack.partition_agent().await;
// Agent reconnects
stack.heal_partition().await;
// Engine must recognize step was already executed — no double execution
let step = stack.get_step(&execution.id, &step_id).await;
assert_eq!(step.execution_count, 1); // Exactly once
}
#[tokio::test]
async fn engine_does_not_re_send_command_after_agent_reconnect_if_step_completed() {}
#[tokio::test]
async fn engine_re_sends_command_if_agent_never_started_execution_before_partition() {}
```
### 11.8 Slack Payload Forgery Tests
```rust
// tests/integration/slack_security_test.rs
#[tokio::test]
async fn slack_approval_webhook_rejects_missing_signature() {
let resp = stack.api()
.post("/v1/run/slack/actions")
.json(&fixture_approval_payload())
// No X-Slack-Signature header
.send().await;
assert_eq!(resp.status(), 401);
}
#[tokio::test]
async fn slack_approval_webhook_rejects_invalid_signature() {
let resp = stack.api()
.post("/v1/run/slack/actions")
.header("X-Slack-Signature", "v0=invalid_hmac")
.header("X-Slack-Request-Timestamp", &now_timestamp())
.json(&fixture_approval_payload())
.send().await;
assert_eq!(resp.status(), 401);
}
#[tokio::test]
async fn slack_approval_webhook_rejects_replayed_timestamp() {
// Timestamp older than 5 minutes
let resp = stack.api()
.post("/v1/run/slack/actions")
.header("X-Slack-Signature", &valid_signature_for_old_timestamp())
.header("X-Slack-Request-Timestamp", &five_minutes_ago())
.json(&fixture_approval_payload())
.send().await;
assert_eq!(resp.status(), 401);
}
#[tokio::test]
async fn slack_approval_webhook_rejects_cross_tenant_approval() {
// Tenant A's user trying to approve Tenant B's execution
}
```
### 11.9 Audit Log Encryption Tests
```rust
// tests/integration/audit_encryption_test.rs
#[tokio::test]
async fn audit_log_command_field_is_encrypted_at_rest() {
let db = TestDb::start().await;
// Insert an audit event with a command
insert_audit_event(&db, "kubectl get pods").await;
// Read raw bytes from PostgreSQL — must NOT contain plaintext command
let raw = db.query_raw_bytes("SELECT command FROM audit_events LIMIT 1").await;
assert!(!String::from_utf8_lossy(&raw).contains("kubectl get pods"),
"Command stored in plaintext — must be encrypted");
}
#[tokio::test]
async fn audit_log_output_field_is_encrypted_at_rest() {
let db = TestDb::start().await;
insert_audit_event_with_output(&db, "sensitive output data").await;
let raw = db.query_raw_bytes("SELECT output FROM audit_events LIMIT 1").await;
assert!(!String::from_utf8_lossy(&raw).contains("sensitive output data"));
}
#[tokio::test]
async fn audit_log_decryption_requires_kms_key() {
// Verify the app role can decrypt using the KMS key
let db = TestDb::start().await;
insert_audit_event(&db, "kubectl get pods").await;
let decrypted = db.as_app_role()
.query("SELECT decrypt_command(command) FROM audit_events LIMIT 1").await;
assert_eq!(decrypted, "kubectl get pods");
}
```
### 11.10 gRPC Output Buffer Limits
```rust
// pkg/agent/streaming/tests.rs
#[tokio::test]
async fn agent_truncates_stdout_at_10mb() {
let sandbox = SandboxContainer::start("ubuntu:22.04").await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
// Generate 50MB of output
let result = agent.execute("dd if=/dev/urandom bs=1M count=50 | base64").await.unwrap();
// Agent must truncate, not OOM
assert!(result.stdout.len() <= 10 * 1024 * 1024);
assert!(result.truncated);
}
#[tokio::test]
async fn agent_streams_output_in_chunks_not_buffered() {
// Verify output arrives incrementally, not all at once after completion
}
#[tokio::test]
async fn agent_memory_stays_under_256mb_during_large_output() {
// Memory profiling test — agent must not OOM on `cat /dev/urandom`
}
#[tokio::test]
async fn engine_handles_truncated_output_gracefully() {
// Engine receives truncated flag and logs warning
}
```
### 11.11 Parse SLA End-to-End Benchmark
```rust
// benches/parse_sla_bench.rs
#[tokio::test]
async fn parse_plus_classify_pipeline_under_5s_p95() {
let stack = E2EStack::start().await;
let mut latencies = vec![];
for _ in 0..100 {
let start = Instant::now();
stack.api()
.post("/v1/run/runbooks/parse-preview")
.json(&json!({ "raw_text": FIXTURE_RUNBOOK_10_STEPS }))
.send().await;
latencies.push(start.elapsed());
}
let p95 = percentile(&latencies, 95);
assert!(p95 < Duration::from_secs(5),
"Parse+Classify p95 latency: {:?} — exceeds 5s SLA", p95);
}
```
### 11.12 Updated Test Pyramid (Post-Review)
The Execution Engine ratio shifts from 80/15/5 to 60/30/10 per review recommendation:
| Component | Unit | Integration | E2E |
|-----------|------|-------------|-----|
| Safety Scanner | 80% | 15% | 5% |
| Merge Engine | 90% | 10% | 0% |
| Execution Engine | **60%** | **30%** | **10%** |
| Parser | 50% | 40% | 10% |
| Approval Workflow | 70% | 20% | 10% |
| Audit Trail | 60% | 35% | 5% |
| Agent | 50% | 35% | 15% |
| Dashboard API | 40% | 50% | 10% |
*End of Review Remediation Addendum*

View File

@@ -0,0 +1,226 @@
# dd0c Platform — PLG Instrumentation Brainstorm
**Session:** Carson (Brainstorming Coach) — Cross-Product PLG Analytics
**Date:** March 1, 2026
**Scope:** All 6 dd0c products
---
## The Problem
We built 6 products with onboarding flows, free tiers, and Stripe billing — but zero product analytics. We can't answer:
- How many users hit "aha moment" vs. bounce?
- Where in the funnel do free users drop off before upgrading?
- Which features drive retention vs. which are ignored?
- Are users churning because of alert fatigue, false positives, or just not getting value?
- What's our time-to-first-value per product?
Without instrumentation, PLG iteration is guesswork.
---
## Brainstorm: What to Instrument
### 1. Unified Event Taxonomy
Every dd0c product shares a common event naming convention:
```
<domain>.<object>.<action>
Examples:
account.signup.completed
account.aws.connected
anomaly.alert.sent
anomaly.alert.snoozed
slack.bot.installed
billing.checkout.started
billing.upgrade.completed
feature.flag.evaluated
```
**Rules:**
- Past tense for completed actions (`completed`, `sent`, `clicked`)
- Present tense for state changes (`active`, `learning`, `paused`)
- Always include `tenant_id`, `timestamp`, `product` (route/drift/alert/portal/cost/run)
- Never include PII — hash emails, account IDs
### 2. Per-Product Activation Metrics
The "aha moment" is different for each product:
| Product | Aha Moment | Metric | Target |
|---------|-----------|--------|--------|
| dd0c/route | First dollar saved by model routing | `routing.savings.first_dollar` | <24hr from signup |
| dd0c/drift | First drift detected in real stack | `drift.detection.first_found` | <1hr from agent install |
| dd0c/alert | First alert correlated (not just forwarded) | `alert.correlation.first_match` | <60sec from first alert |
| dd0c/portal | First service auto-discovered | `portal.discovery.first_service` | <5min from install |
| dd0c/cost | First anomaly detected in real account | `cost.anomaly.first_detected` | <24hr from AWS connect |
| dd0c/run | First runbook executed successfully | `run.execution.first_success` | <10min from setup |
### 3. Conversion Funnel (Universal)
Every product shares this funnel shape:
```
Signup → Connect (AWS/Slack/Git) → First Value → Habit → Upgrade
```
Events per stage:
**Stage 1: Signup**
- `account.signup.started` — landed on signup page
- `account.signup.completed` — account created
- `account.signup.method` — github_sso / google_sso / email
**Stage 2: Connect**
- `account.integration.started` — began connecting external service
- `account.integration.completed` — connection verified
- `account.integration.failed` — connection failed (include `error_type`)
- Product-specific: `account.aws.connected`, `account.slack.installed`, `account.git.connected`
**Stage 3: First Value**
- Product-specific aha moment event (see table above)
- `onboarding.wizard.step_completed` — which step, how long
- `onboarding.wizard.abandoned` — which step they quit on
**Stage 4: Habit**
- `session.daily.active` — DAU ping
- `session.weekly.active` — WAU ping
- `feature.<name>.used` — per-feature usage
- `notification.digest.opened` — are they reading digests?
- `slack.command.used` — which slash commands, how often
**Stage 5: Upgrade**
- `billing.checkout.started`
- `billing.checkout.completed`
- `billing.checkout.abandoned`
- `billing.plan.changed` — upgrade/downgrade
- `billing.churn.detected` — subscription cancelled
### 4. Feature Usage Events (Per Product)
**dd0c/route (LLM Cost Router)**
- `routing.request.processed` — model selected, latency, cost
- `routing.override.manual` — user forced a specific model
- `routing.savings.calculated` — weekly savings digest generated
- `routing.shadow.audit.run` — shadow mode comparison completed
- `dashboard.cost.viewed` — opened cost dashboard
**dd0c/drift (IaC Drift Detection)**
- `drift.scan.completed` — scan finished, drifts found count
- `drift.remediation.clicked` — user clicked "fix drift"
- `drift.remediation.applied` — drift actually fixed
- `drift.false_positive.marked` — user dismissed a drift
- `drift.agent.heartbeat` — agent is alive and scanning
**dd0c/alert (Alert Intelligence)**
- `alert.ingested` — raw alert received
- `alert.correlated` — alerts grouped into incident
- `alert.suppressed` — duplicate/noise suppressed
- `alert.escalated` — sent to on-call
- `alert.feedback.helpful` / `alert.feedback.noise` — user feedback
- `alert.mttr.measured` — time from alert to resolution
**dd0c/portal (Lightweight IDP)**
- `portal.service.discovered` — auto-discovery found a service
- `portal.service.claimed` — team claimed ownership
- `portal.scorecard.viewed` — someone checked service health
- `portal.scorecard.action_taken` — acted on a recommendation
- `portal.search.performed` — searched the catalog
**dd0c/cost (AWS Cost Anomaly)**
- `cost.event.ingested` — CloudTrail event processed
- `cost.anomaly.scored` — anomaly scoring completed
- `cost.anomaly.alerted` — Slack alert sent
- `cost.anomaly.snoozed` — user snoozed alert
- `cost.anomaly.expected` — user marked as expected
- `cost.remediation.clicked` — user clicked Stop/Terminate
- `cost.remediation.executed` — remediation completed
- `cost.zombie.detected` — idle resource found
- `cost.digest.sent` — daily digest delivered
**dd0c/run (Runbook Automation)**
- `run.runbook.created` — new runbook authored
- `run.execution.started` — runbook execution began
- `run.execution.completed` — execution finished (include `success`/`failed`)
- `run.execution.approval_requested` — human approval needed
- `run.execution.approval_granted` — human approved
- `run.execution.rolled_back` — rollback triggered
- `run.sandbox.test.run` — dry-run in sandbox
### 5. Health Scoring (Churn Prediction)
Composite health score per tenant, updated daily:
```
health_score = (
0.3 * activation_complete + // did they hit aha moment?
0.2 * weekly_active_days + // how many days active this week?
0.2 * feature_breadth + // how many features used?
0.15 * integration_depth + // how many integrations connected?
0.15 * feedback_sentiment // positive vs negative actions
)
```
Thresholds:
- `health > 0.7` → Healthy (green)
- `health 0.4-0.7` → At Risk (yellow) → trigger re-engagement email
- `health < 0.4` → Churning (red) → trigger founder outreach
### 6. Analytics Stack Recommendation
**PostHog** (self-hosted on AWS):
- Open source, self-hostable → no vendor lock-in
- Free tier: unlimited events self-hosted
- Built-in: funnels, retention, feature flags, session replay
- Supports custom events via REST API or JS/Python SDK
- Can run on a single t3.medium for V1 traffic
**Why not Segment/Amplitude/Mixpanel:**
- Segment: $120/mo minimum, overkill for solo founder
- Amplitude: free tier is generous but cloud-only, data leaves your infra
- Mixpanel: same cloud-only concern
- PostHog self-hosted: $0/mo, data stays in your AWS account, GDPR-friendly
**Integration pattern:**
```
Lambda/API → PostHog REST API (async, fire-and-forget)
Next.js UI → PostHog JS SDK (auto-captures pageviews, clicks)
Slack Bot → PostHog Python SDK (command usage, action clicks)
```
### 7. Cross-Product Flywheel Metrics
dd0c is a platform — users on one product should discover others:
- `platform.cross_sell.impression` — "Try dd0c/alert" banner shown
- `platform.cross_sell.clicked` — user clicked cross-sell
- `platform.cross_sell.activated` — user activated second product
- `platform.products.active_count` — how many dd0c products per tenant
**Flywheel hypothesis:** Users who activate 2+ dd0c products have 3x lower churn than single-product users. We need data to prove/disprove this.
---
## Epic 11 Proposal: PLG Instrumentation
### Scope
Cross-cutting epic added to all 6 products. Shared analytics SDK, per-product event implementations, funnel dashboards, health scoring.
### Stories (Draft)
1. **PostHog Infrastructure** — CDK stack for self-hosted PostHog on ECS Fargate
2. **Analytics SDK** — Shared TypeScript/Python wrapper with standard event schema
3. **Funnel Dashboard** — PostHog dashboard template per product
4. **Activation Tracking** — Per-product aha moment detection and logging
5. **Health Scoring Engine** — Daily cron that computes tenant health scores
6. **Cross-Sell Instrumentation** — Platform-level cross-product discovery events
7. **Churn Alert Pipeline** — Health score → Slack alert to founder when tenant goes red
### Estimate
~25 story points across all products (shared infrastructure + per-product event wiring)
---
*This brainstorm establishes the "what" and "why." Party Mode advisory board should stress-test: Is PostHog the right choice? Is the event taxonomy too granular? Should health scoring be V1 or V2? Is 25 points realistic?*