Implement review remediation + PLG analytics SDK

- All 6 test architectures patched with Section 11 addendums - P5 (cost) fully rewritten from 232 to ~600 lines - PLG brainstorm + party mode advisory board results - Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe) - Analytics tests v2 (safeParse, no , no timestamp, no PII) - Addresses all Gemini review findings across P1-P6
2026-03-01 01:42:49 +00:00
parent 2fe0ed856e
commit 03bfe931fc
9 changed files with 2950 additions and 85 deletions
--- a/products/06-runbook-automation/test-architecture/test-architecture.md
+++ b/products/06-runbook-automation/test-architecture/test-architecture.md
@@ -1760,3 +1760,527 @@ Before writing the `impl ExecutionEngine { pub async fn execute(...) }` function
 5. `engine_pauses_in_flight_execution_when_panic_mode_set`

 Only once these tests are defined can the state machine be implemented to make them pass (Green phase). This ensures no execution path can bypass the Trust Gradient.
+
+---
+
+## 11. Review Remediation Addendum (Post-Gemini Review)
+
+The following sections address all gaps identified in the TDD review. These are net-new test specifications that must be integrated into the relevant sections above during implementation.
+
+### 11.1 Missing Epic Coverage
+
+#### Epic 3.4: Divergence Analysis
+
+```rust
+// pkg/executor/divergence/tests.rs
+
+#[test] fn divergence_detects_extra_command_not_in_runbook() {}
+#[test] fn divergence_detects_modified_command_vs_prescribed() {}
+#[test] fn divergence_detects_skipped_step_not_marked_as_skipped() {}
+#[test] fn divergence_report_includes_diff_of_prescribed_vs_actual() {}
+#[test] fn divergence_flags_env_var_changes_made_during_execution() {}
+#[test] fn divergence_ignores_whitespace_differences_in_commands() {}
+#[test] fn divergence_analysis_runs_automatically_after_execution_completes() {}
+#[test] fn divergence_report_written_to_audit_trail() {}
+
+#[tokio::test]
+async fn integration_divergence_analysis_detects_agent_side_extra_commands() {
+    // Agent executes an extra `whoami` not in the runbook
+    // Divergence analyzer must flag it
+}
+```
+
+#### Epic 5.3: Compliance Export
+
+```rust
+// pkg/audit/export/tests.rs
+
+#[tokio::test] async fn export_generates_valid_csv_for_date_range() {}
+#[tokio::test] async fn export_generates_valid_pdf_with_execution_summary() {}
+#[tokio::test] async fn export_uploads_to_s3_and_returns_presigned_url() {}
+#[tokio::test] async fn export_presigned_url_expires_after_24_hours() {}
+#[tokio::test] async fn export_scoped_to_tenant_via_rls() {}
+#[tokio::test] async fn export_includes_hash_chain_verification_status() {}
+#[tokio::test] async fn export_redacts_command_output_but_includes_hashes() {}
+```
+
+#### Epic 6.4: Classification Query API Rate Limiting
+
+```rust
+// tests/integration/api_rate_limit_test.rs
+
+#[tokio::test]
+async fn api_rate_limit_30_requests_per_minute_per_tenant() {
+    let stack = E2EStack::start().await;
+    for i in 0..30 {
+        let resp = stack.api().get("/v1/run/classifications").send().await;
+        assert_eq!(resp.status(), 200);
+    }
+    // 31st request must be rate-limited
+    let resp = stack.api().get("/v1/run/classifications").send().await;
+    assert_eq!(resp.status(), 429);
+}
+
+#[tokio::test]
+async fn api_rate_limit_resets_after_60_seconds() {}
+
+#[tokio::test]
+async fn api_rate_limit_is_per_tenant_not_global() {
+    // Tenant A hitting limit must not affect Tenant B
+}
+
+#[tokio::test]
+async fn api_rate_limit_returns_retry_after_header() {}
+```
+
+#### Epic 7: Dashboard UI (Playwright)
+
+```typescript
+// tests/e2e/ui/dashboard.spec.ts
+
+test('parse preview renders within 5 seconds of paste', async ({ page }) => {
+  await page.goto('/dashboard/runbooks/new');
+  await page.fill('[data-testid="runbook-input"]', FIXTURE_RUNBOOK);
+  const preview = page.locator('[data-testid="parse-preview"]');
+  await expect(preview).toBeVisible({ timeout: 5000 });
+  await expect(preview.locator('.step-card')).toHaveCount(4);
+});
+
+test('trust level visualization shows correct colors per step', async ({ page }) => {
+  // 🟢 safe = green, 🟡 caution = yellow, 🔴 dangerous = red
+});
+
+test('MTTR dashboard loads and displays chart', async ({ page }) => {
+  await page.goto('/dashboard/analytics');
+  await expect(page.locator('[data-testid="mttr-chart"]')).toBeVisible();
+});
+
+test('execution timeline shows real-time step progress', async ({ page }) => {});
+test('approval modal requires typed confirmation for dangerous steps', async ({ page }) => {});
+test('panic mode banner appears when panic is active', async ({ page }) => {});
+```
+
+#### Epic 9: Onboarding & PLG
+
+```rust
+// pkg/onboarding/tests.rs
+
+#[test] fn free_tier_allows_5_runbooks() {}
+#[test] fn free_tier_allows_50_executions_per_month() {}
+#[test] fn free_tier_rejects_6th_runbook_with_upgrade_prompt() {}
+#[test] fn free_tier_rejects_51st_execution_with_upgrade_prompt() {}
+#[test] fn free_tier_counter_resets_monthly() {}
+
+#[test] fn agent_install_snippet_includes_correct_api_key() {}
+#[test] fn agent_install_snippet_includes_correct_gateway_url() {}
+#[test] fn agent_install_snippet_is_valid_bash() {}
+
+#[tokio::test] async fn stripe_checkout_creates_session_with_correct_pricing() {}
+#[tokio::test] async fn stripe_webhook_checkout_completed_upgrades_tenant() {}
+#[tokio::test] async fn stripe_webhook_subscription_deleted_downgrades_tenant() {}
+#[tokio::test] async fn stripe_webhook_validates_signature() {}
+```
+
+### 11.2 Agent-Side Security Tests (Zero-Trust Environment)
+
+The Agent runs in customer VPCs — untrusted territory. These tests prove the Agent defends itself independently of the SaaS backend.
+
+```rust
+// pkg/agent/security/tests.rs
+
+// Agent-side deterministic blocking (mirrors SaaS scanner)
+#[test] fn agent_scanner_blocks_rm_rf_independently_of_saas() {}
+#[test] fn agent_scanner_blocks_kubectl_delete_namespace_independently() {}
+#[test] fn agent_scanner_blocks_drop_table_independently() {}
+#[test] fn agent_scanner_rejects_command_even_if_saas_says_safe() {
+    // Simulates compromised SaaS sending a "safe" classification for rm -rf
+    let saas_classification = Classification { risk: RiskLevel::Safe, .. };
+    let agent_result = agent_scanner.classify("rm -rf /");
+    assert_eq!(agent_result.risk, RiskLevel::Dangerous);
+    // Agent MUST override SaaS classification
+}
+
+// Binary integrity
+#[test] fn agent_validates_binary_checksum_on_startup() {}
+#[test] fn agent_refuses_to_start_if_checksum_mismatch() {}
+
+// Payload tampering
+#[tokio::test] async fn agent_rejects_grpc_payload_with_invalid_hmac() {}
+#[tokio::test] async fn agent_rejects_grpc_payload_with_expired_timestamp() {}
+#[tokio::test] async fn agent_rejects_grpc_payload_with_mismatched_execution_id() {}
+
+// Local fallback when SaaS is unreachable
+#[tokio::test] async fn agent_falls_back_to_scanner_only_when_saas_disconnected() {}
+#[tokio::test] async fn agent_in_fallback_mode_treats_all_unknowns_as_caution() {}
+#[tokio::test] async fn agent_reconnects_automatically_when_saas_returns() {}
+```
+
+### 11.3 Realistic Sandbox Matrix
+
+Replace Alpine-only sandbox with a matrix of realistic execution targets.
+
+```rust
+// tests/integration/sandbox_matrix_test.rs
+
+#[rstest]
+#[case("ubuntu:22.04")]
+#[case("amazonlinux:2023")]
+#[case("alpine:3.19")]
+async fn sandbox_safe_command_executes_on_all_targets(#[case] image: &str) {
+    let sandbox = SandboxContainer::start(image).await;
+    let agent = TestAgent::connect_to(sandbox.socket_path()).await;
+    let result = agent.execute("ls /tmp").await.unwrap();
+    assert_eq!(result.exit_code, 0);
+}
+
+#[rstest]
+#[case("ubuntu:22.04")]
+#[case("amazonlinux:2023")]
+async fn sandbox_dangerous_command_blocked_on_all_targets(#[case] image: &str) {
+    let sandbox = SandboxContainer::start(image).await;
+    let agent = TestAgent::connect_to(sandbox.socket_path()).await;
+    let result = agent.execute("rm -rf /").await;
+    assert!(result.is_err());
+}
+
+// Non-root execution
+#[tokio::test]
+async fn sandbox_agent_runs_as_non_root_user() {
+    let sandbox = SandboxContainer::start_as_user("ubuntu:22.04", "dd0c-agent").await;
+    let agent = TestAgent::connect_to(sandbox.socket_path()).await;
+    let result = agent.execute("whoami").await.unwrap();
+    assert_eq!(result.stdout.trim(), "dd0c-agent");
+}
+
+#[tokio::test]
+async fn sandbox_non_root_agent_cannot_escalate_to_root() {
+    let sandbox = SandboxContainer::start_as_user("ubuntu:22.04", "dd0c-agent").await;
+    let agent = TestAgent::connect_to(sandbox.socket_path()).await;
+    let result = agent.execute("sudo cat /etc/shadow").await;
+    assert!(result.is_err() || result.unwrap().exit_code != 0);
+}
+
+// RBAC-restricted K3s
+#[tokio::test]
+async fn sandbox_k3s_rbac_denies_kubectl_delete_namespace() {
+    let k3s = K3sContainer::start_with_rbac("read-only-role").await;
+    let agent = TestAgent::with_kubeconfig(k3s.kubeconfig()).await;
+    let result = agent.execute("kubectl delete namespace default").await;
+    // Should be blocked by BOTH scanner AND K8s RBAC
+    assert!(result.is_err());
+}
+```
+
+### 11.4 Advanced Command Injection Tests
+
+```rust
+// pkg/classifier/scanner/injection_tests.rs
+
+// Semicolon injection
+#[test] fn scanner_semicolon_rm_rf_is_dangerous() {
+    assert_dangerous("echo hello; rm -rf /");
+}
+
+// Pipe injection
+#[test] fn scanner_pipe_to_rm_is_dangerous() {
+    assert_dangerous("find / -name '*.log' | xargs rm -rf");
+}
+
+// Backtick injection
+#[test] fn scanner_backtick_rm_is_dangerous() {
+    assert_dangerous("echo `rm -rf /`");
+}
+
+// $() substitution (already tested, but more variants)
+#[test] fn scanner_nested_substitution_is_dangerous() {
+    assert_dangerous("echo $(echo $(rm -rf /))");
+}
+
+// Newline injection
+#[test] fn scanner_newline_injection_is_dangerous() {
+    assert_dangerous("echo safe\nrm -rf /");
+}
+
+// Null byte injection
+#[test] fn scanner_null_byte_injection_is_dangerous() {
+    assert_dangerous("echo safe\0rm -rf /");
+}
+
+// Unicode homoglyph attack
+#[test] fn scanner_unicode_homoglyph_rm_is_dangerous() {
+    // Using Cyrillic 'р' and 'м' that look like 'r' and 'm'
+    assert_dangerous("rм -rf /"); // Should still catch this
+}
+
+// Base64 encoded payload
+#[test] fn scanner_base64_decode_pipe_bash_is_dangerous() {
+    assert_dangerous("echo cm0gLXJmIC8= | base64 -d | bash");
+}
+
+// Heredoc injection
+#[test] fn scanner_heredoc_with_destructive_is_dangerous() {
+    assert_dangerous("cat << EOF | bash\nrm -rf /\nEOF");
+}
+
+// Environment variable expansion
+#[test] fn scanner_env_var_expansion_to_rm_is_dangerous() {
+    assert_dangerous("$CMD"); // Unknown variable expansion = unknown, not safe
+}
+```
+
+### 11.5 Privilege Escalation Tests
+
+```rust
+// pkg/classifier/scanner/escalation_tests.rs
+
+#[test] fn scanner_sudo_anything_is_at_least_caution() {
+    assert_at_least_caution("sudo systemctl restart nginx");
+}
+
+#[test] fn scanner_sudo_rm_is_dangerous() {
+    assert_dangerous("sudo rm -rf /var/log");
+}
+
+#[test] fn scanner_su_root_is_dangerous() {
+    assert_dangerous("su - root -c 'rm -rf /'");
+}
+
+#[test] fn scanner_chmod_suid_is_dangerous() {
+    assert_dangerous("chmod u+s /usr/bin/find");
+}
+
+#[test] fn scanner_chown_root_is_caution() {
+    assert_at_least_caution("chown root:root /tmp/exploit");
+}
+
+#[test] fn scanner_nsenter_is_dangerous() {
+    assert_dangerous("nsenter --target 1 --mount --uts --ipc --net --pid");
+}
+
+#[test] fn scanner_docker_run_privileged_is_dangerous() {
+    assert_dangerous("docker run --privileged -v /:/host ubuntu");
+}
+
+#[test] fn scanner_kubectl_exec_as_root_is_caution() {
+    assert_at_least_caution("kubectl exec -it pod -- /bin/bash");
+}
+```
+
+### 11.6 Rollback Failure & Nested Failure Tests
+
+```rust
+// pkg/executor/rollback/tests.rs
+
+#[test] fn rollback_failure_transitions_to_manual_intervention() {
+    let mut engine = ExecutionEngine::new();
+    engine.transition(State::RollingBack);
+    engine.report_rollback_failure("rollback command timed out");
+    assert_eq!(engine.state(), State::ManualIntervention);
+}
+
+#[test] fn rollback_failure_does_not_retry_automatically() {
+    // Rollback failures are terminal — no auto-retry
+}
+
+#[test] fn rollback_timeout_kills_rollback_process_after_300s() {}
+
+#[test] fn rollback_hanging_indefinitely_triggers_manual_intervention_after_timeout() {
+    let mut engine = ExecutionEngine::with_rollback_timeout(Duration::from_secs(5));
+    engine.transition(State::RollingBack);
+    // Simulate rollback that never completes
+    tokio::time::advance(Duration::from_secs(6)).await;
+    assert_eq!(engine.state(), State::ManualIntervention);
+}
+
+#[test] fn manual_intervention_state_sends_slack_alert_to_oncall() {}
+#[test] fn manual_intervention_state_logs_full_context_to_audit() {}
+```
+
+### 11.7 Double Execution & Network Partition Tests
+
+```rust
+// pkg/executor/idempotency/tests.rs
+
+#[tokio::test]
+async fn agent_reconnect_after_partition_resyncs_already_executed_step() {
+    let stack = E2EStack::start().await;
+    let execution = stack.start_execution().await;
+
+    // Agent executes step successfully
+    stack.wait_for_step_state(&execution.id, &step_id, "executing").await;
+
+    // Network partition AFTER execution but BEFORE ACK
+    stack.partition_agent().await;
+
+    // Agent reconnects
+    stack.heal_partition().await;
+
+    // Engine must recognize step was already executed — no double execution
+    let step = stack.get_step(&execution.id, &step_id).await;
+    assert_eq!(step.execution_count, 1); // Exactly once
+}
+
+#[tokio::test]
+async fn engine_does_not_re_send_command_after_agent_reconnect_if_step_completed() {}
+
+#[tokio::test]
+async fn engine_re_sends_command_if_agent_never_started_execution_before_partition() {}
+```
+
+### 11.8 Slack Payload Forgery Tests
+
+```rust
+// tests/integration/slack_security_test.rs
+
+#[tokio::test]
+async fn slack_approval_webhook_rejects_missing_signature() {
+    let resp = stack.api()
+        .post("/v1/run/slack/actions")
+        .json(&fixture_approval_payload())
+        // No X-Slack-Signature header
+        .send().await;
+    assert_eq!(resp.status(), 401);
+}
+
+#[tokio::test]
+async fn slack_approval_webhook_rejects_invalid_signature() {
+    let resp = stack.api()
+        .post("/v1/run/slack/actions")
+        .header("X-Slack-Signature", "v0=invalid_hmac")
+        .header("X-Slack-Request-Timestamp", &now_timestamp())
+        .json(&fixture_approval_payload())
+        .send().await;
+    assert_eq!(resp.status(), 401);
+}
+
+#[tokio::test]
+async fn slack_approval_webhook_rejects_replayed_timestamp() {
+    // Timestamp older than 5 minutes
+    let resp = stack.api()
+        .post("/v1/run/slack/actions")
+        .header("X-Slack-Signature", &valid_signature_for_old_timestamp())
+        .header("X-Slack-Request-Timestamp", &five_minutes_ago())
+        .json(&fixture_approval_payload())
+        .send().await;
+    assert_eq!(resp.status(), 401);
+}
+
+#[tokio::test]
+async fn slack_approval_webhook_rejects_cross_tenant_approval() {
+    // Tenant A's user trying to approve Tenant B's execution
+}
+```
+
+### 11.9 Audit Log Encryption Tests
+
+```rust
+// tests/integration/audit_encryption_test.rs
+
+#[tokio::test]
+async fn audit_log_command_field_is_encrypted_at_rest() {
+    let db = TestDb::start().await;
+    // Insert an audit event with a command
+    insert_audit_event(&db, "kubectl get pods").await;
+
+    // Read raw bytes from PostgreSQL — must NOT contain plaintext command
+    let raw = db.query_raw_bytes("SELECT command FROM audit_events LIMIT 1").await;
+    assert!(!String::from_utf8_lossy(&raw).contains("kubectl get pods"),
+        "Command stored in plaintext — must be encrypted");
+}
+
+#[tokio::test]
+async fn audit_log_output_field_is_encrypted_at_rest() {
+    let db = TestDb::start().await;
+    insert_audit_event_with_output(&db, "sensitive output data").await;
+
+    let raw = db.query_raw_bytes("SELECT output FROM audit_events LIMIT 1").await;
+    assert!(!String::from_utf8_lossy(&raw).contains("sensitive output data"));
+}
+
+#[tokio::test]
+async fn audit_log_decryption_requires_kms_key() {
+    // Verify the app role can decrypt using the KMS key
+    let db = TestDb::start().await;
+    insert_audit_event(&db, "kubectl get pods").await;
+
+    let decrypted = db.as_app_role()
+        .query("SELECT decrypt_command(command) FROM audit_events LIMIT 1").await;
+    assert_eq!(decrypted, "kubectl get pods");
+}
+```
+
+### 11.10 gRPC Output Buffer Limits
+
+```rust
+// pkg/agent/streaming/tests.rs
+
+#[tokio::test]
+async fn agent_truncates_stdout_at_10mb() {
+    let sandbox = SandboxContainer::start("ubuntu:22.04").await;
+    let agent = TestAgent::connect_to(sandbox.socket_path()).await;
+
+    // Generate 50MB of output
+    let result = agent.execute("dd if=/dev/urandom bs=1M count=50 | base64").await.unwrap();
+
+    // Agent must truncate, not OOM
+    assert!(result.stdout.len() <= 10 * 1024 * 1024);
+    assert!(result.truncated);
+}
+
+#[tokio::test]
+async fn agent_streams_output_in_chunks_not_buffered() {
+    // Verify output arrives incrementally, not all at once after completion
+}
+
+#[tokio::test]
+async fn agent_memory_stays_under_256mb_during_large_output() {
+    // Memory profiling test — agent must not OOM on `cat /dev/urandom`
+}
+
+#[tokio::test]
+async fn engine_handles_truncated_output_gracefully() {
+    // Engine receives truncated flag and logs warning
+}
+```
+
+### 11.11 Parse SLA End-to-End Benchmark
+
+```rust
+// benches/parse_sla_bench.rs
+
+#[tokio::test]
+async fn parse_plus_classify_pipeline_under_5s_p95() {
+    let stack = E2EStack::start().await;
+    let mut latencies = vec![];
+
+    for _ in 0..100 {
+        let start = Instant::now();
+        stack.api()
+            .post("/v1/run/runbooks/parse-preview")
+            .json(&json!({ "raw_text": FIXTURE_RUNBOOK_10_STEPS }))
+            .send().await;
+        latencies.push(start.elapsed());
+    }
+
+    let p95 = percentile(&latencies, 95);
+    assert!(p95 < Duration::from_secs(5),
+        "Parse+Classify p95 latency: {:?} — exceeds 5s SLA", p95);
+}
+```
+
+### 11.12 Updated Test Pyramid (Post-Review)
+
+The Execution Engine ratio shifts from 80/15/5 to 60/30/10 per review recommendation:
+
+| Component | Unit | Integration | E2E |
+|-----------|------|-------------|-----|
+| Safety Scanner | 80% | 15% | 5% |
+| Merge Engine | 90% | 10% | 0% |
+| Execution Engine | **60%** | **30%** | **10%** |
+| Parser | 50% | 40% | 10% |
+| Approval Workflow | 70% | 20% | 10% |
+| Audit Trail | 60% | 35% | 5% |
+| Agent | 50% | 35% | 15% |
+| Dashboard API | 40% | 50% | 10% |
+
+*End of Review Remediation Addendum*