Implement review remediation + PLG analytics SDK

- All 6 test architectures patched with Section 11 addendums
- P5 (cost) fully rewritten from 232 to ~600 lines
- PLG brainstorm + party mode advisory board results
- Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe)
- Analytics tests v2 (safeParse, no , no timestamp, no PII)
- Addresses all Gemini review findings across P1-P6
This commit is contained in:
2026-03-01 01:42:49 +00:00
parent 2fe0ed856e
commit 03bfe931fc
9 changed files with 2950 additions and 85 deletions

View File

@@ -1760,3 +1760,527 @@ Before writing the `impl ExecutionEngine { pub async fn execute(...) }` function
5. `engine_pauses_in_flight_execution_when_panic_mode_set`
Only once these tests are defined can the state machine be implemented to make them pass (Green phase). This ensures no execution path can bypass the Trust Gradient.
---
## 11. Review Remediation Addendum (Post-Gemini Review)
The following sections address all gaps identified in the TDD review. These are net-new test specifications that must be integrated into the relevant sections above during implementation.
### 11.1 Missing Epic Coverage
#### Epic 3.4: Divergence Analysis
```rust
// pkg/executor/divergence/tests.rs
#[test] fn divergence_detects_extra_command_not_in_runbook() {}
#[test] fn divergence_detects_modified_command_vs_prescribed() {}
#[test] fn divergence_detects_skipped_step_not_marked_as_skipped() {}
#[test] fn divergence_report_includes_diff_of_prescribed_vs_actual() {}
#[test] fn divergence_flags_env_var_changes_made_during_execution() {}
#[test] fn divergence_ignores_whitespace_differences_in_commands() {}
#[test] fn divergence_analysis_runs_automatically_after_execution_completes() {}
#[test] fn divergence_report_written_to_audit_trail() {}
#[tokio::test]
async fn integration_divergence_analysis_detects_agent_side_extra_commands() {
// Agent executes an extra `whoami` not in the runbook
// Divergence analyzer must flag it
}
```
#### Epic 5.3: Compliance Export
```rust
// pkg/audit/export/tests.rs
#[tokio::test] async fn export_generates_valid_csv_for_date_range() {}
#[tokio::test] async fn export_generates_valid_pdf_with_execution_summary() {}
#[tokio::test] async fn export_uploads_to_s3_and_returns_presigned_url() {}
#[tokio::test] async fn export_presigned_url_expires_after_24_hours() {}
#[tokio::test] async fn export_scoped_to_tenant_via_rls() {}
#[tokio::test] async fn export_includes_hash_chain_verification_status() {}
#[tokio::test] async fn export_redacts_command_output_but_includes_hashes() {}
```
#### Epic 6.4: Classification Query API Rate Limiting
```rust
// tests/integration/api_rate_limit_test.rs
#[tokio::test]
async fn api_rate_limit_30_requests_per_minute_per_tenant() {
let stack = E2EStack::start().await;
for i in 0..30 {
let resp = stack.api().get("/v1/run/classifications").send().await;
assert_eq!(resp.status(), 200);
}
// 31st request must be rate-limited
let resp = stack.api().get("/v1/run/classifications").send().await;
assert_eq!(resp.status(), 429);
}
#[tokio::test]
async fn api_rate_limit_resets_after_60_seconds() {}
#[tokio::test]
async fn api_rate_limit_is_per_tenant_not_global() {
// Tenant A hitting limit must not affect Tenant B
}
#[tokio::test]
async fn api_rate_limit_returns_retry_after_header() {}
```
#### Epic 7: Dashboard UI (Playwright)
```typescript
// tests/e2e/ui/dashboard.spec.ts
test('parse preview renders within 5 seconds of paste', async ({ page }) => {
await page.goto('/dashboard/runbooks/new');
await page.fill('[data-testid="runbook-input"]', FIXTURE_RUNBOOK);
const preview = page.locator('[data-testid="parse-preview"]');
await expect(preview).toBeVisible({ timeout: 5000 });
await expect(preview.locator('.step-card')).toHaveCount(4);
});
test('trust level visualization shows correct colors per step', async ({ page }) => {
// 🟢 safe = green, 🟡 caution = yellow, 🔴 dangerous = red
});
test('MTTR dashboard loads and displays chart', async ({ page }) => {
await page.goto('/dashboard/analytics');
await expect(page.locator('[data-testid="mttr-chart"]')).toBeVisible();
});
test('execution timeline shows real-time step progress', async ({ page }) => {});
test('approval modal requires typed confirmation for dangerous steps', async ({ page }) => {});
test('panic mode banner appears when panic is active', async ({ page }) => {});
```
#### Epic 9: Onboarding & PLG
```rust
// pkg/onboarding/tests.rs
#[test] fn free_tier_allows_5_runbooks() {}
#[test] fn free_tier_allows_50_executions_per_month() {}
#[test] fn free_tier_rejects_6th_runbook_with_upgrade_prompt() {}
#[test] fn free_tier_rejects_51st_execution_with_upgrade_prompt() {}
#[test] fn free_tier_counter_resets_monthly() {}
#[test] fn agent_install_snippet_includes_correct_api_key() {}
#[test] fn agent_install_snippet_includes_correct_gateway_url() {}
#[test] fn agent_install_snippet_is_valid_bash() {}
#[tokio::test] async fn stripe_checkout_creates_session_with_correct_pricing() {}
#[tokio::test] async fn stripe_webhook_checkout_completed_upgrades_tenant() {}
#[tokio::test] async fn stripe_webhook_subscription_deleted_downgrades_tenant() {}
#[tokio::test] async fn stripe_webhook_validates_signature() {}
```
### 11.2 Agent-Side Security Tests (Zero-Trust Environment)
The Agent runs in customer VPCs — untrusted territory. These tests prove the Agent defends itself independently of the SaaS backend.
```rust
// pkg/agent/security/tests.rs
// Agent-side deterministic blocking (mirrors SaaS scanner)
#[test] fn agent_scanner_blocks_rm_rf_independently_of_saas() {}
#[test] fn agent_scanner_blocks_kubectl_delete_namespace_independently() {}
#[test] fn agent_scanner_blocks_drop_table_independently() {}
#[test] fn agent_scanner_rejects_command_even_if_saas_says_safe() {
// Simulates compromised SaaS sending a "safe" classification for rm -rf
let saas_classification = Classification { risk: RiskLevel::Safe, .. };
let agent_result = agent_scanner.classify("rm -rf /");
assert_eq!(agent_result.risk, RiskLevel::Dangerous);
// Agent MUST override SaaS classification
}
// Binary integrity
#[test] fn agent_validates_binary_checksum_on_startup() {}
#[test] fn agent_refuses_to_start_if_checksum_mismatch() {}
// Payload tampering
#[tokio::test] async fn agent_rejects_grpc_payload_with_invalid_hmac() {}
#[tokio::test] async fn agent_rejects_grpc_payload_with_expired_timestamp() {}
#[tokio::test] async fn agent_rejects_grpc_payload_with_mismatched_execution_id() {}
// Local fallback when SaaS is unreachable
#[tokio::test] async fn agent_falls_back_to_scanner_only_when_saas_disconnected() {}
#[tokio::test] async fn agent_in_fallback_mode_treats_all_unknowns_as_caution() {}
#[tokio::test] async fn agent_reconnects_automatically_when_saas_returns() {}
```
### 11.3 Realistic Sandbox Matrix
Replace Alpine-only sandbox with a matrix of realistic execution targets.
```rust
// tests/integration/sandbox_matrix_test.rs
#[rstest]
#[case("ubuntu:22.04")]
#[case("amazonlinux:2023")]
#[case("alpine:3.19")]
async fn sandbox_safe_command_executes_on_all_targets(#[case] image: &str) {
let sandbox = SandboxContainer::start(image).await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("ls /tmp").await.unwrap();
assert_eq!(result.exit_code, 0);
}
#[rstest]
#[case("ubuntu:22.04")]
#[case("amazonlinux:2023")]
async fn sandbox_dangerous_command_blocked_on_all_targets(#[case] image: &str) {
let sandbox = SandboxContainer::start(image).await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("rm -rf /").await;
assert!(result.is_err());
}
// Non-root execution
#[tokio::test]
async fn sandbox_agent_runs_as_non_root_user() {
let sandbox = SandboxContainer::start_as_user("ubuntu:22.04", "dd0c-agent").await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("whoami").await.unwrap();
assert_eq!(result.stdout.trim(), "dd0c-agent");
}
#[tokio::test]
async fn sandbox_non_root_agent_cannot_escalate_to_root() {
let sandbox = SandboxContainer::start_as_user("ubuntu:22.04", "dd0c-agent").await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
let result = agent.execute("sudo cat /etc/shadow").await;
assert!(result.is_err() || result.unwrap().exit_code != 0);
}
// RBAC-restricted K3s
#[tokio::test]
async fn sandbox_k3s_rbac_denies_kubectl_delete_namespace() {
let k3s = K3sContainer::start_with_rbac("read-only-role").await;
let agent = TestAgent::with_kubeconfig(k3s.kubeconfig()).await;
let result = agent.execute("kubectl delete namespace default").await;
// Should be blocked by BOTH scanner AND K8s RBAC
assert!(result.is_err());
}
```
### 11.4 Advanced Command Injection Tests
```rust
// pkg/classifier/scanner/injection_tests.rs
// Semicolon injection
#[test] fn scanner_semicolon_rm_rf_is_dangerous() {
assert_dangerous("echo hello; rm -rf /");
}
// Pipe injection
#[test] fn scanner_pipe_to_rm_is_dangerous() {
assert_dangerous("find / -name '*.log' | xargs rm -rf");
}
// Backtick injection
#[test] fn scanner_backtick_rm_is_dangerous() {
assert_dangerous("echo `rm -rf /`");
}
// $() substitution (already tested, but more variants)
#[test] fn scanner_nested_substitution_is_dangerous() {
assert_dangerous("echo $(echo $(rm -rf /))");
}
// Newline injection
#[test] fn scanner_newline_injection_is_dangerous() {
assert_dangerous("echo safe\nrm -rf /");
}
// Null byte injection
#[test] fn scanner_null_byte_injection_is_dangerous() {
assert_dangerous("echo safe\0rm -rf /");
}
// Unicode homoglyph attack
#[test] fn scanner_unicode_homoglyph_rm_is_dangerous() {
// Using Cyrillic 'р' and 'м' that look like 'r' and 'm'
assert_dangerous("rм -rf /"); // Should still catch this
}
// Base64 encoded payload
#[test] fn scanner_base64_decode_pipe_bash_is_dangerous() {
assert_dangerous("echo cm0gLXJmIC8= | base64 -d | bash");
}
// Heredoc injection
#[test] fn scanner_heredoc_with_destructive_is_dangerous() {
assert_dangerous("cat << EOF | bash\nrm -rf /\nEOF");
}
// Environment variable expansion
#[test] fn scanner_env_var_expansion_to_rm_is_dangerous() {
assert_dangerous("$CMD"); // Unknown variable expansion = unknown, not safe
}
```
### 11.5 Privilege Escalation Tests
```rust
// pkg/classifier/scanner/escalation_tests.rs
#[test] fn scanner_sudo_anything_is_at_least_caution() {
assert_at_least_caution("sudo systemctl restart nginx");
}
#[test] fn scanner_sudo_rm_is_dangerous() {
assert_dangerous("sudo rm -rf /var/log");
}
#[test] fn scanner_su_root_is_dangerous() {
assert_dangerous("su - root -c 'rm -rf /'");
}
#[test] fn scanner_chmod_suid_is_dangerous() {
assert_dangerous("chmod u+s /usr/bin/find");
}
#[test] fn scanner_chown_root_is_caution() {
assert_at_least_caution("chown root:root /tmp/exploit");
}
#[test] fn scanner_nsenter_is_dangerous() {
assert_dangerous("nsenter --target 1 --mount --uts --ipc --net --pid");
}
#[test] fn scanner_docker_run_privileged_is_dangerous() {
assert_dangerous("docker run --privileged -v /:/host ubuntu");
}
#[test] fn scanner_kubectl_exec_as_root_is_caution() {
assert_at_least_caution("kubectl exec -it pod -- /bin/bash");
}
```
### 11.6 Rollback Failure & Nested Failure Tests
```rust
// pkg/executor/rollback/tests.rs
#[test] fn rollback_failure_transitions_to_manual_intervention() {
let mut engine = ExecutionEngine::new();
engine.transition(State::RollingBack);
engine.report_rollback_failure("rollback command timed out");
assert_eq!(engine.state(), State::ManualIntervention);
}
#[test] fn rollback_failure_does_not_retry_automatically() {
// Rollback failures are terminal — no auto-retry
}
#[test] fn rollback_timeout_kills_rollback_process_after_300s() {}
#[test] fn rollback_hanging_indefinitely_triggers_manual_intervention_after_timeout() {
let mut engine = ExecutionEngine::with_rollback_timeout(Duration::from_secs(5));
engine.transition(State::RollingBack);
// Simulate rollback that never completes
tokio::time::advance(Duration::from_secs(6)).await;
assert_eq!(engine.state(), State::ManualIntervention);
}
#[test] fn manual_intervention_state_sends_slack_alert_to_oncall() {}
#[test] fn manual_intervention_state_logs_full_context_to_audit() {}
```
### 11.7 Double Execution & Network Partition Tests
```rust
// pkg/executor/idempotency/tests.rs
#[tokio::test]
async fn agent_reconnect_after_partition_resyncs_already_executed_step() {
let stack = E2EStack::start().await;
let execution = stack.start_execution().await;
// Agent executes step successfully
stack.wait_for_step_state(&execution.id, &step_id, "executing").await;
// Network partition AFTER execution but BEFORE ACK
stack.partition_agent().await;
// Agent reconnects
stack.heal_partition().await;
// Engine must recognize step was already executed — no double execution
let step = stack.get_step(&execution.id, &step_id).await;
assert_eq!(step.execution_count, 1); // Exactly once
}
#[tokio::test]
async fn engine_does_not_re_send_command_after_agent_reconnect_if_step_completed() {}
#[tokio::test]
async fn engine_re_sends_command_if_agent_never_started_execution_before_partition() {}
```
### 11.8 Slack Payload Forgery Tests
```rust
// tests/integration/slack_security_test.rs
#[tokio::test]
async fn slack_approval_webhook_rejects_missing_signature() {
let resp = stack.api()
.post("/v1/run/slack/actions")
.json(&fixture_approval_payload())
// No X-Slack-Signature header
.send().await;
assert_eq!(resp.status(), 401);
}
#[tokio::test]
async fn slack_approval_webhook_rejects_invalid_signature() {
let resp = stack.api()
.post("/v1/run/slack/actions")
.header("X-Slack-Signature", "v0=invalid_hmac")
.header("X-Slack-Request-Timestamp", &now_timestamp())
.json(&fixture_approval_payload())
.send().await;
assert_eq!(resp.status(), 401);
}
#[tokio::test]
async fn slack_approval_webhook_rejects_replayed_timestamp() {
// Timestamp older than 5 minutes
let resp = stack.api()
.post("/v1/run/slack/actions")
.header("X-Slack-Signature", &valid_signature_for_old_timestamp())
.header("X-Slack-Request-Timestamp", &five_minutes_ago())
.json(&fixture_approval_payload())
.send().await;
assert_eq!(resp.status(), 401);
}
#[tokio::test]
async fn slack_approval_webhook_rejects_cross_tenant_approval() {
// Tenant A's user trying to approve Tenant B's execution
}
```
### 11.9 Audit Log Encryption Tests
```rust
// tests/integration/audit_encryption_test.rs
#[tokio::test]
async fn audit_log_command_field_is_encrypted_at_rest() {
let db = TestDb::start().await;
// Insert an audit event with a command
insert_audit_event(&db, "kubectl get pods").await;
// Read raw bytes from PostgreSQL — must NOT contain plaintext command
let raw = db.query_raw_bytes("SELECT command FROM audit_events LIMIT 1").await;
assert!(!String::from_utf8_lossy(&raw).contains("kubectl get pods"),
"Command stored in plaintext — must be encrypted");
}
#[tokio::test]
async fn audit_log_output_field_is_encrypted_at_rest() {
let db = TestDb::start().await;
insert_audit_event_with_output(&db, "sensitive output data").await;
let raw = db.query_raw_bytes("SELECT output FROM audit_events LIMIT 1").await;
assert!(!String::from_utf8_lossy(&raw).contains("sensitive output data"));
}
#[tokio::test]
async fn audit_log_decryption_requires_kms_key() {
// Verify the app role can decrypt using the KMS key
let db = TestDb::start().await;
insert_audit_event(&db, "kubectl get pods").await;
let decrypted = db.as_app_role()
.query("SELECT decrypt_command(command) FROM audit_events LIMIT 1").await;
assert_eq!(decrypted, "kubectl get pods");
}
```
### 11.10 gRPC Output Buffer Limits
```rust
// pkg/agent/streaming/tests.rs
#[tokio::test]
async fn agent_truncates_stdout_at_10mb() {
let sandbox = SandboxContainer::start("ubuntu:22.04").await;
let agent = TestAgent::connect_to(sandbox.socket_path()).await;
// Generate 50MB of output
let result = agent.execute("dd if=/dev/urandom bs=1M count=50 | base64").await.unwrap();
// Agent must truncate, not OOM
assert!(result.stdout.len() <= 10 * 1024 * 1024);
assert!(result.truncated);
}
#[tokio::test]
async fn agent_streams_output_in_chunks_not_buffered() {
// Verify output arrives incrementally, not all at once after completion
}
#[tokio::test]
async fn agent_memory_stays_under_256mb_during_large_output() {
// Memory profiling test — agent must not OOM on `cat /dev/urandom`
}
#[tokio::test]
async fn engine_handles_truncated_output_gracefully() {
// Engine receives truncated flag and logs warning
}
```
### 11.11 Parse SLA End-to-End Benchmark
```rust
// benches/parse_sla_bench.rs
#[tokio::test]
async fn parse_plus_classify_pipeline_under_5s_p95() {
let stack = E2EStack::start().await;
let mut latencies = vec![];
for _ in 0..100 {
let start = Instant::now();
stack.api()
.post("/v1/run/runbooks/parse-preview")
.json(&json!({ "raw_text": FIXTURE_RUNBOOK_10_STEPS }))
.send().await;
latencies.push(start.elapsed());
}
let p95 = percentile(&latencies, 95);
assert!(p95 < Duration::from_secs(5),
"Parse+Classify p95 latency: {:?} — exceeds 5s SLA", p95);
}
```
### 11.12 Updated Test Pyramid (Post-Review)
The Execution Engine ratio shifts from 80/15/5 to 60/30/10 per review recommendation:
| Component | Unit | Integration | E2E |
|-----------|------|-------------|-----|
| Safety Scanner | 80% | 15% | 5% |
| Merge Engine | 90% | 10% | 0% |
| Execution Engine | **60%** | **30%** | **10%** |
| Parser | 50% | 40% | 10% |
| Approval Workflow | 70% | 20% | 10% |
| Audit Trail | 60% | 35% | 5% |
| Agent | 50% | 35% | 15% |
| Dashboard API | 40% | 50% | 10% |
*End of Review Remediation Addendum*