# dd0c/run - Test Architecture Review **Reviewer:** Max Mayfield (Senior TDD Consultant) **Status:** Requires Remediation 🔴 Look, this is the highest-risk product in the suite. You are literally executing commands in production environments. The test architecture is a solid start, and the "Safety-First TDD" mandate is great in theory, but there are massive blind spots here. If this ships as-is, you're going to have an existential incident. Here is the rigorous breakdown of where this architecture falls short and what needs fixing immediately. ## 1. Coverage Gaps Per Epic You've focused heavily on the core state machine but completely ghosted several critical epics: * **Epic 3 (Execution Engine):** Story 3.4 (Divergence Analysis) is completely missing. No tests for the post-execution analyzer comparing prescribed vs. executed commands or flagging unlisted actions. * **Epic 5 (Audit Trail):** Story 5.3 (Compliance Export) is absent. Where are the tests for generating the SOC 2 PDF/CSV exports and verifying the S3 links? * **Epic 6 (Dashboard API):** Story 6.4 (Classification Query API) requires rate-limiting to 30 req/min per tenant. There are zero rate-limiting integration tests specified. * **Epic 7 (Dashboard UI):** SPA testing is totally ignored. You need Cypress or Playwright tests for the "5-second wow moment" parse preview, Trust Level visualizations, and MTTR dashboard. * **Epic 9 (Onboarding & PLG):** The entire self-serve tenant provisioning, free-tier limit enforcement (5 runbooks/50 executions), and Agent installation snippet generation lack test coverage. ## 2. "Safety-First" TDD Enforcement The policy is strong (Canary suite, red-safety, 95% threshold), but it’s overly centralized on the SaaS side. * **The Gap:** The Agent runs in customer VPCs (untrusted territory). If the gRPC payload is intercepted or the API gateway is compromised, the SaaS scanner means nothing. You mentioned the Agent-Side Scanner in the thresholds, but there is no explicit TDD mandate for testing Agent-side deterministic blocking, binary integrity, or defense against payload tampering. ## 3. Test Pyramid Evaluation Your execution engine ratio is `80% Unit, 15% Integration, 5% E2E`. * **The Reality Check:** That unit ratio is too high for the *orchestration* layer. The state machine in-memory is pure logic, but distributed state is where things break. You need to shift the Execution Engine to at least `60% Unit, 30% Integration, 10% E2E`. You need more integration tests proving state persistence during RDS failovers, Redis evictions, and gRPC disconnects. Pure unit tests won't catch a distributed race condition. ## 4. Anti-Patterns & Sandboxing You specified an Alpine DinD container for sandboxed execution tests. This is a massive anti-pattern. * **Why it's bad:** Alpine doesn't reflect real-world AWS AMIs, Ubuntu instances, or RBAC-restricted Kubernetes environments. Testing `echo $(rm -rf /)` in Alpine proves nothing about how your agent handles missing binaries, shell variations (bash vs zsh vs sh), or K8s permission denied errors. * **The Fix:** Sandbox tests must use realistic targets: Mocked EC2 AMIs, RBAC-configured K3s clusters (not just a blanket k3s mock), and varying user privileges (root vs. limited user). ## 5. Transparent Factory Gaps You hit most of the tenets, but you missed some critical enforcement mechanics: * **Atomic Flagging:** You test the 48-hour bake logic, but does the CI script *actually* fail the build if a destructive flag bypasses it? You need an integration test against the CI validation script itself. * **Elastic Schema:** You test that app roles can't `UPDATE`/`DELETE` the audit log. But Semantic Observability requires "encrypted audit logs." There are **zero tests** verifying data-at-rest encryption (KMS or pg_crypto) for the execution logs. * **Configurable Autonomy:** You test the Redis logic for Panic Mode, but Story 10.5 requires a webhook endpoint `POST /admin/panic` that works in <1s. You must test the *entire* path from HTTP request to gRPC stream pause, not just the Redis key state. ## 6. Performance Gaps * **Parse SLA:** Epic 1 dictates a 3.5s SLA for LLM extraction, but your performance benchmarks only cover the Rust normalizer (<10ms). You need an end-to-end integration benchmark proving the 5s total SLA (Parser + Classifier) using recorded LLM latency percentiles. * **Agent Output Buffering:** You test Slack rate limiting, but what if a user runs `cat /var/log/syslog` and dumps 500MB of text? There are no memory profiling tests or truncation enforcement tests on the Agent side before it attempts to stream over gRPC. ## 7. Missing Threat Scenarios Your chaos and security tests are missing these highly probable vectors: * **Advanced Command Injection:** Testing `echo $(rm)` is elementary. Where are the tests for `; rm -rf /`, `| rm -rf /`, backticks, and newline injections? * **Privilege Escalation:** You block `sudo`, but what if the Agent is run as root by the customer? Tests must cover execution under different user contexts to prevent horizontal escalation. * **Rollback Failures:** You test mid-execution failure triggering a rollback. What happens if the *rollback command itself* fails or hangs indefinitely? You need a test for nested/fatal failure states. * **Double Execution (Partial Failure):** The Agent executes a step successfully but the network partitions *before* it sends the success ACK to the engine. Your idempotency test covers duplicate IDs, but does it handle agent reconnections re-syncing an already-executed state? * **Slack Payload Forgery:** You test that approval timeouts mark as stalled, but what if someone curls the Slack approval webhook directly with a forged payload? You must test Slack signature verification. ## 8. Top 5 Recommendations 1. **Mandate Agent-Side Threat Testing:** The Agent is in a zero-trust environment. Write explicit E2E tests for agent binary tampering, payload forgery, and local fallback scanning if the SaaS connection drops. 2. **Upgrade the Execution Sandbox:** Ditch the plain Alpine container. Implement a matrix of execution sandboxes (Ubuntu, Amazon Linux 2023, RBAC-restricted K3s) and test execution with non-root users. 3. **Close the Epic Coverage Blackholes:** Write test architecture specs for Divergence Analysis (Epic 3.4), Compliance Exports (Epic 5.3), API Rate Limiting (Epic 6.4), and PLG Onboarding limits (Epic 9). 4. **Test the Unknowns of Rollbacks & Double Executions:** Add explicit chaos tests for when rollback commands fail, and simulate network drops occurring exactly between command completion and gRPC acknowledgment. 5. **Enforce Cryptographic Audit Tests:** Add tests verifying that raw commands and outputs in the PostgreSQL audit log are actually encrypted at rest using pg_crypto or AWS KMS, as mandated by the observability tenets. Fix these gaps before you write a single line of execution code, or you're asking for a breach.