1730 lines
65 KiB
Markdown
1730 lines
65 KiB
Markdown
|
|
# dd0c/drift — Test Architecture & TDD Strategy
|
|||
|
|
**Author:** Max Mayfield (Test Architect)
|
|||
|
|
**Date:** February 28, 2026
|
|||
|
|
**Product:** dd0c/drift — IaC Drift Detection & Remediation SaaS
|
|||
|
|
**Status:** Test Architecture Design Document
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 1: Testing Philosophy & TDD Workflow
|
|||
|
|
|
|||
|
|
### 1.1 Core Philosophy
|
|||
|
|
|
|||
|
|
dd0c/drift is a security-critical product. A missed drift event or a false positive in the remediation engine can cause real infrastructure damage. The testing strategy reflects this: **correctness is non-negotiable, speed is a constraint, not a goal**.
|
|||
|
|
|
|||
|
|
Three principles guide every testing decision:
|
|||
|
|
|
|||
|
|
1. **Tests are the first customer.** Before writing a single line of production code, the test defines the contract. If you can't write a test for it, you don't understand the requirement well enough to build it.
|
|||
|
|
|
|||
|
|
2. **The secret scrubber and RLS are untouchable.** These two components — the agent's secret scrubbing engine and the SaaS's PostgreSQL Row-Level Security — have 100% test coverage requirements. No exceptions. A bug in either is a trust-destroying incident.
|
|||
|
|
|
|||
|
|
3. **Drift detection logic is pure functions.** The comparator, scorer, and classifier take inputs and return outputs with no side effects. This makes them trivially testable and means the test suite runs fast even at high coverage.
|
|||
|
|
|
|||
|
|
### 1.2 Red-Green-Refactor Adapted for dd0c/drift
|
|||
|
|
|
|||
|
|
The standard TDD cycle applies, but with domain-specific adaptations:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
RED → Write a failing test that describes a drift scenario
|
|||
|
|
e.g., "security group ingress rule added to 0.0.0.0/0 → severity: critical"
|
|||
|
|
|
|||
|
|
GREEN → Write the minimum code to make it pass
|
|||
|
|
e.g., add the classification rule to the YAML config + evaluator
|
|||
|
|
|
|||
|
|
REFACTOR → Clean up without breaking the test
|
|||
|
|
e.g., extract the CIDR check into a reusable predicate
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**When to write tests first (strict TDD):**
|
|||
|
|
- All drift detection logic (comparator, classifier, scorer)
|
|||
|
|
- Secret scrubbing engine — write tests for every secret pattern BEFORE writing the regex
|
|||
|
|
- API request/response contracts — write schema validation tests before implementing handlers
|
|||
|
|
- Remediation policy evaluation — write policy enforcement tests before the engine
|
|||
|
|
- Feature flag evaluation logic (Epic 10.1)
|
|||
|
|
|
|||
|
|
**When integration tests lead (test-after acceptable):**
|
|||
|
|
- AWS SDK wiring (agent ↔ EC2/IAM/RDS describe calls) — mock the SDK first, integration test confirms the wiring
|
|||
|
|
- DynamoDB persistence — write the schema, then integration tests against DynamoDB Local
|
|||
|
|
- Slack Block Kit formatting — render the block, visually verify, then snapshot test
|
|||
|
|
- CI/CD pipeline configuration — validate by running it, not by unit testing YAML
|
|||
|
|
|
|||
|
|
**When E2E tests lead:**
|
|||
|
|
- Onboarding flow (`drift init` → `drift check` → Slack alert) — the happy path must work end-to-end before any unit tests are written for the CLI
|
|||
|
|
- Remediation round-trip (Slack button → agent apply → resolution) — too many moving parts to unit test first
|
|||
|
|
|
|||
|
|
### 1.3 Test Naming Conventions
|
|||
|
|
|
|||
|
|
**Go (Agent, State Manager):**
|
|||
|
|
```go
|
|||
|
|
// Pattern: Test<Component>_<Scenario>_<ExpectedOutcome>
|
|||
|
|
func TestDriftComparator_SecurityGroupIngressAdded_ReturnsCriticalDrift(t *testing.T)
|
|||
|
|
func TestSecretScrubber_PasswordAttribute_ReturnsRedacted(t *testing.T)
|
|||
|
|
func TestStateParser_V4Format_ExtractsManagedResources(t *testing.T)
|
|||
|
|
|
|||
|
|
// Table-driven test naming: use descriptive name field
|
|||
|
|
tests := []struct {
|
|||
|
|
name string
|
|||
|
|
// ...
|
|||
|
|
}{
|
|||
|
|
{name: "security group with public CIDR → critical"},
|
|||
|
|
{name: "tag-only change → low severity"},
|
|||
|
|
{name: "IAM policy document changed → high severity"},
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**TypeScript (SaaS, Dashboard API):**
|
|||
|
|
```typescript
|
|||
|
|
// Pattern: describe("<Component>") > describe("<method/scenario>") > it("<expected behavior>")
|
|||
|
|
describe("DriftClassifier", () => {
|
|||
|
|
describe("classify()", () => {
|
|||
|
|
it("returns critical severity for security group with 0.0.0.0/0 ingress")
|
|||
|
|
it("returns low severity for tag-only changes")
|
|||
|
|
it("falls back to medium/configuration for unmatched resource types")
|
|||
|
|
})
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Integration & E2E:**
|
|||
|
|
```
|
|||
|
|
// File naming: <component>.integration_test.go / <flow>.e2e_test.go
|
|||
|
|
agent_dynamodb_integration_test.go
|
|||
|
|
drift_report_ingestion_integration_test.go
|
|||
|
|
onboarding_flow_e2e_test.go
|
|||
|
|
remediation_roundtrip_e2e_test.go
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 2: Test Pyramid
|
|||
|
|
|
|||
|
|
### 2.1 Recommended Ratio
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────┐
|
|||
|
|
│ E2E / Smoke │ ~10% (~50 tests)
|
|||
|
|
│ (LocalStack, │
|
|||
|
|
│ real flows) │
|
|||
|
|
├─────────────────┤
|
|||
|
|
│ Integration │ ~20% (~100 tests)
|
|||
|
|
│ (boundaries, │
|
|||
|
|
│ real DBs) │
|
|||
|
|
├─────────────────┤
|
|||
|
|
│ Unit Tests │ ~70% (~350 tests)
|
|||
|
|
│ (pure logic, │
|
|||
|
|
│ fast, mocked) │
|
|||
|
|
└─────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Target: **~500 tests total at V1 launch**, growing to ~1,000 by month 3.
|
|||
|
|
|
|||
|
|
### 2.2 Unit Test Targets (Per Component)
|
|||
|
|
|
|||
|
|
| Component | Language | Target Coverage | Key Test Count |
|
|||
|
|
|---|---|---|---|
|
|||
|
|
| State Parser (TF v4) | Go | 95% | ~40 tests |
|
|||
|
|
| Drift Comparator | Go | 95% | ~60 tests |
|
|||
|
|
| Drift Classifier | Go | 90% | ~30 tests |
|
|||
|
|
| Secret Scrubber | Go | 100% | ~50 tests |
|
|||
|
|
| Drift Scorer | Go/TS | 90% | ~20 tests |
|
|||
|
|
| Event Processor (ingestion) | TypeScript | 85% | ~30 tests |
|
|||
|
|
| Notification Formatter | TypeScript | 85% | ~25 tests |
|
|||
|
|
| Remediation Engine | TypeScript | 85% | ~30 tests |
|
|||
|
|
| Dashboard API handlers | TypeScript | 80% | ~40 tests |
|
|||
|
|
| Feature Flag evaluator | Go | 90% | ~20 tests |
|
|||
|
|
| Policy engine | Go/TS | 95% | ~30 tests |
|
|||
|
|
|
|||
|
|
### 2.3 Integration Test Boundaries
|
|||
|
|
|
|||
|
|
| Boundary | Test Type | Infrastructure |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Agent ↔ AWS EC2/IAM/RDS APIs | Integration | LocalStack or recorded HTTP fixtures |
|
|||
|
|
| Agent ↔ SaaS API (drift report POST) | Integration | Real HTTP server (test instance) |
|
|||
|
|
| Event Processor ↔ DynamoDB | Integration | DynamoDB Local (Testcontainers) |
|
|||
|
|
| Event Processor ↔ PostgreSQL | Integration | PostgreSQL (Testcontainers) |
|
|||
|
|
| Event Processor ↔ SQS | Integration | LocalStack SQS |
|
|||
|
|
| Notification Service ↔ Slack API | Integration | Slack API mock server |
|
|||
|
|
| Remediation Engine ↔ Agent | Integration | Agent stub server |
|
|||
|
|
| Dashboard API ↔ PostgreSQL (RLS) | Integration | PostgreSQL (Testcontainers) — multi-tenant isolation tests |
|
|||
|
|
|
|||
|
|
### 2.4 E2E / Smoke Test Scenarios
|
|||
|
|
|
|||
|
|
| Scenario | Priority | Infrastructure |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Install agent → run `drift check` → detect drift → Slack alert | P0 | LocalStack + Slack mock |
|
|||
|
|
| Agent heartbeat → SaaS records it → dashboard shows "online" | P0 | LocalStack |
|
|||
|
|
| Click [Revert] in Slack → agent executes terraform apply → event resolved | P0 | LocalStack + agent stub |
|
|||
|
|
| Click [Accept] → GitHub PR created with code patch | P1 | GitHub API mock |
|
|||
|
|
| Free tier stack limit enforcement (register 2nd stack → 403) | P1 | Real SaaS test env |
|
|||
|
|
| Secret scrubbing end-to-end (state with password → report has [REDACTED]) | P0 | Agent + SaaS test env |
|
|||
|
|
| Multi-tenant isolation (org A cannot see org B drift events) | P0 | PostgreSQL + RLS |
|
|||
|
|
| Agent offline detection (no heartbeat → Slack "agent offline" alert) | P1 | LocalStack |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 3: Unit Test Strategy (Per Component)
|
|||
|
|
|
|||
|
|
### 3.1 State Parser (Go — Epic 1, Story 1.1)
|
|||
|
|
|
|||
|
|
**What to test:**
|
|||
|
|
- Correct extraction of `managed` resources (skip `data` sources)
|
|||
|
|
- Module-prefixed addresses (`module.vpc.aws_security_group.api`)
|
|||
|
|
- Multi-instance resources (`aws_instance.worker[0]`, `aws_instance.worker[1]`)
|
|||
|
|
- Graceful handling of unknown/future resource types
|
|||
|
|
- Rejection of non-v4 state format versions
|
|||
|
|
- Empty state file (zero resources)
|
|||
|
|
- State file with only data sources (zero managed resources)
|
|||
|
|
- `private` field stripped from all instances before returning
|
|||
|
|
|
|||
|
|
**Key test cases:**
|
|||
|
|
```go
|
|||
|
|
func TestStateParser_V4Format_ExtractsManagedResources(t *testing.T) {}
|
|||
|
|
func TestStateParser_DataSourceResources_AreExcluded(t *testing.T) {}
|
|||
|
|
func TestStateParser_ModulePrefixedAddress_ParsedCorrectly(t *testing.T) {}
|
|||
|
|
func TestStateParser_MultiInstanceResource_AllInstancesExtracted(t *testing.T) {}
|
|||
|
|
func TestStateParser_UnsupportedVersion_ReturnsError(t *testing.T) {}
|
|||
|
|
func TestStateParser_EmptyState_ReturnsEmptyResourceList(t *testing.T) {}
|
|||
|
|
func TestStateParser_PrivateField_IsStrippedFromAttributes(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Mocking strategy:** None — pure function over a JSON byte slice. Fixtures in `testdata/states/`.
|
|||
|
|
|
|||
|
|
**Table-driven pattern:**
|
|||
|
|
```go
|
|||
|
|
func TestStateParser_ResourceExtraction(t *testing.T) {
|
|||
|
|
tests := []struct {
|
|||
|
|
name string
|
|||
|
|
fixtureFile string
|
|||
|
|
wantCount int
|
|||
|
|
wantAddresses []string
|
|||
|
|
wantErr bool
|
|||
|
|
}{
|
|||
|
|
{name: "single managed resource", fixtureFile: "testdata/states/single_sg.tfstate", wantCount: 1},
|
|||
|
|
{name: "state v3 format", fixtureFile: "testdata/states/v3_format.tfstate", wantErr: true},
|
|||
|
|
{name: "module-nested resources", fixtureFile: "testdata/states/module_nested.tfstate", wantCount: 5},
|
|||
|
|
}
|
|||
|
|
for _, tt := range tests {
|
|||
|
|
t.Run(tt.name, func(t *testing.T) {
|
|||
|
|
data, _ := os.ReadFile(tt.fixtureFile)
|
|||
|
|
got, err := ParseState(data)
|
|||
|
|
if tt.wantErr { require.Error(t, err); return }
|
|||
|
|
require.NoError(t, err)
|
|||
|
|
assert.Len(t, got.Resources, tt.wantCount)
|
|||
|
|
})
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.2 Drift Comparator (Go — Epic 1, Story 1.3)
|
|||
|
|
|
|||
|
|
**What to test:**
|
|||
|
|
- Attribute added in cloud (not in state) → drift detected
|
|||
|
|
- Attribute removed from cloud (in state, not in cloud) → drift detected
|
|||
|
|
- Attribute value changed → correct old/new values in diff
|
|||
|
|
- Attribute unchanged → no drift
|
|||
|
|
- Nested attribute changes (ingress rules array)
|
|||
|
|
- Ignored attributes (AWS-generated IDs, timestamps, computed fields) → no drift
|
|||
|
|
- Null vs. empty string → treated as no drift
|
|||
|
|
- Boolean drift (`true` → `false`)
|
|||
|
|
- Numeric drift (port numbers, counts)
|
|||
|
|
|
|||
|
|
**Key test cases:**
|
|||
|
|
```go
|
|||
|
|
func TestDriftComparator_AttributeAdded_ReturnsDrift(t *testing.T) {}
|
|||
|
|
func TestDriftComparator_AttributeRemoved_ReturnsDrift(t *testing.T) {}
|
|||
|
|
func TestDriftComparator_AttributeUnchanged_ReturnsNoDrift(t *testing.T) {}
|
|||
|
|
func TestDriftComparator_NestedIngressRuleAdded_ReturnsDrift(t *testing.T) {}
|
|||
|
|
func TestDriftComparator_IgnoredAttribute_ReturnsNoDrift(t *testing.T) {}
|
|||
|
|
func TestDriftComparator_NullVsEmptyString_TreatedAsNoDrift(t *testing.T) {}
|
|||
|
|
func TestDriftComparator_ComputedTimestamp_IsIgnored(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Mocking strategy:** None — pure function. State and cloud attributes are both `map[string]interface{}`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.3 Drift Classifier (Go — Epic 3, Story 3.2)
|
|||
|
|
|
|||
|
|
**What to test:**
|
|||
|
|
- Security group with `0.0.0.0/0` ingress → `critical/security`
|
|||
|
|
- IAM role policy document changed → `high/security`
|
|||
|
|
- RDS parameter group changed → `high/configuration`
|
|||
|
|
- Tag-only change → `low/tags`
|
|||
|
|
- Unmatched resource type → `medium/configuration` (default fallback)
|
|||
|
|
- Customer override rules take precedence over defaults
|
|||
|
|
- Rule evaluation order (first match wins)
|
|||
|
|
- Invalid YAML config → error at startup, not at classification time
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestDriftClassifier_PublicCIDRIngress_ReturnsCriticalSecurity(t *testing.T) {}
|
|||
|
|
func TestDriftClassifier_IAMPolicyChanged_ReturnsHighSecurity(t *testing.T) {}
|
|||
|
|
func TestDriftClassifier_TagOnlyChange_ReturnsLowTags(t *testing.T) {}
|
|||
|
|
func TestDriftClassifier_UnmatchedResource_ReturnsMediumConfiguration(t *testing.T) {}
|
|||
|
|
func TestDriftClassifier_CustomerOverride_TakesPrecedence(t *testing.T) {}
|
|||
|
|
func TestDriftClassifier_InvalidYAML_ReturnsErrorOnLoad(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.4 Secret Scrubber (Go — Epic 1, Story 1.4) — **100% Coverage Required**
|
|||
|
|
|
|||
|
|
Every secret pattern is a security requirement. No table-driven shortcuts — each pattern gets its own named test.
|
|||
|
|
|
|||
|
|
**Key test cases:**
|
|||
|
|
```go
|
|||
|
|
func TestSecretScrubber_PasswordKey_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_SecretKey_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_TokenKey_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_PrivateKeyKey_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_ConnectionStringKey_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_AWSAccessKeyPattern_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_PostgresURIPattern_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_PEMPrivateKeyPattern_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_JWTTokenPattern_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_SensitiveFlag_RedactsValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_PrivateField_IsStrippedEntirely(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_NonSensitiveAttribute_PreservesValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_NestedSensitiveKey_RedactsNestedValue(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_ArrayWithSensitiveValues_AllElementsChecked(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_RedactedPlaceholder_IsLiteralREDACTEDString(t *testing.T) {}
|
|||
|
|
func TestSecretScrubber_DiffStructureIntact_AfterScrubbing(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.5 Drift Scorer (TypeScript — Epic 3, Story 3.4)
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
describe("DriftScorer", () => {
|
|||
|
|
it("returns 100 for a stack with no drift")
|
|||
|
|
it("applies heavy penalty for critical severity drift")
|
|||
|
|
it("applies minimal penalty for low severity drift")
|
|||
|
|
it("produces weighted score for mixed severity drift")
|
|||
|
|
it("recalculates upward when drift event is resolved")
|
|||
|
|
it("handles zero-resource stack without divide-by-zero")
|
|||
|
|
it("caps score at 0 for catastrophically drifted stacks")
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.6 Event Processor — Ingestion & Validation (TypeScript — Epic 3, Story 3.1)
|
|||
|
|
|
|||
|
|
**What to test:**
|
|||
|
|
- Valid drift report → accepted, returns 202
|
|||
|
|
- Missing `stack_id` → 400 `DRIFT_REPORT_INVALID`
|
|||
|
|
- Duplicate `report_id` → 409 `DRIFT_REPORT_DUPLICATE`
|
|||
|
|
- Payload > 1MB → 400 `DRIFT_REPORT_TOO_LARGE`
|
|||
|
|
- Invalid severity value → 400
|
|||
|
|
- Unknown agent ID → 404 `AGENT_NOT_FOUND`
|
|||
|
|
- Revoked agent API key → 403 `AGENT_REVOKED`
|
|||
|
|
- SQS message group ID equals `stack_id`
|
|||
|
|
- SQS deduplication ID equals `report_id`
|
|||
|
|
|
|||
|
|
**Mocking strategy:** Mock `@aws-sdk/client-sqs`. Mock PostgreSQL pool. Use `zod` schema directly in tests.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.7 Notification Formatter (TypeScript — Epic 4, Story 4.1)
|
|||
|
|
|
|||
|
|
**What to test:**
|
|||
|
|
- Critical drift → header `🔴 Critical Drift Detected`
|
|||
|
|
- Diff block truncated at Slack's 3000-char block limit
|
|||
|
|
- CloudTrail attribution present → "Changed by: <IAM ARN>"
|
|||
|
|
- CloudTrail attribution absent → "Changed by: Unknown (scheduled scan)"
|
|||
|
|
- All four action buttons present (`drift_revert`, `drift_accept`, `drift_snooze`, `drift_assign`)
|
|||
|
|
- `[REDACTED]` values rendered as-is
|
|||
|
|
- Low severity digest format → no `[Revert]` button
|
|||
|
|
|
|||
|
|
**Mocking strategy:** None — pure function. Use snapshot tests for Block Kit JSON output.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.8 Remediation Engine (TypeScript — Epic 7, Stories 7.1–7.2)
|
|||
|
|
|
|||
|
|
**What to test:**
|
|||
|
|
- Revert: generates correct `terraform apply -target=<address>` command
|
|||
|
|
- Blast radius: resource with 3 dependents → `blast_radius = 3`
|
|||
|
|
- Blast radius: isolated resource → `blast_radius = 0`
|
|||
|
|
- `require-approval` policy → status `pending`, not `executing`
|
|||
|
|
- `auto-revert` policy for critical → executes without approval gate
|
|||
|
|
- Accept: generates correct code patch for changed attribute
|
|||
|
|
- Accept: creates PR with correct branch name and description
|
|||
|
|
- Agent heartbeat stale → `REMEDIATION_AGENT_OFFLINE`
|
|||
|
|
- Concurrent revert on same resource → `REMEDIATION_IN_PROGRESS`
|
|||
|
|
- Panic mode active → all remediation blocked
|
|||
|
|
|
|||
|
|
**Mocking strategy:** Mock agent command dispatcher. Mock GitHub API client (`@octokit/rest`). Mock PostgreSQL for plan persistence.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.9 Feature Flag Evaluator (Go — Epic 10, Story 10.1)
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestFeatureFlag_EnabledFlag_ExecutesFeature(t *testing.T) {}
|
|||
|
|
func TestFeatureFlag_DisabledFlag_SkipsFeatureWithNoSideEffects(t *testing.T) {}
|
|||
|
|
func TestFeatureFlag_UnknownFlag_ReturnsDefaultOff(t *testing.T) {}
|
|||
|
|
func TestFeatureFlag_EnvVarOverride_TakesPrecedenceOverJSONFile(t *testing.T) {}
|
|||
|
|
func TestFeatureFlag_CircuitBreaker_DisablesFlagOnFalsePositiveSpike(t *testing.T) {}
|
|||
|
|
func TestFeatureFlag_ExpiredTTL_CILintDetectsIt(t *testing.T) {} // lint test, not runtime
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3.10 Policy Engine (Go — Epic 10, Story 10.5)
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestPolicyEngine_StrictMode_BlocksAllRemediation(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_AuditMode_ExecutesAndLogs(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_CustomerMoreRestrictive_CustomerPolicyWins(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_CustomerLessRestrictive_SystemPolicyWins(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_PanicMode_HaltsAllScans(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_PanicMode_SendsSingleNotification(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_PolicyDecision_IsLogged(t *testing.T) {}
|
|||
|
|
func TestPolicyEngine_FileReload_NewPolicyTakesEffect(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 4: Integration Test Strategy
|
|||
|
|
|
|||
|
|
### 4.1 Agent ↔ Cloud Provider APIs
|
|||
|
|
|
|||
|
|
**Goal:** Verify the agent correctly maps Terraform resource types to AWS describe calls and handles API responses.
|
|||
|
|
|
|||
|
|
**Approach:** Use recorded HTTP fixtures (via `go-vcr` or `httpmock`) for unit-speed integration tests. Use LocalStack for full integration runs in CI.
|
|||
|
|
|
|||
|
|
**Key test cases:**
|
|||
|
|
```go
|
|||
|
|
// pkg/agent/integration/aws_polling_test.go
|
|||
|
|
func TestAWSPolling_SecurityGroup_MapsToDescribeSecurityGroups(t *testing.T) {}
|
|||
|
|
func TestAWSPolling_IAMRole_MapsToGetRole(t *testing.T) {}
|
|||
|
|
func TestAWSPolling_RDSInstance_MapsToDescribeDBInstances(t *testing.T) {}
|
|||
|
|
func TestAWSPolling_ResourceNotFound_ReturnsUnknownDriftState(t *testing.T) {}
|
|||
|
|
func TestAWSPolling_RateLimitResponse_RetriesWithBackoff(t *testing.T) {}
|
|||
|
|
func TestAWSPolling_CredentialError_ReturnsDescriptiveError(t *testing.T) {}
|
|||
|
|
func TestAWSPolling_RegionScopedRequest_UsesConfiguredRegion(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Fixture strategy:**
|
|||
|
|
```
|
|||
|
|
testdata/
|
|||
|
|
aws-responses/
|
|||
|
|
ec2_describe_security_groups_clean.json # cloud matches state
|
|||
|
|
ec2_describe_security_groups_drifted.json # ingress rule added
|
|||
|
|
iam_get_role_policy_changed.json
|
|||
|
|
rds_describe_db_instances_clean.json
|
|||
|
|
ec2_describe_security_groups_not_found.json # resource deleted from cloud
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.2 Agent ↔ SaaS API (Drift Report Submission)
|
|||
|
|
|
|||
|
|
**Goal:** Verify the agent correctly serializes and transmits `DriftReport` payloads, handles auth errors, and respects rate limit responses.
|
|||
|
|
|
|||
|
|
**Setup:** Spin up a lightweight HTTP test server in Go (`httptest.NewServer`) that mimics the SaaS ingestion endpoint.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestTransmitter_ValidReport_Returns202(t *testing.T) {}
|
|||
|
|
func TestTransmitter_InvalidAPIKey_Returns401AndStopsRetrying(t *testing.T) {}
|
|||
|
|
func TestTransmitter_RevokedAPIKey_Returns403AndStopsRetrying(t *testing.T) {}
|
|||
|
|
func TestTransmitter_RateLimited_RespectsRetryAfterHeader(t *testing.T) {}
|
|||
|
|
func TestTransmitter_ServerError_RetriesWithExponentialBackoff(t *testing.T) {}
|
|||
|
|
func TestTransmitter_PayloadCompressed_WhenOverThreshold(t *testing.T) {}
|
|||
|
|
func TestTransmitter_mTLSCertPresented_OnEveryRequest(t *testing.T) {}
|
|||
|
|
func TestTransmitter_NetworkTimeout_RetriesUpToMaxAttempts(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.3 Event Processor ↔ DynamoDB (Testcontainers)
|
|||
|
|
|
|||
|
|
**Goal:** Verify event sourcing writes, TTL attribute setting, and checksum generation against a real DynamoDB Local instance.
|
|||
|
|
|
|||
|
|
**Setup:**
|
|||
|
|
```go
|
|||
|
|
// Use testcontainers-go to spin up DynamoDB Local
|
|||
|
|
func setupDynamoDBLocal(t *testing.T) *dynamodb.Client {
|
|||
|
|
ctx := context.Background()
|
|||
|
|
container, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
|
|||
|
|
ContainerRequest: testcontainers.ContainerRequest{
|
|||
|
|
Image: "amazon/dynamodb-local:latest",
|
|||
|
|
ExposedPorts: []string{"8000/tcp"},
|
|||
|
|
WaitingFor: wait.ForListeningPort("8000/tcp"),
|
|||
|
|
},
|
|||
|
|
Started: true,
|
|||
|
|
})
|
|||
|
|
require.NoError(t, err)
|
|||
|
|
t.Cleanup(func() { container.Terminate(ctx) })
|
|||
|
|
// ... return configured client
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key test cases:**
|
|||
|
|
```go
|
|||
|
|
func TestDynamoDBEventStore_AppendDriftEvent_PersistsWithCorrectPK(t *testing.T) {}
|
|||
|
|
func TestDynamoDBEventStore_AppendDriftEvent_SetsChecksumAttribute(t *testing.T) {}
|
|||
|
|
func TestDynamoDBEventStore_AppendDriftEvent_SetsTTLPerTier(t *testing.T) {}
|
|||
|
|
func TestDynamoDBEventStore_QueryByStackID_ReturnsChronologicalOrder(t *testing.T) {}
|
|||
|
|
func TestDynamoDBEventStore_DuplicateEventID_IsIdempotent(t *testing.T) {}
|
|||
|
|
func TestDynamoDBEventStore_FreeTier_TTL90Days(t *testing.T) {}
|
|||
|
|
func TestDynamoDBEventStore_EnterpriseTier_TTL7Years(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.4 Event Processor ↔ PostgreSQL (Testcontainers + RLS)
|
|||
|
|
|
|||
|
|
**Goal:** Verify multi-tenant data isolation via Row-Level Security. This is the most critical integration test suite.
|
|||
|
|
|
|||
|
|
**Setup:**
|
|||
|
|
```typescript
|
|||
|
|
// Use testcontainers for Node.js to spin up PostgreSQL 16
|
|||
|
|
// Apply full schema migrations before each test suite
|
|||
|
|
// Create two test orgs: orgA and orgB
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key test cases:**
|
|||
|
|
```typescript
|
|||
|
|
describe("PostgreSQL RLS Integration", () => {
|
|||
|
|
it("org A cannot read org B drift events via direct query")
|
|||
|
|
it("org A cannot read org B stacks via direct query")
|
|||
|
|
it("setting app.current_org_id scopes all queries correctly")
|
|||
|
|
it("missing app.current_org_id returns zero rows (not an error)")
|
|||
|
|
it("drift event insert without org_id fails FK constraint")
|
|||
|
|
it("drift score update is scoped to correct org's stack")
|
|||
|
|
it("concurrent inserts from two orgs do not cross-contaminate")
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Critical test — cross-tenant isolation:**
|
|||
|
|
```typescript
|
|||
|
|
it("org A cannot read org B drift events", async () => {
|
|||
|
|
// Insert drift event for orgB
|
|||
|
|
await insertDriftEvent(orgBPool, orgBEvent)
|
|||
|
|
|
|||
|
|
// Query as orgA — should return empty, not orgB's data
|
|||
|
|
await orgAPool.query("SET app.current_org_id = $1", [orgA.id])
|
|||
|
|
const result = await orgAPool.query("SELECT * FROM drift_events")
|
|||
|
|
expect(result.rows).toHaveLength(0)
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.5 IaC State File Parsing — Multi-Backend Integration
|
|||
|
|
|
|||
|
|
**Goal:** Verify the agent correctly reads state files from different backends (S3, local file, Terraform Cloud).
|
|||
|
|
|
|||
|
|
**Setup:** LocalStack S3 for S3 backend tests. Real file system for local backend. WireMock for Terraform Cloud API.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestStateBackend_S3_ReadsStateFileFromBucket(t *testing.T) {}
|
|||
|
|
func TestStateBackend_S3_HandlesVersionedBucket(t *testing.T) {}
|
|||
|
|
func TestStateBackend_LocalFile_ReadsFromFilesystem(t *testing.T) {}
|
|||
|
|
func TestStateBackend_TerraformCloud_AuthenticatesAndFetchesState(t *testing.T) {}
|
|||
|
|
func TestStateBackend_S3_AccessDenied_ReturnsDescriptiveError(t *testing.T) {}
|
|||
|
|
func TestStateBackend_S3_BucketNotFound_ReturnsDescriptiveError(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.6 Notification Service ↔ Slack API
|
|||
|
|
|
|||
|
|
**Goal:** Verify Slack message delivery, request signature validation, and interactive callback handling.
|
|||
|
|
|
|||
|
|
**Setup:** WireMock or a custom Go HTTP mock server simulating the Slack API.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
describe("Slack Integration", () => {
|
|||
|
|
it("delivers Block Kit message to configured channel")
|
|||
|
|
it("falls back to org default channel when stack channel not set")
|
|||
|
|
it("validates Slack request signature on interaction callbacks")
|
|||
|
|
it("rejects interaction callback with invalid signature → 401")
|
|||
|
|
it("updates original message after [Revert] button click")
|
|||
|
|
it("handles Slack API rate limit (429) with retry")
|
|||
|
|
it("handles Slack API 500 — logs error, does not crash Lambda")
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4.7 Terraform State File Parsing — Real Fixture Files
|
|||
|
|
|
|||
|
|
**Goal:** Verify the parser handles real-world Terraform state files from different provider versions and configurations.
|
|||
|
|
|
|||
|
|
Fixture files sourced from:
|
|||
|
|
- Terraform AWS provider v4.x, v5.x state outputs
|
|||
|
|
- OpenTofu state files (should be identical format)
|
|||
|
|
- State files with modules, count, for_each
|
|||
|
|
- State files with workspace prefixes
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestStateParser_RealWorldAWSProviderV5_ParsesCorrectly(t *testing.T) {}
|
|||
|
|
func TestStateParser_OpenTofuStateFile_ParsesCorrectly(t *testing.T) {}
|
|||
|
|
func TestStateParser_ForEachResources_AllInstancesExtracted(t *testing.T) {}
|
|||
|
|
func TestStateParser_WorkspacePrefixedState_ParsesCorrectly(t *testing.T) {}
|
|||
|
|
func TestStateParser_LargeStateFile_500Resources_CompletesUnder2Seconds(t *testing.T) {}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 5: E2E & Smoke Tests
|
|||
|
|
|
|||
|
|
### 5.1 Infrastructure Setup
|
|||
|
|
|
|||
|
|
All E2E tests run against LocalStack (AWS service simulation) and a real PostgreSQL instance. The test environment is defined as a Docker Compose stack:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# docker-compose.test.yml
|
|||
|
|
services:
|
|||
|
|
localstack:
|
|||
|
|
image: localstack/localstack:3.x
|
|||
|
|
environment:
|
|||
|
|
SERVICES: s3,sqs,dynamodb,iam,ec2,lambda,eventbridge
|
|||
|
|
DEBUG: 0
|
|||
|
|
ports:
|
|||
|
|
- "4566:4566"
|
|||
|
|
|
|||
|
|
postgres:
|
|||
|
|
image: postgres:16-alpine
|
|||
|
|
environment:
|
|||
|
|
POSTGRES_DB: drift_test
|
|||
|
|
POSTGRES_USER: drift
|
|||
|
|
POSTGRES_PASSWORD: test
|
|||
|
|
ports:
|
|||
|
|
- "5432:5432"
|
|||
|
|
|
|||
|
|
slack-mock:
|
|||
|
|
image: wiremock/wiremock:latest
|
|||
|
|
volumes:
|
|||
|
|
- ./testdata/wiremock/slack:/home/wiremock/mappings
|
|||
|
|
ports:
|
|||
|
|
- "8080:8080"
|
|||
|
|
|
|||
|
|
github-mock:
|
|||
|
|
image: wiremock/wiremock:latest
|
|||
|
|
volumes:
|
|||
|
|
- ./testdata/wiremock/github:/home/wiremock/mappings
|
|||
|
|
ports:
|
|||
|
|
- "8081:8080"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Synthetic drift generation:** A helper CLI tool (`testdata/tools/drift-injector`) modifies LocalStack EC2/IAM resources to simulate real drift scenarios without touching real AWS.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5.2 Critical User Journey: Install → Detect → Notify
|
|||
|
|
|
|||
|
|
**Journey:** Agent installed → `drift check` run → drift detected → Slack alert delivered
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// e2e/onboarding_flow_test.go
|
|||
|
|
func TestE2E_OnboardingFlow_InstallToFirstSlackAlert(t *testing.T) {
|
|||
|
|
// 1. Register org and agent via API
|
|||
|
|
org := createTestOrg(t)
|
|||
|
|
agent := registerAgent(t, org.APIKey)
|
|||
|
|
|
|||
|
|
// 2. Upload a Terraform state file to LocalStack S3
|
|||
|
|
uploadStateFixture(t, "testdata/states/prod_networking.tfstate", org.StateBucket)
|
|||
|
|
|
|||
|
|
// 3. Inject drift into LocalStack EC2 (add 0.0.0.0/0 ingress rule)
|
|||
|
|
injectSecurityGroupDrift(t, "sg-abc123")
|
|||
|
|
|
|||
|
|
// 4. Run drift check
|
|||
|
|
result := runDriftCheck(t, agent, org.StateBucket)
|
|||
|
|
require.Equal(t, 1, result.DriftedResourceCount)
|
|||
|
|
require.Equal(t, "critical", result.DriftedResources[0].Severity)
|
|||
|
|
|
|||
|
|
// 5. Verify Slack mock received the Block Kit message
|
|||
|
|
slackRequests := getSlackMockRequests(t)
|
|||
|
|
require.Len(t, slackRequests, 1)
|
|||
|
|
assert.Contains(t, slackRequests[0].Body, "Critical Drift Detected")
|
|||
|
|
assert.Contains(t, slackRequests[0].Body, "aws_security_group")
|
|||
|
|
|
|||
|
|
// 6. Verify drift event persisted in PostgreSQL
|
|||
|
|
event := getDriftEvent(t, org.ID, result.DriftedResources[0].Address)
|
|||
|
|
assert.Equal(t, "open", event.Status)
|
|||
|
|
assert.Equal(t, "critical", event.Severity)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5.3 Critical User Journey: Revert Workflow
|
|||
|
|
|
|||
|
|
**Journey:** Slack [Revert] button clicked → remediation engine queues command → agent executes → event resolved
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestE2E_RemediationRevert_SlackButtonToResolution(t *testing.T) {
|
|||
|
|
// Setup: existing open drift event
|
|||
|
|
org, driftEvent := setupOpenDriftEvent(t, "critical")
|
|||
|
|
|
|||
|
|
// 1. Simulate Slack [Revert] button click
|
|||
|
|
payload := buildSlackInteractionPayload("drift_revert", driftEvent.ID, org.SlackUserID)
|
|||
|
|
resp := postSlackInteraction(t, payload, validSlackSignature(payload))
|
|||
|
|
assert.Equal(t, 200, resp.StatusCode)
|
|||
|
|
|
|||
|
|
// 2. Verify Slack message updated to "Reverting..."
|
|||
|
|
slackUpdates := getSlackMockUpdates(t)
|
|||
|
|
assert.Contains(t, slackUpdates[0].Body, "Reverting")
|
|||
|
|
|
|||
|
|
// 3. Verify remediation plan created in DB
|
|||
|
|
plan := waitForRemediationPlan(t, driftEvent.ID, 5*time.Second)
|
|||
|
|
assert.Equal(t, "executing", plan.Status)
|
|||
|
|
assert.Contains(t, plan.TargetResources, driftEvent.ResourceAddress)
|
|||
|
|
|
|||
|
|
// 4. Simulate agent completing the apply
|
|||
|
|
reportRemediationComplete(t, plan.ID, "success")
|
|||
|
|
|
|||
|
|
// 5. Verify drift event resolved
|
|||
|
|
event := getDriftEvent(t, org.ID, driftEvent.ResourceAddress)
|
|||
|
|
assert.Equal(t, "resolved", event.Status)
|
|||
|
|
assert.Equal(t, "reverted", event.ResolutionType)
|
|||
|
|
|
|||
|
|
// 6. Verify final Slack message shows success
|
|||
|
|
finalUpdate := getLastSlackUpdate(t)
|
|||
|
|
assert.Contains(t, finalUpdate.Body, "reverted to declared state")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5.4 Critical User Journey: Secret Scrubbing End-to-End
|
|||
|
|
|
|||
|
|
**Journey:** State file with secrets → agent processes → drift report transmitted → NO secrets in SaaS database
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestE2E_SecretScrubbing_NoSecretsReachSaaS(t *testing.T) {
|
|||
|
|
// State file contains: master_password = "supersecret123", db_endpoint = "postgres://..."
|
|||
|
|
uploadStateFixture(t, "testdata/states/rds_with_secrets.tfstate", org.StateBucket)
|
|||
|
|
|
|||
|
|
// Inject RDS drift (instance class changed)
|
|||
|
|
injectRDSDrift(t, "mydb", "db.t3.medium", "db.t3.large")
|
|||
|
|
|
|||
|
|
// Run drift check
|
|||
|
|
runDriftCheck(t, agent, org.StateBucket)
|
|||
|
|
|
|||
|
|
// Verify drift event in DB — no secret values
|
|||
|
|
event := getDriftEventByResource(t, org.ID, "aws_db_instance.mydb")
|
|||
|
|
diffJSON, _ := json.Marshal(event.Diff)
|
|||
|
|
assert.NotContains(t, string(diffJSON), "supersecret123")
|
|||
|
|
assert.NotContains(t, string(diffJSON), "postgres://")
|
|||
|
|
assert.Contains(t, string(diffJSON), "[REDACTED]")
|
|||
|
|
|
|||
|
|
// Verify instance class drift IS present (non-secret attribute preserved)
|
|||
|
|
assert.Contains(t, string(diffJSON), "db.t3.large")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5.5 Smoke Tests (Post-Deploy)
|
|||
|
|
|
|||
|
|
Smoke tests run after every production deployment. They hit real endpoints with minimal side effects.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
smoke/
|
|||
|
|
health_check_test.go # GET /health → 200 on all services
|
|||
|
|
agent_registration_test.go # Register a smoke-test agent → 200
|
|||
|
|
heartbeat_test.go # Send heartbeat → 200
|
|||
|
|
drift_report_ingestion_test.go # POST minimal drift report → 202
|
|||
|
|
dashboard_api_test.go # GET /v1/stacks (smoke org) → 200
|
|||
|
|
slack_connectivity_test.go # Verify Slack OAuth token still valid
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Smoke tests use a dedicated `smoke-test` organization in production with a pre-provisioned API key. They never write to real customer data.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 6: Performance & Load Testing
|
|||
|
|
|
|||
|
|
### 6.1 Scan Duration Benchmarks
|
|||
|
|
|
|||
|
|
**Tool:** Go's built-in `testing.B` for agent benchmarks. `k6` for SaaS API load tests.
|
|||
|
|
|
|||
|
|
**Targets:**
|
|||
|
|
|
|||
|
|
| Scenario | Stack Size | Target Duration | Kill Threshold |
|
|||
|
|
|---|---|---|---|
|
|||
|
|
| Full state parse | 100 resources | < 50ms | > 200ms |
|
|||
|
|
| Full state parse | 500 resources | < 200ms | > 1s |
|
|||
|
|
| Full drift check (parse + poll + compare) | 20 resources | < 5s | > 30s |
|
|||
|
|
| Full drift check | 100 resources | < 30s | > 120s |
|
|||
|
|
| Drift report ingestion (SaaS) | single report | < 200ms p99 | > 1s p99 |
|
|||
|
|
| Drift report ingestion (SaaS) | 100 concurrent | < 500ms p99 | > 2s p99 |
|
|||
|
|
|
|||
|
|
**Go benchmark tests:**
|
|||
|
|
```go
|
|||
|
|
// pkg/agent/bench_test.go
|
|||
|
|
func BenchmarkStateParser_100Resources(b *testing.B) {
|
|||
|
|
data, _ := os.ReadFile("testdata/states/100_resources.tfstate")
|
|||
|
|
b.ResetTimer()
|
|||
|
|
for i := 0; i < b.N; i++ {
|
|||
|
|
_, _ = ParseState(data)
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func BenchmarkDriftComparator_100Resources(b *testing.B) {
|
|||
|
|
stateResources := loadStateFixture("testdata/states/100_resources.tfstate")
|
|||
|
|
cloudResources := loadCloudFixture("testdata/cloud/100_resources_clean.json")
|
|||
|
|
b.ResetTimer()
|
|||
|
|
for i := 0; i < b.N; i++ {
|
|||
|
|
_ = CompareDrift(stateResources, cloudResources)
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func BenchmarkSecretScrubber_LargeDiff(b *testing.B) {
|
|||
|
|
diff := loadDiffFixture("testdata/diffs/large_diff_50_attributes.json")
|
|||
|
|
b.ResetTimer()
|
|||
|
|
for i := 0; i < b.N; i++ {
|
|||
|
|
_ = ScrubSecrets(diff)
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 6.2 Memory & CPU Profiling
|
|||
|
|
|
|||
|
|
**Goal:** Ensure the agent stays within its ECS task allocation (0.25 vCPU, 512MB) even for large state files.
|
|||
|
|
|
|||
|
|
**Profile targets:**
|
|||
|
|
- State parser memory allocation for 500-resource state files
|
|||
|
|
- Drift comparator heap usage during deep JSON comparison
|
|||
|
|
- Secret scrubber regex compilation (should be compiled once, not per-call)
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// Run with: go test -memprofile=mem.out -cpuprofile=cpu.out -bench=.
|
|||
|
|
// Analyze with: go tool pprof mem.out
|
|||
|
|
|
|||
|
|
func TestMemoryProfile_LargeStateFile_Under100MB(t *testing.T) {
|
|||
|
|
if testing.Short() { t.Skip("skipping memory profile in short mode") }
|
|||
|
|
|
|||
|
|
var m runtime.MemStats
|
|||
|
|
runtime.ReadMemStats(&m)
|
|||
|
|
before := m.HeapAlloc
|
|||
|
|
|
|||
|
|
data, _ := os.ReadFile("testdata/states/500_resources.tfstate")
|
|||
|
|
_, err := ParseState(data)
|
|||
|
|
require.NoError(t, err)
|
|||
|
|
|
|||
|
|
runtime.ReadMemStats(&m)
|
|||
|
|
after := m.HeapAlloc
|
|||
|
|
allocatedMB := float64(after-before) / 1024 / 1024
|
|||
|
|
assert.Less(t, allocatedMB, 100.0, "state parser should use < 100MB for 500 resources")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Regex pre-compilation check:**
|
|||
|
|
```go
|
|||
|
|
func TestSecretScrubber_RegexPrecompiled_NotCompiledPerCall(t *testing.T) {
|
|||
|
|
// Call scrubber 1000 times — if regex is compiled per call, this will be slow
|
|||
|
|
diff := map[string]interface{}{"password": "test123"}
|
|||
|
|
start := time.Now()
|
|||
|
|
for i := 0; i < 1000; i++ {
|
|||
|
|
ScrubSecrets(diff)
|
|||
|
|
}
|
|||
|
|
elapsed := time.Since(start)
|
|||
|
|
assert.Less(t, elapsed, 100*time.Millisecond, "1000 scrub calls should complete in < 100ms")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 6.3 Concurrent Scan Stress Tests
|
|||
|
|
|
|||
|
|
**Goal:** Verify the agent handles concurrent scans (multiple stacks) without race conditions or goroutine leaks.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestConcurrentScans_MultipleStacks_NoRaceConditions(t *testing.T) {
|
|||
|
|
// Run with: go test -race ./...
|
|||
|
|
const numStacks = 10
|
|||
|
|
var wg sync.WaitGroup
|
|||
|
|
errors := make(chan error, numStacks)
|
|||
|
|
|
|||
|
|
for i := 0; i < numStacks; i++ {
|
|||
|
|
wg.Add(1)
|
|||
|
|
go func(stackIdx int) {
|
|||
|
|
defer wg.Done()
|
|||
|
|
stateFile := fmt.Sprintf("testdata/states/stack_%d.tfstate", stackIdx)
|
|||
|
|
_, err := RunDriftCheck(stateFile, mockAWSClient)
|
|||
|
|
if err != nil { errors <- err }
|
|||
|
|
}(i)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
wg.Wait()
|
|||
|
|
close(errors)
|
|||
|
|
for err := range errors {
|
|||
|
|
t.Errorf("concurrent scan error: %v", err)
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**SaaS load test (k6):**
|
|||
|
|
```javascript
|
|||
|
|
// load-tests/drift-report-ingestion.js
|
|||
|
|
import http from 'k6/http';
|
|||
|
|
import { check } from 'k6';
|
|||
|
|
|
|||
|
|
export const options = {
|
|||
|
|
stages: [
|
|||
|
|
{ duration: '30s', target: 50 }, // ramp up to 50 concurrent agents
|
|||
|
|
{ duration: '60s', target: 50 }, // hold
|
|||
|
|
{ duration: '10s', target: 0 }, // ramp down
|
|||
|
|
],
|
|||
|
|
thresholds: {
|
|||
|
|
http_req_duration: ['p(99)<500'], // 99th percentile < 500ms
|
|||
|
|
http_req_failed: ['rate<0.01'], // < 1% error rate
|
|||
|
|
},
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
export default function () {
|
|||
|
|
const payload = JSON.stringify(buildDriftReport());
|
|||
|
|
const res = http.post(`${__ENV.API_URL}/v1/drift-reports`, payload, {
|
|||
|
|
headers: { 'Authorization': `Bearer ${__ENV.API_KEY}`, 'Content-Type': 'application/json' },
|
|||
|
|
});
|
|||
|
|
check(res, { 'status is 202': (r) => r.status === 202 });
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 7: CI/CD Pipeline Integration
|
|||
|
|
|
|||
|
|
### 7.1 Test Stages
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ PRE-COMMIT (local, < 30s) │
|
|||
|
|
│ • golangci-lint (Go) │
|
|||
|
|
│ • eslint + tsc --noEmit (TypeScript) │
|
|||
|
|
│ • go test -short ./... (unit tests only, no I/O) │
|
|||
|
|
│ • Feature flag TTL audit (make flag-audit) │
|
|||
|
|
│ • Decision log presence check (PRs touching pkg/detection/) │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ PR (GitHub Actions, < 5 min) │
|
|||
|
|
│ • Full unit test suite (Go + TypeScript) │
|
|||
|
|
│ • go test -race ./... (race detector) │
|
|||
|
|
│ • Coverage gate: fail if < 80% overall, < 100% on scrubber │
|
|||
|
|
│ • Schema migration lint (no destructive changes) │
|
|||
|
|
│ • Snapshot test diff check (Block Kit formatter) │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ MERGE TO MAIN (GitHub Actions, < 10 min) │
|
|||
|
|
│ • All unit tests │
|
|||
|
|
│ • Integration tests (Testcontainers: PostgreSQL + DynamoDB) │
|
|||
|
|
│ • LocalStack integration tests (S3, SQS, EC2 mock) │
|
|||
|
|
│ • RLS isolation tests (multi-tenant) │
|
|||
|
|
│ • Docker build + Trivy scan │
|
|||
|
|
│ • Go benchmark regression check (fail if > 20% slower) │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ STAGING DEPLOY (< 15 min) │
|
|||
|
|
│ • E2E test suite against staging environment │
|
|||
|
|
│ • Smoke tests (all health endpoints) │
|
|||
|
|
│ • Secret scrubbing E2E test │
|
|||
|
|
│ • Multi-tenant isolation E2E test │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼ (manual approval gate)
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ PRODUCTION DEPLOY │
|
|||
|
|
│ • Smoke tests post-deploy │
|
|||
|
|
│ • Canary: route 5% traffic to new version for 10 min │
|
|||
|
|
│ • Auto-rollback if smoke tests fail │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 7.2 Coverage Thresholds & Gates
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# .github/workflows/test.yml (coverage gate step)
|
|||
|
|
- name: Check coverage thresholds
|
|||
|
|
run: |
|
|||
|
|
# Go agent
|
|||
|
|
go test -coverprofile=coverage.out ./...
|
|||
|
|
go tool cover -func=coverage.out | grep "total:" | awk '{print $3}' | \
|
|||
|
|
awk -F'%' '{if ($1 < 80) {print "FAIL: Go coverage " $1 "% < 80%"; exit 1}}'
|
|||
|
|
|
|||
|
|
# Secret scrubber must be 100%
|
|||
|
|
go tool cover -func=coverage.out | grep "scrubber" | \
|
|||
|
|
awk -F'%' '{if ($1 < 100) {print "FAIL: Scrubber coverage " $1 "% < 100%"; exit 1}}'
|
|||
|
|
|
|||
|
|
# TypeScript SaaS
|
|||
|
|
npx vitest run --coverage
|
|||
|
|
# vitest.config.ts enforces: lines: 80, branches: 75, functions: 80
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**`vitest.config.ts` coverage config:**
|
|||
|
|
```typescript
|
|||
|
|
export default defineConfig({
|
|||
|
|
test: {
|
|||
|
|
coverage: {
|
|||
|
|
provider: 'v8',
|
|||
|
|
thresholds: {
|
|||
|
|
lines: 80,
|
|||
|
|
branches: 75,
|
|||
|
|
functions: 80,
|
|||
|
|
statements: 80,
|
|||
|
|
},
|
|||
|
|
// Stricter thresholds for critical modules
|
|||
|
|
perFile: true,
|
|||
|
|
},
|
|||
|
|
},
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 7.3 Test Parallelization Strategy
|
|||
|
|
|
|||
|
|
**Go:** Tests are parallelized at the package level by default (`go test ./...`). Mark individual tests with `t.Parallel()` where safe. Integration tests that share LocalStack state must NOT be parallelized — use build tags to separate them.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// Unit tests: always parallel
|
|||
|
|
func TestDriftComparator_AttributeAdded_ReturnsDrift(t *testing.T) {
|
|||
|
|
t.Parallel()
|
|||
|
|
// ...
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Integration tests: sequential within package, parallel across packages
|
|||
|
|
// go test -p 4 ./... (4 packages in parallel)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Build tags for test separation:**
|
|||
|
|
```go
|
|||
|
|
//go:build integration
|
|||
|
|
// +build integration
|
|||
|
|
|
|||
|
|
// Run with: go test -tags=integration ./...
|
|||
|
|
// Unit only: go test ./... (no tag)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**GitHub Actions matrix:**
|
|||
|
|
```yaml
|
|||
|
|
strategy:
|
|||
|
|
matrix:
|
|||
|
|
test-suite:
|
|||
|
|
- unit-go
|
|||
|
|
- unit-ts
|
|||
|
|
- integration-go
|
|||
|
|
- integration-ts
|
|||
|
|
- e2e
|
|||
|
|
fail-fast: false # don't cancel other suites on first failure
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 8: Transparent Factory Tenet Testing
|
|||
|
|
|
|||
|
|
### 8.1 Feature Flag Behavior (Epic 10, Story 10.1)
|
|||
|
|
|
|||
|
|
**Testing OpenFeature Go SDK integration:**
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// pkg/flags/flags_test.go
|
|||
|
|
|
|||
|
|
// Test 1: Flag gates new detection rule
|
|||
|
|
func TestFeatureFlag_NewDetectionRule_GatedByFlag(t *testing.T) {
|
|||
|
|
// Set up: flag "pulumi-support" = false
|
|||
|
|
provider := openfeature.NewInMemoryProvider(map[string]openfeature.InMemoryFlag{
|
|||
|
|
"pulumi-support": {DefaultVariant: "off", Variants: map[string]interface{}{"off": false, "on": true}},
|
|||
|
|
})
|
|||
|
|
openfeature.SetProvider(provider)
|
|||
|
|
|
|||
|
|
result := RunDriftCheck(pulumiStateFixture)
|
|||
|
|
assert.ErrorIs(t, result.Err, ErrIaCToolNotSupported)
|
|||
|
|
assert.Equal(t, 0, result.DriftedResourceCount)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Test 2: Flag enabled — feature executes
|
|||
|
|
func TestFeatureFlag_NewDetectionRule_ExecutesWhenEnabled(t *testing.T) {
|
|||
|
|
provider := openfeature.NewInMemoryProvider(map[string]openfeature.InMemoryFlag{
|
|||
|
|
"pulumi-support": {DefaultVariant: "on", Variants: map[string]interface{}{"off": false, "on": true}},
|
|||
|
|
})
|
|||
|
|
openfeature.SetProvider(provider)
|
|||
|
|
|
|||
|
|
result := RunDriftCheck(pulumiStateFixture)
|
|||
|
|
require.NoError(t, result.Err)
|
|||
|
|
assert.Greater(t, result.ResourceCount, 0)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Test 3: Circuit breaker disables flag on false-positive spike
|
|||
|
|
func TestFeatureFlag_CircuitBreaker_TripsOnFalsePositiveSpike(t *testing.T) {
|
|||
|
|
flag := NewFeatureFlag("new-sg-rule", circuitBreakerConfig{Threshold: 3.0, Window: time.Hour})
|
|||
|
|
|
|||
|
|
// Simulate 10 dismissals in 1 hour (3x baseline of ~3)
|
|||
|
|
for i := 0; i < 10; i++ {
|
|||
|
|
flag.RecordDismissal()
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
assert.False(t, flag.IsEnabled(), "circuit breaker should have tripped")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**TTL lint test (CI enforcement):**
|
|||
|
|
```go
|
|||
|
|
func TestFeatureFlags_NoExpiredTTLs(t *testing.T) {
|
|||
|
|
flags := LoadAllFlags("../../config/flags.json")
|
|||
|
|
for _, flag := range flags {
|
|||
|
|
if flag.Rollout == 100 {
|
|||
|
|
assert.True(t, time.Now().Before(flag.TTL),
|
|||
|
|
"flag %q is at 100%% rollout and past TTL %v — clean it up", flag.Name, flag.TTL)
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 8.2 Schema Migration Validation (Epic 10, Story 10.2)
|
|||
|
|
|
|||
|
|
**Goal:** CI blocks any migration that removes, renames, or changes the type of existing DynamoDB attributes.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// tools/schema-lint/main_test.go
|
|||
|
|
|
|||
|
|
func TestSchemaMigration_AddNewAttribute_IsAllowed(t *testing.T) {
|
|||
|
|
migration := Migration{
|
|||
|
|
Changes: []SchemaChange{
|
|||
|
|
{Type: ChangeTypeAdd, AttributeName: "new_field_v2", AttributeType: "S"},
|
|||
|
|
},
|
|||
|
|
}
|
|||
|
|
err := ValidateMigration(migration, currentSchema)
|
|||
|
|
assert.NoError(t, err)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestSchemaMigration_RemoveAttribute_IsRejected(t *testing.T) {
|
|||
|
|
migration := Migration{
|
|||
|
|
Changes: []SchemaChange{
|
|||
|
|
{Type: ChangeTypeRemove, AttributeName: "event_type"},
|
|||
|
|
},
|
|||
|
|
}
|
|||
|
|
err := ValidateMigration(migration, currentSchema)
|
|||
|
|
assert.ErrorContains(t, err, "destructive schema change: cannot remove attribute 'event_type'")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestSchemaMigration_RenameAttribute_IsRejected(t *testing.T) {
|
|||
|
|
migration := Migration{
|
|||
|
|
Changes: []SchemaChange{
|
|||
|
|
{Type: ChangeTypeRename, OldName: "payload", NewName: "event_payload"},
|
|||
|
|
},
|
|||
|
|
}
|
|||
|
|
err := ValidateMigration(migration, currentSchema)
|
|||
|
|
assert.ErrorContains(t, err, "destructive schema change: cannot rename attribute")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestSchemaMigration_ChangeAttributeType_IsRejected(t *testing.T) {
|
|||
|
|
migration := Migration{
|
|||
|
|
Changes: []SchemaChange{
|
|||
|
|
{Type: ChangeTypeModify, AttributeName: "timestamp", OldType: "S", NewType: "N"},
|
|||
|
|
},
|
|||
|
|
}
|
|||
|
|
err := ValidateMigration(migration, currentSchema)
|
|||
|
|
assert.ErrorContains(t, err, "destructive schema change: cannot change type of attribute 'timestamp'")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 8.3 Decision Log Format Validation (Epic 10, Story 10.3)
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// tools/decision-log-lint/main_test.go
|
|||
|
|
|
|||
|
|
func TestDecisionLog_ValidFormat_PassesValidation(t *testing.T) {
|
|||
|
|
log := DecisionLog{
|
|||
|
|
Prompt: "Why is security group drift classified as critical?",
|
|||
|
|
Reasoning: "SG drift is the #1 vector for cloud breaches...",
|
|||
|
|
AlternativesConsidered: []string{"classify as high", "require manual review"},
|
|||
|
|
Confidence: 0.9,
|
|||
|
|
Timestamp: time.Now(),
|
|||
|
|
Author: "max@dd0c.dev",
|
|||
|
|
}
|
|||
|
|
assert.NoError(t, ValidateDecisionLog(log))
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestDecisionLog_MissingReasoning_FailsValidation(t *testing.T) {
|
|||
|
|
log := DecisionLog{Prompt: "Why?", Confidence: 0.8}
|
|||
|
|
err := ValidateDecisionLog(log)
|
|||
|
|
assert.ErrorContains(t, err, "reasoning is required")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestDecisionLog_ConfidenceOutOfRange_FailsValidation(t *testing.T) {
|
|||
|
|
log := DecisionLog{Prompt: "Why?", Reasoning: "Because.", Confidence: 1.5}
|
|||
|
|
err := ValidateDecisionLog(log)
|
|||
|
|
assert.ErrorContains(t, err, "confidence must be between 0 and 1")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// CI check: PRs touching pkg/detection/ must include a decision log
|
|||
|
|
func TestCI_DetectionPackageChange_RequiresDecisionLog(t *testing.T) {
|
|||
|
|
changedFiles := getChangedFilesInPR()
|
|||
|
|
touchesDetection := slices.ContainsFunc(changedFiles, func(f string) bool {
|
|||
|
|
return strings.HasPrefix(f, "pkg/detection/")
|
|||
|
|
})
|
|||
|
|
if touchesDetection {
|
|||
|
|
decisionLogs := findDecisionLogsInPR()
|
|||
|
|
assert.NotEmpty(t, decisionLogs, "PRs touching pkg/detection/ require a decision log entry")
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 8.4 OTEL Span Assertion Tests (Epic 10, Story 10.4)
|
|||
|
|
|
|||
|
|
**Goal:** Verify that drift classification emits the correct OpenTelemetry spans with required attributes.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// pkg/observability/spans_test.go
|
|||
|
|
|
|||
|
|
func TestOTELSpans_DriftScan_EmitsParentSpan(t *testing.T) {
|
|||
|
|
exporter := tracetest.NewInMemoryExporter()
|
|||
|
|
tp := sdktrace.NewTracerProvider(sdktrace.WithSyncer(exporter))
|
|||
|
|
otel.SetTracerProvider(tp)
|
|||
|
|
|
|||
|
|
RunDriftScan(testStateFixture, mockAWSClient)
|
|||
|
|
|
|||
|
|
spans := exporter.GetSpans()
|
|||
|
|
parentSpans := filterSpansByName(spans, "drift_scan")
|
|||
|
|
require.Len(t, parentSpans, 1)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestOTELSpans_DriftClassification_EmitsChildSpanPerResource(t *testing.T) {
|
|||
|
|
exporter := tracetest.NewInMemoryExporter()
|
|||
|
|
// ... setup ...
|
|||
|
|
|
|||
|
|
RunDriftScan(stateWith3Resources, mockAWSClient)
|
|||
|
|
|
|||
|
|
classificationSpans := filterSpansByName(exporter.GetSpans(), "drift_classification")
|
|||
|
|
assert.Len(t, classificationSpans, 3) // one per resource
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestOTELSpans_ClassificationSpan_HasRequiredAttributes(t *testing.T) {
|
|||
|
|
// ... run scan ...
|
|||
|
|
span := getClassificationSpan(exporter, "aws_security_group.api")
|
|||
|
|
|
|||
|
|
attrs := span.Attributes()
|
|||
|
|
assert.Equal(t, "aws_security_group", getAttr(attrs, "drift.resource_type"))
|
|||
|
|
assert.NotEmpty(t, getAttr(attrs, "drift.severity_score"))
|
|||
|
|
assert.NotEmpty(t, getAttr(attrs, "drift.classification_reason"))
|
|||
|
|
// No PII: resource ARN must be hashed, not raw
|
|||
|
|
assert.NotContains(t, getAttr(attrs, "drift.resource_id"), "arn:aws:")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestOTELSpans_NoCustomerPII_InAnySpan(t *testing.T) {
|
|||
|
|
// Run scan with a state file containing real-looking ARNs
|
|||
|
|
RunDriftScan(stateWithRealARNs, mockAWSClient)
|
|||
|
|
|
|||
|
|
for _, span := range exporter.GetSpans() {
|
|||
|
|
for _, attr := range span.Attributes() {
|
|||
|
|
assert.NotRegexp(t, `arn:aws:[a-z]+:[a-z0-9-]+:\d{12}:`, attr.Value.AsString(),
|
|||
|
|
"span %q contains unhashed ARN in attribute %q", span.Name(), attr.Key)
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 8.5 Governance Policy Enforcement Tests (Epic 10, Story 10.5)
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
func TestGovernance_StrictMode_RemediationNeverExecutes(t *testing.T) {
|
|||
|
|
engine := NewRemediationEngine(Policy{GovernanceMode: "strict"})
|
|||
|
|
|
|||
|
|
result, err := engine.Revert(criticalDriftEvent)
|
|||
|
|
|
|||
|
|
require.NoError(t, err) // not an error — just blocked
|
|||
|
|
assert.Equal(t, "blocked_by_policy", result.Status)
|
|||
|
|
assert.Contains(t, result.Log, "Remediation blocked by strict mode")
|
|||
|
|
assert.False(t, mockAgentDispatcher.WasCalled())
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestGovernance_CustomerCannotEscalateAboveSystemPolicy(t *testing.T) {
|
|||
|
|
systemPolicy := Policy{GovernanceMode: "strict"}
|
|||
|
|
customerPolicy := Policy{GovernanceMode: "audit"} // customer wants less restriction
|
|||
|
|
|
|||
|
|
merged := MergePolicies(systemPolicy, customerPolicy)
|
|||
|
|
assert.Equal(t, "strict", merged.GovernanceMode, "customer cannot override system to be less restrictive")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestGovernance_PanicMode_HaltsAllScansImmediately(t *testing.T) {
|
|||
|
|
agent := NewDriftAgent(Policy{PanicMode: true})
|
|||
|
|
|
|||
|
|
result := agent.RunScan(testStateFixture)
|
|||
|
|
|
|||
|
|
assert.ErrorIs(t, result.Err, ErrPanicModeActive)
|
|||
|
|
assert.False(t, mockAWSClient.WasCalled(), "no AWS API calls should be made in panic mode")
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func TestGovernance_PanicMode_SendsExactlyOneNotification(t *testing.T) {
|
|||
|
|
agent := NewDriftAgent(Policy{PanicMode: true})
|
|||
|
|
|
|||
|
|
// Run scan 3 times — should only notify once
|
|||
|
|
for i := 0; i < 3; i++ {
|
|||
|
|
agent.RunScan(testStateFixture)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
assert.Equal(t, 1, mockNotifier.CallCount(), "panic mode should send exactly one notification")
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 9: Test Data & Fixtures
|
|||
|
|
|
|||
|
|
### 9.1 Directory Structure
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
testdata/
|
|||
|
|
states/
|
|||
|
|
# Terraform state v4 fixtures
|
|||
|
|
single_sg.tfstate # 1 resource: aws_security_group
|
|||
|
|
single_rds.tfstate # 1 resource: aws_db_instance (with secrets)
|
|||
|
|
prod_networking.tfstate # 23 resources: VPC, SGs, subnets, routes
|
|||
|
|
prod_compute.tfstate # 47 resources: EC2, IAM, Lambda, ECS
|
|||
|
|
100_resources.tfstate # benchmark fixture
|
|||
|
|
500_resources.tfstate # benchmark fixture
|
|||
|
|
module_nested.tfstate # module-prefixed addresses
|
|||
|
|
for_each_resources.tfstate # for_each instances
|
|||
|
|
v3_format.tfstate # invalid: old format (should error)
|
|||
|
|
rds_with_secrets.tfstate # contains master_password, connection strings
|
|||
|
|
opentofu_state.tfstate # OpenTofu-generated state
|
|||
|
|
|
|||
|
|
aws-responses/
|
|||
|
|
# Recorded AWS API responses (go-vcr cassettes)
|
|||
|
|
ec2/
|
|||
|
|
describe_sg_clean.json # cloud matches state
|
|||
|
|
describe_sg_ingress_added.json # 0.0.0.0/0 rule added
|
|||
|
|
describe_sg_ingress_removed.json # rule removed
|
|||
|
|
describe_sg_not_found.json # resource deleted from cloud
|
|||
|
|
iam/
|
|||
|
|
get_role_clean.json
|
|||
|
|
get_role_policy_changed.json
|
|||
|
|
get_role_not_found.json
|
|||
|
|
rds/
|
|||
|
|
describe_db_instances_clean.json
|
|||
|
|
describe_db_instances_class_changed.json
|
|||
|
|
describe_db_instances_publicly_accessible.json # critical: made public
|
|||
|
|
|
|||
|
|
diffs/
|
|||
|
|
# Pre-computed drift diff fixtures
|
|||
|
|
sg_ingress_added_critical.json
|
|||
|
|
iam_policy_changed_high.json
|
|||
|
|
rds_class_changed_high.json
|
|||
|
|
tag_only_change_low.json
|
|||
|
|
large_diff_50_attributes.json # benchmark fixture
|
|||
|
|
|
|||
|
|
wiremock/
|
|||
|
|
slack/
|
|||
|
|
post_message_success.json
|
|||
|
|
post_message_rate_limited.json
|
|||
|
|
post_message_channel_not_found.json
|
|||
|
|
interactions_revert_payload.json
|
|||
|
|
github/
|
|||
|
|
create_branch_success.json
|
|||
|
|
create_pr_success.json
|
|||
|
|
create_pr_repo_not_found.json
|
|||
|
|
|
|||
|
|
policies/
|
|||
|
|
strict_mode.json
|
|||
|
|
audit_mode.json
|
|||
|
|
auto_revert_critical.json
|
|||
|
|
require_approval_iam.json
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 9.2 State File Factory (Go)
|
|||
|
|
|
|||
|
|
A factory package generates synthetic Terraform state files for tests. This avoids brittle fixture files that break when the state format changes.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// testutil/statefactory/factory.go
|
|||
|
|
|
|||
|
|
type StateFactory struct {
|
|||
|
|
version int
|
|||
|
|
terraformVersion string
|
|||
|
|
resources []StateResource
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func NewStateFactory() *StateFactory {
|
|||
|
|
return &StateFactory{version: 4, terraformVersion: "1.7.0"}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func (f *StateFactory) WithSecurityGroup(name, vpcID string, ingress []IngressRule) *StateFactory {
|
|||
|
|
f.resources = append(f.resources, StateResource{
|
|||
|
|
Mode: "managed",
|
|||
|
|
Type: "aws_security_group",
|
|||
|
|
Name: name,
|
|||
|
|
Provider: "registry.terraform.io/hashicorp/aws",
|
|||
|
|
Instances: []ResourceInstance{{
|
|||
|
|
Attributes: map[string]interface{}{
|
|||
|
|
"id": fmt.Sprintf("sg-%s", randID()),
|
|||
|
|
"name": name,
|
|||
|
|
"vpc_id": vpcID,
|
|||
|
|
"ingress": ingress,
|
|||
|
|
"egress": defaultEgressRules(),
|
|||
|
|
"tags": map[string]string{"ManagedBy": "terraform"},
|
|||
|
|
},
|
|||
|
|
}},
|
|||
|
|
})
|
|||
|
|
return f
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func (f *StateFactory) WithIAMRole(name, assumeRolePolicy string) *StateFactory { /* ... */ }
|
|||
|
|
func (f *StateFactory) WithRDSInstance(id, instanceClass string) *StateFactory { /* ... */ }
|
|||
|
|
func (f *StateFactory) WithSecret(key, value string) *StateFactory { /* injects secret into last resource */ }
|
|||
|
|
func (f *StateFactory) Build() []byte { /* marshals to JSON */ }
|
|||
|
|
|
|||
|
|
// Usage in tests:
|
|||
|
|
state := NewStateFactory().
|
|||
|
|
WithSecurityGroup("api", "vpc-abc123", []IngressRule{{Port: 443, CIDR: "10.0.0.0/8"}}).
|
|||
|
|
WithIAMRole("lambda-exec", assumeRolePolicyJSON).
|
|||
|
|
Build()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 9.3 Cloud Response Factory (Go)
|
|||
|
|
|
|||
|
|
Mirrors the state factory but for AWS API responses. Used to simulate clean vs. drifted cloud state.
|
|||
|
|
|
|||
|
|
```go
|
|||
|
|
// testutil/cloudfactory/factory.go
|
|||
|
|
|
|||
|
|
type CloudResponseFactory struct{}
|
|||
|
|
|
|||
|
|
func (f *CloudResponseFactory) SecurityGroup(id string, opts ...SGOption) *ec2.SecurityGroup {
|
|||
|
|
sg := &ec2.SecurityGroup{GroupId: aws.String(id), /* defaults */}
|
|||
|
|
for _, opt := range opts { opt(sg) }
|
|||
|
|
return sg
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Options for injecting drift:
|
|||
|
|
func WithPublicIngress(port int) SGOption {
|
|||
|
|
return func(sg *ec2.SecurityGroup) {
|
|||
|
|
sg.IpPermissions = append(sg.IpPermissions, ec2types.IpPermission{
|
|||
|
|
FromPort: aws.Int32(int32(port)),
|
|||
|
|
IpRanges: []ec2types.IpRange{{CidrIp: aws.String("0.0.0.0/0")}},
|
|||
|
|
})
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
func WithInstanceClassChanged(newClass string) RDSOption { /* ... */ }
|
|||
|
|
func WithPolicyDocumentChanged(newPolicy string) IAMOption { /* ... */ }
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 9.4 Drift Scenario Fixtures
|
|||
|
|
|
|||
|
|
Pre-built scenarios covering the most common real-world drift patterns. Each scenario includes: state file, cloud response, expected diff, expected severity.
|
|||
|
|
|
|||
|
|
| Scenario | State Fixture | Cloud Response | Expected Severity | Category |
|
|||
|
|
|---|---|---|---|---|
|
|||
|
|
| Security group: public HTTPS ingress added | `sg_private.tfstate` | `sg_public_443.json` | critical | security |
|
|||
|
|
| Security group: SSH port opened to world | `sg_no_ssh.tfstate` | `sg_ssh_open.json` | critical | security |
|
|||
|
|
| IAM role: `*:*` policy attached | `iam_role_scoped.tfstate` | `iam_role_star_star.json` | critical | security |
|
|||
|
|
| S3 bucket: public access enabled | `s3_private.tfstate` | `s3_public.json` | critical | security |
|
|||
|
|
| RDS: made publicly accessible | `rds_private.tfstate` | `rds_public.json` | critical | security |
|
|||
|
|
| Lambda: runtime changed (python3.8 → python3.12) | `lambda_py38.tfstate` | `lambda_py312.json` | high | configuration |
|
|||
|
|
| ECS service: task count changed (2 → 5) | `ecs_2tasks.tfstate` | `ecs_5tasks.json` | low | scaling |
|
|||
|
|
| EC2 instance: instance type changed | `ec2_t3medium.tfstate` | `ec2_t3large.json` | high | configuration |
|
|||
|
|
| Route53: TTL changed (300 → 60) | `r53_ttl300.tfstate` | `r53_ttl60.json` | medium | configuration |
|
|||
|
|
| Tags: Environment tag changed | `tags_prod.tfstate` | `tags_staging.json` | low | tags |
|
|||
|
|
| Resource deleted from cloud | `sg_exists.tfstate` | `sg_not_found.json` | high | configuration |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 9.5 TypeScript Test Helpers
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// test/helpers/factories.ts
|
|||
|
|
|
|||
|
|
export const buildDriftEvent = (overrides: Partial<DriftEvent> = {}): DriftEvent => ({
|
|||
|
|
id: `evt_${randomUUID()}`,
|
|||
|
|
orgId: 'org_test_001',
|
|||
|
|
stackId: 'stack_prod_networking',
|
|||
|
|
resourceAddress: 'aws_security_group.api',
|
|||
|
|
resourceType: 'aws_security_group',
|
|||
|
|
severity: 'critical',
|
|||
|
|
category: 'security',
|
|||
|
|
status: 'open',
|
|||
|
|
diff: {
|
|||
|
|
ingress: {
|
|||
|
|
old: [{ from_port: 443, cidr_blocks: ['10.0.0.0/8'] }],
|
|||
|
|
new: [{ from_port: 443, cidr_blocks: ['10.0.0.0/8', '0.0.0.0/0'] }],
|
|||
|
|
},
|
|||
|
|
},
|
|||
|
|
attribution: {
|
|||
|
|
principal: 'arn:aws:iam::123456789:user/jsmith',
|
|||
|
|
sourceIp: '192.168.1.1',
|
|||
|
|
eventName: 'AuthorizeSecurityGroupIngress',
|
|||
|
|
attributedAt: new Date().toISOString(),
|
|||
|
|
},
|
|||
|
|
createdAt: new Date().toISOString(),
|
|||
|
|
...overrides,
|
|||
|
|
})
|
|||
|
|
|
|||
|
|
export const buildOrg = (overrides: Partial<Organization> = {}): Organization => ({
|
|||
|
|
id: `org_${randomUUID()}`,
|
|||
|
|
name: 'Test Org',
|
|||
|
|
slug: 'test-org',
|
|||
|
|
plan: 'starter',
|
|||
|
|
maxStacks: 10,
|
|||
|
|
pollIntervalS: 300,
|
|||
|
|
...overrides,
|
|||
|
|
})
|
|||
|
|
|
|||
|
|
export const buildStack = (orgId: string, overrides: Partial<Stack> = {}): Stack => ({
|
|||
|
|
id: `stack_${randomUUID()}`,
|
|||
|
|
orgId,
|
|||
|
|
name: 'prod-networking',
|
|||
|
|
backendType: 's3',
|
|||
|
|
backendHash: 'abc123def456',
|
|||
|
|
iacTool: 'terraform',
|
|||
|
|
environment: 'prod',
|
|||
|
|
driftScore: 100.0,
|
|||
|
|
resourceCount: 23,
|
|||
|
|
driftedCount: 0,
|
|||
|
|
...overrides,
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Section 10: TDD Implementation Order
|
|||
|
|
|
|||
|
|
### 10.1 Bootstrap Sequence (Test Infrastructure First)
|
|||
|
|
|
|||
|
|
Before writing a single product test, the test infrastructure itself must be bootstrapped. This is the meta-TDD step.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Week 0 — Test Infrastructure Bootstrap
|
|||
|
|
────────────────────────────────────────
|
|||
|
|
1. Set up Go test project structure
|
|||
|
|
• testutil/ package with state factory, cloud factory
|
|||
|
|
• testdata/ directory with initial fixture files
|
|||
|
|
• golangci-lint config (.golangci.yml)
|
|||
|
|
• go test -race baseline (should pass with zero tests)
|
|||
|
|
|
|||
|
|
2. Set up TypeScript test project
|
|||
|
|
• vitest.config.ts with coverage thresholds
|
|||
|
|
• test/helpers/factories.ts with builder functions
|
|||
|
|
• ESLint + tsc --noEmit in CI
|
|||
|
|
|
|||
|
|
3. Set up Docker Compose test environment
|
|||
|
|
• docker-compose.test.yml (LocalStack, PostgreSQL, WireMock)
|
|||
|
|
• Makefile targets: make test-unit, make test-integration, make test-e2e
|
|||
|
|
|
|||
|
|
4. Set up CI pipeline skeleton
|
|||
|
|
• GitHub Actions workflow with test stages
|
|||
|
|
• Coverage reporting (codecov or similar)
|
|||
|
|
• Feature flag TTL lint check
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 10.2 Epic-by-Epic TDD Order
|
|||
|
|
|
|||
|
|
The implementation order follows epic dependencies. Tests are written before code at each step.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Phase 1: Agent Core (Weeks 1–2)
|
|||
|
|
────────────────────────────────
|
|||
|
|
Write tests first, then implement:
|
|||
|
|
|
|||
|
|
1. TestStateParser_* (Epic 1, Story 1.1)
|
|||
|
|
→ Implement StateParser
|
|||
|
|
→ Fixture: single_sg.tfstate, module_nested.tfstate
|
|||
|
|
|
|||
|
|
2. TestDriftComparator_* (Epic 1, Story 1.3)
|
|||
|
|
→ Implement DriftComparator
|
|||
|
|
→ Depends on: StateParser (need parsed state to compare)
|
|||
|
|
|
|||
|
|
3. TestSecretScrubber_* (Epic 1, Story 1.4) ← ALL 16 tests before any code
|
|||
|
|
→ Implement SecretScrubber
|
|||
|
|
→ This is the highest-risk component. Write every test case first.
|
|||
|
|
|
|||
|
|
4. TestDriftClassifier_* (Epic 3, Story 3.2)
|
|||
|
|
→ Implement DriftClassifier with YAML rules
|
|||
|
|
→ Depends on: DriftComparator output format
|
|||
|
|
|
|||
|
|
5. TestAWSPolling_* (Epic 1, Story 1.2) ← Integration tests lead here
|
|||
|
|
→ Implement AWS resource polling for top 5 resource types
|
|||
|
|
→ Use recorded HTTP fixtures (go-vcr)
|
|||
|
|
→ Add remaining 15 resource types iteratively
|
|||
|
|
|
|||
|
|
Phase 2: Agent Communication (Week 2)
|
|||
|
|
───────────────────────────────────────
|
|||
|
|
6. TestTransmitter_* (Epic 2, Story 2.2)
|
|||
|
|
→ Implement HTTPS transmitter with mTLS
|
|||
|
|
→ Depends on: SecretScrubber (scrub before transmit)
|
|||
|
|
|
|||
|
|
7. TestAgentRegistration_* (Epic 2, Story 2.1)
|
|||
|
|
→ Implement agent registration flow
|
|||
|
|
→ Depends on: Transmitter
|
|||
|
|
|
|||
|
|
8. TestHeartbeat_* (Epic 2, Story 2.3)
|
|||
|
|
→ Implement heartbeat goroutine
|
|||
|
|
→ Depends on: AgentRegistration
|
|||
|
|
|
|||
|
|
Phase 3: SaaS Ingestion Pipeline (Week 2–3)
|
|||
|
|
─────────────────────────────────────────────
|
|||
|
|
9. TestEventProcessor_Validation_* (Epic 3, Story 3.1)
|
|||
|
|
→ Implement zod schema validation
|
|||
|
|
→ Write tests for every invalid payload shape
|
|||
|
|
|
|||
|
|
10. TestDynamoDBEventStore_* (Epic 3, Story 3.3) ← Integration tests with Testcontainers
|
|||
|
|
→ Implement DynamoDB persistence
|
|||
|
|
→ Depends on: DynamoDB Local container running
|
|||
|
|
|
|||
|
|
11. TestPostgreSQL_RLS_* (Epic 3, Story 3.3) ← Integration tests with Testcontainers
|
|||
|
|
→ Apply schema migrations
|
|||
|
|
→ Write multi-tenant isolation tests BEFORE any API handlers
|
|||
|
|
|
|||
|
|
12. TestDriftScorer_* (Epic 3, Story 3.4)
|
|||
|
|
→ Implement drift score calculation
|
|||
|
|
→ Depends on: PostgreSQL schema (reads/writes stacks table)
|
|||
|
|
|
|||
|
|
Phase 4: Notifications (Week 3)
|
|||
|
|
─────────────────────────────────
|
|||
|
|
13. TestNotificationFormatter_* (Epic 4, Story 4.1)
|
|||
|
|
→ Implement Block Kit formatter
|
|||
|
|
→ Snapshot tests for output JSON
|
|||
|
|
|
|||
|
|
14. TestSlackDelivery_* (Epic 4, Story 4.2) ← Integration with WireMock
|
|||
|
|
→ Implement Slack API client
|
|||
|
|
→ Depends on: Formatter output
|
|||
|
|
|
|||
|
|
15. TestNotificationBatching_* (Epic 4, Story 4.4)
|
|||
|
|
→ Implement digest queue logic
|
|||
|
|
→ Depends on: Slack delivery working
|
|||
|
|
|
|||
|
|
Phase 5: Dashboard API (Week 3–4)
|
|||
|
|
───────────────────────────────────
|
|||
|
|
16. TestDashboardAuth_* (Epic 5, Story 5.1)
|
|||
|
|
→ Implement Cognito JWT middleware
|
|||
|
|
→ RLS context-setting middleware
|
|||
|
|
→ Write auth tests before any route handlers
|
|||
|
|
|
|||
|
|
17. TestStackEndpoints_* (Epic 5, Story 5.2)
|
|||
|
|
→ Implement GET/PATCH /v1/stacks
|
|||
|
|
→ Depends on: Auth middleware + PostgreSQL
|
|||
|
|
|
|||
|
|
18. TestDriftEventEndpoints_* (Epic 5, Story 5.3)
|
|||
|
|
→ Implement GET /v1/drift-events with filters
|
|||
|
|
→ Depends on: Stack endpoints
|
|||
|
|
|
|||
|
|
Phase 6: Slack Bot & Remediation (Week 4)
|
|||
|
|
───────────────────────────────────────────
|
|||
|
|
19. TestSlackInteraction_SignatureValidation_* (Epic 7, Story 7.1)
|
|||
|
|
→ Implement signature verification FIRST
|
|||
|
|
→ Write tests for valid and invalid signatures before any callback logic
|
|||
|
|
|
|||
|
|
20. TestRemediationEngine_* (Epic 7, Stories 7.1–7.2)
|
|||
|
|
→ Implement revert and accept workflows
|
|||
|
|
→ Depends on: Slack interaction handler, PostgreSQL remediation_plans table
|
|||
|
|
|
|||
|
|
21. TestPolicyEngine_* (Epic 10, Story 10.5)
|
|||
|
|
→ Implement governance policy enforcement
|
|||
|
|
→ Wrap remediation engine with policy checks
|
|||
|
|
|
|||
|
|
Phase 7: Transparent Factory Tenets (Week 4, parallel)
|
|||
|
|
────────────────────────────────────────────────────────
|
|||
|
|
22. TestFeatureFlag_* (Epic 10, Story 10.1)
|
|||
|
|
→ Integrate OpenFeature SDK
|
|||
|
|
→ Write flag tests alongside each new feature (not at the end)
|
|||
|
|
|
|||
|
|
23. TestOTELSpans_* (Epic 10, Story 10.4)
|
|||
|
|
→ Add OTEL instrumentation to drift scan
|
|||
|
|
→ Write span assertion tests
|
|||
|
|
|
|||
|
|
24. TestSchemaMigration_* (Epic 10, Story 10.2)
|
|||
|
|
→ Implement schema lint tool
|
|||
|
|
→ Add to CI pipeline
|
|||
|
|
|
|||
|
|
25. TestDecisionLog_* (Epic 10, Story 10.3)
|
|||
|
|
→ Implement decision log validator
|
|||
|
|
→ Add PR template check to CI
|
|||
|
|
|
|||
|
|
Phase 8: E2E & Performance (Week 4–5)
|
|||
|
|
───────────────────────────────────────
|
|||
|
|
26. E2E: Onboarding flow (install → detect → notify)
|
|||
|
|
→ Requires all Phase 1–4 components working
|
|||
|
|
→ First E2E test written after unit + integration tests pass
|
|||
|
|
|
|||
|
|
27. E2E: Remediation round-trip (Slack → apply → resolve)
|
|||
|
|
→ Requires Phase 5–6 components
|
|||
|
|
|
|||
|
|
28. Performance benchmarks
|
|||
|
|
→ Run after correctness is established
|
|||
|
|
→ Fail CI if regression > 20%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 10.3 Test Dependency Graph
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
StateParser ──────────────────────────────────────────────────────┐
|
|||
|
|
│ │
|
|||
|
|
▼ ▼
|
|||
|
|
DriftComparator ──► SecretScrubber ──► Transmitter ──► E2E: Onboarding
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
DriftClassifier ──► DriftScorer ──► DynamoDB EventStore ──► Dashboard API
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
PostgreSQL RLS ──► Auth Middleware
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Slack Formatter
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Slack Delivery
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Remediation Engine ──► E2E: Revert
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Policy Engine
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 10.4 "Never Ship Without" Checklist
|
|||
|
|
|
|||
|
|
Before any code ships to production, these tests must be green:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
□ TestSecretScrubber_* — all 16 tests passing (100% coverage)
|
|||
|
|
□ TestPostgreSQL_RLS_CrossTenantIsolation — org A cannot read org B data
|
|||
|
|
□ TestTransmitter_mTLSCertPresented_OnEveryRequest
|
|||
|
|
□ TestGovernance_StrictMode_RemediationNeverExecutes
|
|||
|
|
□ TestE2E_SecretScrubbing_NoSecretsReachSaaS
|
|||
|
|
□ TestE2E_MultiTenantIsolation_OrgACannotSeeOrgBEvents
|
|||
|
|
□ go test -race ./... — zero race conditions
|
|||
|
|
□ Coverage gate: ≥ 80% overall, 100% on scrubber
|
|||
|
|
□ Schema migration lint: no destructive changes
|
|||
|
|
□ Feature flag TTL audit: no expired flags at 100% rollout
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Document complete. Total estimated test count at V1 launch: ~500 tests. Target by month 3: ~1,000 tests.*
|