Implement review remediation + PLG analytics SDK

- All 6 test architectures patched with Section 11 addendums - P5 (cost) fully rewritten from 232 to ~600 lines - PLG brainstorm + party mode advisory board results - Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe) - Analytics tests v2 (safeParse, no , no timestamp, no PII) - Addresses all Gemini review findings across P1-P6
2026-03-01 01:42:49 +00:00
parent 2fe0ed856e
commit 03bfe931fc
9 changed files with 2950 additions and 85 deletions
--- a/products/04-lightweight-idp/test-architecture/test-architecture.md
+++ b/products/04-lightweight-idp/test-architecture/test-architecture.md
@@ -1107,3 +1107,161 @@ Phase 7: E2E Validation
 ---

 *End of dd0c/portal Test Architecture*
+
+---
+
+## 11. Review Remediation Addendum (Post-Gemini Review)
+
+### 11.1 Resolve Database Misalignment (PostgreSQL vs DynamoDB)
+
+Epic 10.2 specified DynamoDB Single-Table, but the Architecture and Test Architecture are fundamentally built around PostgreSQL (Aurora Serverless v2) with pgvector. 
+**Resolution:** The IDP requires relational joins and vector search. PostgreSQL is the definitive catalog database. DynamoDB references are removed.
+
+```rust
+// tests/schema/migration_validation_test.rs
+
+#[tokio::test]
+async fn elastic_schema_postgres_migration_is_additive_only() {
+    let migrations = read_sql_migrations("./migrations");
+    for migration in migrations {
+        assert!(!migration.contains("DROP COLUMN"), "Destructive schema change detected");
+        assert!(!migration.contains("ALTER COLUMN"), "Type modification detected");
+        assert!(!migration.contains("RENAME COLUMN"), "Column rename detected");
+    }
+}
+
+#[tokio::test]
+async fn migration_does_not_hold_exclusive_locks_on_reads() {
+    // Concurrent index creation tests
+    assert!(migration_contains("CREATE INDEX CONCURRENTLY"), 
+        "Indexes must be created concurrently to avoid locking the catalog");
+}
+```
+
+### 11.2 Invert the Test Pyramid (Integration Honeycomb)
+
+Shift from 70% Unit (with heavy moto/responses mocking) to 30/60/10 with VCR and LocalStack.
+
+```python
+# tests/integration/scanners/test_aws_scanner.py
+
+@pytest.mark.vcr()
+def test_aws_scanner_discovers_ecs_services_and_api_gateways(vcr_cassette):
+    # Uses real recorded AWS API responses, not moto mocks
+    # Validates actual boto3 parsing against real-world AWS shapes
+    scanner = AWSDiscoveryScanner(account_id="123456789012", region="us-east-1")
+    services = scanner.scan()
+    assert len(services) > 0
+    assert any(s.type == "ecs_service" for s in services)
+
+@pytest.mark.vcr()
+def test_github_scanner_handles_graphql_pagination(vcr_cassette):
+    # Validates real GitHub GraphQL paginated responses
+    scanner = GitHubDiscoveryScanner(org_name="dd0c")
+    repos = scanner.scan()
+    assert len(repos) > 100 # Proves pagination logic works
+```
+
+### 11.3 Missing Epic Coverage
+
+#### Epic 3.4: PagerDuty & OpsGenie Integrations
+```python
+# tests/integration/test_pagerduty_sync.py
+
+@pytest.mark.vcr()
+def test_pagerduty_sync_maps_schedules_to_catalog_teams():
+    sync = PagerDutySyncer(api_key="sk-test-key")
+    teams = sync.fetch_oncall_schedules()
+    assert teams[0].oncall_email is not None
+
+def test_pagerduty_credentials_are_encrypted_at_rest():
+    # Verify KMS envelope encryption for 3rd party API keys
+    pass
+```
+
+#### Epic 4.3: Redis Prefix Caching for Cmd+K
+```python
+# tests/integration/test_search_cache.py
+
+def test_cmd_k_search_hits_redis_cache_before_postgres():
+    redis_client.set("search:auth", json.dumps([{"name": "auth-service"}]))
+    # Must return < 5ms from Redis, skipping DB
+    result = search_api.query("auth")
+    assert result[0]['name'] == "auth-service"
+
+def test_catalog_update_invalidates_search_cache():
+    # Create new service
+    catalog_api.create_service("billing-api")
+    # Prefix cache must be purged
+    assert redis_client.keys("search:*") == []
+```
+
+#### Epics 5 & 6: UI and Dashboards (Playwright)
+```typescript
+// tests/e2e/ui/catalog.spec.ts
+
+test('service catalog renders progressive disclosure UI', async ({ page }) => {
+  await page.goto('/catalog');
+  // Click expands details instead of navigating away
+  await page.click('[data-testid="service-row-auth-api"]');
+  await expect(page.locator('[data-testid="service-drawer"]')).toBeVisible();
+});
+
+test('dashboard KPI aggregation shows total services and ownership coverage', async ({ page }) => {
+  await page.goto('/dashboard');
+  await expect(page.locator('[data-testid="kpi-total-services"]')).toHaveText("150");
+  await expect(page.locator('[data-testid="kpi-ownership"]')).toHaveText("85%");
+});
+```
+
+#### Epic 9: Onboarding & Stripe
+```python
+# tests/integration/test_stripe_webhooks.py
+
+def test_stripe_checkout_completed_upgrades_tenant_tier():
+    payload = load_fixture("stripe_checkout_completed.json")
+    signature = generate_stripe_signature(payload, secret)
+    
+    response = api_client.post("/webhooks/stripe", data=payload, headers={"Stripe-Signature": signature})
+    assert response.status_code == 200
+    
+    tenant = db.get_tenant("t-123")
+    assert tenant.tier == "pro"
+
+def test_websocket_streams_discovery_progress_during_onboarding():
+    # Connect WS client, trigger discovery, assert WS receives "discovering AWS...", "found 50 resources..."
+    pass
+```
+
+### 11.4 Scaled Performance Benchmarks
+```python
+# tests/performance/test_discovery_scale.py
+
+def test_discovery_pipeline_handles_10000_aws_resources_without_step_functions_payload_limit():
+    # Simulate an AWS environment with 10k resources
+    # Must chunk state machine transitions to stay under 256KB Step Functions limit
+    pass
+
+def test_discovery_pipeline_handles_1000_github_repos():
+    # Verify GraphQL batching and rate limit backoff
+    pass
+```
+
+### 11.5 Edge Case Resilience
+```python
+def test_github_graphql_concurrent_rate_limiting():
+    # If 5 tenants scan concurrently, respect Retry-After headers across workers
+    pass
+
+def test_partial_discovery_scan_does_not_corrupt_catalog():
+    # If GitHub scan times out halfway, existing services must NOT be marked stale
+    pass
+
+def test_ownership_conflict_resolution():
+    # If two discovery sources claim the same repo, prioritize Explicit (Config) over Implicit (Tags)
+    pass
+
+def test_meilisearch_index_rebuild_does_not_drop_search():
+    # Verify zero-downtime index swapping during mapping updates
+    pass
+```