Files

Max Mayfield 03bfe931fc Implement review remediation + PLG analytics SDK

- All 6 test architectures patched with Section 11 addendums
- P5 (cost) fully rewritten from 232 to ~600 lines
- PLG brainstorm + party mode advisory board results
- Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe)
- Analytics tests v2 (safeParse, no , no timestamp, no PII)
- Addresses all Gemini review findings across P1-P6

2026-03-01 01:42:49 +00:00

8.8 KiB

Raw Blame History

dd0c Platform — PLG Instrumentation Brainstorm

Session: Carson (Brainstorming Coach) — Cross-Product PLG Analytics Date: March 1, 2026 Scope: All 6 dd0c products

The Problem

We built 6 products with onboarding flows, free tiers, and Stripe billing — but zero product analytics. We can't answer:

How many users hit "aha moment" vs. bounce?
Where in the funnel do free users drop off before upgrading?
Which features drive retention vs. which are ignored?
Are users churning because of alert fatigue, false positives, or just not getting value?
What's our time-to-first-value per product?

Without instrumentation, PLG iteration is guesswork.

Brainstorm: What to Instrument

1. Unified Event Taxonomy

Every dd0c product shares a common event naming convention:

<domain>.<object>.<action>

Examples:
  account.signup.completed
  account.aws.connected
  anomaly.alert.sent
  anomaly.alert.snoozed
  slack.bot.installed
  billing.checkout.started
  billing.upgrade.completed
  feature.flag.evaluated

Rules:

Past tense for completed actions (completed, sent, clicked)
Present tense for state changes (active, learning, paused)
Always include tenant_id, timestamp, product (route/drift/alert/portal/cost/run)
Never include PII — hash emails, account IDs

2. Per-Product Activation Metrics

The "aha moment" is different for each product:

Product	Aha Moment	Metric	Target
dd0c/route	First dollar saved by model routing	`routing.savings.first_dollar`	<24hr from signup
dd0c/drift	First drift detected in real stack	`drift.detection.first_found`	<1hr from agent install
dd0c/alert	First alert correlated (not just forwarded)	`alert.correlation.first_match`	<60sec from first alert
dd0c/portal	First service auto-discovered	`portal.discovery.first_service`	<5min from install
dd0c/cost	First anomaly detected in real account	`cost.anomaly.first_detected`	<24hr from AWS connect
dd0c/run	First runbook executed successfully	`run.execution.first_success`	<10min from setup

3. Conversion Funnel (Universal)

Every product shares this funnel shape:

Signup → Connect (AWS/Slack/Git) → First Value → Habit → Upgrade

Events per stage:

Stage 1: Signup

account.signup.started — landed on signup page
account.signup.completed — account created
account.signup.method — github_sso / google_sso / email

Stage 2: Connect

account.integration.started — began connecting external service
account.integration.completed — connection verified
account.integration.failed — connection failed (include error_type)
Product-specific: account.aws.connected, account.slack.installed, account.git.connected

Stage 3: First Value

Product-specific aha moment event (see table above)
onboarding.wizard.step_completed — which step, how long
onboarding.wizard.abandoned — which step they quit on

Stage 4: Habit

session.daily.active — DAU ping
session.weekly.active — WAU ping
feature.<name>.used — per-feature usage
notification.digest.opened — are they reading digests?
slack.command.used — which slash commands, how often

Stage 5: Upgrade

billing.checkout.started
billing.checkout.completed
billing.checkout.abandoned
billing.plan.changed — upgrade/downgrade
billing.churn.detected — subscription cancelled

4. Feature Usage Events (Per Product)

dd0c/route (LLM Cost Router)

routing.request.processed — model selected, latency, cost
routing.override.manual — user forced a specific model
routing.savings.calculated — weekly savings digest generated
routing.shadow.audit.run — shadow mode comparison completed
dashboard.cost.viewed — opened cost dashboard

dd0c/drift (IaC Drift Detection)

drift.scan.completed — scan finished, drifts found count
drift.remediation.clicked — user clicked "fix drift"
drift.remediation.applied — drift actually fixed
drift.false_positive.marked — user dismissed a drift
drift.agent.heartbeat — agent is alive and scanning

dd0c/alert (Alert Intelligence)

alert.ingested — raw alert received
alert.correlated — alerts grouped into incident
alert.suppressed — duplicate/noise suppressed
alert.escalated — sent to on-call
alert.feedback.helpful / alert.feedback.noise — user feedback
alert.mttr.measured — time from alert to resolution

dd0c/portal (Lightweight IDP)

portal.service.discovered — auto-discovery found a service
portal.service.claimed — team claimed ownership
portal.scorecard.viewed — someone checked service health
portal.scorecard.action_taken — acted on a recommendation
portal.search.performed — searched the catalog

dd0c/cost (AWS Cost Anomaly)

cost.event.ingested — CloudTrail event processed
cost.anomaly.scored — anomaly scoring completed
cost.anomaly.alerted — Slack alert sent
cost.anomaly.snoozed — user snoozed alert
cost.anomaly.expected — user marked as expected
cost.remediation.clicked — user clicked Stop/Terminate
cost.remediation.executed — remediation completed
cost.zombie.detected — idle resource found
cost.digest.sent — daily digest delivered

dd0c/run (Runbook Automation)

run.runbook.created — new runbook authored
run.execution.started — runbook execution began
run.execution.completed — execution finished (include success/failed)
run.execution.approval_requested — human approval needed
run.execution.approval_granted — human approved
run.execution.rolled_back — rollback triggered
run.sandbox.test.run — dry-run in sandbox

5. Health Scoring (Churn Prediction)

Composite health score per tenant, updated daily:

health_score = (
  0.3 * activation_complete +    // did they hit aha moment?
  0.2 * weekly_active_days +     // how many days active this week?
  0.2 * feature_breadth +        // how many features used?
  0.15 * integration_depth +     // how many integrations connected?
  0.15 * feedback_sentiment       // positive vs negative actions
)

Thresholds:

health > 0.7 → Healthy (green)
health 0.4-0.7 → At Risk (yellow) → trigger re-engagement email
health < 0.4 → Churning (red) → trigger founder outreach

6. Analytics Stack Recommendation

PostHog (self-hosted on AWS):

Open source, self-hostable → no vendor lock-in
Free tier: unlimited events self-hosted
Built-in: funnels, retention, feature flags, session replay
Supports custom events via REST API or JS/Python SDK
Can run on a single t3.medium for V1 traffic

Why not Segment/Amplitude/Mixpanel:

Segment: $120/mo minimum, overkill for solo founder
Amplitude: free tier is generous but cloud-only, data leaves your infra
Mixpanel: same cloud-only concern
PostHog self-hosted: $0/mo, data stays in your AWS account, GDPR-friendly

Integration pattern:

Lambda/API → PostHog REST API (async, fire-and-forget)
Next.js UI → PostHog JS SDK (auto-captures pageviews, clicks)
Slack Bot → PostHog Python SDK (command usage, action clicks)

7. Cross-Product Flywheel Metrics

dd0c is a platform — users on one product should discover others:

platform.cross_sell.impression — "Try dd0c/alert" banner shown
platform.cross_sell.clicked — user clicked cross-sell
platform.cross_sell.activated — user activated second product
platform.products.active_count — how many dd0c products per tenant

Flywheel hypothesis: Users who activate 2+ dd0c products have 3x lower churn than single-product users. We need data to prove/disprove this.

Epic 11 Proposal: PLG Instrumentation

Scope

Cross-cutting epic added to all 6 products. Shared analytics SDK, per-product event implementations, funnel dashboards, health scoring.

Stories (Draft)

PostHog Infrastructure — CDK stack for self-hosted PostHog on ECS Fargate
Analytics SDK — Shared TypeScript/Python wrapper with standard event schema
Funnel Dashboard — PostHog dashboard template per product
Activation Tracking — Per-product aha moment detection and logging
Health Scoring Engine — Daily cron that computes tenant health scores
Cross-Sell Instrumentation — Platform-level cross-product discovery events
Churn Alert Pipeline — Health score → Slack alert to founder when tenant goes red

Estimate

~25 story points across all products (shared infrastructure + per-product event wiring)

This brainstorm establishes the "what" and "why." Party Mode advisory board should stress-test: Is PostHog the right choice? Is the event taxonomy too granular? Should health scoring be V1 or V2? Is 25 points realistic?

8.8 KiB Raw Blame History