Files
dd0c/products/plg-instrumentation-brainstorm.md
Max Mayfield 03bfe931fc Implement review remediation + PLG analytics SDK
- All 6 test architectures patched with Section 11 addendums
- P5 (cost) fully rewritten from 232 to ~600 lines
- PLG brainstorm + party mode advisory board results
- Analytics SDK v2 (PostHog Cloud, Zod strict, Lambda-safe)
- Analytics tests v2 (safeParse, no , no timestamp, no PII)
- Addresses all Gemini review findings across P1-P6
2026-03-01 01:42:49 +00:00

8.8 KiB

dd0c Platform — PLG Instrumentation Brainstorm

Session: Carson (Brainstorming Coach) — Cross-Product PLG Analytics Date: March 1, 2026 Scope: All 6 dd0c products


The Problem

We built 6 products with onboarding flows, free tiers, and Stripe billing — but zero product analytics. We can't answer:

  • How many users hit "aha moment" vs. bounce?
  • Where in the funnel do free users drop off before upgrading?
  • Which features drive retention vs. which are ignored?
  • Are users churning because of alert fatigue, false positives, or just not getting value?
  • What's our time-to-first-value per product?

Without instrumentation, PLG iteration is guesswork.


Brainstorm: What to Instrument

1. Unified Event Taxonomy

Every dd0c product shares a common event naming convention:

<domain>.<object>.<action>

Examples:
  account.signup.completed
  account.aws.connected
  anomaly.alert.sent
  anomaly.alert.snoozed
  slack.bot.installed
  billing.checkout.started
  billing.upgrade.completed
  feature.flag.evaluated

Rules:

  • Past tense for completed actions (completed, sent, clicked)
  • Present tense for state changes (active, learning, paused)
  • Always include tenant_id, timestamp, product (route/drift/alert/portal/cost/run)
  • Never include PII — hash emails, account IDs

2. Per-Product Activation Metrics

The "aha moment" is different for each product:

Product Aha Moment Metric Target
dd0c/route First dollar saved by model routing routing.savings.first_dollar <24hr from signup
dd0c/drift First drift detected in real stack drift.detection.first_found <1hr from agent install
dd0c/alert First alert correlated (not just forwarded) alert.correlation.first_match <60sec from first alert
dd0c/portal First service auto-discovered portal.discovery.first_service <5min from install
dd0c/cost First anomaly detected in real account cost.anomaly.first_detected <24hr from AWS connect
dd0c/run First runbook executed successfully run.execution.first_success <10min from setup

3. Conversion Funnel (Universal)

Every product shares this funnel shape:

Signup → Connect (AWS/Slack/Git) → First Value → Habit → Upgrade

Events per stage:

Stage 1: Signup

  • account.signup.started — landed on signup page
  • account.signup.completed — account created
  • account.signup.method — github_sso / google_sso / email

Stage 2: Connect

  • account.integration.started — began connecting external service
  • account.integration.completed — connection verified
  • account.integration.failed — connection failed (include error_type)
  • Product-specific: account.aws.connected, account.slack.installed, account.git.connected

Stage 3: First Value

  • Product-specific aha moment event (see table above)
  • onboarding.wizard.step_completed — which step, how long
  • onboarding.wizard.abandoned — which step they quit on

Stage 4: Habit

  • session.daily.active — DAU ping
  • session.weekly.active — WAU ping
  • feature.<name>.used — per-feature usage
  • notification.digest.opened — are they reading digests?
  • slack.command.used — which slash commands, how often

Stage 5: Upgrade

  • billing.checkout.started
  • billing.checkout.completed
  • billing.checkout.abandoned
  • billing.plan.changed — upgrade/downgrade
  • billing.churn.detected — subscription cancelled

4. Feature Usage Events (Per Product)

dd0c/route (LLM Cost Router)

  • routing.request.processed — model selected, latency, cost
  • routing.override.manual — user forced a specific model
  • routing.savings.calculated — weekly savings digest generated
  • routing.shadow.audit.run — shadow mode comparison completed
  • dashboard.cost.viewed — opened cost dashboard

dd0c/drift (IaC Drift Detection)

  • drift.scan.completed — scan finished, drifts found count
  • drift.remediation.clicked — user clicked "fix drift"
  • drift.remediation.applied — drift actually fixed
  • drift.false_positive.marked — user dismissed a drift
  • drift.agent.heartbeat — agent is alive and scanning

dd0c/alert (Alert Intelligence)

  • alert.ingested — raw alert received
  • alert.correlated — alerts grouped into incident
  • alert.suppressed — duplicate/noise suppressed
  • alert.escalated — sent to on-call
  • alert.feedback.helpful / alert.feedback.noise — user feedback
  • alert.mttr.measured — time from alert to resolution

dd0c/portal (Lightweight IDP)

  • portal.service.discovered — auto-discovery found a service
  • portal.service.claimed — team claimed ownership
  • portal.scorecard.viewed — someone checked service health
  • portal.scorecard.action_taken — acted on a recommendation
  • portal.search.performed — searched the catalog

dd0c/cost (AWS Cost Anomaly)

  • cost.event.ingested — CloudTrail event processed
  • cost.anomaly.scored — anomaly scoring completed
  • cost.anomaly.alerted — Slack alert sent
  • cost.anomaly.snoozed — user snoozed alert
  • cost.anomaly.expected — user marked as expected
  • cost.remediation.clicked — user clicked Stop/Terminate
  • cost.remediation.executed — remediation completed
  • cost.zombie.detected — idle resource found
  • cost.digest.sent — daily digest delivered

dd0c/run (Runbook Automation)

  • run.runbook.created — new runbook authored
  • run.execution.started — runbook execution began
  • run.execution.completed — execution finished (include success/failed)
  • run.execution.approval_requested — human approval needed
  • run.execution.approval_granted — human approved
  • run.execution.rolled_back — rollback triggered
  • run.sandbox.test.run — dry-run in sandbox

5. Health Scoring (Churn Prediction)

Composite health score per tenant, updated daily:

health_score = (
  0.3 * activation_complete +    // did they hit aha moment?
  0.2 * weekly_active_days +     // how many days active this week?
  0.2 * feature_breadth +        // how many features used?
  0.15 * integration_depth +     // how many integrations connected?
  0.15 * feedback_sentiment       // positive vs negative actions
)

Thresholds:

  • health > 0.7 → Healthy (green)
  • health 0.4-0.7 → At Risk (yellow) → trigger re-engagement email
  • health < 0.4 → Churning (red) → trigger founder outreach

6. Analytics Stack Recommendation

PostHog (self-hosted on AWS):

  • Open source, self-hostable → no vendor lock-in
  • Free tier: unlimited events self-hosted
  • Built-in: funnels, retention, feature flags, session replay
  • Supports custom events via REST API or JS/Python SDK
  • Can run on a single t3.medium for V1 traffic

Why not Segment/Amplitude/Mixpanel:

  • Segment: $120/mo minimum, overkill for solo founder
  • Amplitude: free tier is generous but cloud-only, data leaves your infra
  • Mixpanel: same cloud-only concern
  • PostHog self-hosted: $0/mo, data stays in your AWS account, GDPR-friendly

Integration pattern:

Lambda/API → PostHog REST API (async, fire-and-forget)
Next.js UI → PostHog JS SDK (auto-captures pageviews, clicks)
Slack Bot → PostHog Python SDK (command usage, action clicks)

7. Cross-Product Flywheel Metrics

dd0c is a platform — users on one product should discover others:

  • platform.cross_sell.impression — "Try dd0c/alert" banner shown
  • platform.cross_sell.clicked — user clicked cross-sell
  • platform.cross_sell.activated — user activated second product
  • platform.products.active_count — how many dd0c products per tenant

Flywheel hypothesis: Users who activate 2+ dd0c products have 3x lower churn than single-product users. We need data to prove/disprove this.


Epic 11 Proposal: PLG Instrumentation

Scope

Cross-cutting epic added to all 6 products. Shared analytics SDK, per-product event implementations, funnel dashboards, health scoring.

Stories (Draft)

  1. PostHog Infrastructure — CDK stack for self-hosted PostHog on ECS Fargate
  2. Analytics SDK — Shared TypeScript/Python wrapper with standard event schema
  3. Funnel Dashboard — PostHog dashboard template per product
  4. Activation Tracking — Per-product aha moment detection and logging
  5. Health Scoring Engine — Daily cron that computes tenant health scores
  6. Cross-Sell Instrumentation — Platform-level cross-product discovery events
  7. Churn Alert Pipeline — Health score → Slack alert to founder when tenant goes red

Estimate

~25 story points across all products (shared infrastructure + per-product event wiring)


This brainstorm establishes the "what" and "why." Party Mode advisory board should stress-test: Is PostHog the right choice? Is the event taxonomy too granular? Should health scoring be V1 or V2? Is 25 points realistic?