dd0c: full product research pipeline - 6 products, 8 phases each

Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
        product-brief, architecture, epics (incl. Epic 10 TF compliance),
        test-architecture (TDD strategy)

Brand strategy and market research included.
This commit is contained in:
2026-02-28 17:35:02 +00:00
commit 5ee95d8b13
51 changed files with 36935 additions and 0 deletions

View File

@@ -0,0 +1,245 @@
# dd0c/portal — Brainstorm Session
**Product:** Lightweight Internal Developer Portal ("The Anti-Backstage")
**Facilitator:** Carson, Elite Brainstorming Specialist
**Date:** 2026-02-28
> *Every idea gets a seat at the table. We filter later. Let's GO.*
---
## Phase 1: Problem Space (25 ideas)
### Why Does Backstage Suck?
1. **YAML Cemetery** — Backstage requires hand-written `catalog-info.yaml` in every repo. Engineers write it once, never update it. Within 6 months your catalog is a graveyard of lies.
2. **Plugin Roulette** — Backstage plugins break on every upgrade. The plugin ecosystem is wide but shallow — half-maintained community plugins that rot.
3. **Dedicated Platform Team Required** — You need 1-2 full-time engineers just to keep Backstage running. For a 30-person team, that's 3-7% of your engineering headcount babysitting a developer portal.
4. **React Monolith From Hell** — Backstage is a massive React app you have to fork, customize, build, and deploy yourself. It's not a product, it's a framework. Spotify built it for Spotify.
5. **Upgrade Treadmill** — Backstage releases constantly. Each upgrade risks breaking your custom plugins and templates. Teams fall behind and get stuck on ancient versions.
6. **Cold Start Problem** — Day 1 of Backstage: empty catalog. You have to manually register every service. Nobody does it. The portal launches to crickets.
7. **No Opinions** — Backstage is infinitely configurable, which means it ships with zero useful defaults. You have to decide everything: what metadata to track, what plugins to install, how to organize the catalog.
8. **Search Is Terrible** — Backstage's built-in search is basic. Finding "who owns the payment service" requires navigating a clunky UI tree.
9. **Authentication Nightmare** — Setting up auth (Okta, GitHub, Google) in Backstage requires custom provider configuration that's poorly documented.
10. **No Auto-Discovery** — Backstage doesn't discover anything. It's a static registry that depends entirely on humans keeping it current. Humans don't.
### What Do Engineers Actually Need? (The 80/20)
11. **"Who owns this?"** — The #1 question. When something breaks at 3 AM, you need to know who to page. That's it. That's the killer feature.
12. **"What does this service do?"** — A one-paragraph description, its dependencies, and its API docs. Not a 40-page Confluence novel.
13. **"Is it healthy right now?"** — Green/yellow/red. Deployment status. Last deploy time. Current error rate. One glance.
14. **"Where's the runbook?"** — When the service is on fire, where do I go? Link to the runbook, the dashboard, the logs.
15. **"What depends on this?"** — Dependency graph. If I change this service, what breaks?
16. **"How do I set up my dev environment for this?"** — README, setup scripts, required env vars. Onboarding in 10 minutes, not 10 days.
### The Pain of NOT Having an IDP
17. **Tribal Knowledge Monopoly** — "Ask Dave, he built that service 3 years ago." Dave left 6 months ago. Now nobody knows.
18. **Confluence Graveyard** — Teams document services in Confluence pages that are 2 years stale. New engineers follow outdated instructions and waste days.
19. **Slack Archaeology** — "I think someone posted the architecture diagram in #platform-eng last March?" Engineers spend hours searching Slack history for institutional knowledge.
20. **Incident Response Roulette** — Alert fires → nobody knows who owns the service → 30-minute delay finding the right person → MTTR doubles.
21. **Onboarding Black Hole** — New engineer joins. Spends first 2 weeks asking "what is this service?" and "who do I talk to about X?" in Slack. Productivity = zero.
22. **Duplicate Services** — Without a catalog, Team A builds a notification service. Team B doesn't know it exists. Team B builds another notification service. Now you have two.
23. **Zombie Services** — Services that nobody owns, nobody uses, but nobody is brave enough to turn off. They accumulate like barnacles, costing money and creating security risk.
24. **Compliance Panic** — Auditor asks "show me all services that handle PII and their owners." Without an IDP, this is a multi-week scavenger hunt.
25. **Shadow Architecture** — The actual architecture diverges from every diagram ever drawn. Nobody has a true picture of what's running in production.
---
## Phase 2: Solution Space (42 ideas)
### Auto-Discovery Approaches
26. **AWS Resource Tagger** — Scan AWS accounts via read-only IAM role. Discover EC2, ECS, Lambda, RDS, S3, API Gateway. Map them to services using tags, naming conventions, and CloudFormation stack associations.
27. **GitHub/GitLab Repo Scanner** — Scan org repos. Infer services from repo names, `Dockerfile` presence, CI/CD configs, deployment manifests. Extract README descriptions automatically.
28. **Kubernetes Label Harvester** — Connect to K8s clusters. Discover deployments, services, ingresses. Map labels (`app`, `team`, `owner`) to catalog entries.
29. **Terraform State Reader** — Parse Terraform state files (S3 backends). Build infrastructure graph from resource relationships. Know exactly what infra each service uses.
30. **CI/CD Pipeline Analyzer** — Read GitHub Actions / GitLab CI / Jenkins configs. Infer deployment targets, environments, and service relationships from pipeline definitions.
31. **DNS/Route53 Reverse Map** — Scan DNS records to discover all public-facing services and map them back to infrastructure.
32. **CloudFormation Stack Walker** — Parse CF stacks to understand resource groupings and cross-stack references. Build dependency graphs automatically.
33. **Package.json / go.mod / pom.xml Dependency Inference** — Read dependency files to infer internal service-to-service relationships (shared libraries = likely communication).
34. **Git Blame Ownership** — Infer service ownership from git commit history. Who commits most to this repo? That's probably the owner (or at least knows who is).
35. **PagerDuty/OpsGenie Schedule Import** — Pull on-call schedules to auto-populate "who to page" for each service.
36. **OpenAPI/Swagger Auto-Ingest** — Detect and index API specs from repos. Surface them in the portal as live, searchable API documentation.
37. **Docker Compose Graph** — Parse `docker-compose.yml` files to understand local development service topologies.
### Service Catalog Features
38. **One-Line Service Card** — Every service gets a card: name, owner, health, last deploy, language, repo link. Scannable in 2 seconds.
39. **Dependency Graph Visualization** — Interactive graph showing service-to-service dependencies. Click a node to see details. Highlight blast radius.
40. **Health Dashboard** — Aggregate health from multiple sources (CloudWatch, Datadog, Grafana, custom health endpoints). Show unified red/yellow/green.
41. **Ownership Registry** — Team → services mapping. Click a team, see everything they own. Click a service, see the team and on-call rotation.
42. **Runbook Linker** — Auto-detect runbooks in repos (markdown files in `/runbooks`, `/docs`, or linked in README). Surface them on the service card.
43. **Environment Matrix** — Show all environments (dev, staging, prod) for each service. Current version deployed in each. Drift between environments highlighted.
44. **SLO Tracker** — Define SLOs per service. Show current burn rate. Alert when SLO budget is burning too fast. Simple — not a full SLO platform, just visibility.
45. **Cost Attribution** — Pull from dd0c/cost. Show monthly AWS cost per service. "This service costs $847/month." Engineers never see this data today.
46. **Tech Radar Integration** — Tag services with their tech stack. Surface org-wide technology adoption. "We have 47 services on Node 18, 3 still on Node 14."
47. **README Renderer** — Pull and render the repo README directly in the portal. No context switching to GitHub.
48. **Changelog Feed** — Show recent deployments, config changes, and incidents per service. "What happened to this service this week?"
### Developer Experience
49. **Instant Search (Cmd+K)** — Algolia-fast search across all services, teams, APIs, runbooks. The portal IS the search bar.
50. **Slack Bot**`/dd0c who owns payment-service` → instant answer in Slack. No need to open the portal.
51. **CLI Tool**`dd0c portal search "payment"` → results in terminal. For engineers who live in the terminal.
52. **Browser New Tab** — dd0c/portal as the browser new tab page. Every time an engineer opens a tab, they see their team's services, recent incidents, and deployment status.
53. **VS Code Extension** — Right-click a service import → "View in dd0c/portal" → opens service card. See ownership and docs without leaving the editor.
54. **GitHub PR Enrichment** — Bot comments on PRs with service context: "This PR affects payment-service (owned by @payments-team, 99.9% SLO, last incident 3 days ago)."
55. **Mobile-Friendly View** — When you're on-call and get paged on your phone, the portal should be usable on mobile. Backstage is not.
56. **Deep Links** — Every service, team, runbook, and API has a stable URL. Paste it in Slack, Jira, anywhere. It just works.
### Zero-Config Magic
57. **Convention Over Configuration** — If your repo is named `payment-service`, the service is named `payment-service`. If it has a `Dockerfile`, it's a deployable service. If it has `owner` in CODEOWNERS, that's the owner. Zero YAML needed.
58. **Smart Defaults** — First run: connect AWS account + GitHub org. Portal auto-populates with everything it finds. Engineer reviews and corrects, not creates from scratch.
59. **Progressive Enhancement** — Start with auto-discovered data (maybe 60% accurate). Let teams enrich over time. Never require manual entry as a prerequisite.
60. **Confidence Scores** — Show "we're 85% sure @payments-team owns this" based on git history and AWS tags. Let humans confirm or correct. Learn from corrections.
61. **Ghost Service Detection** — Find AWS resources that don't map to any known repo or team. Surface them as "orphaned infrastructure" — potential zombie services or cost waste.
### Scorecard / Maturity Model
62. **Production Readiness Score** — Does this service have: health check? Logging? Alerting? Runbook? Score it 0-100. Gamify production readiness.
63. **Documentation Coverage** — Does the repo have a README? API docs? Architecture decision records? Score it.
64. **Security Posture** — Are dependencies up to date? Any known CVEs? Is the Docker image scanned? Secrets in env vars vs. secrets manager?
65. **On-Call Readiness** — Is there an on-call rotation defined? Is the runbook current? Has the team done a recent incident drill?
66. **Leaderboard** — Team-level maturity scores. Friendly competition. "Platform team is at 92%, payments team is at 67%." Gamification drives adoption.
67. **Improvement Suggestions** — "Your service is missing a health check endpoint. Here's a template for Express/FastAPI/Go." Actionable, not just a score.
### dd0c Module Integration
68. **Alert → Owner Routing (dd0c/alert)** — Alert fires → portal knows the owner → alert routes directly to the right person. No more generic #alerts channel.
69. **Drift Visibility (dd0c/drift)** — Service card shows "⚠️ 3 infrastructure drifts detected." Click to see details in dd0c/drift.
70. **Cost Per Service (dd0c/cost)** — Service card shows monthly AWS cost. "This Lambda costs $234/month." Engineers finally see the money.
71. **Runbook Execution (dd0c/run)** — Runbook linked in portal is executable via dd0c/run. "Service is down → click runbook → AI walks you through recovery."
72. **LLM Cost Per Service (dd0c/route)** — If the service uses LLM APIs, show the AI spend. "This service spent $1,200 on GPT-4o last month."
73. **Unified Incident View** — When an incident happens, the portal becomes the war room: service health, owner, runbook, recent changes, cost impact — all on one screen.
### Wild Ideas 🔥
74. **The IDP Is Just a Search Engine** — Forget the catalog UI. The entire product is a search bar. Type anything: service name, team name, API endpoint, error message. Get instant answers. Google for your infrastructure.
75. **AI Agent: "Ask Your Infra"** — Natural language queries: "Who owns the service that handles Stripe webhooks?" "What changed in production in the last 24 hours?" "Which services don't have runbooks?" The AI queries the catalog and answers.
76. **Auto-Generated Architecture Diagrams** — From discovered services and dependencies, auto-generate C4 / system context diagrams. Always up-to-date because they're generated from reality, not drawn by hand.
77. **"New Engineer" Mode** — A guided tour for new hires. "Here are the 10 most important services. Here's who owns what. Here's how to set up your dev environment." Onboarding in 1 hour, not 1 week.
78. **Service DNA** — Every service gets a unique fingerprint: its tech stack, dependencies, deployment pattern, cost profile, health history. Use this to find similar services, suggest best practices, detect anomalies.
79. **Incident Replay** — After an incident, the portal shows a timeline: what changed, what broke, who responded, how it was fixed. Auto-generated post-mortem skeleton.
80. **"What If" Simulator** — "What if we deprecate service X?" Show the blast radius: which services depend on it, which teams are affected, estimated migration effort.
81. **Service Lifecycle Tracker** — Track services from creation → active → deprecated → decommissioned. Prevent zombie services by making the lifecycle visible.
82. **Auto-PR for Missing Metadata** — Portal detects a service is missing an owner tag. Automatically opens a PR to the repo adding a `CODEOWNERS` file with a suggested owner based on git history.
83. **Ambient Dashboard (TV Mode)** — Full-screen mode for office TVs. Show team services, health status, recent deploys, SLO burn rates. The engineering floor's heartbeat monitor.
84. **Service Comparison** — Side-by-side comparison of two services: tech stack, cost, health, maturity score. Useful for migration planning or standardization.
---
## Phase 3: Differentiation & Moat (18 ideas)
### How to Beat Backstage
85. **Time-to-Value: 5 Minutes vs. 5 Months** — Backstage takes months to set up. dd0c/portal takes 5 minutes (connect AWS + GitHub, auto-discover, done). This is the entire pitch. Speed kills.
86. **Zero Maintenance** — Backstage is self-hosted and requires constant upgrades. dd0c/portal is SaaS. We handle upgrades, scaling, and plugin compatibility. Your platform team can go back to building platforms.
87. **Auto-Discovery vs. Manual Entry** — Backstage requires humans to write YAML. dd0c/portal discovers everything automatically. The catalog is always current because it's generated from reality, not maintained by humans.
88. **Opinionated > Configurable** — Backstage gives you a blank canvas. dd0c/portal gives you a finished painting. We make the decisions so you don't have to. Convention over configuration.
89. **"Backstage Migrator"** — One-click import from existing Backstage `catalog-info.yaml` files. Lower the switching cost to zero. Eat their lunch.
### How to Beat Port/Cortex/OpsLevel
90. **Price: $10/eng vs. $200+/eng** — Port, Cortex, and OpsLevel charge enterprise prices ($20K+/year). dd0c/portal is $10/engineer/month with self-serve signup. No sales calls. No procurement process.
91. **Self-Serve vs. Sales-Led** — You can start using dd0c/portal today. Port requires a demo call, a POC, and a 6-week evaluation. By the time their sales cycle completes, you've been using dd0c for 2 months.
92. **Simplicity as Feature** — Port and Cortex have massive feature sets designed for 1000+ engineer orgs. dd0c/portal has 20% of the features for 80% of the value. For a 30-person team, less is more.
93. **dd0c Platform Integration** — Port is a standalone IDP. dd0c/portal is part of a unified platform (cost, alerts, drift, runbooks). The IDP that knows your costs, routes your alerts, and executes your runbooks. Nobody else can do this.
### The Moat
94. **Data Network Effect** — The more services discovered, the better the dependency graph, the smarter the ownership inference, the more accurate the health aggregation. Data compounds.
95. **Platform Lock-In (The Good Kind)** — Once dd0c/portal is the browser homepage for every engineer, switching costs are enormous. It's the operating system for your engineering org.
96. **Cross-Module Flywheel** — Portal makes alerts smarter (route to owner). Alerts make portal stickier (engineers open it during incidents). Cost data makes portal indispensable (engineers check service costs). Each module reinforces the others.
97. **AI-Powered Inference Engine** — Over time, dd0c learns patterns across all customers (anonymized): common service architectures, typical ownership structures, standard tech stacks. The AI gets smarter with scale. New customers get better auto-discovery on day 1.
98. **Community Catalog Templates** — Open-source library of service templates (Express API, Lambda function, ECS service). New services created from templates are automatically portal-ready. The community builds the ecosystem.
99. **"Agent Control Plane" Positioning** — As agentic AI grows, AI agents need a source of truth about services. dd0c/portal becomes the registry that AI agents query. "Which service handles payments?" The IDP isn't just for humans anymore — it's for AI agents too.
100. **Compliance Moat** — Once dd0c/portal is the system of record for service ownership and maturity, it becomes compliance infrastructure. SOC 2 auditors love it. Ripping it out means losing your compliance evidence.
101. **Integration Depth** — Build deep integrations with the tools teams already use (GitHub, Slack, PagerDuty, Datadog, AWS). Each integration makes dd0c/portal harder to replace.
102. **Open-Source Discovery Agent** — Open-source the discovery agent (runs in their VPC). Proprietary SaaS dashboard. The OSS agent builds trust and community. The dashboard is the business.
---
## Phase 4: Anti-Ideas & Red Team (14 ideas)
### Why Would This Fail?
103. **"Lightweight" = "Toy"** — Teams might dismiss dd0c/portal as too simple. "We need Backstage because we're a serious engineering org." Perception problem: lightweight sounds like it can't scale.
104. **GitHub Ships a Built-In Catalog** — GitHub already has repository topics, CODEOWNERS, and dependency graphs. If they add a "Service Catalog" tab, dd0c/portal's value proposition evaporates overnight.
105. **Backstage Gets Easy** — Roadie (managed Backstage) is improving. If Backstage 2.0 ships with auto-discovery and zero-config setup, the "Anti-Backstage" positioning dies.
106. **AWS Ships a Good IDP** — AWS has Service Catalog, but it's terrible. If they build a real IDP integrated with their ecosystem, they have distribution dd0c can't match.
107. **Discovery Accuracy Problem** — Auto-discovery sounds magical but might be 60% accurate. If engineers open the portal and see wrong data, they'll never trust it again. First impressions are everything.
108. **Small Teams Don't Need an IDP** — A 15-person team might genuinely not need a service catalog. They all sit in the same room. The TAM might be smaller than expected.
109. **Enterprise Gravity** — As teams grow past 100 engineers, they'll "graduate" to Port or Cortex. dd0c/portal might be a stepping stone, not a destination. High churn at the top end.
110. **Solo Founder Risk** — Building an IDP requires integrations with dozens of tools (AWS, GCP, Azure, GitHub, GitLab, Bitbucket, PagerDuty, OpsGenie, Datadog, Grafana...). That's a massive surface area for one person.
111. **The "Free" Competitor Problem** — Backstage is free. Convincing teams to pay $10/eng/month when a free option exists requires the value gap to be enormous and obvious.
112. **Data Sensitivity** — The portal needs read access to AWS accounts and GitHub orgs. Security teams at larger companies will block this. The trust barrier is real.
113. **Agentic AI Makes IDPs Obsolete** — If AI agents can answer "who owns this service?" by reading git history and Slack in real-time, do you need a static catalog at all?
114. **Platform Engineering Fatigue** — Teams are tired of adopting new tools. "We just finished setting up Backstage, we're not switching." Migration fatigue is real.
115. **The "Good Enough" Spreadsheet** — Many teams track services in a Google Sheet or Notion database. It's ugly but it works. Convincing them to pay for a dedicated tool is harder than it sounds.
116. **Churn from Simplicity** — If the product is truly lightweight, there's less surface area for stickiness. Users might churn because they feel they've "outgrown" it.
---
## Phase 5: Synthesis
### Top 10 Ideas (Ranked)
| Rank | Idea | Why It Wins |
|------|------|-------------|
| 1 | **5-Minute Auto-Discovery Setup** (#57, #58) | THE differentiator. Connect AWS + GitHub → catalog populated. Zero YAML. This is the entire pitch against Backstage. |
| 2 | **Cmd+K Instant Search** (#49, #74) | The portal IS the search bar. "Who owns X?" answered in 2 seconds. This is the daily-use hook that makes it the browser homepage. |
| 3 | **AI "Ask Your Infra" Agent** (#75) | Natural language queries against your service catalog. "What changed in prod today?" This is the 2026 differentiator that no competitor has. |
| 4 | **Ownership Registry + PagerDuty Sync** (#41, #35) | The #1 use case: who owns this, who's on-call. Auto-populated from PagerDuty/OpsGenie + git history. Solves the 3 AM problem. |
| 5 | **dd0c Cross-Module Integration** (#68-73, #96) | Alerts route to owners. Costs attributed to services. Runbooks linked and executable. The platform flywheel that standalone IDPs can't match. |
| 6 | **Production Readiness Scorecard** (#62-67) | Gamified maturity model. Teams compete to improve scores. Drives adoption AND improves engineering practices. Two birds, one stone. |
| 7 | **Slack Bot** (#50) | `/dd0c who owns payment-service` — meet engineers where they already are. Reduces friction to zero. Drives organic adoption. |
| 8 | **Auto-Generated Dependency Graphs** (#39, #76) | Visual blast radius. "If this service goes down, these 12 services are affected." Always current because it's generated from reality. |
| 9 | **Backstage Migrator** (#89) | One-click import from Backstage YAML. Lowers switching cost to zero. Directly targets the frustrated Backstage user base. |
| 10 | **$10/eng Self-Serve Pricing** (#90, #91) | No sales calls. No procurement. Credit card signup. This alone disqualifies Port/Cortex/OpsLevel for 80% of the market. |
### 3 Wild Cards 🃏
| # | Wild Card | Why It's Wild |
|---|-----------|---------------|
| 🃏1 | **"New Engineer" Guided Onboarding Mode** (#77) | Turns the IDP into an onboarding tool. "Welcome to Acme Corp. Here are your 47 services in 5 minutes." HR teams would champion this. Completely different buyer persona. |
| 🃏2 | **Agent Control Plane** (#99) | Position dd0c/portal as the registry that AI agents query, not just humans. As agentic DevOps explodes in 2026, this could be the defining use case. The IDP becomes infrastructure for AI. |
| 🃏3 | **Auto-PR for Missing Metadata** (#82) | The portal doesn't just show gaps — it fixes them. Detects missing CODEOWNERS, opens a PR with suggested owners. The catalog improves itself. Self-healing metadata. |
### Recommended V1 Scope
**Core (Must Ship):**
- AWS auto-discovery (EC2, ECS, Lambda, RDS, API Gateway via read-only IAM role)
- GitHub org scan (repos, languages, CODEOWNERS, README)
- Service cards (name, owner, description, repo, health, last deploy)
- Cmd+K instant search
- Team → services ownership mapping
- PagerDuty/OpsGenie on-call schedule import
- Slack bot (`/dd0c who owns X`)
- Self-serve signup, $10/engineer/month
**V1.1 (Fast Follow):**
- Production readiness scorecard
- Dependency graph visualization
- Kubernetes discovery
- Backstage YAML importer
- dd0c/alert integration (route alerts to service owners)
- dd0c/cost integration (show cost per service)
**V1.2 (Differentiator):**
- AI "Ask Your Infra" natural language queries
- Auto-PR for missing metadata
- New engineer onboarding mode
- Terraform state parsing
**Explicitly NOT V1:**
- Custom plugins/extensions (that's Backstage's trap)
- GCP/Azure support (AWS-first, expand later)
- Software templates / scaffolding (stay focused on catalog)
- Full SLO management (just show basic health)
- Self-hosted option (SaaS only to start)
---
> **Total ideas generated: 116**
> *Session complete. The Anti-Backstage has a blueprint. Now go build it.* 🔥