Products: route, drift, alert, portal, cost, run
Phases: brainstorm, design-thinking, innovation-strategy, party-mode,
product-brief, architecture, epics (incl. Epic 10 TF compliance),
test-architecture (TDD strategy)
Brand strategy and market research included.
24 KiB
24 KiB
dd0c/portal — Brainstorm Session
Product: Lightweight Internal Developer Portal ("The Anti-Backstage")
Facilitator: Carson, Elite Brainstorming Specialist
Date: 2026-02-28
Every idea gets a seat at the table. We filter later. Let's GO.
Phase 1: Problem Space (25 ideas)
Why Does Backstage Suck?
- YAML Cemetery — Backstage requires hand-written
catalog-info.yamlin every repo. Engineers write it once, never update it. Within 6 months your catalog is a graveyard of lies. - Plugin Roulette — Backstage plugins break on every upgrade. The plugin ecosystem is wide but shallow — half-maintained community plugins that rot.
- Dedicated Platform Team Required — You need 1-2 full-time engineers just to keep Backstage running. For a 30-person team, that's 3-7% of your engineering headcount babysitting a developer portal.
- React Monolith From Hell — Backstage is a massive React app you have to fork, customize, build, and deploy yourself. It's not a product, it's a framework. Spotify built it for Spotify.
- Upgrade Treadmill — Backstage releases constantly. Each upgrade risks breaking your custom plugins and templates. Teams fall behind and get stuck on ancient versions.
- Cold Start Problem — Day 1 of Backstage: empty catalog. You have to manually register every service. Nobody does it. The portal launches to crickets.
- No Opinions — Backstage is infinitely configurable, which means it ships with zero useful defaults. You have to decide everything: what metadata to track, what plugins to install, how to organize the catalog.
- Search Is Terrible — Backstage's built-in search is basic. Finding "who owns the payment service" requires navigating a clunky UI tree.
- Authentication Nightmare — Setting up auth (Okta, GitHub, Google) in Backstage requires custom provider configuration that's poorly documented.
- No Auto-Discovery — Backstage doesn't discover anything. It's a static registry that depends entirely on humans keeping it current. Humans don't.
What Do Engineers Actually Need? (The 80/20)
- "Who owns this?" — The #1 question. When something breaks at 3 AM, you need to know who to page. That's it. That's the killer feature.
- "What does this service do?" — A one-paragraph description, its dependencies, and its API docs. Not a 40-page Confluence novel.
- "Is it healthy right now?" — Green/yellow/red. Deployment status. Last deploy time. Current error rate. One glance.
- "Where's the runbook?" — When the service is on fire, where do I go? Link to the runbook, the dashboard, the logs.
- "What depends on this?" — Dependency graph. If I change this service, what breaks?
- "How do I set up my dev environment for this?" — README, setup scripts, required env vars. Onboarding in 10 minutes, not 10 days.
The Pain of NOT Having an IDP
- Tribal Knowledge Monopoly — "Ask Dave, he built that service 3 years ago." Dave left 6 months ago. Now nobody knows.
- Confluence Graveyard — Teams document services in Confluence pages that are 2 years stale. New engineers follow outdated instructions and waste days.
- Slack Archaeology — "I think someone posted the architecture diagram in #platform-eng last March?" Engineers spend hours searching Slack history for institutional knowledge.
- Incident Response Roulette — Alert fires → nobody knows who owns the service → 30-minute delay finding the right person → MTTR doubles.
- Onboarding Black Hole — New engineer joins. Spends first 2 weeks asking "what is this service?" and "who do I talk to about X?" in Slack. Productivity = zero.
- Duplicate Services — Without a catalog, Team A builds a notification service. Team B doesn't know it exists. Team B builds another notification service. Now you have two.
- Zombie Services — Services that nobody owns, nobody uses, but nobody is brave enough to turn off. They accumulate like barnacles, costing money and creating security risk.
- Compliance Panic — Auditor asks "show me all services that handle PII and their owners." Without an IDP, this is a multi-week scavenger hunt.
- Shadow Architecture — The actual architecture diverges from every diagram ever drawn. Nobody has a true picture of what's running in production.
Phase 2: Solution Space (42 ideas)
Auto-Discovery Approaches
- AWS Resource Tagger — Scan AWS accounts via read-only IAM role. Discover EC2, ECS, Lambda, RDS, S3, API Gateway. Map them to services using tags, naming conventions, and CloudFormation stack associations.
- GitHub/GitLab Repo Scanner — Scan org repos. Infer services from repo names,
Dockerfilepresence, CI/CD configs, deployment manifests. Extract README descriptions automatically. - Kubernetes Label Harvester — Connect to K8s clusters. Discover deployments, services, ingresses. Map labels (
app,team,owner) to catalog entries. - Terraform State Reader — Parse Terraform state files (S3 backends). Build infrastructure graph from resource relationships. Know exactly what infra each service uses.
- CI/CD Pipeline Analyzer — Read GitHub Actions / GitLab CI / Jenkins configs. Infer deployment targets, environments, and service relationships from pipeline definitions.
- DNS/Route53 Reverse Map — Scan DNS records to discover all public-facing services and map them back to infrastructure.
- CloudFormation Stack Walker — Parse CF stacks to understand resource groupings and cross-stack references. Build dependency graphs automatically.
- Package.json / go.mod / pom.xml Dependency Inference — Read dependency files to infer internal service-to-service relationships (shared libraries = likely communication).
- Git Blame Ownership — Infer service ownership from git commit history. Who commits most to this repo? That's probably the owner (or at least knows who is).
- PagerDuty/OpsGenie Schedule Import — Pull on-call schedules to auto-populate "who to page" for each service.
- OpenAPI/Swagger Auto-Ingest — Detect and index API specs from repos. Surface them in the portal as live, searchable API documentation.
- Docker Compose Graph — Parse
docker-compose.ymlfiles to understand local development service topologies.
Service Catalog Features
- One-Line Service Card — Every service gets a card: name, owner, health, last deploy, language, repo link. Scannable in 2 seconds.
- Dependency Graph Visualization — Interactive graph showing service-to-service dependencies. Click a node to see details. Highlight blast radius.
- Health Dashboard — Aggregate health from multiple sources (CloudWatch, Datadog, Grafana, custom health endpoints). Show unified red/yellow/green.
- Ownership Registry — Team → services mapping. Click a team, see everything they own. Click a service, see the team and on-call rotation.
- Runbook Linker — Auto-detect runbooks in repos (markdown files in
/runbooks,/docs, or linked in README). Surface them on the service card. - Environment Matrix — Show all environments (dev, staging, prod) for each service. Current version deployed in each. Drift between environments highlighted.
- SLO Tracker — Define SLOs per service. Show current burn rate. Alert when SLO budget is burning too fast. Simple — not a full SLO platform, just visibility.
- Cost Attribution — Pull from dd0c/cost. Show monthly AWS cost per service. "This service costs $847/month." Engineers never see this data today.
- Tech Radar Integration — Tag services with their tech stack. Surface org-wide technology adoption. "We have 47 services on Node 18, 3 still on Node 14."
- README Renderer — Pull and render the repo README directly in the portal. No context switching to GitHub.
- Changelog Feed — Show recent deployments, config changes, and incidents per service. "What happened to this service this week?"
Developer Experience
- Instant Search (Cmd+K) — Algolia-fast search across all services, teams, APIs, runbooks. The portal IS the search bar.
- Slack Bot —
/dd0c who owns payment-service→ instant answer in Slack. No need to open the portal. - CLI Tool —
dd0c portal search "payment"→ results in terminal. For engineers who live in the terminal. - Browser New Tab — dd0c/portal as the browser new tab page. Every time an engineer opens a tab, they see their team's services, recent incidents, and deployment status.
- VS Code Extension — Right-click a service import → "View in dd0c/portal" → opens service card. See ownership and docs without leaving the editor.
- GitHub PR Enrichment — Bot comments on PRs with service context: "This PR affects payment-service (owned by @payments-team, 99.9% SLO, last incident 3 days ago)."
- Mobile-Friendly View — When you're on-call and get paged on your phone, the portal should be usable on mobile. Backstage is not.
- Deep Links — Every service, team, runbook, and API has a stable URL. Paste it in Slack, Jira, anywhere. It just works.
Zero-Config Magic
- Convention Over Configuration — If your repo is named
payment-service, the service is namedpayment-service. If it has aDockerfile, it's a deployable service. If it hasownerin CODEOWNERS, that's the owner. Zero YAML needed. - Smart Defaults — First run: connect AWS account + GitHub org. Portal auto-populates with everything it finds. Engineer reviews and corrects, not creates from scratch.
- Progressive Enhancement — Start with auto-discovered data (maybe 60% accurate). Let teams enrich over time. Never require manual entry as a prerequisite.
- Confidence Scores — Show "we're 85% sure @payments-team owns this" based on git history and AWS tags. Let humans confirm or correct. Learn from corrections.
- Ghost Service Detection — Find AWS resources that don't map to any known repo or team. Surface them as "orphaned infrastructure" — potential zombie services or cost waste.
Scorecard / Maturity Model
- Production Readiness Score — Does this service have: health check? Logging? Alerting? Runbook? Score it 0-100. Gamify production readiness.
- Documentation Coverage — Does the repo have a README? API docs? Architecture decision records? Score it.
- Security Posture — Are dependencies up to date? Any known CVEs? Is the Docker image scanned? Secrets in env vars vs. secrets manager?
- On-Call Readiness — Is there an on-call rotation defined? Is the runbook current? Has the team done a recent incident drill?
- Leaderboard — Team-level maturity scores. Friendly competition. "Platform team is at 92%, payments team is at 67%." Gamification drives adoption.
- Improvement Suggestions — "Your service is missing a health check endpoint. Here's a template for Express/FastAPI/Go." Actionable, not just a score.
dd0c Module Integration
- Alert → Owner Routing (dd0c/alert) — Alert fires → portal knows the owner → alert routes directly to the right person. No more generic #alerts channel.
- Drift Visibility (dd0c/drift) — Service card shows "⚠️ 3 infrastructure drifts detected." Click to see details in dd0c/drift.
- Cost Per Service (dd0c/cost) — Service card shows monthly AWS cost. "This Lambda costs $234/month." Engineers finally see the money.
- Runbook Execution (dd0c/run) — Runbook linked in portal is executable via dd0c/run. "Service is down → click runbook → AI walks you through recovery."
- LLM Cost Per Service (dd0c/route) — If the service uses LLM APIs, show the AI spend. "This service spent $1,200 on GPT-4o last month."
- Unified Incident View — When an incident happens, the portal becomes the war room: service health, owner, runbook, recent changes, cost impact — all on one screen.
Wild Ideas 🔥
- The IDP Is Just a Search Engine — Forget the catalog UI. The entire product is a search bar. Type anything: service name, team name, API endpoint, error message. Get instant answers. Google for your infrastructure.
- AI Agent: "Ask Your Infra" — Natural language queries: "Who owns the service that handles Stripe webhooks?" "What changed in production in the last 24 hours?" "Which services don't have runbooks?" The AI queries the catalog and answers.
- Auto-Generated Architecture Diagrams — From discovered services and dependencies, auto-generate C4 / system context diagrams. Always up-to-date because they're generated from reality, not drawn by hand.
- "New Engineer" Mode — A guided tour for new hires. "Here are the 10 most important services. Here's who owns what. Here's how to set up your dev environment." Onboarding in 1 hour, not 1 week.
- Service DNA — Every service gets a unique fingerprint: its tech stack, dependencies, deployment pattern, cost profile, health history. Use this to find similar services, suggest best practices, detect anomalies.
- Incident Replay — After an incident, the portal shows a timeline: what changed, what broke, who responded, how it was fixed. Auto-generated post-mortem skeleton.
- "What If" Simulator — "What if we deprecate service X?" Show the blast radius: which services depend on it, which teams are affected, estimated migration effort.
- Service Lifecycle Tracker — Track services from creation → active → deprecated → decommissioned. Prevent zombie services by making the lifecycle visible.
- Auto-PR for Missing Metadata — Portal detects a service is missing an owner tag. Automatically opens a PR to the repo adding a
CODEOWNERSfile with a suggested owner based on git history. - Ambient Dashboard (TV Mode) — Full-screen mode for office TVs. Show team services, health status, recent deploys, SLO burn rates. The engineering floor's heartbeat monitor.
- Service Comparison — Side-by-side comparison of two services: tech stack, cost, health, maturity score. Useful for migration planning or standardization.
Phase 3: Differentiation & Moat (18 ideas)
How to Beat Backstage
- Time-to-Value: 5 Minutes vs. 5 Months — Backstage takes months to set up. dd0c/portal takes 5 minutes (connect AWS + GitHub, auto-discover, done). This is the entire pitch. Speed kills.
- Zero Maintenance — Backstage is self-hosted and requires constant upgrades. dd0c/portal is SaaS. We handle upgrades, scaling, and plugin compatibility. Your platform team can go back to building platforms.
- Auto-Discovery vs. Manual Entry — Backstage requires humans to write YAML. dd0c/portal discovers everything automatically. The catalog is always current because it's generated from reality, not maintained by humans.
- Opinionated > Configurable — Backstage gives you a blank canvas. dd0c/portal gives you a finished painting. We make the decisions so you don't have to. Convention over configuration.
- "Backstage Migrator" — One-click import from existing Backstage
catalog-info.yamlfiles. Lower the switching cost to zero. Eat their lunch.
How to Beat Port/Cortex/OpsLevel
- Price: $10/eng vs. $200+/eng — Port, Cortex, and OpsLevel charge enterprise prices ($20K+/year). dd0c/portal is $10/engineer/month with self-serve signup. No sales calls. No procurement process.
- Self-Serve vs. Sales-Led — You can start using dd0c/portal today. Port requires a demo call, a POC, and a 6-week evaluation. By the time their sales cycle completes, you've been using dd0c for 2 months.
- Simplicity as Feature — Port and Cortex have massive feature sets designed for 1000+ engineer orgs. dd0c/portal has 20% of the features for 80% of the value. For a 30-person team, less is more.
- dd0c Platform Integration — Port is a standalone IDP. dd0c/portal is part of a unified platform (cost, alerts, drift, runbooks). The IDP that knows your costs, routes your alerts, and executes your runbooks. Nobody else can do this.
The Moat
- Data Network Effect — The more services discovered, the better the dependency graph, the smarter the ownership inference, the more accurate the health aggregation. Data compounds.
- Platform Lock-In (The Good Kind) — Once dd0c/portal is the browser homepage for every engineer, switching costs are enormous. It's the operating system for your engineering org.
- Cross-Module Flywheel — Portal makes alerts smarter (route to owner). Alerts make portal stickier (engineers open it during incidents). Cost data makes portal indispensable (engineers check service costs). Each module reinforces the others.
- AI-Powered Inference Engine — Over time, dd0c learns patterns across all customers (anonymized): common service architectures, typical ownership structures, standard tech stacks. The AI gets smarter with scale. New customers get better auto-discovery on day 1.
- Community Catalog Templates — Open-source library of service templates (Express API, Lambda function, ECS service). New services created from templates are automatically portal-ready. The community builds the ecosystem.
- "Agent Control Plane" Positioning — As agentic AI grows, AI agents need a source of truth about services. dd0c/portal becomes the registry that AI agents query. "Which service handles payments?" The IDP isn't just for humans anymore — it's for AI agents too.
- Compliance Moat — Once dd0c/portal is the system of record for service ownership and maturity, it becomes compliance infrastructure. SOC 2 auditors love it. Ripping it out means losing your compliance evidence.
- Integration Depth — Build deep integrations with the tools teams already use (GitHub, Slack, PagerDuty, Datadog, AWS). Each integration makes dd0c/portal harder to replace.
- Open-Source Discovery Agent — Open-source the discovery agent (runs in their VPC). Proprietary SaaS dashboard. The OSS agent builds trust and community. The dashboard is the business.
Phase 4: Anti-Ideas & Red Team (14 ideas)
Why Would This Fail?
- "Lightweight" = "Toy" — Teams might dismiss dd0c/portal as too simple. "We need Backstage because we're a serious engineering org." Perception problem: lightweight sounds like it can't scale.
- GitHub Ships a Built-In Catalog — GitHub already has repository topics, CODEOWNERS, and dependency graphs. If they add a "Service Catalog" tab, dd0c/portal's value proposition evaporates overnight.
- Backstage Gets Easy — Roadie (managed Backstage) is improving. If Backstage 2.0 ships with auto-discovery and zero-config setup, the "Anti-Backstage" positioning dies.
- AWS Ships a Good IDP — AWS has Service Catalog, but it's terrible. If they build a real IDP integrated with their ecosystem, they have distribution dd0c can't match.
- Discovery Accuracy Problem — Auto-discovery sounds magical but might be 60% accurate. If engineers open the portal and see wrong data, they'll never trust it again. First impressions are everything.
- Small Teams Don't Need an IDP — A 15-person team might genuinely not need a service catalog. They all sit in the same room. The TAM might be smaller than expected.
- Enterprise Gravity — As teams grow past 100 engineers, they'll "graduate" to Port or Cortex. dd0c/portal might be a stepping stone, not a destination. High churn at the top end.
- Solo Founder Risk — Building an IDP requires integrations with dozens of tools (AWS, GCP, Azure, GitHub, GitLab, Bitbucket, PagerDuty, OpsGenie, Datadog, Grafana...). That's a massive surface area for one person.
- The "Free" Competitor Problem — Backstage is free. Convincing teams to pay $10/eng/month when a free option exists requires the value gap to be enormous and obvious.
- Data Sensitivity — The portal needs read access to AWS accounts and GitHub orgs. Security teams at larger companies will block this. The trust barrier is real.
- Agentic AI Makes IDPs Obsolete — If AI agents can answer "who owns this service?" by reading git history and Slack in real-time, do you need a static catalog at all?
- Platform Engineering Fatigue — Teams are tired of adopting new tools. "We just finished setting up Backstage, we're not switching." Migration fatigue is real.
- The "Good Enough" Spreadsheet — Many teams track services in a Google Sheet or Notion database. It's ugly but it works. Convincing them to pay for a dedicated tool is harder than it sounds.
- Churn from Simplicity — If the product is truly lightweight, there's less surface area for stickiness. Users might churn because they feel they've "outgrown" it.
Phase 5: Synthesis
Top 10 Ideas (Ranked)
| Rank | Idea | Why It Wins |
|---|---|---|
| 1 | 5-Minute Auto-Discovery Setup (#57, #58) | THE differentiator. Connect AWS + GitHub → catalog populated. Zero YAML. This is the entire pitch against Backstage. |
| 2 | Cmd+K Instant Search (#49, #74) | The portal IS the search bar. "Who owns X?" answered in 2 seconds. This is the daily-use hook that makes it the browser homepage. |
| 3 | AI "Ask Your Infra" Agent (#75) | Natural language queries against your service catalog. "What changed in prod today?" This is the 2026 differentiator that no competitor has. |
| 4 | Ownership Registry + PagerDuty Sync (#41, #35) | The #1 use case: who owns this, who's on-call. Auto-populated from PagerDuty/OpsGenie + git history. Solves the 3 AM problem. |
| 5 | dd0c Cross-Module Integration (#68-73, #96) | Alerts route to owners. Costs attributed to services. Runbooks linked and executable. The platform flywheel that standalone IDPs can't match. |
| 6 | Production Readiness Scorecard (#62-67) | Gamified maturity model. Teams compete to improve scores. Drives adoption AND improves engineering practices. Two birds, one stone. |
| 7 | Slack Bot (#50) | /dd0c who owns payment-service — meet engineers where they already are. Reduces friction to zero. Drives organic adoption. |
| 8 | Auto-Generated Dependency Graphs (#39, #76) | Visual blast radius. "If this service goes down, these 12 services are affected." Always current because it's generated from reality. |
| 9 | Backstage Migrator (#89) | One-click import from Backstage YAML. Lowers switching cost to zero. Directly targets the frustrated Backstage user base. |
| 10 | $10/eng Self-Serve Pricing (#90, #91) | No sales calls. No procurement. Credit card signup. This alone disqualifies Port/Cortex/OpsLevel for 80% of the market. |
3 Wild Cards 🃏
| # | Wild Card | Why It's Wild |
|---|---|---|
| 🃏1 | "New Engineer" Guided Onboarding Mode (#77) | Turns the IDP into an onboarding tool. "Welcome to Acme Corp. Here are your 47 services in 5 minutes." HR teams would champion this. Completely different buyer persona. |
| 🃏2 | Agent Control Plane (#99) | Position dd0c/portal as the registry that AI agents query, not just humans. As agentic DevOps explodes in 2026, this could be the defining use case. The IDP becomes infrastructure for AI. |
| 🃏3 | Auto-PR for Missing Metadata (#82) | The portal doesn't just show gaps — it fixes them. Detects missing CODEOWNERS, opens a PR with suggested owners. The catalog improves itself. Self-healing metadata. |
Recommended V1 Scope
Core (Must Ship):
- AWS auto-discovery (EC2, ECS, Lambda, RDS, API Gateway via read-only IAM role)
- GitHub org scan (repos, languages, CODEOWNERS, README)
- Service cards (name, owner, description, repo, health, last deploy)
- Cmd+K instant search
- Team → services ownership mapping
- PagerDuty/OpsGenie on-call schedule import
- Slack bot (
/dd0c who owns X) - Self-serve signup, $10/engineer/month
V1.1 (Fast Follow):
- Production readiness scorecard
- Dependency graph visualization
- Kubernetes discovery
- Backstage YAML importer
- dd0c/alert integration (route alerts to service owners)
- dd0c/cost integration (show cost per service)
V1.2 (Differentiator):
- AI "Ask Your Infra" natural language queries
- Auto-PR for missing metadata
- New engineer onboarding mode
- Terraform state parsing
Explicitly NOT V1:
- Custom plugins/extensions (that's Backstage's trap)
- GCP/Azure support (AWS-first, expand later)
- Software templates / scaffolding (stay focused on catalog)
- Full SLO management (just show basic health)
- Self-hosted option (SaaS only to start)
Total ideas generated: 116
Session complete. The Anti-Backstage has a blueprint. Now go build it. 🔥