Files
dd0c/products/02-iac-drift-detection/design-thinking/session.md

354 lines
28 KiB
Markdown
Raw Normal View History

# 🎷 dd0c/drift — Design Thinking Session
**Facilitator:** Maya, Design Thinking Maestro
**Date:** February 28, 2026
**Product:** dd0c/drift — IaC Drift Detection & Auto-Remediation
**Method:** Full Design Thinking (Empathize → Define → Ideate → Prototype → Test → Iterate)
---
> *"Drift is the jazz of infrastructure — unplanned improvisation that sounds beautiful to nobody. Our job isn't to kill the music. It's to give the musicians a score they actually want to follow."* — Maya
---
## Phase 1: EMPATHIZE 🎭
*Before we design a single pixel, we sit in their chairs. We wear their headphones. We feel the 2am vibration of a PagerDuty alert in our bones. Design is about THEM, not us.*
---
### Persona 1: The Infrastructure Engineer — "Ravi"
**Name:** Ravi Krishnamurthy
**Title:** Senior Infrastructure Engineer
**Company:** Mid-stage SaaS startup, 120 employees, Series B
**Experience:** 6 years in infra, 4 years writing Terraform daily
**Tools:** Terraform, AWS, GitHub Actions, Slack, Datadog, PagerDuty
**Stacks managed:** 23 (across dev, staging, prod, and 3 microservice clusters)
#### Empathy Map
**SAYS:**
- "I'll import that resource into state tomorrow." *(Tomorrow never comes.)*
- "Who changed the security group? Was it you? Was it me? I honestly don't remember."
- "Don't touch prod through the console. I'm serious. PLEASE."
- "I need to run `terraform plan` across all stacks before the release, give me... four hours."
- "The state file is the source of truth." *(Said with the conviction of someone who knows it's a lie.)*
**THINKS:**
- "If I run `terraform apply` right now, will it destroy something that someone manually created last month?"
- "I'm mass-producing YAML and HCL for a living. Is this what I went to school for?"
- "There are at least 15 resources in prod that aren't in any state file. I know it. I can feel it."
- "If this next plan shows 40+ changes I didn't expect, I'm calling in sick."
- "I should automate drift checks. I've been saying that for 8 months."
**DOES:**
- Runs `terraform plan` manually before every apply, scanning the output like a bomb technician reading a wire diagram
- Maintains a mental map of "things that have drifted but I haven't fixed yet" — a cognitive debt ledger that never gets paid down
- Writes Slack messages like "PSA: DO NOT modify anything in the `prod-networking` stack via console" every few weeks
- Spends 30% of debugging time figuring out whether the issue is in the code, the state, or reality
- Keeps a personal spreadsheet of "resources created via console during incidents that need to be imported"
**FEELS:**
- **Anxiety** before every `terraform apply` — the gap between what the code says and what exists is a minefield
- **Guilt** about the drift they know exists but haven't fixed — it's technical debt they can see but can't prioritize
- **Frustration** when colleagues make console changes — it feels like someone scribbling in the margins of a book you're trying to keep clean
- **Loneliness** at 2am when the pager goes off and nothing matches the code — you're debugging a ghost
- **Imposter syndrome** when drift causes an incident — "I should have caught this. Why didn't I catch this?"
#### Pain Points
1. **No continuous visibility**`terraform plan` is a flashlight, not a security camera. Drift happens between plans.
2. **State file trust erosion** — Every discovered drift chips away at confidence in IaC as a practice.
3. **Manual reconciliation is soul-crushing** — Running plan across 23 stacks, reading output, triaging changes, fixing one by one. This is a full workday that produces zero new value.
4. **Blast radius fear** — Applying to a drifted stack is Russian roulette. Will it fix the drift or destroy the workaround someone built at 3am last Tuesday?
5. **No attribution** — WHO drifted this? WHEN? WHY? CloudTrail exists but correlating it to Terraform resources requires a PhD in AWS forensics.
6. **Context switching tax** — Drift discovery interrupts real work. You're building a new module and suddenly you're spelunking through state files.
#### Current Workarounds
- **Cron job running `terraform plan`** — Fragile bash script that emails output. Nobody reads the emails. The cron job broke 3 weeks ago. Nobody noticed.
- **"Drift Fridays"** — Dedicated time to reconcile state. Gets cancelled when there's a release. Which is every Friday.
- **Console access restrictions** — Tried to remove console access. Got overruled because "we need it for emergencies." The emergency is now permanent.
- **Mental model** — Ravi literally remembers which stacks are "clean" and which are "dirty." This knowledge lives in his head. When he's on vacation, nobody knows.
#### Jobs To Be Done (JTBD)
1. **When** I'm about to run `terraform apply`, **I want to** know exactly what has drifted since my last apply, **so I can** apply with confidence instead of fear.
2. **When** someone makes a console change to a resource I manage, **I want to** be notified immediately with context (who, what, when), **so I can** decide whether to revert or codify it before it becomes invisible debt.
3. **When** I'm debugging a production issue at 2am, **I want to** instantly see whether the resource in question matches its declared state, **so I can** eliminate "drift" as a variable in 30 seconds instead of 30 minutes.
4. **When** it's audit season, **I want to** generate a report showing all drift events and their resolutions, **so I can** prove our infrastructure matches our code without spending a week on manual reconciliation.
#### Day-in-the-Life Scenario: "The 2am Discovery"
*It's 2:17am. Ravi's phone buzzes. PagerDuty. `CRITICAL: API latency > 5s on prod-api-cluster`.*
*He opens his laptop, eyes half-closed. Checks Datadog. The API pods are healthy. Load balancer looks fine. But wait — the target group health checks are failing for two instances. He checks the security group. It should allow traffic on port 8080 from the ALB.*
*It doesn't.*
*The security group has been modified. Port 8080 is restricted to a specific CIDR that... isn't the ALB's subnet. Someone changed this. When? He doesn't know. Who? He doesn't know. Why? He doesn't know.*
*He opens the Terraform code. The code says port 8080 should be open to the ALB security group. The code is right. Reality is wrong. But he can't just `terraform apply` — what if there are OTHER changes in this stack he doesn't know about? What if applying reverts something else that's keeping prod alive?*
*He runs `terraform plan`. It shows 12 changes. TWELVE. He expected one. He doesn't recognize 8 of them. His stomach drops.*
*He spends the next 47 minutes reading plan output, cross-referencing with CloudTrail (which has 4,000 events in the last 24 hours for this account), trying to figure out which of these 12 changes are safe to apply and which will make things worse.*
*At 3:04am, he manually fixes the security group via the console. Adding more drift to fix drift. The irony isn't lost on him. He makes a mental note to "clean this up tomorrow."*
*Tomorrow, he won't. He'll be too tired. And the drift will compound.*
*He goes back to sleep at 3:22am. The alarm is set for 6:30am. He lies awake until 4.*
---
### Persona 2: The Security/Compliance Lead — "Diana"
**Name:** Diana Okafor
**Title:** Head of Security & Compliance
**Company:** B2B SaaS, 200 employees, SOC 2 Type II certified, pursuing HIPAA
**Experience:** 10 years in security, 3 years in cloud compliance
**Tools:** AWS Config, Prisma Cloud, Jira, Confluence, spreadsheets (so many spreadsheets)
**Responsibility:** Ensuring infrastructure matches approved configurations across 4 AWS accounts
#### Empathy Map
**SAYS:**
- "Can you prove that production matches the approved Terraform configuration? I need evidence for the auditor."
- "When was the last time someone verified there's no drift in the PCI-scoped environment?"
- "I need a change log. Not CloudTrail — I need something a human can read."
- "If we fail this audit, we lose the Acme contract. That's $2.3 million ARR."
- "I don't care HOW you fix the drift. I care that it's fixed, documented, and won't happen again."
**THINKS:**
- "I'm building compliance evidence on a foundation of sand. The Terraform code says one thing. I have no idea if the cloud matches."
- "The engineering team says 'everything is in code.' I've been in this industry long enough to know that's never 100% true."
- "If an auditor asks me to demonstrate real-time drift detection and I show them a cron job... we're done."
- "I'm spending 60% of my time on evidence collection that should be automated."
- "One undetected IAM policy change could be the difference between 'compliant' and 'breach notification.'"
**DOES:**
- Maintains a 200-row spreadsheet mapping compliance controls to infrastructure resources — updated manually, always slightly out of date
- Requests quarterly "drift audits" from the infra team — which take 2 weeks and produce a PDF that's outdated by the time it's delivered
- Reviews CloudTrail logs for unauthorized changes — drowning in noise, looking for signal
- Writes compliance narratives that say "all infrastructure changes are made through version-controlled IaC" while knowing there are exceptions
- Schedules monthly meetings with engineering to review "infrastructure hygiene" — attendance drops every month
**FEELS:**
- **Exposed** — she knows there are gaps between declared and actual state, but can't quantify them
- **Dependent** — she relies entirely on the infra team to tell her whether things have drifted, and they're too busy to check regularly
- **Anxious before audits** — the two weeks before an audit are a scramble to reconcile state, fix drift, and generate evidence
- **Frustrated by tooling** — AWS Config gives her compliance rules but not IaC drift. Prisma Cloud gives her security posture but not Terraform state comparison. Nothing connects the dots.
- **Professionally vulnerable** — if a breach happens because of undetected drift, it's her name on the incident report
#### Pain Points
1. **No continuous compliance evidence** — compliance is proven in snapshots, not streams. Between audits, she's flying blind.
2. **IaC drift is invisible to security tools** — Prisma Cloud sees the current state. It doesn't know what the INTENDED state is. Drift is the gap between intent and reality, and no security tool measures it.
3. **Manual evidence collection** — generating audit evidence requires coordinating with 3 engineering teams, running plans, collecting outputs, formatting reports. It's a part-time job.
4. **Change attribution is archaeological** — figuring out who changed what, when, and whether it was approved requires cross-referencing CloudTrail, git history, Jira tickets, and Slack messages.
5. **Compliance theater** — she suspects some "evidence" is aspirational rather than factual. The narrative says "no manual changes" but she can't verify it.
#### Current Workarounds
- **Quarterly manual audits** — engineering runs `terraform plan` across all stacks, documents drift, fixes it, re-runs. Takes 2 weeks. Results are stale immediately.
- **AWS Config rules** — catches some configuration drift but doesn't compare against Terraform intent. Generates hundreds of findings, most irrelevant.
- **Honor system** — relies on engineers to report console changes. They don't. Not maliciously — they just forget, or they think "I'll fix it in code later."
- **Pre-audit fire drill** — 2 weeks before every audit, the entire infra team drops everything to reconcile state. Productivity crater.
#### Jobs To Be Done (JTBD)
1. **When** an auditor asks for evidence that infrastructure matches declared state, **I want to** generate a real-time compliance report in one click, **so I can** demonstrate continuous compliance instead of point-in-time snapshots.
2. **When** a change is made outside of IaC, **I want to** be alerted immediately with full attribution (who, what, when, from where), **so I can** assess the compliance impact and initiate remediation before it becomes a finding.
3. **When** preparing for SOC 2 / HIPAA audits, **I want to** have an automatically maintained audit trail of all drift events and their resolutions, **so I can** eliminate the 2-week pre-audit scramble.
4. **When** evaluating our security posture, **I want to** see a real-time "drift score" across all environments, **so I can** quantify infrastructure hygiene and track improvement over time.
#### Day-in-the-Life Scenario: "The Audit Scramble"
*It's Monday morning, 14 days before the SOC 2 Type II audit. Diana opens her laptop to 3 Slack messages from the auditor: "Please provide evidence for Control CC6.1 — logical access controls match approved configurations."*
*She messages the infra team lead: "I need a full drift report across all production stacks by Wednesday." The response: "We're in the middle of a release. Can it wait until next week?" It cannot wait until next week.*
*She compromises: "Can you at least run plans on the PCI-scoped stacks?" Two days later, she gets a Confluence page with terraform plan output pasted in. It shows 7 drifted resources. Three are tagged "known — will fix later." Two are "not sure what this is." Two are "probably fine."*
*"Probably fine" is not a compliance posture.*
*She spends Thursday manually cross-referencing the drifted resources with CloudTrail to build an attribution timeline. The CloudTrail console search is painfully slow. She exports to CSV. Opens it in Excel. 47,000 rows for the last 30 days. She filters by resource ID. Finds the change. It was made by a service role, not a human. Which service? She doesn't know. More digging.*
*By Friday, she has a draft evidence document that says "7 drift events detected, 5 remediated, 2 accepted with justification." The justifications are thin. She knows the auditor will push back. She rewrites them three times.*
*The audit passes. Barely. The auditor notes "opportunity for improvement in continuous configuration monitoring." Diana knows that means "fix this before next year or you'll get a finding."*
*She adds "evaluate drift detection tooling" to her Q2 OKRs. It's February. Q2 starts in April. The drift continues.*
---
### Persona 3: The DevOps Team Lead — "Marcus"
**Name:** Marcus Chen
**Title:** DevOps Team Lead
**Company:** E-commerce platform, 400 employees, multi-region AWS deployment
**Experience:** 12 years in ops/DevOps, managing a team of 4 infra engineers
**Tools:** Terraform, AWS (3 accounts), GitHub, Slack, Linear, Datadog, PagerDuty
**Stacks managed (team total):** 67 across dev, staging, prod, and disaster recovery
#### Empathy Map
**SAYS:**
- "How much time did we spend on drift remediation this sprint? I need to report that to leadership."
- "I need visibility across all 67 stacks. Not 'run plan on each one.' A dashboard. One screen."
- "We're spending 30% of our time firefighting things that shouldn't have changed. That's not engineering, that's janitorial work."
- "I can't hire another engineer just to babysit state files. The budget isn't there."
- "If I could show leadership a metric — 'drift incidents per week' trending down — I could justify the tooling investment."
**THINKS:**
- "My team is burning out. Ravi hasn't taken a real vacation in 8 months because he's the only one who knows which stacks are 'safe' to apply."
- "I have 67 stacks and zero visibility into which ones have drifted. I'm managing by hope."
- "If we had drift detection, I could turn reactive firefighting into proactive maintenance. That's the difference between a team that ships and a team that survives."
- "Spacelift would solve this but it's $6,000/year and requires migrating our entire workflow. I can't justify that for drift detection alone."
- "One of my engineers is going to quit if I don't reduce the on-call burden. The 2am pages for drift-related issues are killing morale."
**DOES:**
- Runs weekly "stack health" meetings where each engineer reports on their stacks — this is verbal, tribal knowledge transfer disguised as a meeting
- Maintains a Linear board of "known drift" issues that never gets prioritized over feature work
- Advocates to leadership for drift detection tooling — gets told "can't you just write a script?"
- Manually assigns drift remediation during "quiet sprints" — which don't exist
- Reviews every `terraform apply` in prod personally because he doesn't trust the state
#### Pain Points
1. **No aggregate visibility** — he can't see drift across 67 stacks without asking each engineer to run plans individually. There's no "single pane of glass."
2. **Team capacity drain** — drift remediation is unplanned work that displaces planned work. He can't forecast sprint velocity because drift is unpredictable.
3. **Knowledge silos** — each engineer knows their stacks. When someone is sick or on vacation, their stacks are black boxes. Drift knowledge is tribal.
4. **Can't quantify the problem** — leadership asks "how bad is drift?" and he can't answer with data. He has anecdotes and gut feelings. That doesn't unlock budget.
5. **Tool sprawl fatigue** — his team already uses 8+ tools. Adding another platform (Spacelift, env0) means migration, training, and ongoing maintenance. He wants a tool, not a platform.
6. **On-call burnout** — drift-related incidents inflate on-call burden. His team is on a 4-person rotation. One more quit and it's unsustainable.
#### Current Workarounds
- **Weekly manual checks** — each engineer runs `terraform plan` on their stacks Monday morning. Results shared in Slack. Nobody reads each other's results.
- **"Drift budget"** — allocates 20% of each sprint to "infrastructure hygiene." In practice, it's 5% because feature work always wins.
- **Tribal knowledge** — Marcus keeps a mental model of which stacks are "high risk" for drift. He assigns on-call accordingly. This doesn't scale.
- **Post-incident drift audits** — after every drift-related incident, the team does a full audit of related stacks. Reactive, not proactive.
#### Jobs To Be Done (JTBD)
1. **When** planning sprint capacity, **I want to** see a real-time count of drift events across all stacks, **so I can** allocate remediation time accurately instead of guessing.
2. **When** reporting to engineering leadership, **I want to** show drift metrics trending over time (drift rate, mean time to remediation, stacks affected), **so I can** justify tooling investment with data instead of anecdotes.
3. **When** an engineer is on vacation, **I want to** have automated drift monitoring on their stacks, **so I can** eliminate the "bus factor" of tribal knowledge.
4. **When** evaluating new tools, **I want to** adopt something that integrates with our existing workflow (Terraform + GitHub + Slack) without requiring a platform migration, **so I can** get value in days, not months.
5. **When** a drift event occurs, **I want to** route it to the right engineer automatically based on stack ownership, **so I can** reduce my role as a human router of drift information.
#### Day-in-the-Life Scenario: "The Visibility Gap"
*It's Wednesday standup. Marcus asks the team: "Any drift issues this week?"*
*Ravi: "I found 3 drifted resources in prod-api yesterday. Fixed two, the third is complicated — someone changed the RDS parameter group and I'm not sure if reverting will restart the database."*
*Priya: "I haven't checked my stacks yet this week. I've been heads-down on the Kubernetes migration."*
*Jordan: "My stacks are clean... I think. I ran plan on Monday. But that was before the incident on Tuesday where someone opened a port via console."*
*Sam: "I'm still fixing drift from last month's audit prep. I have 4 stacks with known drift that I haven't gotten to."*
*Marcus writes in his notebook: "3 confirmed drifts (Ravi), unknown (Priya), possibly new drift (Jordan), 4 known unresolved (Sam)." He has 67 stacks. He has visibility into maybe 15 of them. The other 52 are Schrödinger's infrastructure — simultaneously drifted and not drifted until someone opens the box.*
*After standup, his manager pings him: "The VP of Engineering wants to know our 'infrastructure reliability score' for the board deck. Can you get me a number by Friday?"*
*Marcus stares at the message. He has no number. He has a notebook with question marks. He opens a spreadsheet and starts making one up — educated guesses based on the last time each stack was checked. He knows it's fiction. The VP will present it as fact.*
*He adds "evaluate drift detection tools" to his personal to-do list. It's been there for 5 months. It keeps getting bumped by the next fire.*
---
## Phase 2: DEFINE 🔍
*Now we take all that raw empathy — the 2am dread, the audit scramble, the standup question marks — and we distill it into sharp, actionable problem statements. This is where the jazz improvisation becomes a composed melody. We're not solving everything. We're finding the ONE note that resonates across all three personas.*
---
### Point-of-View (POV) Statements
**POV 1 — Ravi (Infrastructure Engineer):**
Ravi, a senior infrastructure engineer managing 23 Terraform stacks, needs a way to continuously know when his infrastructure has diverged from its declared state because the gap between code and reality is an invisible minefield that turns every `terraform apply` into an act of faith and every 2am incident into a forensic investigation against phantom configurations.
**POV 2 — Diana (Security/Compliance Lead):**
Diana, a head of security responsible for SOC 2 compliance across 4 AWS accounts, needs a way to continuously prove that infrastructure matches approved configurations because her current evidence is built on quarterly snapshots and engineer self-reporting — a house of cards that one undetected IAM policy change could collapse, taking a $2.3M customer contract with it.
**POV 3 — Marcus (DevOps Team Lead):**
Marcus, a DevOps lead managing 67 stacks through a team of 4 engineers, needs a way to see aggregate drift status across all stacks in real time because without it, he's managing infrastructure health through tribal knowledge and standup anecdotes — a system that breaks the moment someone takes a vacation, and that produces fiction when leadership asks for a number.
**POV 4 — The Composite (The Organization):**
Engineering organizations that practice Infrastructure as Code need a way to close the loop between declared state and actual state continuously because the current model — periodic manual checks, tribal knowledge, and reactive firefighting — erodes trust in IaC itself, burns out the engineers who maintain it, and creates compliance gaps that compound silently until they become audit failures or security incidents.
---
### Key Insights
**Insight 1: Drift is a trust problem, not a technical problem.**
Every undetected drift event erodes trust — trust in the state file, trust in IaC as a practice, trust in teammates not to make console changes. When trust erodes far enough, teams abandon IaC discipline entirely. "Why bother with Terraform if reality doesn't match anyway?" dd0c/drift doesn't just detect configuration changes. It restores faith in the system.
**Insight 2: The absence of data is the biggest pain point.**
Ravi can't quantify his anxiety. Diana can't prove her compliance. Marcus can't report his team's infrastructure health. All three personas suffer from the same root cause: there is no continuous, automated measurement of the gap between intent and reality. The first product that provides this number — a simple drift score — wins all three personas simultaneously.
**Insight 3: Remediation without context is dangerous.**
"Just revert it" sounds simple until you realize the drift might be a 3am hotfix that's keeping production alive. The product must present drift with CONTEXT — who changed it, when, what else depends on it, and what happens if you revert. One-click remediation is the killer feature, but one-click destruction is the killer bug.
**Insight 4: The tool/platform divide is real and exploitable.**
Every competitor (Spacelift, env0, Terraform Cloud) bundles drift detection inside a platform that requires workflow migration. Our personas don't want to change how they work. They want to ADD visibility to how they already work. The market gap isn't "better drift detection." It's "drift detection that doesn't require you to change anything else."
**Insight 5: Three buyers, one product, three stories.**
- Ravi buys with his credit card because it eliminates his 2am dread. (Bottom-up, individual pain.)
- Diana approves the budget because it generates audit evidence. (Middle-out, compliance justification.)
- Marcus champions it to leadership because it produces metrics. (Top-down, organizational visibility.)
The product is the same. The value proposition changes per persona. This is the dd0c/drift GTM unlock.
**Insight 6: Slack is the control plane, not the dashboard.**
For Ravi, the Slack alert with action buttons IS the product 80% of the time. He doesn't want to open another dashboard. He wants to see "Security group sg-abc123 drifted. [Revert] [Accept] [Snooze]" in the channel he's already in. The web dashboard exists for Diana and Marcus. Ravi lives in Slack.
---
### Core Tension: Automation vs. Control 🎸
*Here's the jazz tension — the dissonance that makes the music interesting:*
**The Automation Pull:**
- Engineers want drift to be fixed automatically. "If someone opens a port to 0.0.0.0/0, just close it. Don't ask me. I'm sleeping."
- Compliance wants continuous enforcement. "Infrastructure should ALWAYS match declared state. Period."
- Leadership wants zero drift. "Why do we have drift at all? Automate it away."
**The Control Pull:**
- Engineers fear auto-remediation will revert a hotfix that's keeping prod alive. "What if the drift is INTENTIONAL?"
- Security wants approval workflows. "We can't auto-revert IAM changes without understanding the blast radius."
- Operations wants change windows. "Don't auto-remediate during peak traffic. That's a recipe for a different kind of outage."
**The Resolution:**
The product must offer a SPECTRUM of automation, not a binary switch. Per-resource-type policies:
- **Auto-revert:** Security groups opened to 0.0.0.0/0. Always. No questions.
- **Alert + one-click:** IAM policy changes. Show me, let me decide, make it easy.
- **Digest only:** Tag drift. Tell me in the morning summary. I'll get to it.
- **Ignore:** Auto-scaling instance count changes. That's not drift, that's the system working.
This spectrum IS the product's sophistication. It's what separates dd0c/drift from a cron job running `terraform plan`.
---
### How Might We (HMW) Questions
**Detection:**
1. **HMW** detect drift in real-time without requiring engineers to manually run `terraform plan`?
2. **HMW** distinguish between intentional changes (hotfixes, emergency responses) and unintentional drift (mistakes, forgotten console changes)?
3. **HMW** detect drift across 67+ stacks without requiring individual stack-by-stack checks?
4. **HMW** attribute drift to a specific person, time, and action without requiring engineers to become CloudTrail forensics experts?
**Remediation:**
5. **HMW** make drift remediation a 10-second action instead of a 30-minute investigation?
6. **HMW** give engineers confidence that reverting drift won't break something else?
7. **HMW** allow teams to define per-resource automation policies (auto-revert vs. alert vs. ignore) without complex configuration?
8. **HMW** offer "accept the drift" as a first-class workflow — updating code to match reality when the change was intentional?
**Visibility & Reporting:**
9. **HMW** give team leads a single number ("drift score") that represents infrastructure health across all stacks?
10. **HMW** generate SOC 2 / HIPAA compliance evidence automatically from drift detection data?
11. **HMW** show drift trends over time so teams can measure whether their IaC discipline is improving or degrading?
12. **HMW** route drift alerts to the right engineer automatically based on stack ownership?
**Adoption & Trust:**
13. **HMW** get an engineer from zero to first drift alert in under 60 seconds?
14. **HMW** build trust with security-conscious teams who won't give a SaaS tool IAM access to their AWS accounts?
15. **HMW** make drift detection feel like a natural extension of existing workflows (Terraform + GitHub + Slack) rather than a new tool to learn?
16. **HMW** make the free tier valuable enough to create habit and word-of-mouth without giving away the business?
---