Control Chart & KPI Diagnostic Guide
How to separate signal from noise, catch problems early, and diagnose root causes correctly instead of guessing.
The Core Problem
ROAS drops from 50% to 44%. Is this a crisis or just noise? Without control charts, you're guessing. Teams either panic over random variance or miss real problems until they've cost thousands. Control charts tell you exactly when to act.
What is a Control Chart?
A control chart separates normal day-to-day variance (noise) from actual structural changes (signal). You calculate a baseline, set limits at Β±2 standard deviations, and watch for points that break through.
Points inside the limits = normal. Points outside = investigate immediately. This removes gut-feel decisions and gives you an objective trigger for action.
Leading vs. Lagging Indicators
You need both types of metrics. Leading indicators catch problems fast. Lagging indicators confirm severity.
π Leading Indicators
D1 Retention or D1 ROAS
If a bad update ships, these break first. Your smoke detector.
π Lagging Indicators
D7 ROAS (source of truth)
Confirms profitability impact. The data point for "today" is actually the cohort from 7 days ago.
β οΈ Critical Timing Note
A product change today won't show in D7 ROAS until next week. If you only watch D7, you're reacting to problems that started 7 days ago. Always monitor D1 metrics daily.
The KPI Decomposition Tree
When ROAS drops, you need to find which input changed. Never act on a top-level metric without decomposing it first.
ROAS = Revenue / Spend. If ROAS drops, either revenue went down, spend went up, or both. Decompose first, then drill into the branch that explains the change.
The 7-Step Diagnostic
When a control chart signals a problem, follow this exact sequence. Don't skip steps.
Confirm the Break
Is it real? Check for data gaps, one-time events, or day-of-week effects. If signal persists 2-3 days, proceed.
Timestamp It
Identify the exact date the break began. "D1 Retention dropped from 42% to 34% starting Aug 30."
Decompose
Calculate which component drove the change. Revenue down 24%, spend stable? Revenue explains 100% of the decline.
Correlate with Events
Check: Product releases near the date? Channel mix shift? Creative changes? 0-day lag = strongest signal.
Drill Down
Follow the decomposition to its end. Revenue down β Payers down β Conversion down β Why?
Form Hypothesis
Good: "v1.8.4 physics changes reduced D7 conversion by 15%." Bad: "ROAS is down, let's try another network."
Remediate Root Cause
Product issue β escalate to product. Creative fatigue β refresh assets. Market pressure β adjust bids.
Case Study: The Wrong Diagnosis
π Slam Clash (Aug-Nov 2024)
The smoking gun: Zero-day lag between release and break. Channel didn't change. Product broke.
β What went wrong
Without decomposing or checking version correlation, the team blamed the channel. The secondary network showed the same bad ROASβbecause the product was broken, not the channel. Misdiagnosis delayed the fix by months.
Quick Reference
β DO
- Monitor D1 metrics daily
- Decompose before acting
- Timestamp breaks precisely
- Check product releases first
- Use 7-day rolling averages
- Wait 2-3 days to confirm signal
β DON'T
- React to single-day variance
- Change channels to fix product
- Skip the decomposition step
- Wait for D7 to act on D1 signals
- Assume channel is the problem
- Ignore version release dates
Getting Started
- Set up two control charts β D1 Retention (leading) and D7 ROAS (lagging)
- Establish your baseline β Use 4-8 weeks of stable data to calculate mean and Β±2Ο limits
- Create a version log β Track all product releases with dates
- Check daily β Review D1 metrics every morning. If outside limits, start the diagnostic.
- Document findings β Build institutional knowledge of what breaks what
The time you invest in this system pays off massively. You'll catch problems in 24 hours instead of weeks, and fix the right thing instead of chasing ghosts.
