Why this guide exists
This is the playbook we wish someone had handed us the first time we built this capability. It assumes you know your domain — what we add is structure, sequencing, and the unglamorous details that decide whether a program lands.
How to read this
You can read it linearly. You can also use it as a reference — each section is meant to stand alone. Skip what you've already mastered.
1. The setup
Define the goal in writing. Pick a target metric. Get sign-off from the person whose bonus depends on it. Document the guardrails: margin floors, parity rules, brand constraints.
2. The first experiment
Start with the smallest variant that could plausibly move the needle. Pre-register the metric. Pick a cohort small enough to converge in days, not weeks.
3. The compounding loop
The first experiment matters less than the rate at which you can run the next one. Build for velocity. Most of the lift comes from the cumulative effect of forty small wins, not from finding one big lever.
4. Governance you'll actually use
Version-control every variant. Sunset old experiments aggressively. Keep a running log of what worked, what didn't, and what you'd test differently. Future-you will thank present-you.
5. The metrics review
A weekly merchandising standup is the single highest-leverage operational change most teams can make. Bring the data. Argue about the numbers, not opinions.