Three reasons your merchandising experiments don't converge.

We've audited a lot of merchandising experimentation programs. The same three patterns explain almost every 'we ran it for six weeks and it never converged' story.

First: contaminated controls. The control variant is rarely a true control — it's last quarter's variant, plus whatever the merchandising team has nudged in flight. If you can't draw a clear line between control and treatment at any single moment in the experiment window, you don't have an experiment.

Second: target-metric misalignment. Teams set up experiments to lift attach but evaluate them on conversion, or set up to lift conversion but evaluate on per-passenger revenue. The right metric is the one your bonus depends on. Pick that one, declare it before the experiment starts, and stop renegotiating it mid-run.

Third: cohort dilution. The experiment runs across all traffic, but the treatment was designed for a specific cohort — leisure family travelers, say. Across all traffic, the lift is dilute and never reaches significance. Across the target cohort, it would have converged in days.

The fix in each case is unglamorous: clean variant management, pre-registered metrics, and disciplined cohort targeting. Tooling matters less than discipline. But good tooling — version-controlled experiments, pre-registered metric definitions, and cohort-targeting built into the merchandising layer — makes discipline cheaper.

More from the blog

Industry

Ready to retail like a real retailer?

A 30-minute walkthrough — your routes, your channels, your metrics.

Book a demo

Three reasons your merchandising experiments don't converge.

More from the blog

Why fare families are dying — and what comes next.

How we keep p95 offer composition under 200ms.

Inside Skyline Air's 27% bundle attach lift.

Ready to retail like a real retailer?