Skip to content
NewThe State of Airline Retailing 2026 — read the report
AncillaryOffers
Product · 7 min read

Three reasons your merchandising experiments don't converge.

If your A/B tests run for weeks and never reach significance, the problem is rarely traffic. Here are the three most common patterns we see.

NPNaomi Park· CPO· March 31, 2026

We've audited a lot of merchandising experimentation programs. The same three patterns explain almost every 'we ran it for six weeks and it never converged' story.

First: contaminated controls. The control variant is rarely a true control — it's last quarter's variant, plus whatever the merchandising team has nudged in flight. If you can't draw a clear line between control and treatment at any single moment in the experiment window, you don't have an experiment.

Second: target-metric misalignment. Teams set up experiments to lift attach but evaluate them on conversion, or set up to lift conversion but evaluate on per-passenger revenue. The right metric is the one your bonus depends on. Pick that one, declare it before the experiment starts, and stop renegotiating it mid-run.

Third: cohort dilution. The experiment runs across all traffic, but the treatment was designed for a specific cohort — leisure family travelers, say. Across all traffic, the lift is dilute and never reaches significance. Across the target cohort, it would have converged in days.

The fix in each case is unglamorous: clean variant management, pre-registered metrics, and disciplined cohort targeting. Tooling matters less than discipline. But good tooling — version-controlled experiments, pre-registered metric definitions, and cohort-targeting built into the merchandising layer — makes discipline cheaper.

Ready to retail like a real retailer?

A 30-minute walkthrough — your routes, your channels, your metrics.