Skip to content
NewThe State of Airline Retailing 2026 — read the report
AncillaryOffers
Engineering · 14 min read

How we keep p95 offer composition under 200ms.

An engineering deep-dive on the architectural choices that let us compose multi-leg offers in under 200ms — even with personalization in the loop.

RCRafael Costa· CTO· April 9, 2026

When we tell airline architects that p95 offer-composition latency is under 200ms, the most common reaction is polite skepticism. Sub-200ms isn't unheard of — but for offer composition that involves multi-leg journeys, ancillary catalogs, traveler personalization, and pricing inference, it's a different problem class.

This post walks through the architectural commitments that make those numbers possible: a strict latency budget per stage, eager fan-out with deadline-aware cancellation, a feature store designed for the read pattern, and a pricing inference path that's a fixed-cost operation rather than a free lunch.

Latency budget per stage. We start by making latency a first-class metric in the type system. Every stage — content fetch, eligibility, pricing, ranking, rendering — has a published budget. Stages that overrun their budget return a 'best-known' result and emit a tracing event. The pipeline as a whole is latency-bounded by design, not by hope.

Eager fan-out with deadline-aware cancellation. Most of the work in offer composition is independent: fetch ancillary content, fetch loyalty status, score upgrades, evaluate bundle eligibility. We dispatch all of these the moment we have a request, then collect with deadlines. If a slow upstream is going to miss its budget, we cancel it and degrade gracefully — better to ship a slightly less personalized offer than to miss the latency target.

A feature store designed for the read pattern. Most travel feature stores are bolted-on analytical infrastructure that happens to also serve features. We built ours around the offer-composition read pattern — point lookups by traveler-and-context keys, with a hot tier for traffic-shape features and a warm tier for full personalization features. The read budget is single-digit milliseconds.

Pricing inference as a fixed-cost operation. Pricing models are a latency hot-spot. We chose to constrain inference to a fixed-cost operation: gradient-boosted trees and linear models, with a tightly-controlled feature vector. Deep models are restricted to offline scoring with online lookup. The result: pricing inference adds 8-12ms p95.

What we don't do. We don't precompute every possible offer. We don't cache offer objects opaquely — they invalidate too quickly. We don't trust upstream caches we don't own. Each of these is tempting, and each leaks correctness in ways that make ancillary revenue accounting unreliable.

Ready to retail like a real retailer?

A 30-minute walkthrough — your routes, your channels, your metrics.