When we tell airline architects that p95 offer-composition latency is under 200ms, the most common reaction is polite skepticism. Sub-200ms isn't unheard of — but for offer composition that involves multi-leg journeys, ancillary catalogs, traveler personalization, and pricing inference, it's a different problem class.
This post walks through the architectural commitments that make those numbers possible: a strict latency budget per stage, eager fan-out with deadline-aware cancellation, a feature store designed for the read pattern, and a pricing inference path that's a fixed-cost operation rather than a free lunch.
Latency budget per stage. We start by making latency a first-class metric in the type system. Every stage — content fetch, eligibility, pricing, ranking, rendering — has a published budget. Stages that overrun their budget return a 'best-known' result and emit a tracing event. The pipeline as a whole is latency-bounded by design, not by hope.
Eager fan-out with deadline-aware cancellation. Most of the work in offer composition is independent: fetch ancillary content, fetch loyalty status, score upgrades, evaluate bundle eligibility. We dispatch all of these the moment we have a request, then collect with deadlines. If a slow upstream is going to miss its budget, we cancel it and degrade gracefully — better to ship a slightly less personalized offer than to miss the latency target.
A feature store designed for the read pattern. Most travel feature stores are bolted-on analytical infrastructure that happens to also serve features. We built ours around the offer-composition read pattern — point lookups by traveler-and-context keys, with a hot tier for traffic-shape features and a warm tier for full personalization features. The read budget is single-digit milliseconds.
Pricing inference as a fixed-cost operation. Pricing models are a latency hot-spot. We chose to constrain inference to a fixed-cost operation: gradient-boosted trees and linear models, with a tightly-controlled feature vector. Deep models are restricted to offline scoring with online lookup. The result: pricing inference adds 8-12ms p95.
What we don't do. We don't precompute every possible offer. We don't cache offer objects opaquely — they invalidate too quickly. We don't trust upstream caches we don't own. Each of these is tempting, and each leaks correctness in ways that make ancillary revenue accounting unreliable.