In this two-part series, we describe how we scaled a complex airline planning model using cloud-native tools. Specifically, we walk through how we deployed large EC2 instances on Amazon EKS (Elastic Kubernetes Service) and a highly available Redis cluster (via Amazon ElastiCache) to reduce solve times during a large-scale sensitivity analysis of an airline passenger spill and recapture (SR) model. This effort was a collaboration between the Operations Research (OR) and DevOps teams at United Airlines.
Why Airlines Need Scale in Operations Research — And When Cloud Infrastructure Helps
If you're passionate about large-scale optimization and real-world logistics, few industries are more challenging—or more rewarding—than commercial aviation. Airlines are large, complex systems. Beyond flying planes, they orchestrate an intricate ballet of aircraft, personnel, schedules, regulations, customer service, and infrastructure. Each of these areas presents an optimization problem, and collectively they form an interconnected and vast network of operational decisions.
From the outside, airline operations may seem limited to scheduling flights, selling tickets, and checking bags. In reality, airlines manage everything from catering logistics and crew assignments to airframe maintenance, regulatory compliance, customer engagement, and revenue forecasting. These systems are deeply interdependent, and the “glue” that binds them together is operations research (OR).
To navigate this complexity, airlines employ teams of OR specialists who model business problems as mathematical programs. These models drive automation and optimize resource allocation across different branches of the organization. However, the models’ effectiveness often hinges on the ability to solve them at scale—quickly, reliably, and repeatedly.
Consider an airline operating thousands of domestic and international flights per day. Each flight must satisfy regulatory, business, and capacity constraints. The resulting models are massive mixed-integer linear programs (MILPs) with millions of variables and constraints. Even state-of-the-art commercial solvers struggle to produce optimal solutions within practical timeframes, and models are often solved in a piece-wise fashion, and rarely in one-shot.
This is where scale matters. Airline models are frequently run on a daily or intra-day cadence, and operational disruptions like weather events can require rapid re-optimization. While OR professionals employ heuristics and decomposition techniques to speed up computation, there comes a point where better software isn't enough. We need better hardware and infrastructure.
That’s when we turn to cloud computing—to accelerate OR models using distributed processing, scalable memory, and modern parallelization techniques.
What Is the Spill and Recapture (SR) Model?
The Spill and Recapture (SR) model is a key tool in airline revenue management. Its purpose is to minimize the amount of passenger demand that is "spilled"—i.e., lost to other airlines due to seat capacity limitations—and to "recapture" as much of that demand as possible by adjusting aircraft assignments.
In this model, demand is unconstrained—it includes all passengers who would book an itinerary if capacity allowed them to do so, not just those who managed to reserve a seat. The SR model therefore helps planners assess true market demand and guides strategies for maximizing retained revenue.
Conceptual Overview:
Core Constraints:
Typically, airlines publish flight schedules in advance, at which point ticket prices and aircraft assignments are fixed. As departure dates approach, more accurate demand signals emerge. When demand exceeds capacity on certain flight-legs, the SR model simulates potential changes—like reassigning aircraft with more seats—to retain revenue that would otherwise be lost.
Why SR Models Must Scale
The challenge isn’t just in solving the SR model once. To make informed decisions, airlines need to run sensitivity analyses—examining how changes to individual flight-legs or aircraft assignments affect spilled demand and recapture opportunities.
For example, for a major US carrier:
To understand the impact of adjusting seating capacity on each flight-leg, the model must be re-run thousands of times, once for each leg. This creates a combinatorially large space of potential configurations and revenue outcomes.
Hence, the need to scale the SR model horizontally—distributing thousands of model variations across compute nodes for parallel execution. In Part II, we’ll dive into how we built such a system using IBM CPLEX as the underlying optimization engine, Redis as a distributed queue and lightweight data store for maintaining the state of the sensitivity analysis, and Kubernetes to orchestrate scalable workloads across AWS instances.
Closing Thoughts
The SR model offers a powerful lens into how airlines can use optimization not just to improve operations but to directly influence revenue retention. However, like many modern OR applications, it is constrained not by ideas—but by infrastructure. Scalable, cloud-native architectures are essential to bring these models to life at the speed and scale required by real-world airline operations.
Stay tuned for Part II, where we reveal the full architecture behind our high-throughput SR solver system.