key: cord-0171224-ivvf8wk0 authors: Garg, Abhinav; Shukla, Naman; Marla, Lavanya; Somanchi, Sriram title: Distribution Shift in Airline Customer Behavior during COVID-19 date: 2021-11-29 journal: nan DOI: nan sha: 34dffee7d71141357ed2281c8955733bf377e13a doc_id: 171224 cord_uid: ivvf8wk0 Traditional AI approaches in customized (personalized) contextual pricing applications assume that the data distribution at the time of online pricing is similar to that observed during training. However, this assumption may be violated in practice because of the dynamic nature of customer buying patterns, particularly due to unanticipated system shocks such as COVID-19. We study the changes in customer behavior for a major airline during the COVID-19 pandemic by framing it as a covariate shift and concept drift detection problem. We identify which customers changed their travel and purchase behavior and the attributes affecting that change using (i) Fast Generalized Subset Scanning and (ii) Causal Forests. In our experiments with simulated and real-world data, we present how these two techniques can be used through qualitative analysis. The novel coronavirus pandemic has had a seismic impact on many industries, including travel. During these unprecedented times, along with industrial operations, customer behavior also changed drastically [11] . Due to this, machine learning systems built on sequential decision making were affected the most. Machine learning applications often implicitly or explicitly assume that data sets are drawn from stationary distributions, and the sudden shift in underlying data makes the model prone to break. Pricing based on context is one such application that is prominently driven by customers' behaviour [14, 12] . The motivation of this work stems from examining the performance of machine learning models deployed to price add-on (ancillary) products for an international airline. To illustrate, the performance of one of the deployed models that predicts the probability of ancillary purchase dropped from 75% during training to 50% during testing despite exhaustive offline analysis with K-fold cross validation as shown in Figure 1 . Since the onset of COVID-19, one of the features used in the model started experiencing distribution change and needed further investigation. Demand forecasting is frequently used for price determination given customers' context [13, 5] . Pricing applications that rely on forecasting need a distribution shift detection as the price that a customer is willing to pay can be affected by unanticipated events, such as COVID-19, that alter the system dynamics and change the data generating process. In this work, we (a) investigate the problem of covariate shift and concept drift for a forecasting model used for dynamic pricing, where the marginal distribution of the covariates, P (X), and the conditional distibution, P (Y |X), between the training and test sets differ [10] , (b) explore two potential techniques for detecting distribution shift for online use-case, specifically, we use Fast Generalized Subset Scan [9] to detect covariate shift in terms of P (X) and Causal Forests [2] to identify concept drift, that is, change in P (Y |X), and (c) discuss the proposal of a robust framework for contextual pricing applications in a real-world setting. In this section, we define how Fast Generalized Subset Scan [9] and Causal Forests [2] can be used for detecting distribution shift due to shocks, such as COVID-19, experienced by a system. Fast Generalized Subset Scan (FGSS) is an unsupervised anomalous pattern detection technique, proposed by McFowland et al. [9] . We use FGSS for covariate shift detection (change in P (X)) with the objective of finding shifted patterns in the test set. Given a set of observations R 1 ...R N and features A 1 ...A M in a test set, under the null hypothesis that there is no anomalous pattern in the test set, we use FGSS to find a subset S * = R * × A * of self-similar groups that are anomalous, where R * ⊆ {R 1 ...R N } and A * ⊆ {A 1 ...A M }, using a scoring function F (S) defining the anomalousness of the subset S. If the null hypothesis is true, then the test set is generated from the same distribution as the training set. Otherwise, the training and test distributions are different with a significantly higher score F (S) for those subset of observations in the test set. where N (S) represents the size of the subset S and N α (S) represents the total number of p-values (obtained by passing the observed values in the test set through the inverse eCDF of the training set) that are significant at level α in S. To efficiently find the subset S * = arg max F (S), we need the function φ(α, N α (S), N (S)) to be monotonically increasing w.r.t. N α , monotonically decreasing w.r.t. N and α, and be convex [9] . We use the Berk-Jones statistic [3] as φ in our experiments which satisfies these properties. Causal forests in a supervised method from Generalized Random Forests [2] that estimates heterogeneity in treatment effects. A treatment effect refers to a causal effect of a treatment or intervention on an outcome variable. Causal forests can be used to estimate the Conditional Average Treatment Effect (CATE). This is useful in identifying the observations for which the treatment is positive and that benefit the most from a treatment; essentially an estimation of optimal policy assignment [1] . CATE cannot be directly observed for a unit because of the "fundamental problem of causal inference" [7] , making it impossible to observe unit-level causal effects and the reason why we can never directly observe the counterfactual condition of a unit of observation. For each observation X ∈ R m where m is the number of covariates, there are two potential outcomes Y 1 and Y 0 corresponding to the binary treatment variable D ∈ {0, 1}, but only one of them is observed. The conditional expectation of an outcome for the treatment or control, µ d (x) is defined as: and CATE (τ (x)) is the difference in expectation of the potential outcomes given x, We use causal forests to estimate the causal effect of COVID-19 intervention and identify the concept drift P (Y |X) in the system. This requires some data observed post-intervention to be used for training so as to learn the unit-level interventional change in treatment. Ancillaries are optional products or services sold by businesses to complement their primary product [4] . In this work, we utilized the following datasets from the airline industry for an ancillary market: (1) simulated interaction of ancillary pricing, and (2) real-world ancillary booking requests data from a large airline containing price variability. To test our approach, we generated a simulated dataset of customer and flight seat (ancillary) interactions using open-sourced flight simulator 1 . In this dataset, we artificially varied the arrival rates of the customers in train and test data. The details of the data-generation process can be found in Appendix A.1. Figure 2 significance α) and similarly negative shift represent the customers that are more likely to purchase in the control period, (P 1 (Y |X = x i ) < P 0 (Y |X = x i ) with significance α), while no shift represent the customers whose probability of purchase in the train and test sets are similar (P 1 (Y |X = x i ) ≈ P 0 (Y |X = x i )). (c) Density plot of customers identified as shifted (P 1 (X = x i ) = P 0 (X = x i ) with significance α) and not shifted (P 1 (X = x i ) ≈ P 0 (X = x i ) with significance α) by FGSS in the test set. FGSS only tells whether an observation has a covariate shift or not, and says nothing about the concept drift. The real-world dataset consists of customer booking requests from March 2019 to September 2020. We use this dataset to evaluate the performance of FGSS and Causal Forests in detecting distribution shift due to COVID-19 in real-world setting. The airline identifies the time period starting March 2020 as the "COVID" era because a large portion of scheduled flights started getting canceled/rescheduled and observed decreased ticket sales. We use five features in our experiments -AdvancedPurchase, LengthOfStay, GroupSize, TotalDuration and TripType. The description of the features can be found in Appendix: Table 1 . Figure 3 and 4 show the results obtained on the hold-out test set. Both Causal Forests and FGSS results indicate that there is significant shift in all of the features. LengthOfStay feature has shift towards value 0 (indefinite stay at destination), indicating most customers are not opting for vacation/business travel but essential movement only. The same conclusion can be drawn from the shift observed in TripType feature with customers preferring to travel one-way. GroupSize feature has a shift towards value 1, indicating customers are traveling alone and not in large groups during the pandemic. AdvancedPurchase has a shift towards smaller values indicating customers are not booking tickets way ahead of the travel date due to the uncertain nature of the pandemic. TotalDuration has a shift towards smaller values as well, indicating the operational changes made by the airline to operate on shorter routes during the pandemic. We explored two techniques, Fast Generalized Subset Scan and Causal Forests, for covariate shift detection when a system experiences a shock. We applied these techniques to an airline ancillary purchase use case and saw that some of the features used for ancillary pricing experienced a covariate shift while the probability of purchase experienced a concept drift during COVID-19. We identified the observations in the test set that follow a significantly different distribution compared to the training set. Causal forests, while able to detect a concept drift in real-time, need a significant amount of data post-intervention for training. On the other hand, FGSS can only detect covariate shift on a batch of data but does not need data post-intervention for training. Hence, there is a possibility to combine the two approaches for a unified and robust distribution shift detection. At the same time, the construction of a good testing pipeline, along with an agreement by multiple models as well as a visual inspection, can jointly help identify shift patterns. These patterns can then be cross-checked with designers' hypothesis and domain experts' knowledge for validation and potential changes to the system before being corrected for enhanced model performance. In the future, we aim to extend this work to other applications, test multiple shift correction approaches and provide recommendations for adapting to a shift induced by sudden shocks to the system (see Appendix A.2 for details). The simulated dataset consists of interactions for selecting flight seat after the ticket is purchased. Customer arrivals are simulated using non-homogeneous poisson process [8] . Multinomial logit is used for customer choice model in the simulator [6] . The offered prices are randomly sampled from minimum to maximum allowed price for that flight. The intervention is artificially created by changing the customer arrival rate. We have simulated 2 datasets with different parameters of the poisson process for 10 long-distance flights as train and test ( Fig. 2(a) ). We use the arrival rate (AdvancedPurchase) and number of seats sold (Sold) as features, with AdvancedPurchase feature having an apparent covariate shift. For Causal Forests, we leak 40% points randomly sampled from the test set into training. Categorical Indicates whether the customer has booked "One-Way" or "Round-Trip" Shukla et al. [12] proposed a two-stage pricing model that uses the purchase probability prediction for price recommendation. Figure 5 shows the proposed pricing framework with shift detection before the supervised model prediction. The shift detection layer first checks if there is a shift in the system or not. Causal Forests being a supervised technique can be used to detect shift "on-line". However, Causal Forests require data observed after intervention for training and identification of the start a shift induced by an intervention to the system is usually a difficult task. We propose to take advantage of the unsupervised nature of FGSS to achieve that. FGSS can learn the normal behavior from historical data and run in batch mode to detect anomalous patterns during testing. If the percentage of observed anomalous patterns breach a user-defined threshold for a significant period of time, Causal Forests can use that data as treatment for training. Once the shift has been detected in "real-time", it can be handled in multiple ways: a single model trained after reweighting shifted points, multiple models -one for each shifted pattern, or heuristics based on domain knowledge to better price the product. We aim to perform further experiments in future to test the feasibility of these 3 approaches and propose an efficient strategy to handle covariate shift for decision-making under uncertainty. Machine learning methods that economists should know about Generalized random forests. The Annals of Statistics Goodness-of-fit test statistics that dominate the kolmogorov statistics Incorporating ancillary services in airline passenger choice models Analytics for an online retailer: Demand forecasting and price optimization Specification tests for the multinomial logit model Statistics and causal inference Simulation of nonhomogeneous poisson processes by thinning Fast generalized subset scan for anomalous pattern detection Dataset Shift in Machine Learning Impact of covid-19 on consumer behavior: Will the old habits return or die Dynamic pricing for airline ancillaries with customer context The theory and practice of revenue management Customized regression model for airbnb dynamic pricing We sincerely and gratefully acknowledge our airline partners for their continuing support. The academic partners are also thankful to deepair (www.deepair.io) for funding this research.