Estimating Customer Lifetime Value Using Machine Learning Techniques

With the rapid development of civil aviation industry, high-quality customer resources have become a significant way to measure the competitiveness of the civil aviation industry. It is well known that the competition for high-value customers has become the core of airline profits. The research of airline customer lifetime value can help airlines identify high-value, medium-value and low-value travellers. What is more, the airline company can make resource allocation more rational, with the least resource investment for maximum profit return. However, the models that are used to calculate the value of customer life value remain controversial, and how to design a model that applies to airline company still needs to be explored. In the paper, the author proposed the optimised China Eastern Airlines passenger network value assessment model and examined its fitting degree with the TravelSky value score. Besides, the author combines China Eastern Airlines passenger network value assessment model score with loss model score to help airlines find their significant customers.


Introduction
In the context of customer relationship management, customer lifetime value (CLV) or customer equity (CE) becomes important because it is a disaggregate metric to evaluate marketing decisions [1], which can be utilised to allocate resources appropriately and identify profitable consumers [2].Companies are looking forward to better approaches to create value and optimise their market offerings to appeal to customers and make profits [3].Many firms are utilising CLV regularly to control and supervise the strategies of marketing as well as evaluate the business success.For companies, it is of interest to know how much net benefit it can expect from their customers.It is recognised that clv has become a significant component of companies' central strategy [4,5].CLV of customers at present and in the future can be a good proxy of the general corporate value [6].Meanwhile, at each point in each customer's lifetime with the firm, the firm would like to form some expectation regarding the lifetime value of that customer.

Definition of CLV
Customer valuation has been discussed by several papers in the customer relationship management literature, for example, Dwyer [7], Berger and Nasr [8], Rust et al. [9] and Blattberg and Malthouse [10].
Dwyer [7] and Berger and Nasr [8] firstly provided a framework using the lifetime value of a customer.Then Gupta and his colleagues [6] found that the earnings of a company, and hence its value, are a function of the total customer lifetime value (CLV), defined as the discounted value of the future profits yielded by customers to the company, in other words, the value of a customer as the expected sum of discounted future earnings, where a customer generates a profit margin for each period.Moreover, a customer lifetime value (CLV) stands for the expected benefits' current value [7] and the equity of customer approaches to marketing [11,12].And CLV plays a major role in the marketing of the relationship [13].The relationship with customers in the relationship marketing can be considered as the capital assets that require proper management [14].

Related work
In measuring customer lifetime value, a standard approach is to estimate the present value of the net benefit to the firm from the customer over time [1].Researchers have suggested various methods to use customer-level data to measure the CLV [8,9,[15][16][17].However, the relationship between customer purchase behaviour and customer lifetime is not specific [15][16][17][18][19], if firms observed the customer defections, and longer customer lifetime implies higher customer lifetime value [20][21][22].Different models for measuring CLV are different at estimates of the expectations of future customer purchase behaviour.

CLV model
CLV is typically defined and estimated at an individual customer or segment level.This allows us to differentiate between customers who are more profitable than others rather than merely examining average profitability.The issue is to predict the future profits when the timing and the benefit of future transactions are unknown as discussed in Mulhern [23] and Bell et al. [24].It is proposed by Gupta and other scholars [25] that CLV for a customer is [6,19]: It is proposed by Gupta and other scholars [18] that CLV for a customer is [19,36]: where: = price paid by a consumer at time t.
= direct cost of servicing the customer at time t.
= discount rate or cost of capital for the firm.
= probability of customer repeat buying or being 'alive' at time t.AC = acquisition cost.
T = time horizon for estimating CLV.
Another review of CLV model sees Jain and Singh [26].Linear regression with the variance that stabilises the transformation forecasted with the ordinary least square is the first approach.
Selecting a stable variance transformation can be informed by residual plots [27].As shown by Neter et al. [28], the linear regression forecasted with iteratively reweighted least square is the second approach of regression.IRLS is another means to solve the heteroscedasticity issue.

RFM model
For the sake of simplicity, the only predictor variables in these models are the recency, frequency and monetary (RFM) type, Buckinx and Van den Poel [29], and the variables of RFM are sound predictors for CLV [15,16].
The models of RFM have been utilised in direct marketing for three decades developed to target marketing programmes at specific customers with the objective to improve response rates.Studies show that customers' response rates vary the most by their recency, followed by their purchase frequency and monetary value [30].Before these models, companies typically used demographic profiles of customers for targeting purposes.However, research strongly suggests that past purchases of consumers are better predictors of their future purchase behaviour than demographics.
They have many restrictions though RFM, or other models of scoring try to forecast customers' behaviour in the future and are therefore associated with CLV implicitly [15,16,31].Firstly, in the next periods, the behaviour can be predicted by the models.However, to estimate CLV, we need to estimate customers' purchase behaviour not only in Period 2 but also in Periods 3, 4, 5 and so on.Secondly, RFM variables are real underlying behaviour's imperfect index stemmed from a real distribution.The models of RFM have neglected this part.Thirdly, the previous behaviour of customers can be an outcome of the company's previous marketing promotion, which has been ignored by the models.In spite of the restrictions, due to the implementation in real practice, the models of RFM are the core of the industry.
One fundamental limitation of RFM models is that they are scoring models and do not explicitly provide a number for customer value.However, RFM is essential past purchase variables that should be good predictors of future purchase behaviour of customers.Fader et al. [15,16] showed how RFM variables could be used to build a CLV model that overcomes many of its limitations.

NBD-Pareto model
A popular method is the negative binomial distribution (NBD)-Pareto model introduced by Schmittlein et al. [32], which is referred by several authors [23,26,33] as a powerful technique to provide the situation where past customer purchase behaviour is used to predict the future probability of a customer remaining in business with the firm.
To forecast the CLV and integrate the transaction profits, some adoptions are conducted as the model of NBD-Pareto estimates the activity probability and the transaction number of a customer.Made by the NBD-Pareto for the forecast, an essential assumption refers to the independence between the relevant profit for every transaction and the transaction number of a customer.According to the prediction of a majority of papers, a two-step scheme to CLV modelling is being utilised by CLV [16,17,34].Firstly, the transaction number of every person in the future will be forecasted.Subsequently, the mean profit for every transaction can be forecasted.At the level of customers, the values can be predicted.It generates a CLV approximation for every customer if the future transaction number and the mean profit for every transaction can be concluded.
In Fader and Hardie [15,16], the maximum likelihood estimation (MLE) for an individual with purchase history is shown to describe the NBD-Pareto submodel.Utilising the approach of moments is an alternative to the MLE.However, similar results can be generated [19].A person can forecast the transaction number that will be made by a customer in the future or predict the possibility for him or her to be alive when the parameters can be forecasted.As discussed by Schmittlein and Peterson [17], in the situation where customer lifetimes are observed, the NBD-Pareto model has limitations and is not suitable.
Another approach that can naturally incorporate past behavioural outcomes into future expectations is a Bayesian approach [35].Bayesian approaches could integrate the previous data and information into the model's structure via the prior distribution of the CLV drivers.

Computer science models
The vast computer science literature in data mining, machine learning and nonparametric statistics has generated many approaches that emphasise predictive ability.These include projectionpursuit models; neural network models [36]; decision tree models; spline-based models such as generalised additive models, multivariate adaptive regression splines and classification and regression trees; and support vector machines.Lots of the methods might be more applicable to the research on the value in customers' lifetime.
In a recent study, Cui and Curry [37] conducted extensive Monte Carlo simulations to compare predictions based on multinomial logit model.Besides, Giuffrida et al. [38] reported that a multivariate decision tree induction algorithm outperformed a logit model in identifying the best customer targets for cross-selling purposes.
Due to the high focus that academics in marketing emphasise on interpretability and a parametric setup, these approaches remain little known in the marketing literature.However, given the importance of prediction in CLV, these methods need a closer look at the future.

RFMc model
The meaning of individual passenger value is calculating the traveller's particular value for the airline company based on the passenger's consumption data.It also refers to the passengers' profit contribution to the airline company.
Based on the characteristics of civil aviation, the fare discounts corresponding to class C are introduced to represent the level of value which passenger's consumption contributes to airlines.The RFMc model is proposed to calculate the civil aviation passengers' individual value, where R is the closeness coefficient of flight time, F is the total number of flights in a period of time and Mc is the passengers' relative total amount of flights calculated with the class of flight.
(1) Mc: the passengers' relative total amount of flights.
Calculate the total amount of relative consumer consumption Mc based on the fare weight of class c (corresponding fare discount); see formula (2): In the formula (2), c i represents the fare discount on the traveller's ith flight, m i is the fare of the traveller's ith plane, and k is the number of tickets purchased.
(2) R: the approximate coefficient of flight time.
The latest flight time t: the interval between the last flight time and the current time (the time when using the model to calculate the passenger's value).
The average turnaround time of flight t 0 : the average of the two adjacent flights' time interval; see formula (3): In the formula (3), n is the gross number of passenger flights, t i is the passenger's flight time interval between ith and (i + 1)th, and t s is the average turnaround time of the precalculated whole passenger set.
The approximate coefficient of flight time R: the possibility that passengers take the plane again; see Eq. ( 4): R ¼ The average flight turnaround time t 0 reflects the expectation of the interval between passengers' two contiguous flights.As the latest flight time t is less than or equal to the average turnaround time t 0 , the value of R is 1; when t is greater than t 0 , the possibility of passengers taking off again is gradually reduced, and R is slowly decreased.
The passengers' flight frequency F reflects the activity and loyalty of passengers.It is acknowledged that the activity and loyalty affect the CLV to the airline company.The greater the take-off frequency, the higher the activity and loyalty degree, which can lead to the greater passenger's value to the airline.In general, the passengers' relative total amount of flights, the approximate coefficient of flight time and the passengers' flight frequency weighted sum, to obtain the passengers' value 'v', see Eq. ( 5): In formula (5), ω 1 , ω 2 and ω 3 are each indicator's weight coefficients.Considering the different measurement of different indicators, Mc, R and F should be standardised and then weighted summation.

MRE model
Passenger co-occurrence relationship includes the same order explicit co-occurrence relationship and different orders implicit coordination relationship.MRE multi-relational evaluation model combines order data and departure data, quantifies the explicit and implicit relationship between passengers and integrates time to make the comprehensive multi-relational evaluation.
(1) The same order co-occurrence relationship.
The same order co-occurrence relationship refers to the passenger relationship in the same order.The passenger's the same order relationship includes the number of passengers in the order, the difference between passenger class and order generation date.Based on PNR data to establish the whole passengers' same order relationship, use P ij to show the sequence of the same order relationship between passenger i and passenger j.
] is the kth record in the sequence, which indicates the data from the passenger i and passenger j's same order, where s [k] is the number of passengers of the order, t p [k] is the order generation date and c i [k] is the class of passenger i in the order (corresponding to the fare discount).
According to the sequence of the passenger's same order relation, passenger's same order relationship score is calculated.P' ij shows the total score of the same order relationship between passenger i and passenger j; see formula (6): In the formula (6), s p [k] is the score of the kth same order between passenger i and passenger j.

Data Mining
Company relationship: company relationship is defined by the author as the passengercompany relationships on the same flight which include coincidental company and appointed company.A co-occurrence relationship includes the date of flight departure, passenger seat distance, check-in sequence number distance, class rank difference and other attributes.
According to the whole passengers' company relationship based on the departure data, D ij is denoted as the sequence of company relationship between passenger i and passenger j.
] is the kth record in the sequence, which represents the kth flight data of passenger i and passenger j when they fly together.Among these, t d [k] is the flight departure date, d ci [k] represents the check-in distance between passenger i and passenger j, d seat [k] represents the Euclidean distance between passenger i and passenger j's flight seats and d class [k] represents the class difference between passenger i and passenger j.According to the processed sequence of passenger-company relationship, the passenger-company relationship score can be calculated, where D' ij is used to show the total company relationship score of passenger i and passenger j, and the formula is given as In formulas ( 7) and ( 8), ω 1 , ω 2 and ω 3 are the impact factors of check-in sequence number distance, seat distance and class difference on passenger-company relationship score.S d [k] is the kth company relationship score between passenger i and traveller j.
(3) Time involved multi-relational comprehensive evaluation Passenger value is unevenly distributed according to the edge weight.The scientific and accurate calculation of the edge weight directly affects the result of passenger value for the reason that the closer the passenger relationship is, the higher the value distributed.The RFM model predicts the possibility of customer repurchasing on the basis of customer consumption proximity R. Similarly, we also think that the civil aviation-passenger relationship is also connected with time: The passengers that fly together in the last few days are more likely to travel together again and have a closer relationship.In contrast, even if they have been together for many times, but no record of company in the past 2 years, we also have to consider whether the passenger relationship has disappeared.Due to the above considerations, we set the observation time window to observe the passenger relationship and bring in the time attenuation factor τ to make the passenger's relationship time perceptive.Assuming that the same last order (or same flight) of traveller i and traveller j is t, the time attenuation factor τ of the same order (or company) relationship between passenger i and passenger j can be expressed as where T-t 'is the length of the observation time window, T is the end time of the time window and t' is the beginning time of the time window.If t ≤ t 'means that the passenger does not have the same order (or company) relationship in the observation time window, then the relationship is considered to disappear, and assume that τ = 0.After introducing the time attenuation factor, the score of the same order passenger relationship can be expressed as formula (10), and the score of passenger-company relationship can be expressed as formula (11): where τ Pij is the time attenuation factor of the passenger i and the passenger j's same order relationship and τ Dij is the time attenuation factor of the same order relationship between the passenger i and the passenger j.
Standardise the passengers' company relationship score and the same order relationship score, and then weight and sum to get the total passenger relationship score.The formula is where W ij represents the total score of the relationship between passenger i and passenger j, ω p , ω d , followed by the same order relationship weight and company relationship weight, ω p < ω d .

CLV prediction accuracy
Fit is the criterion suggested in the data-mining literature [39][40][41] for problems where the primary objective is making predictions that are as accurate as possible.As measures of prediction accuracy, Glady et al. [42] used the mean absolute error (MAE) and root mean square error (RMSE) between the actual value and the forecast of value in customers' lifetime.The 1% trimming can be used for the MAE and RMSE to enhance the strength to potential outliers in the set of data.

Passenger network value assessment model 4.1. Model description
Based on the dimensions of flying frequency, discount level, amount level, total flight mileage and number of international flights, etc. in the past year, TravelSky makes a comprehensive assessment on the value of passengers every month and form a scale of 0-100 value score.Which is called TravelSky Value Score.Passenger network value assessment based on the internal data of China Eastern Airlines, using airline frequent personal attributes and the airline's internal flight network's behaviour to estimate the TravelSky Value Score by using the advanced machine learning model.By fitting the TravelSky Value Score to the XGBoost model, a high fitting accuracy rate can be obtained, therefore helping the airline to evaluate the Data Mining passenger network value timely and cost-effectively and to provide follow-up passenger segmentation and precision marketing services.

Data collection: frequent airline passenger portrait
First, collect data from relational database, log system, file system, document, picture, video, voice and other sources of different formats; analyse and identify the data.Then, focus on the business to identify and comprehend the information from the data.After that, extract valuable data fusion to the data platform.The dimension of frequent airline passenger portrait involves more than 300 variables including booking, flight, consumption, journal, e-commerce, add-ons and co-branded cards.

The inputs of the model
The input of the model is regarded as the relevant data or information which is used for computer processing.More specifically, in the process of the model application, input refers to the data of human and human behavioural characteristics.In the case of China Eastern Airlines, the inputs of passenger network value assessment model include 300+ variables, such as member current level, the highest consumption points in the last 3 months, the average delay time, how much changes of the air ticket endorsement, etc. However in general, the 300+ variables can be categorised into booking, flight, consumption, journal, e-commerce, add-ons and co-branded cards.

The outputs of the model
The outputs of CEA passenger network value assessment model is estimated CEA passenger value score.

The mechanism of the model
XGBoost is adopted as the mechanism in the paper. (

1) The introduction of XGBoost
XGBoost is a scalable machine learning system for tree boosting.The system is accessibly regarded as an open source package2.
XGBoost most prominent feature is that it can automatically use the CPU's multithreaded parallel while improving the algorithm to enhance the accuracy.Its debut is the Kaggle Higgs Sub Sign Recognition Contest, because of its superior efficiency and high predictive accuracy and it caught the attention of contestants in the competition forum.
(2) The Objective function of the optimisation model is where L θ ð Þ is error function which proves how well our model fits the data.Ω θ ð Þ is regularisation term, which is used to punish complex models [43].
The error function encourages the optimisation model to fit the training data, while the regularisation term helps the simpler model.Because when the model is simple, the randomness of the fitting degree of the finite data is relatively small and is not accessible to overfitting, making the prediction of the final model more stable.

The optimisation objective function in this case is
In this function, b y i is estimated passenger network value score and, b y i is TravelSky value score.

Conclusion
In this paper, the author first described the definition of customer lifetime value (CLV) and demonstrated the approach to estimating customer lifetime value by proposing various customer lifetime value models and illustrating the criterion to predict customer lifetime value accuracy.The aim is to provide the theoretical basis for the airline customer lifetime value estimation research.After that, a numeral case of China Eastern Airlines was given to show the practicability and veracity of China Eastern Airlines passenger network value assessment model with assessing their fitting accuracy rate with the TravelSky value score.The ambition is combining forecast value score calculated by China Eastern Airlines passenger network value assessment model with loss model score to select the critical population.

4. 5 .
Model evaluation report: TravelSky value fit reportUsing more than 300 features of CEA loss model and 240,000 passenger data of loss model, the TravelSky value score is fitted to the Xgboost model[44,45].4.5.1.Cross-contrast the TravelSky value score and the forecast value Cross-contrast the TravelSky value score with the forecast value; visualise the data and present it in the form of the charts below.PivotTable: The horizontal axis represents the 10-point range where the value score fits with CEA data.For example, 1 indicates [0, 10], 2 indicates [11, 20], and similarly, 10 indicates (90,100).The vertical axis represents the 10-point interval in which the avionics value score is located.
Separately observe their scores, and it can be seen that the scores are all concentrated in the high segment.In particular, 63.13% of the passengers get 100 TravelSky Value scores.
Based on the accuracy of the passenger network value assessment model combined with the prediction of passenger loss probability in the next 6 months, it is necessary to give priority to reach the target of 'high network value and high risk of loss in the next 6 months' passenger groups.Thus, it can help marketing accurate positioning.The resulted model fits the entire dataset, and the relative importance of each variate can be viewed by importance_xgb () or simpler summary () as above.4.6.2.Loss model score combines the forecast value score to select the key population