## 1. Introduction

Net Promoter Score (NPS) is a popular metric used in a variety of industries for measuring customer advocacy. Introduced by Reichheld (2003), NPS measures the likelihood that an existing customer will recommend a company to another prospective customer. NPS is derived from a single question that may be included as part of a larger customer survey. The single question asks the customer to use a scale of 0 to 10 to rate their willingness and intention to recommend the company to another person. Ratings of 9 and 10 are used to characterize so-called ‘promoters,’ ratings of 0 through 6 characterize ‘detractors,’ and ratings of 7 and 8 characterize ‘passives.’ The NPS is calculated as the percentage of respondents that are promoters minus the percentage of respondents that are detractors.

The idea behind the labels given to customers is as follows. Promoters are thought to be extremely satisfied customers that see little to no room for improvement, and consequently would offer persuasive recommendations that could lead to new revenue. The passive ratings, on the other hand, begin to hint at room for improvement and consequently the effectiveness of a recommendation from a Passive may be muted by explicit or implied caveats. Ratings at the low end are thought to be associated with negative experiences that might cloud a recommendation and likely scare off prospective new customers. Additional discussion on the long history of NPS can be found in Hayes (2008).

Some implementations of NPS methodology use reduced 5-point or 7-point scales that align with traditional Likert scales. However it is implemented, the hope is that movements in NPS are positively correlated with revenue growth for the company. While Reichheld’s research presented some evidence of that, other findings are not as corroborative (Kenningham et al., 2007). Regardless of whether there is a predictive relationship between NPS and revenue growth, implementing policies and programs within a company that improve NPS is an intuitively sensible thing to do [see, for example, Vavra (1997)]. A difficult and important question, however, is how to identify key drivers of NPS. *Calculating* NPS alone does not do this.

This chapter is an illustrative tutorial that demonstrates how a statistical classification model can be used to identify key drivers of NPS. Our premise is that the classification model, the data it operates on, and the analyses it provides could usefully form components of a Decision Support System that can not only provide both snapshot and longitudinal analyses of NPS performance, but also enable analyses that can help suggest company initiatives aimed toward lifting the NPS.

We assume that the NPS question was asked as part of larger survey that also probed customer satisfaction levels with respect to various dimensions of the company’s services. We develop a predictive classification model for customer advocacy (promoter, passive or detractor) as a function of these service dimensions. A novelty associated with our classification model is the optional use of constraints on the parameter estimates to enforce a monotonic property. We provide a detailed explanation of how to fit the model using the SAS software package and show how the fitted model can be used to develop company policies that have promise for improving the NPS. Our primary objective is to teach an interested practitioner how to use customer survey data together with a statistical classifier to identify key drivers of NPS. We present a case study that is based on a real-life data collection and analysis project to illustrate the step-by-step process of building the linkage between customer satisfaction data and NPS.

## 2. Logistic regression

In this section we provide a brief review of logistic and multinomial regression. Allen and Rao (2000) is a good reference that contains more detail than we provide, and additionally has example applications pertaining to customer satisfaction modeling.

### 2.1. Binomial logistic regression

The binomial logistic regression model assumes that the response variable is binary (0/1). This could be the case, for example, if a customer is simply asked the question “Would you recommend us to a friend?” Let
*n* customers, assigning a “1” for Yes and “0” for No. Suppose a number of other data items (covariates) are polled from the customer on the same survey instrument. These items might measure the satisfaction of the customer across a wide variety of service dimensions and might be measured on a traditional Likert scale. We let
*i*-th sampled customer and note that it reflects the use of dummy variable coding for covariates that are categorical scale. For example, if the first covariate is measured on a 5-point Likert scale, its value is encoded into
*j*.

The binomial logistic regression model posits that

Model fitting for the binomial logistic regression model entails estimating the parameters

and the maximum likelihood estimate (MLE) of

### 2.2. Multinomial logistic regression

Suppose now that the outcome variable has more than two categories. For example, suppose the responses
*each* response as a function of the covariates. Since each response is an ordinal categorical variable taking on values in the set

where
*j*) of the right hand sides of the link functions, the constraint

and the MLE of

## 3. Case study

Carestream Health, Inc. (CSH) was formed in 2007 when Onex Corporation of Toronto, Canada purchased Eastman Kodak Company’s Health Group and renamed the business as Carestream Health. They are an annual $2.5B company and a world leader in medical imaging (digital and film), healthcare information systems, dental imaging and dental practice management software, molecular imaging and non-destructive testing. Their customers include medical and dental doctors and staff and healthcare IT professionals in small offices and clinics to large hospitals and regional and national healthcare programs. A major company initiative is to create a sustainable competitive advantage by delivering the absolute best customer experience in the industry. Customer recommendations are key to growth in the digital medical space and no one has been able to do it consistently well. The foundation for taking advantage of this opportunity is to understand what's important to customers, measure their satisfaction and likelihood to recommend based on their experiences, and drive improvement.

While descriptive statistics such as trend charts, bar charts, averages and listings of customer verbatim comments are helpful in identifying improvement opportunities to improve the Net Promoter Score (NPS), they are limited in their power. First, they lack quantitative measurements of correlation between elements of event satisfaction and NPS. As a consequence, it is not clear what impact a given process improvement will have on a customer’s likelihood to recommend. Second, they lack the ability to view multi-dimensional relationships – they are limited to single factor inferences which may not sufficiently describe the complex relationships between elements of a customer’s experience and their likelihood to recommend.

This section summarizes the use of multinomial logistic regression analyses that were applied to 5056 independent customer experience surveys from Jan 2009 – Jan 2010. Each survey included a question that measured (on a 5-point Likert scale) how likely it would be for the customer to recommend colleagues to purchase imaging solutions from CSH. Five other questions measured the satisfaction level (on a 7-point Likert scale) of the customer with CSH services obtained in response to an equipment or software problem. Key NPS drivers are revealed through the multinomial logistic regression analyses, and improvement scenarios for specific geographic and business combinations are mapped out. The ability to develop a quantitative model to measure the impact on NPS of potential process improvements significantly enhances the value of the survey data.

### 3.1. CSH Customer survey data

The 5-point Likert response to the question about willingness to recommend is summarized in Table 1 below. CSH calculates a unique net promoter score from responses on this variable using the formula
*i*. Two interesting characteristics of the weight vector are, first, the penalty for a 1 (2) exceeds the benefit of a 5 (4), and second, the negative weight for a neutral score is meant to drive policies toward delighting customers.

Let
*Y*. That is,
*Y* such that their influence on the values

The demographic covariates include the (global) region code, country code, business code and the customer job title. The demographic covariates are coded using the standard dummy variable coding technique. For example, region code utilizes 7 binary variables

Country code utilizes similar dummy variables, but because countries are nested within regions we use the notation

In the data set we have

Finally, job title utilizes 10 dummy variables

The customer satisfaction covariates are also coded using the dummy variable scheme. The data on these covariates are the responses to the survey questions identified as q79, q82a, q82b, q82d and q82f. These questions survey the customer satisfaction on ‘Overall satisfaction with the service event,’ ‘Satisfaction with CSH knowledge of customer business and operations,’ ‘Satisfaction with meeting customer service response time requirements,’ ‘Satisfaction with overall service communications,’ and ‘Satisfaction with skills of CSH employees,’ respectively.

Survey questions q82c and q82e, which survey satisfaction with ‘Time it took to resolve the problem once work was started,’ and ‘Attitude of CSH employees’ were also considered as covariates, but they did not show themselves to be statistically significant in the model. Their absence from the model does not necessarily imply they are not important drivers of their overall satisfaction with CSH, but more likely that their influence is correlated with the other dimensions of overall satisfaction that are in the model. Each customer satisfaction covariate is scored by customers using a 7-point Likert scale (where ‘1’ indicates the customer is “extremely dissatisfied” and ‘7’ indicates “extremely satisfied”), and thus each utilizes 7 dummy variables in the coding scheme. We denote these dummy variables as

Assembling all of the covariates together, we then have a total of 77 covariates in

### 3.2. Model fitting and interpretation

The SAS code for obtaining maximum likelihood estimates (MLEs) for the model parameters

The section of the PROC LOGISTIC output entitled ‘Type-3 Analysis of Effects’ characterizes the statistical significance of the covariates through p-values obtained by referencing a Wald chi-square test statistic to a corresponding null chi-square distribution. Table 3 shows the chi-square tests and the corresponding p-values, and it is seen that all covariate groups are highly significant contributors in the model.

One way to assess model adequacy for multinomial logistic regression is to use the model to predict Y and then examine how well the predicted values match the true values of Y. Since the output of the model for each customer is an estimated probability distribution for Y, a natural predictor of Y is the mode of this distribution. We note that this predictor considers equal cost for all forms of prediction errors. More elaborate predictors could be derived by assuming a more complex cost model where, for example, the cost of predicting 5 when the actual value is 1 is higher than the cost of predicting 5 when the actual value is 4. Table 4, the so-called confusion matrix of the predictions, displays the cross classification of all 5056 customers based on their actual value of Y and the model-predicted value of Y.

Covariate Group | Degrees of Freedom | Wald Statistic | p-value |

RC | 6 | 41.2 | < .0001 |

CC | 16 | 40.9 | < .01 |

BC | 1 | 7.9 | < .01 |

JT | 9 | 43.7 | < .0001 |

q79 | 6 | 84.8 | < .0001 |

q82a | 6 | 56.5 | < .0001 |

q82b | 6 | 34.4 | < .0001 |

q82d | 6 | 34.8 | < .0001 |

q82f | 6 | 39.9 | < .0001 |

Actual Y | Predicted Y | Total | ||||

1 | 2 | 3 | 4 | 5 | ||

1 | 3 | 2 | 7 | 4 | 4 | 20 |

2 | 3 | 8 | 48 | 22 | 7 | 88 |

3 | 2 | 3 | 342 | 486 | 133 | 966 |

4 | 0 | 0 | 126 | 1233 | 723 | 2082 |

5 | 0 | 0 | 39 | 705 | 1156 | 1900 |

Total | 8 | 13 | 562 | 2450 | 2023 | 5056 |

A perfect model would have a confusion matrix that is diagonal indicating the predicted value for each customer coincided identically with the true value. Consider the rows of Table 4 corresponding to Y=4 and Y=5. These two rows account for almost 80% of the customers in the sample. It can be seen that in both cases, the predicted value coincides with the actual value about 60% of the time. Neither of these two cases predicts Y=1 or Y=2, and only 4% of the time is Y=3 predicted. The mean values of the predicted Y when Y=4 and Y=5 are 4.28 and 4.59, respectively. The 7% positive bias for the case Y=4 is roughly offset by the 11.8% negative bias for the case Y=5.

Looking at the row of Table 4 corresponding to Y=3, we see that 86% of the time the predicted Y is within 1 of the actual Y. The mean value of the predicted Y is 3.77, indicating a 26% positive bias. Considering the rows corresponding to Y=1 and Y=2, where only about 2% of the customers reside, we see the model struggles to make accurate predictions, often over-estimating the actual value of Y. A hint as to the explanation for the noticeable over-estimation associated with the Y=1, Y=2 and Y=3 customers is revealed by examining their responses to the covariate questions. As just one example, the respective mean scores on question q79 (“Overall satisfaction with the service event”) are 3.8, 4.1 and 5.2. It seems a relatively large number of customers that give a low response to Y are inclined to simultaneously give favorable responses to the covariate questions on the survey. Although this might be unexpected, it can possibly be explained by the fact that the covariate questions are relevant to the most recent service event whereas Y is based on a customer’s cumulative experience.

Overall, Table 4 reflects significant lift afforded by the multinomial logistic regression model for predicting Y. For example, a model that utilized no covariate information would have a confusion matrix whose rows were constant, summing to the row total. In sum, we feel the accuracy of the model is sufficient to learn something about what drives customers to give high responses to Y, though perhaps not sufficient to learn as much about what drives customers to give low responses to Y.

Figure 1 is a graphical display of the slopes for each of the customer satisfaction covariates. The larger the coefficient value, the more detrimental the response level is to NPS. The y-axis is therefore labeled as ‘demerits.’

In view of the ordinal nature of the customer satisfaction covariates, the slopes, which represent the effect of the Likert scale levels, should decrease monotonically. That is, the penalty for a ‘satisfied’ covariate value should be less than or equal to that of a ‘dissatisfied’ covariate value. As such, it would be logical to have the estimated values of the slopes display the monotone decreasing trend as the response level of the covariates ascends. Figure 1 shows that the unconstrained MLEs for the slopes associated with the customer satisfaction covariates nearly satisfy the desired monotone property, but not exactly. The aberrations are due to data deficiencies or minor model inadequacies and can be resolved by using a constrained logistic regression model introduced in the next section.

### 3.3. Constrained logistic regression

Consider the situation where the *i*-th covariate is ordinal in nature, perhaps because it is measured on a *k*-point Likert scale. The CSH data is a good illustration of this situation, since all the customer satisfaction covariates are ordinal variables measured on 7-point Likert scale. Let the corresponding group of *k* slopes for this covariate be denoted by

In order to simplify our use of PROC NLP, it is convenient to work with a full-rank parameterization of the logistic regression model. Because countries are nested within regions, a linear dependency exists between the dummy variables corresponding to regions and countries within regions. We can eliminate the linear dependency by removing region from the model and specifying country to be non-nested factor. The result of this model reparameterization is that instead of 6 degrees of freedom in the model for regions and 16 degrees of freedom for countries nested within regions, we equivalently have 22 degrees of freedom for countries. For the same purpose, we also redefine the dummy variable coding used for other categorical and ordinal covariates by using a full rank parameterization scheme. In particular, we use *k*-1 dummy variables (rather than *k*) to represent a *k*-level categorical or ordinal variable. With the full rank parameterization, the highest level of customer satisfaction has a slope parameter that is fixed to be 0. Lines 3-10 in the SAS code shown in Appendix B are used to set up the full rank parameterization of the logistic regression model.

Beginning with line 12 in the SAS code, PROC NLP is used to derive the MLEs of the parameters under the constrained parameter space. The ‘max’ statement (line 13) indicates the objective function is the log-likelihood function of the model and that it is to be maximized. The maximization is carried out using a Newton-Raphson algorithm, and the ‘parms’ statement (line 14) specifies initial values for the intercept and slope parameters. The SAS variables bq*j*, ba*j,* bb*j,* bd*j* and bf*j* are used to symbolize the slope parameters corresponding to the *j*-th response level of the customer satisfaction covariates q79, q82a, q82b, q82d and q82f. Similarly, bcc*j*, bbc*j*, and b*j*
*j* are used to denote the slopes associated with different countries, business codes and job titles. The ‘bounds’ and ‘lincon’ statements (lines 15-21) jointly specify the monotone constraints associated with the intercept parameters and the slopes of the customer satisfaction covariates. Lines 22-29 define the log likelihood for each customer which, for the *i*-th customer, is given by

Table 5 provides a side-by-side comparison of the constrained and unconstrained MLEs for the slopes of the customer satisfaction covariates, and Figure 2 is a plot that shows the monotone behavior of the constrained estimates. There is very little difference between the unconstrained and constrained MLEs for the demographic covariates. Recall that for the unconstrained MLEs, the zero for the slope of the last level of each covariate is a structural zero resulting from the non-full rank dummy variable coding used when fitting the model. In the case of the constrained MLEs, the slopes of the last levels of the covariates are implied zeros resulting from the full-rank dummy variable coding used when fitting the model. Table 5 shows that incorporating the constraints do not lead to a substantial change in the estimated slopes. In an indirect way, this provides a sanity check of the proposed model. We will use the constrained estimates for the remainder of the case study.

### 3.4. Model utility

A purely empirical way to compute NPS is to use the observed distribution (based on all 5,056 survey responses) of *Y* for

We illustrated potential pathways to improve the overall NPS score, but this can also be done with specific sub-populations in mind. For example, if the first region was under study, then one could simply adjust the demographic covariates as illustrated in section 3.4.2 before implementing scenarios adjustments.

## 4. Discussion

Alternative measures to NPS of customer advocacy include customer satisfaction (CSAT) and Customer Effort Score (CES) (Dixon et al., 2010). CES is measured on a 5-point scale and is intended to capture the effort required by a customer to resolve an issue through a contact-center or self-service channel. (Dixon et al., 2010) compared the predictive power of CSAT, NPS and CES on service customers' intention to do repeat business, increase their spending, and speak positively about the company. They concluded that CSAT was a relatively poor predictor, while CES was the strongest. NPS ranked in the middle.

The choice of which customer advocacy measure to use depends on many factors such as the type of company-to-customer relationship, the degree to which recommendations (for or against a company) influence a purchase decision, and whether the measures will be complemented by other customer feedback. To gain an in-depth understanding of customers' experiences and how to improve them may require multiple indicators. In the end, it is the action taken to drive improvements that customers value that is most critical.

Our case study validates the feasibility for using a multinomial logistic regression model as a means to identify key drivers of NPS, though it is clear that the same methodology could be employed with alternative measures of customer advocacy. Improvement teams at CSH have used this model to prioritize projects relative to their expected impacts on NPS. A novel aspect of our model development was the implementation of monotone constraints on the slope parameters of the ordinal covariates. Our illustrative SAS code showing how to impose the constraints on the maximum likelihood estimates should be of significant help to practitioners interested in doing the same thing.

## 5. Appendix A

## 6. Appendix B

data indata;

set indata;

array cc{23} cc1-cc23; do i=1 to 23; if CC=i then cc{i}=1; else cc{i}=0;end;

if BC=1 then bc1=1;else bc1=0;

array jt{9} jt1-jt9; do i=1 to 9; if JT=i then jt{i}=1; else jt{i}=0;end;

array q{6} q1-q6; do i=1 to 6; if q79=i then q{i}=1; else q{i}=0;end;

array a{6} a1-a6; do i=1 to 6; if q82a=i then a{i}=1; else a{i}=0;end;

array b{6} b1-b6; do i=1 to 6; if q82b=i then b{i}=1; else b{i}=0;end;

array d{6} d1-d6; do i=1 to 6; if q82d=i then d{i}=1; else d{i}=0;end;

array f{6} f1-f6; do i=1 to 6; if q82f=i then f{i}=1; else f{i}=0;end;

run;

proc nlp data=indata;

max loglik;

parms alp1=-7, alp2=-5, alp3=-2, alp4=-1, bcc1-bcc10=0, bcc12-bcc23=0, bbc1=0, bj1-bj9=0, bq1-bq6=0, ba1-ba6=0, bb1-bb6=0,bd1-bd6=0, bf1-bf6=0;

bounds 0 <= bq6,0 <= ba6,0 <= bb6,0 <= bd6,0 <= bf6;

lincon 0<=alp4-alp3, 0<=alp3-alp2, 0<=alp2-alp1;

lincon 0<= bq5-bq6, 0<= bq4-bq5, 0<= bq3-bq4, 0<= bq2-bq3, 0<= bq1-bq2;

lincon 0<= ba5-ba6, 0<= ba4-ba5, 0<= ba3-ba4, 0<= ba2-ba3, 0<= ba1-ba2;

lincon 0<= bb5-bb6, 0<= bb4-bb5, 0<= bb3-bb4, 0<= bb2-bb3, 0<= bb1-bb2;

lincon 0<= bd5-bd6, 0<= bd4-bd5, 0<= bd3-bd4, 0<= bd2-bd3, 0<= bd1-bd2;

lincon 0<= bf5-bf6, 0<= bf4-bf5, 0<= bf3-bf4, 0<= bf2-bf3, 0<= bf1-bf2;

tp=cc1*bcc1+cc2*bcc2+cc3*bcc3+cc4*bcc4+cc5*bcc5+cc6*bcc6+cc7*bcc7+cc8*bcc8+cc9*bcc9+cc10*bcc10+cc12*bcc12+cc13*bcc13+cc14*bcc14+cc15*bcc15+cc16*bcc16+cc17*bcc17+cc18*bcc18+cc19*bcc19+cc20*bcc20+cc21*bcc21+cc22*bcc22+cc23*bcc23+bc1*bbc1+jt1*bj1+jt2*bj2+jt3*bj3+jt4*bj4+jt5*bj5+jt6*bj6+jt7*bj7+jt8*bj8+jt9*bj9+q1*bq1+q2*bq2+q3*bq3+q4*bq4+q5*bq5+q6*bq6+a1*ba1+a2*ba2+a3*ba3+a4*ba4+a5*ba5+a6*ba6+b1*bb1+b2*bb2+b3*bb3+b4*bb4+b5*bb5+b6*bb6+d1*bd1+d2*bd2+d3*bd3+d4*bd4+d5*bd5+d6*bd6+f1*bf1++f2*bf2+f3*bf3+f4*bf4+f5*bf5+f6*bf6;

pi1=exp(alp1+tp)/(1+exp(alp1+tp));pi2=exp(alp2+tp)/(1+exp(alp2+tp));

pi3=exp(alp3+tp)/(1+exp(alp3+tp));pi4=exp(alp4+tp)/(1+exp(alp4+tp));

if Y=1 then loglik=log(pi1);

if Y=2 then loglik=log(pi2-pi1);

if Y=3 then loglik=log(pi3-pi2);

if Y=4 then loglik=log(pi4-pi3);

if Y=5 then loglik=log(1-pi4);

- run;