Customer churn

Telecom companies are very interested in when customers leave their company (this is called churn). A dataset made publicly available by IBM on customer churn from a telecom company is available in the BDgraph R package and are called churn.

Construct a Kaplan-Meier estimate of the survival function for time-to-churn and plot the estimate of the survival function.

The values of a survival function \(\hat{S}(t)\) can be estimated from: \[\hat{S}(t_j) = \prod_{k=1}^j \left( 1 - \frac{d_k}{n_k} \right)\]

Where,

The calculations can be done using a specified R package (e.g. survival) or manually.

par(mfrow=c(1, 2))

## Manual Approach
t <- d$Account.Length # time to event
ev <- d$ev            # event

es <- ev[order(t)]
ts <- t[order(t)]

di <- table(ts[es==1])
ni <- sapply(unique(ts[es==1]), function(x) sum(ts>=x))
lambda <- di/ni
S <- cumprod(1-lambda)

plot(as.integer(names(S)), S, type="s", lwd=3, col="red", 
     xlab="Time [days]", ylab="Survival Rate", 
     ylim=c(0, 1), xlim=c(0, 250))
legend("bottomleft", "Manual Computation", lwd=3, col="red", cex=.7)
mtext("Kaplan-Meier Survival Function Estimate", cex=.7)

## survival package
fit1 <- survfit(Surv(time = d$Account.Length, event=ev)~1)
plot(fit1, lwd=1, col="blue",
     xlab="Time [days]", ylab="Survival Rate")
legend("bottomleft", c("survival package", "95% CI"), lty=c(1, 3), 
       col="blue", cex=.7)
mtext("Kaplan-Meier Survival Function Estimate", cex=.7)

Kaplan-Meier Survival Function Estimates

Estimate the median time-to-churn

The median time-to-churn is the time when the value of the survival function is equal to 0.5.

# Manual
median.idx <- which.min(abs(S - 0.5))
print(names(median.idx))

## [1] "201"

# survival package
print(fit1)

## Call: survfit(formula = Surv(time = d$Account.Length, event = ev) ~ 
##     1)
## 
##       n  events  median 0.95LCL 0.95UCL 
##    3333     483     201     188      NA

We can get a more accurate result if we fit the model using one of the time-to-event probability density functions, and use the analytical solution to find the median value. To fit the survival model based on the Weibull distribution, we can use the survival package, or for educational reasons, we can try to replicate the algorithms manually. To do it, we need to construct the likelihood function where each of the observations contributes in the following ways, under different censoring conditions:

For brevity, let’s assume that the event at a certain time is fully observed (0), right censored (1) or left-censored (2). In that case, the log-likelihood of the j^th tuple \((time_j, event_j) = (t_j, y_j)\) can be expressed as:

\[ l_j = (a+by_j + cy_j^2)\log{f(t_j)} + (d+ey_j + fy_j^2)\log(1-F(t_j))+(g+hy_j+iy_j^2)\log{F(t_j)} \]

The coefficients \(a, b, c, d, e, f, g, h, i\) used to select the corresponding term can be calculated by solving:

\[ \left[\begin{matrix} a & b & c \\ d & e & f \\ g & h & i \end{matrix}\right] \times \left[\begin{matrix} 0^0 & 1^0 & 2^0 \\ 0^1 & 1^1 & 2^1 \\ 0^2 & 1^2 & 2^2 \end{matrix}\right]= \left[\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}\right] \]

The solution is:

\[ \left[\begin{matrix} a & b & c \\ d & e & f \\ g & h & i \end{matrix}\right] = \left[\begin{matrix} 1 & -1.5 & 0.5 \\ 0 & 2 & -1 \\ 0 & -0.5 & 0.5 \end{matrix}\right] \]

The log likelihood function can be implemented in the following way:

# Log-likelihood
llk <- function(theta) {
  alpha <- theta[1]
  beta <- theta[2]
  
  # fully observed
  f <- dweibull(t, alpha, beta, log=TRUE)
  # f <- log(alpha) - alpha*log(beta) + (alpha - 1)*log(t) - (t/beta)^alpha
    
  # right censored
  rc <- log(1 - pweibull(t, alpha, beta, log=FALSE))
  # rc <- -(t/beta)^alpha
  
  # left censored
  lc <- pweibull(t, alpha, beta, log=TRUE)
  # lc <- log( 1 - exp(-(t/beta)^alpha))
  
  sum( (1 - 1.5*ev + 0.5*ev^2)*rc + (2*ev-ev^2)*f + (-0.5*ev + 0.5*ev^2)*lc)
}

More computationally efficient version written in C++ using the Rcpp library:

double llkC(NumericVector theta, NumericVector time, NumericVector event)
{
  NumericVector params = theta;
  NumericVector t = time;
  NumericVector e = event;
  
  double alpha = params(0);
  double beta = params(1);
  
  int N = t.size();
  
  NumericVector f(N);
  NumericVector rc(N);
  NumericVector lc(N);
  
  NumericVector e_sq(N);
  NumericVector pow_t_beta(N);
  
  pow_t_beta = pow(t/beta, alpha);
  e_sq = pow(e, 2);

  f = log(alpha) - alpha*log(beta) + (alpha - 1)*log(t) - pow_t_beta;
  rc = -pow_t_beta;
  lc = log(1 - exp(-pow_t_beta));
  
  double llk = sum((1-1.5*e+0.5*e_sq)*rc + (2*e - e_sq)*f + (-0.5*e + 0.5*e_sq)*lc);

  return(llk);
}

The parameter estimates which maximise the log-likelihood of the observed data can be obtained from the optim function.

# alpha0 = 1, beta0 = 100
theta0 <- c(1, 100)

#optimise
fit <- optim(theta0, llkC.wrap, method="BFGS", control=list(fnscale=-1), hessian = TRUE)

alpha <- fit$par[1]
beta <- fit$par[2]

# median
m.churn <- beta * (log(2))^(1/alpha)
print(m.churn)

## [1] 198.6128

df.par <- cbind(fit$par, diag(sqrt(-1/fit$hessian)))
colnames(df.par) <- c("MLE", "sd")
rownames(df.par) <- c("$\\hat{\\alpha}$", "$\\hat{\\beta}$")
knitr::kable(df.par, digits=3, caption="Weibull model MLE estimates")

Weibull model MLE estimates
	MLE	sd
\(\hat{\alpha}\)	2.811	0.065
\(\hat{\beta}\)	226.268	3.662

The survival package includes the survreg function, which automates all the calculations. Its results are nearly identical:

sr.fit <- survreg(Surv(t, ev) ~ 1, dis="weibull")
sr.vec <- c(1/sr.fit$scale, exp(sr.fit$coefficients))
names(sr.vec) <- c("alpha", "beta")
sr.vec

##      alpha       beta 
##   2.811443 226.268415

Once we know the shape and rate parameters of the Weibull distribution, we can calculate the median from the closed-form equation:

\[\text{median} = \beta \log(2)^{1/\alpha} \approx \hat{\beta} \log(2)^{1/\hat{\alpha}} = 198.6128\]

Use the Cox proportianal hazards method to study the impact of the customer covariates on the time-to-churn and develop a parsimonious model.

The dataset contains 19 covariates. Three of them are factorised, and the rest are numerical. One of the factorised covariates (State) has 51 levels and can be removed if we consider the national scope. Let \(x_i = (x_{i1}, x_{i2}, \ldots, x_{i18})\) be the vector of covariates recorded for survival time \(t_i\). Then, the Cox Proportional Hazards Model defines the hazard function for the i^th observation as:

\[ \lambda(t_j|x_i) = \lambda_0(t) exp \left( \sum_{j=1}^{18}\beta_j x_{ij}\right) \] where \(\beta\) is a vector or regression parameters.

Function coxph from the survival package allows us to check, which covariates have a statistically significant effect on the hazard.

cx <- coxph(Surv(time=Account.Length, event=ev) ~ . - State - Churn, data=d)
summary(cx, conf.int=FALSE)

## Call:
## coxph(formula = Surv(time = Account.Length, event = ev) ~ . - 
##     State - Churn, data = d)
## 
##   n= 3333, number of events= 483 
## 
##                      coef  exp(coef)   se(coef)      z Pr(>|z|)    
## Area.Code       4.256e-04  1.000e+00  1.077e-03  0.395 0.692768    
## Int.l.Planyes   1.201e+00  3.323e+00  1.038e-01 11.573  < 2e-16 ***
## VMail.Planyes  -2.142e+00  1.174e-01  4.845e-01 -4.421 9.81e-06 ***
## VMail.Message   4.356e-02  1.045e+00  1.495e-02  2.914 0.003572 ** 
## Day.Mins       -9.006e-01  4.063e-01  2.667e+00 -0.338 0.735576    
## Day.Calls       1.323e-03  1.001e+00  2.242e-03  0.590 0.555070    
## Day.Charge      5.351e+00  2.108e+02  1.569e+01  0.341 0.732999    
## Eve.Mins        8.643e-01  2.373e+00  1.344e+00  0.643 0.520337    
## Eve.Calls       1.562e-03  1.002e+00  2.342e-03  0.667 0.504804    
## Eve.Charge     -1.012e+01  4.035e-05  1.582e+01 -0.640 0.522412    
## Night.Mins     -1.230e-02  9.878e-01  7.374e-01 -0.017 0.986693    
## Night.Calls     1.335e-03  1.001e+00  2.416e-03  0.552 0.580670    
## Night.Charge    3.415e-01  1.407e+00  1.639e+01  0.021 0.983371    
## Intl.Mins      -4.326e+00  1.322e-02  4.402e+00 -0.983 0.325763    
## Intl.Calls     -7.365e-02  9.290e-01  2.121e-02 -3.473 0.000514 ***
## Intl.Charge     1.622e+01  1.110e+07  1.630e+01  0.995 0.319730    
## CustServ.Calls  3.075e-01  1.360e+00  2.745e-02 11.201  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Concordance= 0.772  (se = 0.012 )
## Likelihood ratio test= 443.1  on 17 df,   p=<2e-16
## Wald test            = 468.6  on 17 df,   p=<2e-16
## Score (logrank) test = 498.6  on 17 df,   p=<2e-16

The only statistically significant results (at \(\alpha=0.05\)) are:

Cox Proportional Hazards. Statistically significant covariates.
	Description	exp(Coefficient)
Int.l.Planyes	Categorical, yes or no, international plan.	3.323
VMail.Planyes	Categorical, yes or no, voice mail plan.	0.117
VMail.Message	Count, number of voice mail messages.	1.045
Intl.Calls	Count, total number of international calls.	0.929
CustServ.Calls	Count, number of calls to customer service.	1.360

The parsimonious model is relying only on those 5 covariates.

fit.pars <- survreg(Surv(time=Account.Length, event=ev) ~ 
                      Int.l.Plan + VMail.Plan + VMail.Message + Intl.Calls + CustServ.Calls, 
                    data=d)
summary(fit.pars)

## 
## Call:
## survreg(formula = Surv(time = Account.Length, event = ev) ~ Int.l.Plan + 
##     VMail.Plan + VMail.Message + Intl.Calls + CustServ.Calls, 
##     data = d)
##                   Value Std. Error      z       p
## (Intercept)     5.55418    0.04889 113.61 < 2e-16
## Int.l.Planyes  -0.44955    0.03952 -11.38 < 2e-16
## VMail.Planyes   0.63196    0.16890   3.74 0.00018
## VMail.Message  -0.01287    0.00524  -2.45 0.01411
## Intl.Calls      0.02264    0.00742   3.05 0.00227
## CustServ.Calls -0.10816    0.01045 -10.35 < 2e-16
## Log(scale)     -1.03396    0.03577 -28.91 < 2e-16
## 
## Scale= 0.356 
## 
## Weibull distribution
## Loglik(model)= -3244   Loglik(intercept only)= -3383.9
##  Chisq= 279.77 on 5 degrees of freedom, p= 2.2e-58 
## Number of Newton-Raphson Iterations: 8 
## n= 3333

The baseline Weibull parameters of the parsimonious model are slightly different from the full model.

pars.vec <- c(1/fit.pars$scale, exp(fit.pars$coefficients[1]))
names(pars.vec) <- c("alpha", "beta")
pars.vec

##      alpha       beta 
##   2.812171 258.314657

Exponentiated coefficients either increase the hazard (if they are \(>1\)) or decrease it (if they are \(<1\)).

Summarize the results of your analysis

The Weibull distribution proved to be a good fit for the model. The scale parameter is equal \(\alpha = 2.811 \pm 0.065\) and is very unlikely to be \(1\). Hence, the Weibull distribution fits the data better than the exponential.

The median time-to-churn is 201 days. This period is slightly larger than 6 months (182 days). Considering the fact that most of the mobile network operators offer 12 or 24 months contract, it could mean that half of the customers will leave before it ends.

A phone I cannot afford

There are 5 covariates that can be used in the parsimonious model. International Plan increases the hazard of the customer leaving three-fold. It is an interesting observation because people who make more International Calls are more likely to stay longer. One explanation of this paradox may be that some customers go for more expensive plans (which include international calls), to get better phones for less. Later, they are either unable or unwilling to pay, and the contract is terminated more quickly.

The fact may partially support this hypothesis, that similar thing is happening with Voice Mail Plan and Voice Mail Message. Even though customers who have the voice mail plan activated are 88.3% less likely to go at any time, every message left in their voice mail increases the hazard by 4.5%. Voice mail is targeted towards people, who cannot answer the phone calls at any given moment, probably because they are busy making money. If a customer has many voice mail messages, it could mean that they have either shared the number with someone they would like to avoid, or they never intended to use that number. In the latter case, messages are most likely left by the debt collectors.

Four times a charm

Unsurprisingly, contacting the customer service increases the hazard of leaving by 36% for every phone call made. What is surprising is how many chances the customers give to the operators before they decide to go. Culturally dictated common sense would give the operator three opportunities before the issue is sorted. Still, here we see strong evidence that they give four chances before they decide to leave. After the fourth phone call to the customer service, every subsequent call increases the hazard of going five-fold.

d$cs <- d$CustServ.Calls > 4
coxph(Surv(t, ev) ~ d$cs)

## Call:
## coxph(formula = Surv(t, ev) ~ d$cs)
## 
##            coef exp(coef) se(coef)     z      p
## d$csTRUE 1.6135    5.0205   0.1365 11.82 <2e-16
## 
## Likelihood ratio test=95.03  on 1 df, p=< 2.2e-16
## n= 3333, number of events= 483

Customer churn

Pawel Piela

2020-08-02

A phone I cannot afford

Four times a charm