100 Celsius AI

Predictive customer analytics

Improving campaign ROI with uplift modeling

In our last blogpost, we explained how we can identify high-risk customers and identify the key drivers for churn. For this blogpost, we want to motivate the need for uplift modeling and explain how it works.


Why uplift modeling

With help of the churn-risk model, we get a reasonable initial target list for a retention campaign. If we run the campaign on the whole target list at once, however, we will encounter that only a subset of the customers respond well to the campaign, i.e. refrain from churning. Furthermore, we also don't know what the best options for the campaign are. 

Fig. 1: Churn categories - the goal is to only target the salvageable ones and by any means avoid the sleeping dogs.


Uplift modeling tackles exactly this problem by identifying subsegments for which the treatment (in this case sending a specific campaign) has a positive effect which wouldn't be there without the treatment. Customers can be grouped into four categories regarding churn, as depicted in Fig. 1.


Example: E-mail campaign of webshop

Let us study how we can optimize a campaign by using a publicly available dataset on email campaigns for a fashion e-commerce application. In this dataset, customer characteristics include time of last purchase or recency of last visit to the web-shop. These characteristics are the variables we are going to segment the customers by. Three different treatments were applied to a third of the customers each: an email containing offers for men's merchandize, an email containing women's merchandize and no email at all. To simplify, we are now going to analyze the effect of the campaign with women's merchandize versus the control group which didn't receive an email at all. Our uplift model's task will be to identify the customers which are most likely to revisit the webshop after receiving the campaign.

First, let's look at the overall importance of our variables (Fig. 2): Some obvious observations can be made, like that the recency of the purchase or if the customer has bought a women's or men's merchandize within the past year has a large impact on the probability of visiting. 


Fig. 2: Boxplot diagram illustrating influence of different variables on propensity to visit the webshop after having received an email advertising women's clothes averaged over 10-fold stratified cross-validation.


While feature importance is interesting to look at, the thing we actually care about is uplift. Uplift can be seen as measure for the incremental probability of visiting the webshop. To check our model we can look at how the variable "has purchased women's clothing in past year" affects the campaign outcome: Fig. 3 clearly illustrates that a "yes" leads to a lot more added probability than a "no", which is what we expect.

Fig. 3: Uplift delivered by variable "has purchased women's clothing in last year"


Apart from identifying single drivers, the model is able to reveal more complex patterns. As a simple example (Fig. 4), customers who have bought women's merchandize in the past year and are from a rural area are less likely to respond well to the campaign compared to their urban counterparts, whereas the location doesn't seem to matter as much if the customer didn't buy women's merchandize.

Fig. 4: Violinplot showing the distribution of uplift predicted depending on two variables illustrating powerful variable combinations.


The most important question is of course: How much does uplift help us? The gain we can achieve is a better return on effort, meaning we can convert a similar amount of people to visit our site with contacting only a fraction of the entire database. To evaluate the performance of our uplift model, we compare the predicted uplift to the true result of the campaign. However, as for a single customer we can never know if he would have bought the article albeit not having been targeted or vice-versa, we have to evaluate indirectly. One approach to do this is to plot uplift curves (see Fig. 5) and measure the area under the uplift curve (shaded grey area) as employed in Ensemble Methods for Uplift Modelling. Compared to their top result of 0.73 ± 0.18 area under uplift curve, we obtain a score of 0.81 ± 0.20 on the same data and target.

Fig. 5: Campaign gain depending on fraction of customers targeted ordered by decreasing uplift. If customers would be targeted at random we would expect to have a linear graph (grey line). Curves have been averaged over 10-fold stratified cross-validation. We can see that by sending the campaign to the top 50% customers we get approx. 90% of the total reward.


Applications to Churn in Telecommunications

As explained in our last blog post, we have a model predicting the churn probability for every customer based on historical data. This should already rule out most of the sure things and sleeping dogs. However, such a model is of course not perfect. We therefore apply uplift modeling to further micro-segment our customers and also filter out the lost causes. The goal is to identify these segments with as few experiments as possible to avoid awaking sleeping dogs. In contrast to the simple case described above, we make use of 100s of variables, including usage data, contract information and demographics to perform the uplift modeling for telco companies. In practice, the first iteration of the campaign is sent to a few hundred customers, observing their behavior over the next weeks to make the first segmentation. From then on the campaign is iteratively improved and after a few runs a very detailed segmentation is obtained. 





June 13, 2017


Pascal de Buren

Share This Post