Cohort analysis with R – "layer-cake-graph"

Data Science   |   
Published July 9, 2015   |   

Cohort analysis is one of the most powerful and popular techniques available to marketers for assessing long-term trends in customer retention and calculating lifetime value.
If you studied custora’s university, you would be amazed by the “layer-cake graph” they propose for Cohort analysis.
cohort analysis
Custora says: “The distinctive “layer-cake graph”, produced by looking at cohorts in calendar time, can provide powerful insights into the health of your business. At a given point in time, what percentage of your revenue or profit came from new vs. repeat customers? Tracking how that ratio has changed over time can give you insight into whether you’re fueling top-line growth solely through new customer acquisition – or whether you’re continuing to nurture those relationships with your existing customers over time.”
Usually, we focus on calculating lifetime value or comparing cohorts, but I was really impressed with this useful analytical approach and tried to do such a chart in R.
After we process a great deal of data, it should be of the following structure. There are Cohort01, Cohort02, etc. – cohort’s names due to customer signup date or first purchase date and M1, M2, etc. – period of cohort’s life-time (first month, second month, etc.):
cohort sum
For example, Cohort-1 was signed up in January (M1) and brought us $270,000 during the first month (M1). Cohort-5 was signed up in May (M5) and brought us $31,000 in September (M9).
Now, suppose you’ve done data process and got cohort.sum data frame as a result and it looks like the table above. You can replicate this data with the following code:
cohort analysis
Let’s create the “layer-cake” chart with the following R code:
cohort sum analysis
And we take such amazing chart:
cohort analysis
It seems like there was some promo in the eighth month (M8) and a few cohorts responded. Really useful graph.
Although the R code looks pretty simple, I spent most of the time aggregating data. I can’t propose universal R code for this task, as the structure of your initial data can be completely different.
Have questions? Please feel welcome to ask!
This article originally appeared here. Republished with permission. Submit your copyright complaints here.