Description Usage Arguments Details Functions See Also Examples
Splitting a history table into two data frames: (1) $year0 is a copy of the input history table while (2) $history is a copy with just 2 columns: cust_id and year. This data structure is intended to allow for easy filtering prior to calculating retention curves (details in section below).
1 2 3 4 5 | yrs_zero_split(history)
yrs_zero_filter(history_split, func)
yrs_zero_sample(history_split, samp_size, set_seed = TRUE)
|
history |
license history data frame |
history_split |
license history list produced by yrs_zero_split() |
func |
function to be used for subsetting customers |
samp_size |
customer sample size passed to |
set_seed |
if TRUE, will run |
The "year zero" framework identifies the characteristics of interest for retention curves. For example, if we look at customers in 2008 (i.e. year0 = 2008), the retention curve is represented by the percentage of these customers who hold licenses in subsequent years.
This year zero focus complicates things if we want to filter customers though. Say we want to look at retention for 30-year-olds. If we applied a filter directly to the history table, we wouldn't be able to calculate retention b/c the future sales of these customers would be dropped. The yrs_zero_filter() function makes it easy to look at specific sets of customers once yrs_zero_split() has been run.
yrs_zero_split
: Split history data frame into a list
yrs_zero_filter
: Filter customers prior to retention calculation
yrs_zero_sample
: Sample customers prior to retention calculation
Other functions to estimate annual license buying: yrs_avidity
,
yrs_calc_avg
, yrs_calc
,
yrs_fit
, yrs_lifetime
,
yrs_plot
, yrs_predict_avg
,
yrs_predict
, yrs_result
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | library(dplyr)
data(all_sports)
# manually calculate retention for those aged 30-50 in 2010
# - this is a bit awkward and it's easy to make a mistake
# we also need to keep in mind that only 2011-onward is relevant
year0 <- filter(all_sports, age_year %in% 30:50, year == 2010)
all_sports %>%
semi_join(year0, by = "cust_id") %>%
count(year) %>%
mutate(retain_rate = n / max(n))
# the "yrs_zero" functions make filtering more straightforward
df_split <- yrs_zero_split(all_sports) %>%
yrs_zero_filter(function(x) filter(x, age_year %in% 30:50, year == 2010))
df_split$year0
df_split$history
# downstream calculations are consistent, irrespective of customer filter
yrs_calc_retain(df_split)
yrs_zero_sample(df_split, 1000) %>% yrs_calc_retain()
yrs_zero_split(all_sports) %>%
yrs_zero_filter(function(x) filter(x, age_year %in% 25:35)) %>%
yrs_calc_retain()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.