yrs_zero: Prepare history table for retention calculation

Description Usage Arguments Details Functions See Also Examples

Description

Splitting a history table into two data frames: (1) $year0 is a copy of the input history table while (2) $history is a copy with just 2 columns: cust_id and year. This data structure is intended to allow for easy filtering prior to calculating retention curves (details in section below).

Usage

1
2
3
4
5
yrs_zero_split(history)

yrs_zero_filter(history_split, func)

yrs_zero_sample(history_split, samp_size, set_seed = TRUE)

Arguments

history

license history data frame

history_split

license history list produced by yrs_zero_split()

func

function to be used for subsetting customers

samp_size

customer sample size passed to sample

set_seed

if TRUE, will run set.seed for reproducibility

Details

The "year zero" framework identifies the characteristics of interest for retention curves. For example, if we look at customers in 2008 (i.e. year0 = 2008), the retention curve is represented by the percentage of these customers who hold licenses in subsequent years.

This year zero focus complicates things if we want to filter customers though. Say we want to look at retention for 30-year-olds. If we applied a filter directly to the history table, we wouldn't be able to calculate retention b/c the future sales of these customers would be dropped. The yrs_zero_filter() function makes it easy to look at specific sets of customers once yrs_zero_split() has been run.

Functions

See Also

Other functions to estimate annual license buying: yrs_avidity, yrs_calc_avg, yrs_calc, yrs_fit, yrs_lifetime, yrs_plot, yrs_predict_avg, yrs_predict, yrs_result

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(dplyr)
data(all_sports)

# manually calculate retention for those aged 30-50 in 2010
# - this is a bit awkward and it's easy to make a mistake
#   we also need to keep in mind that only 2011-onward is relevant
year0 <- filter(all_sports, age_year %in% 30:50, year == 2010)
all_sports %>%
    semi_join(year0, by = "cust_id") %>%
    count(year) %>%
    mutate(retain_rate = n / max(n))

# the "yrs_zero" functions make filtering more straightforward
df_split <- yrs_zero_split(all_sports) %>%
    yrs_zero_filter(function(x) filter(x, age_year %in% 30:50, year == 2010))
df_split$year0
df_split$history

# downstream calculations are consistent, irrespective of customer filter
yrs_calc_retain(df_split)
yrs_zero_sample(df_split, 1000) %>% yrs_calc_retain()
yrs_zero_split(all_sports) %>%
    yrs_zero_filter(function(x) filter(x, age_year %in% 25:35)) %>%
    yrs_calc_retain()

southwick-associates/lifetime documentation built on Feb. 24, 2020, 9:33 a.m.