elog2cbs: Convert Event Log to customer-level summary statistic

View source: R/helpers.R

elog2cbsR Documentation

Convert Event Log to customer-level summary statistic

Description

Efficient implementation for the conversion of an event log into a customer-by-sufficient-statistic (CBS) data.frame, with a row for each customer, which is the required data format for estimating model parameters.

Usage

elog2cbs(elog, units = "week", T.cal = NULL, T.tot = NULL)

Arguments

elog

Event log, a data.frame with field cust for the customer ID and field date for the date/time of the event, which should be of type Date or POSIXt. If a field sales is present, it will be aggregated as well.

units

Time unit, either week, day, hour, min or sec. See difftime.

T.cal

End date of calibration period. Defaults to max(elog$date).

T.tot

End date of the observation period. Defaults to max(elog$date).

Details

The time unit for expressing t.x, T.cal and litt are determined via the argument units, which is passed forward to method difftime, and defaults to weeks.

Argument T.tot allows one to specify the end of the observation period, i.e. the last possible date of an event to still be included in the event log. If T.tot is not provided, then the date of the last recorded event will be assumed to coincide with the end of the observation period. If T.tot is provided, then any event that occurs after that date is discarded.

Argument T.cal allows one to split the summary statistics into a calibration and a holdout period. This can be useful for evaluating forecasting accuracy for a given dataset. If T.cal is not provided, then the whole observation period is considered, and is then subsequently used for for estimating model parameters. If it is provided, then the returned data.frame contains two additional fields, with x.star representing the number of repeat transactions during the holdout period of length T.star. And only those customers are contained, who have had at least one event during the calibration period.

Transactions with identical cust and date field are treated as a single transaction, with sales being summed up.

Value

data.frame with fields:

cust

Customer id (unique key).

x

Number of recurring events in calibration period.

t.x

Time between first and last event in calibration period.

litt

Sum of logarithmic intertransaction timings during calibration period.

sales

Sum of sales in calibration period, incl. initial transaction. Only if elog$sales is provided.

sales.x

Sum of sales in calibration period, excl. initial transaction. Only if elog$sales is provided.

first

Date of first transaction in calibration period.

T.cal

Time between first event and end of calibration period.

T.star

Length of holdout period. Only if T.cal is provided.

x.star

Number of events within holdout period. Only if T.cal is provided.

sales.star

Sum of sales within holdout period. Only if T.cal and elog$sales are provided.

Examples

data("groceryElog")
cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31", T.tot = "2007-12-30")
head(cbs)

mplatzer/BTYDplus documentation built on April 9, 2024, 3:11 a.m.