clvdata: Create an object for transactional data required to estimate...

View source: R/f_interface_clvdata.R

clvdataR Documentation

Create an object for transactional data required to estimate CLV

Description

Creates a data object that contains the prepared transaction data and that is used as input for model fitting. The transaction data may be split in an estimation and holdout sample if desired. The model then will only be fit on the estimation sample.

If covariates should be used when fitting a model, covariate data can be added to an object returned from this function.

Usage

clvdata(
  data.transactions,
  date.format,
  time.unit,
  estimation.split = NULL,
  name.id = "Id",
  name.date = "Date",
  name.price = "Price"
)

Arguments

data.transactions

Transaction data as data.frame or data.table. See details.

date.format

Character string that indicates the format of the date variable in the data used. See details.

time.unit

What time unit defines a period. May be abbreviated, capitalization is ignored. See details.

estimation.split

Indicates the length of the estimation period. See details.

name.id

Column name of the customer id in data.transactions.

name.date

Column name of the transaction date in data.transactions.

name.price

Column name of price in data.transactions. NULL if no spending data is present.

Details

data.transactions A data.frame or data.table with customers' purchase history. Every transaction record consists of a purchase date and a customer id. Optionally, the price of the transaction may be included to also allow for prediction of future customer spending.

time.unit The definition of a single period. Currently available are "hours", "days", "weeks", and "years". May be abbreviated.

date.format A single format to use when parsing any date that is given as character input. This includes the dates given in data.transaction, estimation.split, or as an input to any other function at a later point, such as prediction.end in predict. The function parse_date_time of package lubridate is used to parse inputs and hence all formats it accepts in argument orders can be used. For example, a date of format "year-month-day" (i.e., "2010-06-17") is indicated with "ymd". Other combinations such as "dmy", "dym", "ymd HMS", or "HMS dmy" are possible as well.

estimation.split May be specified as either the number of periods since the first transaction or the timepoint (either as character, Date, or POSIXct) at which the estimation period ends. The indicated timepoint itself will be part of the estimation sample. If no value is provided or set to NULL, the whole dataset will used for fitting the model (no holdout sample).

Aggregation of Transactions

Multiple transactions by the same customer that occur on the minimally representable temporal resolution are aggregated to a single transaction with their spending summed. For time units days and any other coarser Date-based time units (i.e. weeks, years), this means that transactions on the same day are combined. When using finer time units such as hours which are based on POSIXct, transactions on the same second are aggregated.

For the definition of repeat-purchases, combined transactions are viewed as a single transaction. Hence, repeat-transactions are determined from the aggregated transactions.

Value

An object of class clv.data. See the class definition clv.data for more details about the returned object.

The function summary can be used to obtain and print a summary of the data. The generic accessor function nobs is available to read out the number of customers.

See Also

SetStaticCovariates to add static covariates

SetDynamicCovariates for how to add dynamic covariates

plot to plot the repeat transactions

summary to summarize the transaction data

pnbd to fit Pareto/NBD models on a clv.data object

Examples




data("cdnow")

# create clv data object with weekly periods
#    and no splitting
clv.data.cdnow <- clvdata(data.transactions = cdnow,
                          date.format="ymd",
                          time.unit = "weeks")

# same but split after 37 periods
clv.data.cdnow <- clvdata(data.transactions = cdnow,
                          date.format="ymd",
                          time.unit = "w",
                          estimation.split = 37)

# same but estimation end on the 15th Oct 1997
clv.data.cdnow <- clvdata(data.transactions = cdnow,
                          date.format="ymd",
                          time.unit = "w",
                          estimation.split = "1997-10-15")

# summary of the transaction data
summary(clv.data.cdnow)

# plot the total number of transactions per period
plot(clv.data.cdnow)

## Not run: 
# create data with the weekly periods defined to
#   start on Mondays

# set start of week to Monday
oldopts <- options("lubridate.week.start"=1)

# create clv.data while Monday is the beginning of the week
clv.data.cdnow <- clvdata(data.transactions = cdnow,
                          date.format="ymd",
                          time.unit = "weeks")

# Dynamic covariates now have to be supplied for every Monday

# set week start to what it was before
options(oldopts)

## End(Not run)





CLVTools documentation built on Oct. 13, 2024, 9:07 a.m.