predict.clv.fitted.transactions: Predict CLV from a fitted transaction model
In CLVTools: Tools for Customer Lifetime Value Estimation

View source: R/f_interface_predict_clvfittedtransactions.R

predict.clv.fitted.transactions

R Documentation

Predict CLV from a fitted transaction model

Description

Probabilistic customer attrition models predict in general three expected characteristics for every customer:

"conditional expected transactions" (CET), which is the number of transactions to expect from a customer during the prediction period,
"probability of a customer being alive" (PAlive) at the end of the estimation period and
"discounted expected residual transactions" (DERT) for every customer, which is the total number of transactions for the residual lifetime of a customer discounted to the end of the estimation period. In the case of time-varying covariates, instead of DERT, "discounted expected conditional transactions" (DECT) is predicted. DECT does only cover a finite time horizon in contrast to DERT. For continuous.discount.factor=0, DECT corresponds to CET.

In order to derive a monetary value such as CLV, customer spending has to be considered. If the clv.data object contains spending information, customer spending can be predicted using a Gamma/Gamma spending model for parameter predict.spending and the predicted CLV is be calculated (if the transaction model supports DERT/DECT). In this case, the prediction additionally contains the following two columns:

"predicted.mean.spending", the mean spending per transactions as predicted by the spending model.
"CLV", the customer lifetime value. CLV is the product of DERT/DECT and predicted spending.

Uncertainty estimates are available for all predicted quantities using bootstrapping.

New customer prediction

The fitted model can also be used to predict the number of transactions a fictional, single, average newly alive customer is expected to make at the moment of the first transaction ("coming alive"). This is, for a customer which has no existing order history. For covariate models, the prediction is for an average customer with the given covariates.

The individual-level unconditional expectation that is also used for the tracking plot is used to obtain this prediction. For models without covariates, the prediction hence is the same for all customers and independent of when a customer comes alive. For models with covariates, the prediction is the same for all customers with the same covariates.

The data on which the model was fit and which is stored in it is NOT used for this prediction. See examples and newcustomer for more details.

Usage

## S3 method for class 'clv.fitted.transactions'
predict(
  object,
  newdata = NULL,
  prediction.end = NULL,
  predict.spending = gg,
  continuous.discount.factor = log(1 + 0.1),
  uncertainty = c("none", "boots"),
  level = 0.9,
  num.boots = 100,
  verbose = TRUE,
  ...
)

## S4 method for signature 'clv.fitted.transactions'
predict(
  object,
  newdata = NULL,
  prediction.end = NULL,
  predict.spending = gg,
  continuous.discount.factor = log(1 + 0.1),
  uncertainty = c("none", "boots"),
  level = 0.9,
  num.boots = 100,
  verbose = TRUE,
  ...
)

Arguments

`object`	A fitted clv transaction model for which prediction is desired.
`newdata`	A clv data object or data for the new customer prediction (see newcustomer) for which predictions should be made with the fitted model. If none or NULL is given, predictions are made for the data on which the model was fit.
`prediction.end`	Until what point in time to predict. This can be the number of periods (numeric) or a form of date/time object. See details.
`predict.spending`	Whether and how to predict spending and based on it also CLV, if possible. See details.
`continuous.discount.factor`	continuous discount factor to use to calculate `DERT/DECT`. Defaults to a 10% continuous annual rate. See details.
`uncertainty`	Method to produce confidence intervals of the predictions (parameter uncertainty). Either "none" (default) or "boots".
`level`	Required confidence level, if `uncertainty="boots"`.
`num.boots`	Number of bootstrap repetitions, if `uncertainty="boots"`. A low number may not produce intervals for all customers if they are not sampled.
`verbose`	Show details about the running of the function.
`...`	Ignored

Details

predict.spending indicates whether to predict customers' spending and if so, the spending model to use. Accepted inputs are either a logical (TRUE/FALSE), a method to fit a spending model (i.e. gg), or an already fitted spending model. If provided TRUE, a Gamma-Gamma model is fit with default options. If argument newdata is provided, the spending model is fit on newdata. Predicting spending is only possible if the transaction data contains spending information. See examples for illustrations of valid inputs.

The newdata argument has to be a clv data object of the exact same class as the data object on which the model was fit. In case the model was fit with covariates, newdata needs to contain identically named covariate data.

The use case for newdata is mainly two-fold: First, to estimate model parameters only on a sample of the data and then use the fitted model object to predict or plot for the full data set provided through newdata. Second, for models with dynamic covariates, to provide a clv data object with longer covariates than contained in the data on which the model was estimated what allows to predict or plot further. When providing newdata, some models might require additional steps that can significantly increase runtime.

To predict for new customers, the output of newcustomer has to be given to newdata. See examples.

prediction.end indicates until when to predict or plot and can be given as either a point in time (of class Date, POSIXct, or character) or the number of periods. If prediction.end is of class character, the date/time format set when creating the data object is used for parsing. If prediction.end is the number of periods, the end of the fitting period serves as the reference point from which periods are counted. Only full periods may be specified. If prediction.end is omitted or NULL, it defaults to the end of the holdout period if present and to the end of the estimation period otherwise.

The first prediction period is defined to start right after the end of the estimation period. If for example weekly time units are used and the estimation period ends on Sunday 2019-01-01, then the first day of the first prediction period is Monday 2019-01-02. Each prediction period includes a total of 7 days and the first prediction period therefore will end on, and include, Sunday 2019-01-08. Subsequent prediction periods again start on Mondays and end on Sundays. If prediction.end indicates a timepoint on which to end, this timepoint is included in the prediction period.

continuous.discount.factor is the continuous rate used to discount the expected residual transactions (DERT/DECT). An annual rate of (100 x d)% equals a continuous rate delta = ln(1+d). To account for time units which are not annual, the continuous rate has to be further adjusted to delta=ln(1+d)/k, where k are the number of time units in a year.

Value

An object of class data.table with columns:

`Id`	The respective customer identifier
`period.first`	First timepoint of prediction period
`period.last`	Last timepoint of prediction period
`period.length`	Number of time units covered by the period indicated by `period.first` and `period.last` (including both ends).
`PAlive`	Probability to be alive at the end of the estimation period
`CET`	The Conditional Expected Transactions: The number of transactions expected until prediction.end.
`DERT or DECT`	Discounted Expected Residual Transactions or Discounted Expected Conditional Transactions for dynamic covariates models
`actual.x`	Actual number of transactions until prediction.end. Only if there is a holdout period and the prediction ends in it, otherwise not reported.
`actual.total.spending`	Actual total spending until prediction.end. Only if there is a holdout period and the prediction ends in it, otherwise not reported.
`predicted.mean.spending`	The mean spending per transactions as predicted by the spending model.
`predicted.total.spending`	The predicted total spending until prediction.end (`CET*predicted.mean.spending`).
`predicted.CLV`	Customer Lifetime Value based on `DERT/DECT` and `predicted.mean.spending`.

If predicting for new customers (using newcustomer()), a numeric scalar indicating the expected number of transactions is returned instead.

Uncertainty Estimates

Bootstrapping is used to provide confidence intervals of all predicted metrics. These provide an estimate of parameter uncertainty. To create bootstrapped data, customer ids are sampled with replacement until reaching original length and all transactions of the sampled customers are used to create a new clv.data object. A new model is fit on the bootstrapped data with the exact same specification as used when fitting object (incl. start parameters and 'optimx.args') and it is then used to predict on this data.

It is highly recommended to fit the original model (object) with a robust optimization method, such as Nelder-Mead (optimx.args=list(method='Nelder-Mead')). This ensures that the model can also be fit on the bootstrapped data.

All prediction parameters, incl prediction.end and continuous.discount.factor, are forwarded to the prediction on the bootstrapped data. Per customer, the boundaries of the confidence intervals of each predicted metric are the sample quantiles (quantile(x, probs=c((1-level)/2, 1-(1-level)/2)).

See clv.bootstrapped.apply to create a custom bootstrapping procedure.

Examples




data("apparelTrans")
# Fit pnbd standard model on data, WITH holdout
apparel.holdout <- clvdata(apparelTrans, time.unit="w",
                           estimation.split=52, date.format="ymd")
apparel.pnbd <- pnbd(apparel.holdout)

# Predict until the end of the holdout period
predict(apparel.pnbd)

# Predict until 10 periods (weeks in this case) after
#   the end of the 37 weeks fitting period
predict(apparel.pnbd, prediction.end = 10) # ends on 2010-11-28

# Predict until 31th Dec 2016 with the timepoint as a character
predict(apparel.pnbd, prediction.end = "2016-12-31")

# Predict until 31th Dec 2016 with the timepoint as a Date
predict(apparel.pnbd, prediction.end = lubridate::ymd("2016-12-31"))


# Predict future transactions but not spending and CLV
predict(apparel.pnbd, predict.spending = FALSE)

# Predict spending by fitting a Gamma-Gamma model
predict(apparel.pnbd, predict.spending = gg)

# Fit a spending model separately and use it to predict spending
apparel.gg <- gg(apparel.holdout, remove.first.transaction = FALSE)
predict(apparel.pnbd, predict.spending = apparel.gg)


# Fit pnbd standard model WITHOUT holdout
pnc <- pnbd(clvdata(apparelTrans, time.unit="w", date.format="ymd"))

# This fails, because without holdout, a prediction.end is required
## Not run: 
predict(pnc)

## End(Not run)

# But it works if providing a prediction.end
predict(pnc, prediction.end = 10) # ends on 2016-12-17

# Predict the number of transactions a single, fictional, average new
# customer is expected to make in the first 3.45 weeks since coming alive
# See ?newcustomer() for more examples
predict(apparel.pnbd, newdata = newcustomer(num.periods=3.45))

CLVTools documentation built on April 4, 2025, 2:02 a.m.