This guide will show you how to easily integrate your own models into the CLVTools package. As an example, we will integrate the BG/NBD model.
First, create a new R script with a class called clv.model.{your-model-name}
where the part in the brackets is the name of your model. An instance of this class is stored in the clv.fitted
object and will provide most of the model-specific functionalities (strategy pattern). In the class definition, make sure to inherit from clv.model.with.correlation
or clv.model.no.correlation
depending on whether your model supports correlation between the purchase and attrition process. See also the class documentation ?`clv.model-class`
.
setClass(Class = "clv.model.bgnbd.no.cov", contains = "clv.model.no.correlation")
#' @importFrom methods new
clv.model.bgnbd.no.cov <- function(){
return(new("clv.model.bgnbd.no.cov",
name.model = "BG/NBD Standard",
names.original.params.model = c(r="r", alpha="alpha", a="a", b="b"),
names.prefixed.params.model = c("log.r", "log.alpha", "log.a", "log.b"),
start.params.model = c(r=1, alpha = 3, a = 1, b = 3)))
}
During the model fitting process, the model class is used frequently to execute model-specific steps. You will need to implement the following methods on your class:
clv.model.check.input.args: Perform checks as necessary for your model. This mostly contains validation of the input parameters for your model.
#' @include all_generics.R
setMethod(f = "clv.model.check.input.args", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, clv.fitted, start.params.model, optimx.args, verbose, ...){
err.msg <- c()
# Have to be > 0 as will be logged
if(any(start.params.model <= 0)){
err.msg <- c(err.msg, "Please provide only model start parameters greater than 0 as they will be log()-ed for the optimization!")
}
check_err_msg(err.msg)
})
clv.model.put.estimation.input: Used to store additional inputs or data in the model object. This method is also used by the clv.model.with.correlation
super class to store whether correlation is estimated as well. If your model inherits from it, the parent method has to be called here before your additional code.
setMethod(f = "clv.model.put.estimation.input", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, ...){
# nothing to put specifically for this model
return(clv.model)
})
clv.model.transform.start.params.model: Used to apply transformations on the model start parameters before the estimation is started what can be used to constrain the parameters during the optimization. A common transformation pairs is log()
/exp()
in order to constrain the parameters to be > 0 during the actual LL calculations.
#' @importFrom stats setNames
setMethod("clv.model.transform.start.params.model", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, original.start.params.model){
# Log all user given or default start params
return(setNames(log(original.start.params.model[clv.model@names.original.params.model]),
clv.model@names.prefixed.params.model))
})
clv.model.backtransform.start.params.model: Apply transformation on the estimated model parameters after the estimation has finished. Usually used to revert the transformation first applied in clv.model.transform.start.params.model
.
#' @importFrom stats setNames
setMethod("clv.model.backtransform.estimated.params.model", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, prefixed.params.model){
# exp all prefixed params
return(exp(prefixed.params.model[clv.model@names.prefixed.params.model]))
})
clv.model.prepare.optimx.args: This is where you prepare the arguments to call optimx::optimx
. Most arguments, notably these for the interlayers, are already in the prepared.optimx.args
list and only the ones specific to your model have to be added. This includes your log likelihood (LL) functions both on individual (LL.function.ind
) and on aggregated levels (LL.function.sum
) and the data it requires as input (such as your Customer-By-Sufficiency (CBS) data)
CAUTION: Your parameters are likely at log-scale what needs to be considered in your LL function (i.e. apply exp()
) as otherwise incorrect results will be produced.
setMethod(f = "clv.model.prepare.optimx.args", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, clv.fitted, prepared.optimx.args,...){
# Only add LL function args, everything else is prepared already, incl. start parameters
optimx.args <- modifyList(prepared.optimx.args,
list(LL.function.sum = bgnbd_nocov_LL_sum,
LL.function.ind = bgnbd_nocov_LL_ind, # if doing correlation
obj = clv.fitted,
vX = clv.fitted@cbs$x,
vT_x = clv.fitted@cbs$t.x,
vT_cal = clv.fitted@cbs$T.cal,
# parameter ordering for the callLL interlayer
LL.params.names.ordered = c(log.r = "log.r",log.alpha = "log.alpha", log.a = "log.a", log.b = "log.b")),
keep.null = TRUE)
return(optimx.args)
})
clv.model.process.post.estimation: After the estimation has finished, this method is called to execute additional steps or store extra data in the clv.fitted
object.
setMethod("clv.model.process.post.estimation", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, clv.fitted, res.optimx){
# No additional step needed (i.e. store model specific stuff, extra process)
return(clv.fitted)
})
clv.model.process.newdata: This method is called whenever the parameter newdata
is passed to either predict
or plot
to execute steps that are required before predict
or plot
can progress with the new data. Most often this includes re-calculating the model statistics (i.e. the CBS) for the new transaction data and customers. Note that the newdata
object is not passed as an argument but can be found in the clv.data
slot in clv.fitted
where it has already replaced the previous data object.
setMethod(f = "clv.model.process.newdata", signature = signature(clv.model = "clv.model.bgnbd.no.cov"), definition = function(clv.model, clv.fitted, verbose){
# clv.data in clv.fitted is already replaced with newdata here
# Need to only redo cbs if given new data
clv.fitted@cbs <- bgnbd_cbs(clv.data = clv.fitted@clv.data)
return(clv.fitted)
})
clv.model.vcov.jacobi.diag: Creates the diagonal matrix to correct the vcov for the transformations applied to the parameters during the optimization process. The vcov is derived from the inverse of the hessian but because the parameters can be transformed during the optimization, the vcov is "at the wrong scale": If parameters are log()
-ed also the vcov estimates (as H^(-1)) are at log-scale, i.e. it is the variance of the log-parameters. The output from this method is multiplied with the inverse of the hessian matrix to obtain the vcov matrix at the correct scale.
setMethod(f = "clv.model.vcov.jacobi.diag", signature = signature(clv.model="clv.model.bgnbd.no.cov"), definition = function(clv.model, clv.fitted, prefixed.params){
# Create matrix with the full required size
m.diag <- diag(x = 0, ncol = length(prefixed.params), nrow=length(prefixed.params))
rownames(m.diag) <- colnames(m.diag) <- names(prefixed.params)
# Add the transformations for the model to the matrix
# All model params need to be exp()
m.diag[clv.model@names.prefixed.params.model,
clv.model@names.prefixed.params.model] <- diag(x = exp(prefixed.params[clv.model@names.prefixed.params.model]),
nrow = length(clv.model@names.prefixed.params.model),
ncol = length(clv.model@names.prefixed.params.model))
return(m.diag)
})
Create a new R script with a class called clv.{your-model-name}
where the part in the brackets is the name of your model. In the class definition, make sure to inherit from clv.fitted.transactions
(transaction models) or clv.fitted.spending
(spending models) in order to get the basic functionality. An instance of this class is returned to the user to use in other methods (summary, predict, ..) and it contains all required data, including instances of the clv.model
and clv.data
classes, and the CBS. It is created already when model fitting is initiated and is then used to perform covariate or no-covariate specific steps during the estimation process. See also the class documentation ?`clv.fitted.transactions-class`
and ?`clv.fitted.spending-class`
.
#' @keywords internal
#' @importFrom methods setClass
#' @include class_clv_model_bgnbd.R class_clv_data.R class_clv_fitted_transactions.R
setClass(Class = "clv.bgnbd", contains = "clv.fitted.transactions",
slots = c(
cbs = "data.table"),
# Prototype is labeled not useful anymore, but still recommended by Hadley / Bioc
prototype = list(
cbs = data.table()))
We now need a method to generate a CBS matrix, as well as a constructor for the class. We will call the CBS generating method from the constructor and return an instance of clv.{your-model-name}
. The instance has to be created with an instance of clv.data
that contains the transaction data and clv.model
that is used for most of the model-specific steps.
Constructor
clv.bgnbd <- function(cl, clv.data){
dt.cbs.bgnbd <- bgnbd_cbs(clv.data = clv.data)
clv.model <- clv.model.bgnbd.no.cov()
return(new("clv.bgnbd",
clv.fitted.transactions(cl=cl, clv.model=clv.model, clv.data=clv.data),
cbs = dt.cbs.bgnbd))
}
CBS method: Method to create the Customer-By-Sufficiency (CBS) data.table from the transaction data in the estimation period.
bgnbd_cbs <- function(clv.data){
Date <- Price <- x <- date.first.actual.trans <- date.last.transaction <- NULL
# Customer-By-Sufficiency (CBS) Matrix
# Only for transactions in calibration period
# Only repeat transactions are relevant
#
# For every customer:
# x: Number of repeat transactions := Number of actual transactions - 1
# t.x: Time between first actual and last transaction
# T.cal: Time between first actual transaction and end of calibration period
#
# All time is expressed in time units
trans.dt <- clv.data.get.transactions.in.estimation.period(clv.data = clv.data)
#Initial cbs, for every Id a row
cbs <- trans.dt[ , list(x =.N,
date.first.actual.trans = min(Date),
date.last.transaction = max(Date)),
by="Id"]
# Only repeat transactions -> Number of transactions - 1
cbs[, x := x - 1]
# t.x, T.cal
cbs[, ':='(t.x = clv.time.interval.in.number.tu(clv.time=clv.data@clv.time, interv=interval(start = date.first.actual.trans, end = date.last.transaction)),
T.cal = clv.time.interval.in.number.tu(clv.time=clv.data@clv.time, interv=interval(start = date.first.actual.trans, end = clv.data@clv.time@timepoint.estimation.end)))]
cbs[, date.last.transaction := NULL]
setkeyv(cbs, c("Id", "date.first.actual.trans"))
setcolorder(cbs, c("Id","x","t.x","T.cal", "date.first.actual.trans"))
return(cbs)
}
We now have both a model and a fitted class, the missing piece in the puzzle is the entry point that makes use of the functionality that we implemented. We use a generic method for this purpose because the same method might be applied to clv.data
objects with and without covariate data. Create a new R script called f_interface_{your-model-name}.R
with the following contents:
#' @exportMethod bgnbd
setGeneric("bgnbd", def = function(clv.data, start.params.model=c(), optimx.args=list(), verbose=TRUE, ...)
standardGeneric("bgnbd"))
#' @rdname bgnbd
#' @include class_clv_data.R
setMethod("bgnbd", signature = signature(clv.data="clv.data"), definition = function(clv.data,
start.params.model=c(),
optimx.args=list(),
verbose=TRUE,...){
check_err_msg(check_user_data_emptyellipsis(...))
cl <- match.call(call = sys.call(-1), expand.dots = TRUE)
obj <- clv.bgnbd(cl=cl, clv.data=clv.data)
return(clv.template.controlflow.estimate(clv.fitted=obj, start.params.model = start.params.model,
optimx.args = optimx.args, verbose=verbose))
})
Note that this is also the place where you would provide entrypoints for covariate versions of your model.
If you set up your model correctly, you should be able to estimate parameters:
library("CLVTools")
data("cdnow")
clv.apparel <- clvdata(cdnow,
date.format="ymd",
time.unit = "week",
estimation.split = "1997-09-30",
name.id = "Id",
name.date = "Date",
name.price = "Price")
est.bgnbd <- bgnbd(clv.data = clv.apparel, start.params.model = c(r = 1, alpha = 3, a = 1, b= 3),
optimx.args = list(control=list(trace=5) ))
summary(est.bgnbd)
The output should look similar to this:
BG/NBD Standard Model
Call:
bgnbd(clv.data = clv.apparel, start.params.model = c(r = 1, alpha = 3,
a = 1, b = 3), optimx.args = list(control = list(trace = 5)))
Fitting period:
Estimation start 1997-01-01
Estimation end 1997-09-30
Estimation length 38.8571 Weeks
Coefficients:
Estimate Std. Error z-val Pr(>|z|)
r 0.24259 0.05176 4.687 2.78e-06 ***
alpha 4.41360 0.08570 51.504 < 2e-16 ***
a 0.79292 0.23424 3.385 0.000712 ***
b 2.42589 0.29078 8.343 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Optimization info:
LL -9582.4292
AIC 19172.8584
BIC 19195.9190
KKT 1 TRUE
KKT 2 TRUE
fevals 32.0000
Method BFGS
Used Options:
Correlation FALSE
In your model class, create the method clv.model.expectation
that is used to calculate the unconditional expectation required for the transaction tracking plot:
#' @include all_generics.R
setMethod("clv.model.expectation", signature(clv.model="clv.model.bgnbd.no.cov"), function(clv.model, clv.fitted, dt.expectation.seq, verbose){
r <- alpha <- a <- b <- date.first.repeat.trans<- date.first.actual.trans <- T.cal <- t_i<- period.first.trans<-NULL
params_i <- clv.fitted@cbs[, c("Id", "T.cal", "date.first.actual.trans")]
params_i[, r := clv.fitted@prediction.params.model[["r"]]]
params_i[, alpha := clv.fitted@prediction.params.model[["alpha"]]]
params_i[, a := clv.fitted@prediction.params.model[["a"]]]
params_i[, b := clv.fitted@prediction.params.model[["b"]]]
fct.bgnbd.expectation <- function(params_i.t){
term1 <- params_i.t[,(a + b - 1)/(a - 1)]
term2 <- params_i.t[,(alpha/(alpha + t_i))^r]
term3 <- params_i.t[, vec_gsl_hyp2f1_e(r, b, a+b-1, t_i/(alpha+t_i) )$value]
return(term1 * (1 - term2 * term3))
}
return(DoExpectation(dt.expectation.seq = dt.expectation.seq, params_i = params_i,
fct.expectation = fct.bgnbd.expectation, clv.time = clv.fitted@clv.data@clv.time))
})
In your model class, create the method clv.model.predict.clv
:
#' @include all_generics.R
setMethod("clv.model.predict", signature(clv.model="clv.model.bgnbd.no.cov"), function(clv.model, clv.fitted, dt.predictions, verbose, continuous.discount.factor, ...){
r <- alpha <- a <- b <- period.length <- CET <- PAlive <- DERT <- i.CET <- i.PAlive <- i.DERT <- x <- t.x <- T.cal <- NULL
predict.number.of.periods <- dt.predictions[1, period.length]
# To ensure sorting, do everything in a single table
dt.result <- copy(clv.fitted@cbs[, c("Id", "x", "t.x", "T.cal")])
# Add CET
dt.result[, CET := bgnbd_nocov_CET(r = clv.fitted@prediction.params.model[["r"]],
alpha = clv.fitted@prediction.params.model[["alpha"]],
a = clv.fitted@prediction.params.model[["a"]],
b = clv.fitted@prediction.params.model[["b"]],
dPeriods = predict.number.of.periods,
vX = x,
vT_x = t.x,
vT_cal = T.cal)]
# Add PAlive
dt.result[, PAlive := bgnbd_nocov_PAlive(r = clv.fitted@prediction.params.model[["r"]],
alpha = clv.fitted@prediction.params.model[["alpha"]],
a = clv.fitted@prediction.params.model[["a"]],
b = clv.fitted@prediction.params.model[["b"]],
vX = x,
vT_x = t.x,
vT_cal = T.cal)]
# Add results to prediction table, by matching Id
dt.predictions[dt.result, CET := i.CET, on = "Id"]
dt.predictions[dt.result, PAlive := i.PAlive, on = "Id"]
return(dt.predictions)
})
If you implemented all of the above mentioned classes and methods, you will be able to make use of the implemented functionality:
library("CLVTools")
data("cdnow")
clv.apparel <- clvdata(cdnow,
date.format="ymd",
time.unit = "week",
estimation.split = "1997-09-30",
name.id = "Id",
name.date = "Date",
name.price = "Price")
est.bgnbd <- bgnbd(clv.data = clv.apparel, start.params.model = c(r = 1, alpha = 3, a = 1, b= 3),
optimx.args = list(control=list(trace=5) ))
summary(est.bgnbd)
coef(est.bgnbd)
predict(est.bgnbd, prediction.end = "2011-12-31")
plot(est.bgnbd)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.