Description Usage Arguments Details Value Examples
Takes two CBT or CBS matrices and ensures that the second one has the same row names as the first.
1 | dc.MergeCustomers(data.correct, data.to.correct)
|
data.correct |
CBT or CBS with the correct customer IDs as row names. Usually from the calibration period. |
data.to.correct |
CBT or CBS which needs to be fixed (customer IDs inserted). Usually from the holdout period. |
Care should be taken in using this function. It inserts zero values in all rows that were not in the original holdout period data. This behavior does not cause a problem if using CBT matrices, but will cause a problem if using CBS matrices (for example, the output will report all customers with a holdout period length of zero). However, this particular issue is easily fixed (see examples) and should not cause problems.
A work-around to avoid using this function is presented in the example for
dc.BuildCBSFromCBTAndDates
- build the full CBT and only use the columns
applying to each particular time period to construct separate CBTs, and from
them, CBSs. That is a much cleaner and less error-prone method; however, on
occasion the data will not be available in event log format and you may not
be able to construct a CBT for both time periods together.
Updated holdout period CBT or CBS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | elog <- dc.ReadLines(system.file("data/cdnowElog.csv", package="BTYD"),2,3,5)
elog[,"date"] <- as.Date(elog[,"date"], "%Y%m%d")
cutoff.date <- as.Date("1997-09-30")
cal.elog <- elog[which(elog[,"date"] <= cutoff.date),]
holdout.elog <- elog[which(elog[,"date"] > cutoff.date),]
# Create calibration period CBT from cal.elog
cal.reach.cbt <- dc.CreateReachCBT(cal.elog)
# Create holdout period CBT from holdout.elog
holdout.reach.cbt <- dc.CreateReachCBT(holdout.elog)
# Note the difference:
nrow(cal.reach.cbt) # 2357 customers
nrow(holdout.reach.cbt) # 684 customers
# Create a "fixed" holdout period CBT, with the same number
# of customers in the same order as the calibration period CBT
fixed.holdout.reach.cbt <- dc.MergeCustomers(cal.reach.cbt, holdout.reach.cbt)
nrow(fixed.holdout.reach.cbt) # 2357 customers
# You can verify that the above is correct by turning these into a CBS
# (see \code{\link{dc.BuildCBSFromCBTAndDates}} and using
# \code{\link{pnbd.PlotFreqVsConditionalExpectedFrequency}}, for example
# Alternatively, we can fix the CBS, instead of the CBS:
cal.start.dates.indices <- dc.GetFirstPurchasePeriodsFromCBT(cal.reach.cbt)
cal.start.dates <- as.Date(colnames(cal.reach.cbt)[cal.start.dates.indices])
cal.end.dates.indices <- dc.GetLastPurchasePeriodsFromCBT(cal.reach.cbt)
cal.end.dates <- as.Date(colnames(cal.reach.cbt)[cal.end.dates.indices])
T.cal.total <- rep(cutoff.date, nrow(cal.reach.cbt))
cal.dates <- data.frame(cal.start.dates, cal.end.dates, T.cal.total)
# Create calibration period customer-by-sufficient-statistic data frame,
# using weeks as the unit of time.
cal.cbs <- dc.BuildCBSFromCBTAndDates(cal.reach.cbt,
cal.dates,
per="week",
cbt.is.during.cal.period=TRUE)
# Force the calibration period customer-by-sufficient-statistic to only
# contain repeat transactions (required by BG/BB and Pareto/NBD models)
cal.cbs[,"x"] <- cal.cbs[,"x"] - 1
holdout.start <- cutoff.date+1
holdout.end <- as.Date(colnames(fixed.holdout.reach.cbt)[ncol(fixed.holdout.reach.cbt)])
holdout.dates <- c(holdout.start, holdout.end)
# Create holdout period customer-by-sufficient-statistic data frame,
# using weeks as the unit of time.
holdout.cbs <- dc.BuildCBSFromCBTAndDates(holdout.reach.cbt,
holdout.dates,
per="week",
cbt.is.during.cal.period=FALSE)
# Note the difference:
nrow(cal.cbs) # 2357 customers
nrow(holdout.cbs) # 684 customers
# Create a "fixed" holdout period CBS, with the same number
# of customers in the same order as the calibration period CBS
fixed.holdout.cbs <- dc.MergeCustomers(cal.cbs, holdout.cbs)
nrow(fixed.holdout.cbs) # 2357 customers
# Furthermore, this function will assign a zero value to all fields
# that were not in the original holdout period CBS. Since T.star is the
# same for all customers in the holdout period, we should fix that:
fixed.holdout.cbs[,"T.star"] <- rep(max(fixed.holdout.cbs[,"T.star"]),nrow(fixed.holdout.cbs))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.