BTYD-package: Implementing Buy 'Til You Die Models

Description Details Author(s) References


This package contains functions for data preparation, parameter estimation, scoring, and plotting for the BG/BB, BG/NBD and Pareto/NBD models.


Package: BTYD
Type: Package
Version: 2.4
Date: 2014-11-06
License: GPL-3

This project was funded and sponsored by the Wharton Customer Analytics Initiative (

This package implements the BG/BB, BG/NBD and Pareto/NBD models, which capture/project customer purchase patterns in a typical non-contractual setting.

While these models are developed on a customer-by-customer basis, they do not necessarily require data at such a granular level. The Pareto/NBD requires a “customer-by-sufficient-statistic” matrix (CBS), which consists of each customer's frequency, recency (the time of their last transactions) and total time observed - but the timing of each and every transaction (other than the last) is not needed by the model. If, however, you do have the granular data in the form of an event log (which contains at least columns for customer identification and the time of each transaction, and potentially more columns such as transaction amount), this package provides functions to convert it to a CBS. You can use dc.ReadLines to get your event log from a comma-delimited file to an event log usable by this package; it is possible to use read.table or read.csv, but formatting will be required afterwards. You can then convert the event log directly to a CBS (for both the calibration and holdout periods) using dc.ElogToCbsCbt. As the name suggests, this function also produces a customer-by-time matrix (CBT). This matrix consists of a row for every customer and a column for every date, and is populated by a statistic of your choice (reach, frequency, or spend). It is not necessary for any of the models presented in this package, but is used as a building block to produce the CBS.

The BG/NBD model requires all the same inputs as the Pareto/NBD model.

The BG/BB model requires the same information as the Pareto/NBD model, but as it models discrete transaction opportunities, this information can be condensed into a recency-frequency matrix. A recency-frequency matrix contains a row for every recency/frequency combination in the given time period, and each row contains the number of customers with that recency/frequency combination. Since frequency will always be less than or equal to recency, this matrix will contain (n)(n-1)/2 + 1 rows at most, with n as the number of transaction opportunities (of course, the maximum number of rows for pooled data - for customers with varying numbers of transaction opportunities - will be the sum of the above equation for each unique number of transaction opportunities). You can convert a CBS to recency-frequency matrices using dc.MakeRFmatrixCal and dc.MakeRFmatrixHoldout.

If you want to test the data contained in the package, or have data formatted as a customer-by-sufficient-statistic or recency-frequency matrix, a good starting place would be pnbd.EstimateParameters, bgnbd.EstimateParameters, or bgbb.EstimateParameters.

Following that, pnbd.PlotFrequencyInCalibration, bgnbd.PlotFrequencyInCalibration and bgbb.PlotFrequencyInCalibration will give a check that the model fits the data in-sample. Further plotting functions, comparing actual and expected results, are labelled “pnbd.Plot...”, “bgnbd.Plot...” and “bgbb.Plot...”. The building blocks of these functions are also provided: pnbd.LL, bgnbd.LL bgbb.LL, pnbd.pmf, bgnbd.pmf, bgbb.pmf, pnbd.Expectation, bgnbd.Expectation, bgbb.Expectation, pnbd.ConditionalExpectedTransactions, bgnbd.ConditionalExpectedTransactions, and bgbb.ConditionalExpectedTransactions may be of particular interest.

This package uses the following conventions:

The time period used to estimate the model parameters is called the calibration period. Users may be accustomed to this being called the estimation period, or simply being referred to as “in-sample”. Function parameter names generally follow this convention: for example, “” is used to refer to the number of transaction opportunities in the calibration period.

The time period used to validate model performance is called the holdout period. Users may be accustomed to this being called the validation period, or simply being referred to as “out-of-sample”. Function parameters relating to this time period are generally appended with “.star”. For example, is used to refer to the number of transaction opportunities in the holdout period.

As described in the papers referenced below, the BG/BB, BG/NBD and Pareto/NBD models are generally concerned with repeat transactions, not total transactions. This means that a customer's first transaction in the calibration period is usually not part of the data being modelled - this is due to the fact that a new customer generally does not show up “on the comapany's radar” until after their first purchase has taken place. This means that the modal number of repeat purchases tends to be zero. If your data does not have a relatively large number of customers with zero transactions, but does have a relatively large number of customers with one transaction, and the estimation functions are struggling, the problem is most likely that you are including customers' very first transactions. Some of the data-conversion functions have examples illustrating how to work with data that includes this very first transaction. Note that this does not apply to the holdout period; in the holdout period, we already know about the customer and take all of their previous transactions into account.


Lukasz Dziurzynski, Daniel McCarthy, Edward Wadsworth

Contributors: Peter Fader, Elea McDonnell Feit, Bruce Hardie, Arun Gopalakrishnan, Eric Schwartz, Yao Zhang, Elea McDonnell Feit

Maintainer: Daniel McCarthy <[email protected]>


See for papers, notes, and datasets relating to applied probability models in marketing.

Fader, Peter S., and Bruce G.S. Hardie. “A Note on Deriving the Pareto/NBD Model and Related Expressions.” November. 2005. Web.

Fader, Peter S., Bruce G.S. Hardie, and Ka L. Lee. “RFM and CLV: Using Iso-Value Curves for Customer Base Analysis.” Journal of Marketing Research Vol.42, pp.415-430. November. 2005.

Fader, Peter S., and Bruce G.S. Hardie. “Deriving an Expression for P (X(t) = x) Under the Pareto/NBD Model.” September. 2006. Web.

Fader, Peter S., and Bruce G.S. Hardie. “Creating an RFM summary using Excel.” December. 2008. Web.

Fader, Peter S., Bruce G.S. Hardie, and Jen Shang. “Customer-Base Analysis in a Discrete-Time Noncontractual Setting.” Marketing Science 29(6), pp. 1086-1108. 2010. INFORMS.

Jerath, Kinshuk, Peter S. Fader, and Bruce G.S. Hardie. “Customer-Base Analysis on a 'Data Diet': Model Inference Using Repeated Cross-Sectional Summary (RCSS) Data.” June. 2011. Available at SSRN: or

Fader, Peter S., Bruce G.S. Hardie, and Ka L. Lee. ““Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model.” Marketing Science Vol.24, pp.275-284. Spring. 2005.

Fader, Peter S., Hardie, Bruce G.S., and Lee, Ka Lok. “Computing P(alive) Using the BG/NBD Model.” December. 2008. Web.

BTYD documentation built on May 29, 2017, 10:42 p.m.