upliftRF: Uplift Random Forests
In uplift: Uplift Modeling

Description Usage Arguments Details Value Author(s) References Examples

upliftRF implements Random Forests with split criteria designed for binary uplift modeling tasks.

## S3 method for class 'formula'
upliftRF(formula, data, ...)

## Default S3 method:
upliftRF(
x,  
y,  
ct, 
mtry = floor(sqrt(ncol(x))),
ntree = 100, 
split_method = c("ED", "Chisq", "KL", "L1", "Int"),
interaction.depth = NULL,
bag.fraction = 0.5,
minsplit = 20, 
minbucket_ct0 = round(minsplit/4), 
minbucket_ct1 = round(minsplit/4), 
keep.inbag = FALSE,
verbose = FALSE,
...)  

## S3 method for class 'upliftRF'
print(x, ...)

`data`	A data frame containing the variables in the model. It should include a variable reflecting the binary treatment assignment of each observation (coded as 0/1).
`x, formula`	a data frame of predictors or a formula describing the model to be fitted. A special term of the form `trt()` must be used in the model equation to identify the binary treatment variable. For example, if the treatment is represented by a variable named `treat`, then the right hand side of the formula must include the term +`trt(treat)`.
`y`	a binary response (numeric) vector.
`ct`	a binary (numeric) vector representing the treatment assignment (coded as 0/1).
`mtry`	the number of variables to be tested in each node; the default is floor(sqrt(ncol(x))).
`ntree`	the number of trees to generate in the forest; default is ntree = 100.
`split_method`	the split criteria used at each node of each tree; Possible values are: "ED" (Euclidean distance), "Chisq" (Chi-squared divergence), "KL" (Kullback-Leibler divergence), "Int" (Interaction method).
`interaction.depth`	The maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2-way interactions, etc. The default is to grow trees to maximal depth, constrained on the arguments specified in `minsplit` and `minbucket`.
`bag.fraction`	the fraction of the training set observations randomly selected for the purpose of fitting each tree in the forest.
`minsplit`	the minimum number of observations that must exist in a node in order for a split to be attempted.
`minbucket_ct0`	the minimum number of control observations in any terminal <leaf> node.
`minbucket_ct1`	the minimum number of treatment observations in any terminal <leaf> node.
`keep.inbag`	if set to `TRUE`, an nrow(x) by ntree matrix is returned, whose entries are the "in-bag" samples in each tree.
`verbose`	print status messages?
`...`	optional parameters to be passed to the low level function upliftRF.default.

Uplift Random Forests estimate personalized treatment effects (a.k.a. uplift) by binary recursive partitioning. The algorithm and split methods are described in Guelman et al. (2013a, 2013b).

An object of class upliftRF, which is a list with the following components:

`call`	the original call to `upliftRF`
`trees`	the tree structure that was learned
`split_method`	the split criteria used at each node of each tree
`ntree`	the number of trees used
`mtry`	the number of variables tested at each node
`var.names`	a character vector with the name of the predictors
`var.class`	a character vector containing the class of each predictor variable
`inbag`	an nrow(x) by ntree matrix showing the in-bag samples used by each tree

Leo Guelman <leo.guelman@gmail.com>

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013a). Uplift random forests. Cybernetics & Systems, forthcoming.

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013b). Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study. Submitted.

Su, X., Tsai, C., Wang, H., Nickerson, D., and Li, B. (2009). Subgroup Analysis via Recursive Partitioning. Journal of Machine Learning Research, 10, 141-158.

library(uplift)

### simulate data for uplift modeling

set.seed(123)
dd <- sim_pte(n = 1000, p = 20, rho = 0, sigma =  sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) 

### fit uplift random forest

fit1 <- upliftRF(y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat),
                 data = dd, 
                 mtry = 3,
                 ntree = 100, 
                 split_method = "KL",
                 minsplit = 200, 
                 verbose = TRUE)
print(fit1)
summary(fit1)

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").
uplift: status messages enabled; set "verbose" to false to disable
upliftRF: starting. Tue Oct 10 21:07:16 2017 
10 out of 100 trees so far...
20 out of 100 trees so far...
30 out of 100 trees so far...
40 out of 100 trees so far...
50 out of 100 trees so far...
60 out of 100 trees so far...
70 out of 100 trees so far...
80 out of 100 trees so far...
90 out of 100 trees so far...
Call:
upliftRF(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat), 
    data = dd, mtry = 3, ntree = 100, split_method = "KL", minsplit = 200, 
    verbose = TRUE)

Uplift random forest
Number of trees: 100
No. of variables tried at each split: 3
Split method: KL
$call
upliftRF(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat), 
    data = dd, mtry = 3, ntree = 100, split_method = "KL", minsplit = 200, 
    verbose = TRUE)

$importance
  var  rel.imp
1  X1 39.97286
2  X2 25.07182
3  X4 18.37845
4  X3 16.57687

$ntree
[1] 100

$mtry
[1] 3

$split_method
[1] "KL"

attr(,"class")
[1] "summary.upliftRF"