ccif: Causal Conditional Inference Forest

Description Usage Arguments Details Value Author(s) References Examples

View source: R/ccif.default.R

Description

ccif implements recursive partitioning in a causal conditional inference framework.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## S3 method for class 'formula'
ccif(formula, data, ...)

## Default S3 method:
ccif(
x, 
y, 
ct, 
mtry = floor(sqrt(ncol(x))), 
ntree = 100, 
split_method = c("ED", "Chisq", "KL", "L1", "Int"), 
interaction.depth = NULL, 
pvalue = 0.05, 
bonferroni = FALSE, 
minsplit = 20, 
minbucket_ct0 = round(minsplit/4),
minbucket_ct1 = round(minsplit/4), 
keep.inbag = FALSE, 
verbose = FALSE, 
...)

## S3 method for class 'ccif'
print(x, ...)

Arguments

data

A data frame containing the variables in the model. It should include a variable reflecting the binary treatment assignment of each observation (coded as 0/1).

x, formula

a data frame of predictors or a formula describing the model to be fitted. A special term of the form trt() must be used in the model equation to identify the binary treatment variable. For example, if the treatment is represented by a variable named treat, then the right hand side of the formula must include the term +trt(treat).

y

a binary response (numeric) vector.

ct

a binary (numeric) vector representing the treatment assignment (coded as 0/1).

mtry

the number of variables to be tested in each node; the default is floor(sqrt(ncol(x))).

ntree

the number of trees to generate in the forest; default is ntree = 100.

split_method

the split criteria used at each node of each tree; Possible values are: "ED" (Euclidean distance), "Chisq" (Chi-squared divergence), "KL" (Kullback-Leibler divergence), "Int" (Interaction method).

interaction.depth

The maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2-way interactions, etc.

pvalue

the maximum acceptable pvalue required in order to make a split.

bonferroni

apply a bonferroni adjustment to pvalue.

minsplit

the minimum number of observations that must exist in a node in order for a split to be attempted.

minbucket_ct0

the minimum number of control observations in any terminal <leaf> node.

minbucket_ct1

the minimum number of treatment observations in any terminal <leaf> node.

keep.inbag

if set to TRUE, an nrow(x) by ntree matrix is returned, whose entries are the "in-bag" samples in each tree.

verbose

print status messages?

...

Additional arguments passed to independence_test{coin}. See details.

Details

Causal conditional inference trees estimate personalized treatment effects (a.k.a. uplift) by binary recursive partitioning in a conditional inference framework. Roughly, the algorithm works as follows: 1) For each terminal node in the tree we test the global null hypothesis of no interaction effect between the treatment T and any of the n covariates selected at random from the set of p covariates (n ≤q p). Stop if this hypothesis cannot be rejected. Otherwise select the input variable with strongest interaction effect. The interaction effect is measured by a p-value corresponding to a permutation test (Strasser and Weber, 1999) for the partial null hypothesis of independence between each input variable and a transformed response. Specifically, the response is transformed so the impact of the input variable on the response has a causal interpretation for the treatment effect (see details in Guelman et al. 2013) 2) Implement a binary split in the selected input variable. 3) Recursively repeate steps 1) and 2).

The independence test between each input and the transformed response is performed by calling independence_test{coin}. Additional arguments may be passed to this function via ''.

All split methods are described in Guelman et al. (2013a, 2013b).

This function is very slow at the moment. It was built as a prototype in R. A future version of this package will provide an interface to C++ for this function, which is expected to significantly improve speed.

Value

An object of class ccif, which is a list with the following components:

call

the original call to ccif

trees

the tree structure that was learned

split_method

the split criteria used at each node of each tree

ntree

the number of trees used

mtry

the number of variables tested at each node

var.names

a character vector with the name of the predictors

var.class

a character vector containing the class of each predictor variable

inbag

an nrow(x) by ntree matrix showing the in-bag samples used by each tree

Author(s)

Leo Guelman <leo.guelman@gmail.com>

References

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013a). Uplift random forests. Cybernetics & Systems, forthcoming.

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013b). Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study. Submitted.

Hothorn, T., Hornik, K. and Zeileis, A. (2006).Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3): 651-674.

Strasser, H. and Weber, C. (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics, 8: 220-250.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
library(uplift)

### Simulate train data

set.seed(12345)
dd <- sim_pte(n = 100, p = 6, rho = 0, sigma =  sqrt(2), beta.den = 4)

dd$treat <- ifelse(dd$treat == 1, 1, 0) 

### Fit model

form <- as.formula(paste('y ~', 'trt(treat) +', paste('X', 1:6, sep = '', collapse = "+"))) 

fit1 <- ccif(formula = form,
             data = dd, 
             ntree = 50, 
             split_method = "Int",
             distribution = approximate (B=999),
             pvalue = 0.05,
             verbose = TRUE)
print(fit1)
summary(fit1)

Example output

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").
uplift: status messages enabled; set "verbose" to false to disable
ccif: starting. Fri May 11 17:40:42 2018 
10 out of 50 trees so far...
20 out of 50 trees so far...
30 out of 50 trees so far...
40 out of 50 trees so far...
Call:
ccif(formula = form, data = dd, ntree = 50, split_method = "Int", 
    distribution = approximate(B = 999), pvalue = 0.05, verbose = TRUE)

Causal conditional inference forest
Number of trees: 50
No. of variables tried at each split: 2
Split method: Int
$call
ccif(formula = form, data = dd, ntree = 50, split_method = "Int", 
    distribution = approximate(B = 999), pvalue = 0.05, verbose = TRUE)

$importance
  var    rel.imp
1  X4 47.1428691
2  X1 24.5578506
3  X3 19.1402029
4  X2  5.0248464
5  X5  3.6038126
6  X6  0.5304184

$ntree
[1] 50

$mtry
[1] 2

$split_method
[1] "Int"

attr(,"class")
[1] "summary.ccif"

uplift documentation built on May 2, 2019, 9:32 a.m.