performance: Performance Assessment for Uplift Models
In uplift: Uplift Modeling

Description Usage Arguments Details Value Author(s) References Examples

Provides a method for assessing performance for uplift models.

1	performance(pr.y1_ct1, pr.y1_ct0, y, ct, direction = 1, groups = 10)

`pr.y1_ct1`	the predicted probability Prob(y=1\|treated, x).
`pr.y1_ct0`	the predicted probability Prob(y=1\|control, x).
`y`	the actual observed value of the response.
`ct`	a binary (numeric) vector representing the treatment assignment (coded as 0/1).
`direction`	possible values are `1` (default) if the objective is to maximize the difference in the response for Treatment minus Control, and `2` for Control minus Treatment.
`groups`	number of groups of equal observations in which to partition the data set to show results. The default value is 10 (deciles). Other possible values are 5 and 20.

Model performance is estimated by: 1. computing the difference in the predicted conditional class probabilities Prob(y=1|treated, x) and Prob(y=1|control, x), 2. ranking the difference and grouping it into 'buckets' with equal number of observations each, and 3. computing the actual difference in the mean of the response variable between the treatment and the control groups for each bucket.

An object of class performance, which is a matrix with the following columns: (group) the number of groups, (n.ct1) the number of observations in the treated group, (n.ct0) the number of observations in the control group, (n.y1_ct1) the number of observation in the treated group with response = 1, (n.y1_ct0) the number of observation in the control group with response = 1, (r.y1_ct1) the mean of the response for the treated group, (r.y1_ct0) the mean of the response for the control group, and (uplift) the difference between r.y1_ct1 and r.y1_ct0 (if direction = 1).

Leo Guelman <leo.guelman@gmail.com>

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013). Uplift random forests. Cybernetics & Systems, forthcoming.

library(uplift)

set.seed(123)
dd <- sim_pte(n = 1000, p = 20, rho = 0, sigma =  sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) 

### fit uplift random forest

fit1 <- upliftRF(y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat),
                 data = dd, 
                 mtry = 3,
                 ntree = 100, 
                 split_method = "KL",
                 minsplit = 200, # need small trees as there is strong uplift effects in the data
                 verbose = TRUE)
print(fit1)
summary(fit1)

### get variable importance 

varImportance(fit1, plotit = TRUE, normalize = TRUE)

### predict on new data 

dd_new <- sim_pte(n = 1000, p = 20, rho = 0, sigma =  sqrt(2), beta.den = 4)
dd_new$treat <- ifelse(dd_new$treat == 1, 1, 0)  

pred <- predict(fit1, dd_new)

### evaluate model performance

perf <- performance(pred[, 1], pred[, 2], dd_new$y, dd_new$treat, direction = 1)
plot(perf[, 8] ~ perf[, 1], type ="l", xlab = "Decile", ylab = "uplift")

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").
uplift: status messages enabled; set "verbose" to false to disable
upliftRF: starting. Wed Dec 13 08:34:06 2017 
10 out of 100 trees so far...
20 out of 100 trees so far...
30 out of 100 trees so far...
40 out of 100 trees so far...
50 out of 100 trees so far...
60 out of 100 trees so far...
70 out of 100 trees so far...
80 out of 100 trees so far...
90 out of 100 trees so far...
Call:
upliftRF(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat), 
    data = dd, mtry = 3, ntree = 100, split_method = "KL", minsplit = 200, 
    verbose = TRUE)

Uplift random forest
Number of trees: 100
No. of variables tried at each split: 3
Split method: KL
$call
upliftRF(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat), 
    data = dd, mtry = 3, ntree = 100, split_method = "KL", minsplit = 200, 
    verbose = TRUE)

$importance
  var  rel.imp
1  X1 39.97286
2  X2 25.07182
3  X4 18.37845
4  X3 16.57687

$ntree
[1] 100

$mtry
[1] 3

$split_method
[1] "KL"

attr(,"class")
[1] "summary.upliftRF"
  var  rel.imp
1  X1 39.97286
2  X2 25.07182
3  X4 18.37845
4  X3 16.57687