upliftKNN: Uplift k-Nearest Neighbor

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/upliftKNN.R

Description

upliftKNN implements k-nearest neighbor for uplift modeling.

Usage

1
2
upliftKNN(train, test, y, ct, k = 1, dist.method = "euclidean", 
          p = 2, ties.meth = "min", agg.method = "mean")

Arguments

train

a matrix or data frame of training set cases.

test

a matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case.

y

a numeric response variable (must be coded as 0/1 for binary response).

ct

factor or numeric vector representing the treatment to which each train case is assigned. At least 2 groups are required (e.g. treatment and control). Multi-treatments are also supported.

k

number of neighbors considered.

dist.method

the distance to be used in calculating the neighbors. Any method supported in function dist is valid.

p

the power of the Minkowski distance.

ties.meth

method to handle ties for the kth neighbor. The default is "min" which uses all ties. Alternatives include "max" which uses none if there are ties for the k-th nearest neighbor, "random" which selects among the ties randomly and "first" which uses the ties in their order in the data.

agg.method

method to combine responses of the nearest neighbors, defaults to "mean". The alternative is "majority".

Details

k-nearest neighbor for uplift modeling for a test set from a training set. For each case in the test set, the k-nearest training set vectors for each treatment type are found. The response value for the k-nearest training vectors is aggregated based on the function specified in agg.method. For "majority", classification is decided by majority vote (with ties broken at random).

Value

A matrix of predictions for each test case and value of ct

Note

The code logic follows closely the knn and knnflex packages, the later currently discontinued from CRAN.

Author(s)

Leo Guelman <leo.guelman@gmail.com>

References

Su, X., Kang, J., Fan, J., Levine, R. A., and Yan, X. (2012). Facilitating score and causal inference trees for large observational studies. Journal of Machine Learning Research, 13(10): 2955-2994.

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013). Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study. Submitted.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(uplift)

### simulate data for uplift modeling

set.seed(1)

train <- sim_pte(n = 500, p = 10, rho = 0, sigma =  sqrt(2), beta.den = 4)
train$treat <- ifelse(train$treat == 1, 1, 0) 

### Fit an Uplift k-Nearest Neighbor on test data

test <- sim_pte(n = 100, p = 10, rho = 0, sigma =  sqrt(2), beta.den = 4)
test$treat <- ifelse(test$treat == 1, 1, 0) 

fit1 <- upliftKNN(train[, 3:8], test[, 3:8], train$y, train$treat, k = 1, 
          dist.method = "euclidean", p = 2, ties.meth = "min",   agg.method = "majority")
head(fit1)          

Example output

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").
     0 1
[1,] 1 0
[2,] 0 0
[3,] 0 0
[4,] 1 0
[5,] 1 1
[6,] 0 0

uplift documentation built on May 2, 2019, 9:32 a.m.