upliftKNN: Uplift k-Nearest Neighbor
In uplift: Uplift Modeling

Description Usage Arguments Details Value Note Author(s) References Examples

upliftKNN implements k-nearest neighbor for uplift modeling.

1 2	upliftKNN(train, test, y, ct, k = 1, dist.method = "euclidean", p = 2, ties.meth = "min", agg.method = "mean")

`train`	a matrix or data frame of training set cases.
`test`	a matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case.
`y`	a numeric response variable (must be coded as 0/1 for binary response).
`ct`	factor or numeric vector representing the treatment to which each train case is assigned. At least 2 groups are required (e.g. treatment and control). Multi-treatments are also supported.
`k`	number of neighbors considered.
`dist.method`	the distance to be used in calculating the neighbors. Any method supported in function `dist` is valid.
`p`	the power of the Minkowski distance.
`ties.meth`	method to handle ties for the kth neighbor. The default is "min" which uses all ties. Alternatives include "max" which uses none if there are ties for the k-th nearest neighbor, "random" which selects among the ties randomly and "first" which uses the ties in their order in the data.
`agg.method`	method to combine responses of the nearest neighbors, defaults to "mean". The alternative is "majority".

k-nearest neighbor for uplift modeling for a test set from a training set. For each case in the test set, the k-nearest training set vectors for each treatment type are found. The response value for the k-nearest training vectors is aggregated based on the function specified in agg.method. For "majority", classification is decided by majority vote (with ties broken at random).

A matrix of predictions for each test case and value of ct

The code logic follows closely the knn and knnflex packages, the later currently discontinued from CRAN.

Leo Guelman <leo.guelman@gmail.com>

Su, X., Kang, J., Fan, J., Levine, R. A., and Yan, X. (2012). Facilitating score and causal inference trees for large observational studies. Journal of Machine Learning Research, 13(10): 2955-2994.

Guelman, L., Guillen, M., and Perez-Marin A.M. (2013). Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study. Submitted.

library(uplift)

### simulate data for uplift modeling

set.seed(1)

train <- sim_pte(n = 500, p = 10, rho = 0, sigma =  sqrt(2), beta.den = 4)
train$treat <- ifelse(train$treat == 1, 1, 0) 

### Fit an Uplift k-Nearest Neighbor on test data

test <- sim_pte(n = 100, p = 10, rho = 0, sigma =  sqrt(2), beta.den = 4)
test$treat <- ifelse(test$treat == 1, 1, 0) 

fit1 <- upliftKNN(train[, 3:8], test[, 3:8], train$y, train$treat, k = 1, 
          dist.method = "euclidean", p = 2, ties.meth = "min",   agg.method = "majority")
head(fit1)

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").
     0 1
[1,] 1 0
[2,] 0 0
[3,] 0 0
[4,] 1 0
[5,] 1 1
[6,] 0 0