explore: Explore Data for Uplift Modeling

Description Usage Arguments Value Author(s) Examples

View source: R/explore.R

Description

This function provides a basic exploratory tool for uplift modeling, by computing the average value of the response variable for each predictor and treatment assignment.

Usage

1
2
3
4
5
6
7
explore(formula, 
        data, 
        subset,
        na.action = na.pass,
        nbins = 4, 
        continuous = 4, 
        direction = 1)

Arguments

formula

a formula expression of the form response ~ predictors. A special term of the form trt() must be used in the model equation to identify the binary treatment variable. For example, if the treatment is represented by a variable named treat, then the right hand side of the formula must include the term +trt(treat).

data

a data.frame in which to interpret the variables named in the formula.

subset

expression indicating which subset of the rows of data should be included. All observations are included by default.

na.action

a missing-data filter function. This is applied to the model.frame after any subset argument has been used. Default is na.action = na.pass.

nbins

the number of bins created from numeric predictors. The bins are created based on quantiles, with a default value of 4 (quartiles).

continuous

specifies the threshold for when a variable is considered to be continuous (when there are at least continuous unique values). The default is 4. Factor variables are always considered to be categorical no matter how many levels they have.

direction

possible values are 1 (default) if uplift should be computed as the difference in the average response between treatment and control, or 2 between control and treatment. This only affects the uplift calculation as produced in the output.

Value

A list of matrices, one for each variable. The columns represent: the number of responses over the control group, the number of the responses over the treated group, the average response for the control, the average response for the treatment, and the uplift (difference between treatment and control average response).

Author(s)

Leo Guelman <leo.guelman@gmail.com>

Examples

1
2
3
4
5
6
7
library(uplift)

set.seed(12345)
dd <- sim_pte(n = 1000, p = 20, rho = 0, sigma =  sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) 

eda <- explore(y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat), data = dd)            

Example output

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").

uplift documentation built on May 2, 2019, 9:32 a.m.