mom: Modified outcome method for uplift modeling.

View source: R/mom.R

momR Documentation

Modified outcome method for uplift modeling.

Description

mom transforms the response variable in a way that is relevant for subsequent uplift modeling. It handles continuous (uncensored) and categorical responses. A model fitted to this transformed response has a causal interpretation for the treatment effect conditional on the covariates.

Usage

## S3 method for class 'formula'
mom(formula, data, subset, na.action, sampling = "none",
  newRespName = "z", classLevel = NULL, treatLevel = NULL)

## S3 method for class 'mom'
print(x, ...)

Arguments

formula

A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one.

data

A data frame in which to interpret the variables named in the formula.

subset

Expression indicating which subset of the rows of data should be included. All observations are included by default.

na.action

A missing-data filter function. Defaults to na.omit.

sampling

The sampling method used to balance the treatment variable. See details.

newRespName

The name for the transformed response variable.

classLevel

A character string for the class of interest. Only applicable when the response is a factor. Defaults to the last level of the factor.

treatLevel

A character string for the treatment level of interest. Defaults to the last level of the treatment factor.

x

A mom object.

...

Additional arguments for the S3 methods.

Details

Let T \in [-1,1] be a binary treatment indicator with T=1 being the treatment level of interest (i.e., the treatment group). Also, let y be a response variable. If the response is a factor, the transformed response is set to 1 if T=1 and y=1, or if T=-1 and y=0 (assuming the classLevel of interest for y is 1). Otherwise, the transformed response is set to 0. Under the specific case in which Prob(T=1) = Prob(T=-1) = 1/2, it is easy to show that

2 * Prob(z=1|X) - 1 = Prob(y=1|T = 1, X) - Prob(y=1|T = -1, X)

(Jaskowski and Jaroszewicz, 2012), where y, z, and X denote the original response variable, the transformed response, and the covariates, respectively.

If the response is numeric, it is transformed as z = 2 * (y - \bar{y}) * T. A model fitted to z effectively estimates E[y|T = 1, X] - E[y|T = -1, X] (Tian et al., 2014).

The argument sampling can be used to obtain a balanced treatment distribution. Specifically, if sampling = "oversample", observations from the treatment minority class are duplicated (by sampling with replacement), so that the resulting data frame has exactly the same number of observations under each treatment level. Alternatively, if sampling = "undersample", observations from the treatment majority class are dropped (by sampling without replacement), so that the resulting data frame has exactly the same number of observations under each treatment level. If sampling = "none", no sampling is done. Lastly, if sampling = "weights", the returned data frame includes a weight variable that equals (1 - π) for T = treatLevel and π otheriwse, where π = Prob(T = treatLevel). The weight variable can be subsequently used to perform case-weighted regression/classification on the transformed response.

Value

An object of class "mom", which is a list with the following components (among others passed to the S3 methods):

  • data The data set including the original response variable, the treatment indicator, the transformed response, the predictors, and (optionally) a weight variable.

  • call The original call to mom.

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Guelman, L., Guillen, M., and Perez-Marin A.M. (2015). "A decision support framework to implement optimal personalized marketing interventions." Decision Support Systems, Vol. 72, pp. 24–32.

Jaskowski, M. and Jaroszewicz, S. (2012)."Uplift modeling for clinical trial data". In ICML 2012 Workshop on Machine Learning for Clinical Data Analysis, Edinburgh, Scotland.

Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532,

Examples


set.seed(324)
df_train <- sim_uplift(p = 15, response = "binary")
df_train_mcm <- mom(y ~  X1 + X2 + X3 + trt(T),
                   data = df_train, sampling = "undersample")

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.