mom | R Documentation |
mom
transforms the response variable in a way that is relevant for
subsequent uplift modeling. It handles continuous (uncensored) and categorical
responses. A model fitted to this transformed response has a causal
interpretation for the treatment effect conditional on the covariates.
## S3 method for class 'formula' mom(formula, data, subset, na.action, sampling = "none", newRespName = "z", classLevel = NULL, treatLevel = NULL) ## S3 method for class 'mom' print(x, ...)
formula |
A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one. |
data |
A data frame in which to interpret the variables named in the formula. |
subset |
Expression indicating which subset of the rows of data should be included. All observations are included by default. |
na.action |
A missing-data filter function. Defaults to |
sampling |
The sampling method used to balance the treatment variable. See details. |
newRespName |
The name for the transformed response variable. |
classLevel |
A character string for the class of interest. Only applicable when the response is a factor. Defaults to the last level of the factor. |
treatLevel |
A character string for the treatment level of interest. Defaults to the last level of the treatment factor. |
x |
A |
... |
Additional arguments for the S3 methods. |
Let T \in [-1,1] be a binary treatment indicator with T=1 being
the treatment level of interest (i.e., the treatment group). Also, let y
be a response variable. If the response is a factor, the transformed response
is set to 1 if T=1 and y=1, or if T=-1 and y=0
(assuming the classLevel
of interest for y is 1). Otherwise, the
transformed response is set to 0. Under the specific case in which
Prob(T=1) = Prob(T=-1) = 1/2, it is easy to show that
2 * Prob(z=1|X) - 1 = Prob(y=1|T = 1, X) - Prob(y=1|T = -1, X)
(Jaskowski and Jaroszewicz, 2012), where y, z, and X denote the original response variable, the transformed response, and the covariates, respectively.
If the response is numeric, it is transformed as z = 2 * (y - \bar{y}) * T. A model fitted to z effectively estimates E[y|T = 1, X] - E[y|T = -1, X] (Tian et al., 2014).
The argument sampling
can be used to obtain a balanced treatment
distribution. Specifically, if sampling = "oversample"
, observations
from the treatment minority class are duplicated (by sampling with
replacement), so that the resulting data frame has exactly the same number of
observations under each treatment level. Alternatively, if sampling =
"undersample"
, observations from the treatment majority class are dropped (by
sampling without replacement), so that the resulting data frame has exactly
the same number of observations under each treatment level. If sampling
= "none"
, no sampling is done. Lastly, if sampling = "weights"
, the
returned data frame includes a weight variable that equals (1 - π) for
T = treatLevel
and π otheriwse, where π = Prob(T =
treatLevel). The weight variable can be subsequently used to perform
case-weighted regression/classification on the transformed response.
An object of class "mom"
, which is a list with the following
components (among others passed to the S3 methods):
data
The data set including the original response variable, the
treatment indicator, the transformed response, the predictors, and
(optionally) a weight variable.
call
The original call to
mom
.
Leo Guelman leo.guelman@gmail.com
Guelman, L., Guillen, M., and Perez-Marin A.M. (2015). "A decision support framework to implement optimal personalized marketing interventions." Decision Support Systems, Vol. 72, pp. 24–32.
Jaskowski, M. and Jaroszewicz, S. (2012)."Uplift modeling for clinical trial data". In ICML 2012 Workshop on Machine Learning for Clinical Data Analysis, Edinburgh, Scotland.
Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532,
set.seed(324) df_train <- sim_uplift(p = 15, response = "binary") df_train_mcm <- mom(y ~ X1 + X2 + X3 + trt(T), data = df_train, sampling = "undersample")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.