uplift_glm | R Documentation |
uplift_glm
fits Uplift Generalized Linear Models, optionally with lasso
or elasticnet regularization.
## S3 method for class 'formula' uplift_glm(formula, data, subset, na.action, family = "gaussian", method = "glm", sampling = "weights", treatLevel = NULL, Anova = FALSE, ...) ## S3 method for class 'uplift_glm' print(x, ...)
formula |
A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one. |
data |
A data frame in which to interpret the variables named in the formula. |
subset |
Expression indicating which subset of the rows of data should be included. All observations are included by default. |
na.action |
A missing-data filter function. |
family |
Response type. For |
method |
The method used for model fitting. If |
sampling |
The sampling method used to balance the treatment variable. See details. |
treatLevel |
A character string for the treatment level of interest. Defaults to the last level of the treatment factor. |
Anova |
If |
... |
Additional arguments passed to the regression method selected in
|
x |
A |
The function follows the method for uplift modeling proposed by Tian et al. (2014). This method consists in modifying the covariates in a simple way, and then fitting an appropriate regression model using the modified covariates and no main effects. See Tian et al. (2014) for details.
The argument sampling
can be used to obtain a balanced treatment
distribution. Specifically, if sampling = "oversample"
, observations
from the treatment minority class are duplicated (by sampling with
replacement), so that the data frame used in model fitting has exactly the
same number of observations under each treatment level. Alternatively, if
sampling = "undersample"
, observations from the treatment majority
class are dropped (by sampling without replacement), so that the data frame
used in model fitting has exactly the same number of observations under each
treatment level. If sampling = "none"
, no sampling is done. Lastly, if
sampling = "weights"
, the returned data frame includes a weight
variable that equals (1 - π) for T = treatLevel
and π
otherwise, where π = Prob(T = treatLevel). These weights are
subsequently used as case weights in the fitting process.
An object of class "uplift_glm"
, which is a list with the
following components, in addition to the ones returned by the specific
fitting method:
call
The calling expression
na.action
Information returned by model.frame
on the special handling of NAs.
xlevels
The levels of predictors of class factor.
Family
The family
used.
method
The method
used.
sampling
The sampling
method used.
dataClasses
The data classes of predictors.
treatLevel
The reference treatment level.
ttReLabel
The label of the
transformed treatment indicator.
modForm
The model formula.
modData
The data frame used in model fitting.
inbag
The index of of which observations were used for fitting.
weightVector
The vector of weights used for fitting.
Leo Guelman leo.guelman@gmail.com
Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532.
set.seed(1) df_train <- sim_uplift(p = 50, response = "binary") df_test<- sim_uplift(p = 50, n = 10000, response = "binary") form <- as.formula(paste('y ~', 'trt(T) +', paste('X', 1:(ncol(df_train)-3), sep = '', collapse = "+"))) fit1 <- uplift_glm(form, family = "binomial", method = "glm", data = df_train) fit1 fit2 <- uplift_glm(form, family = "binomial", method = "glmStepAIC", data = df_train) fit2 fit3 <- uplift_glm(form, family = "binomial", method = "cv.glmnet", data = df_train) lambda.opt <- fit3$lambda.min fit3 <- uplift_glm(form, family = "binomial", method = "glmnet", data = df_train) upliftPred1 <- predict(fit1, df_test) upliftPred2 <- predict(fit2, df_test) upliftPred3 <- predict(fit3, df_test, s=lambda.opt) df_eval<- data.frame(upliftPred1 = upliftPred1, upliftPred2 = upliftPred2, upliftPred3 = upliftPred3, y = df_test$y, T = df_test$T) res <- inspect_performance(y ~ upliftPred1 + upliftPred2 + upliftPred3 + trt(T), data = df_eval, qini = TRUE) res summary(res) ggplot(res) res$qiniC
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.