uplift_glm: Fitting Uplift Generalized Linear Models.
In leoguelman/uplift2: Uplift Modeling

uplift_glm

R Documentation

Fitting Uplift Generalized Linear Models.

Description

uplift_glm fits Uplift Generalized Linear Models, optionally with lasso or elasticnet regularization.

Usage

## S3 method for class 'formula'
uplift_glm(formula, data, subset, na.action,
  family = "gaussian", method = "glm", sampling = "weights",
  treatLevel = NULL, Anova = FALSE, ...)

## S3 method for class 'uplift_glm'
print(x, ...)

Arguments

`formula`	A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one.
`data`	A data frame in which to interpret the variables named in the formula.
`subset`	Expression indicating which subset of the rows of data should be included. All observations are included by default.
`na.action`	A missing-data filter function.
`family`	Response type. For `family = "gaussian"` (default), the response must be presented as numeric. For `family = "binomial"`, the response must be a factor with two levels. If the response is numeric, it will be coerced into a factor. For `family = "cox"`, the response must be a survival object, as returned by `survival::Surv`.
`method`	The method used for model fitting. If `method = "glm"` (default), the model is fitted on the modified covariates (see details) using `stats::glm`. If `method = "glmStepAIC"`, the model is first fitted using `stats::glm` and then this is passed to `MASS::stepAIC` for AIC stepwise selection. Alternatively, for `method = "glmnet"` and `method = "cv.glmnet"`, models are fitted using `glmnet::glmnet` and `glmnet::cv.glmnet`, respectively.
`sampling`	The sampling method used to balance the treatment variable. See details.
`treatLevel`	A character string for the treatment level of interest. Defaults to the last level of the treatment factor.
`Anova`	If `TRUE`, the analysis-of-variance table is returned using the function `car::Anova`. It does not apply to `method = "cv.glmnet"` or `method = "glmnet"`.
`...`	Additional arguments passed to the regression method selected in `method`.
`x`	A `uplift_glm` object.

Details

The function follows the method for uplift modeling proposed by Tian et al. (2014). This method consists in modifying the covariates in a simple way, and then fitting an appropriate regression model using the modified covariates and no main effects. See Tian et al. (2014) for details.

The argument sampling can be used to obtain a balanced treatment distribution. Specifically, if sampling = "oversample", observations from the treatment minority class are duplicated (by sampling with replacement), so that the data frame used in model fitting has exactly the same number of observations under each treatment level. Alternatively, if sampling = "undersample", observations from the treatment majority class are dropped (by sampling without replacement), so that the data frame used in model fitting has exactly the same number of observations under each treatment level. If sampling = "none", no sampling is done. Lastly, if sampling = "weights", the returned data frame includes a weight variable that equals (1 - π) for T = treatLevel and π otherwise, where π = Prob(T = treatLevel). These weights are subsequently used as case weights in the fitting process.

Value

An object of class "uplift_glm", which is a list with the following components, in addition to the ones returned by the specific fitting method:

call The calling expression
na.action Information returned by model.frame on the special handling of NAs.
xlevels The levels of predictors of class factor.
Family The family used.
method The method used.
sampling The sampling method used.
dataClasses The data classes of predictors.
treatLevel The reference treatment level.
ttReLabel The label of the transformed treatment indicator.
modForm The model formula.
modData The data frame used in model fitting.
inbag The index of of which observations were used for fitting.
weightVector The vector of weights used for fitting.

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532.

Examples


set.seed(1)
df_train <- sim_uplift(p = 50, response = "binary")
df_test<- sim_uplift(p = 50, n = 10000, response = "binary")
form <- as.formula(paste('y ~', 'trt(T) +',
       paste('X', 1:(ncol(df_train)-3), sep = '', collapse = "+")))
fit1 <- uplift_glm(form,
                  family = "binomial",
                  method = "glm",
                  data = df_train)
fit1
fit2 <- uplift_glm(form,
                  family = "binomial",
                  method = "glmStepAIC",
                  data = df_train)
fit2
fit3 <- uplift_glm(form,
                  family = "binomial",
                  method = "cv.glmnet",
                  data = df_train)
lambda.opt <- fit3$lambda.min
fit3 <- uplift_glm(form,
                  family = "binomial",
                  method = "glmnet",
                  data = df_train)
upliftPred1 <- predict(fit1, df_test)
upliftPred2 <- predict(fit2, df_test)
upliftPred3 <- predict(fit3, df_test, s=lambda.opt)
df_eval<- data.frame(upliftPred1 = upliftPred1,
                    upliftPred2 = upliftPred2,
                    upliftPred3 = upliftPred3,
                    y = df_test$y,
                    T = df_test$T)
res <- inspect_performance(y ~ upliftPred1 + upliftPred2 + upliftPred3 + trt(T),
                          data = df_eval, qini = TRUE)
res
summary(res)
ggplot(res)
res$qiniC

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.