uplift_glm: Fitting Uplift Generalized Linear Models.

View source: R/uplift_glm.R

uplift_glmR Documentation

Fitting Uplift Generalized Linear Models.

Description

uplift_glm fits Uplift Generalized Linear Models, optionally with lasso or elasticnet regularization.

Usage

## S3 method for class 'formula'
uplift_glm(formula, data, subset, na.action,
  family = "gaussian", method = "glm", sampling = "weights",
  treatLevel = NULL, Anova = FALSE, ...)

## S3 method for class 'uplift_glm'
print(x, ...)

Arguments

formula

A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one.

data

A data frame in which to interpret the variables named in the formula.

subset

Expression indicating which subset of the rows of data should be included. All observations are included by default.

na.action

A missing-data filter function.

family

Response type. For family = "gaussian" (default), the response must be presented as numeric. For family = "binomial", the response must be a factor with two levels. If the response is numeric, it will be coerced into a factor. For family = "cox", the response must be a survival object, as returned by survival::Surv.

method

The method used for model fitting. If method = "glm" (default), the model is fitted on the modified covariates (see details) using stats::glm. If method = "glmStepAIC", the model is first fitted using stats::glm and then this is passed to MASS::stepAIC for AIC stepwise selection. Alternatively, for method = "glmnet" and method = "cv.glmnet", models are fitted using glmnet::glmnet and glmnet::cv.glmnet, respectively.

sampling

The sampling method used to balance the treatment variable. See details.

treatLevel

A character string for the treatment level of interest. Defaults to the last level of the treatment factor.

Anova

If TRUE, the analysis-of-variance table is returned using the function car::Anova. It does not apply to method = "cv.glmnet" or method = "glmnet".

...

Additional arguments passed to the regression method selected in method.

x

A uplift_glm object.

Details

The function follows the method for uplift modeling proposed by Tian et al. (2014). This method consists in modifying the covariates in a simple way, and then fitting an appropriate regression model using the modified covariates and no main effects. See Tian et al. (2014) for details.

The argument sampling can be used to obtain a balanced treatment distribution. Specifically, if sampling = "oversample", observations from the treatment minority class are duplicated (by sampling with replacement), so that the data frame used in model fitting has exactly the same number of observations under each treatment level. Alternatively, if sampling = "undersample", observations from the treatment majority class are dropped (by sampling without replacement), so that the data frame used in model fitting has exactly the same number of observations under each treatment level. If sampling = "none", no sampling is done. Lastly, if sampling = "weights", the returned data frame includes a weight variable that equals (1 - π) for T = treatLevel and π otherwise, where π = Prob(T = treatLevel). These weights are subsequently used as case weights in the fitting process.

Value

An object of class "uplift_glm", which is a list with the following components, in addition to the ones returned by the specific fitting method:

  • call The calling expression

  • na.action Information returned by model.frame on the special handling of NAs.

  • xlevels The levels of predictors of class factor.

  • Family The family used.

  • method The method used.

  • sampling The sampling method used.

  • dataClasses The data classes of predictors.

  • treatLevel The reference treatment level.

  • ttReLabel The label of the transformed treatment indicator.

  • modForm The model formula.

  • modData The data frame used in model fitting.

  • inbag The index of of which observations were used for fitting.

  • weightVector The vector of weights used for fitting.

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532.

Examples


set.seed(1)
df_train <- sim_uplift(p = 50, response = "binary")
df_test<- sim_uplift(p = 50, n = 10000, response = "binary")
form <- as.formula(paste('y ~', 'trt(T) +',
       paste('X', 1:(ncol(df_train)-3), sep = '', collapse = "+")))
fit1 <- uplift_glm(form,
                  family = "binomial",
                  method = "glm",
                  data = df_train)
fit1
fit2 <- uplift_glm(form,
                  family = "binomial",
                  method = "glmStepAIC",
                  data = df_train)
fit2
fit3 <- uplift_glm(form,
                  family = "binomial",
                  method = "cv.glmnet",
                  data = df_train)
lambda.opt <- fit3$lambda.min
fit3 <- uplift_glm(form,
                  family = "binomial",
                  method = "glmnet",
                  data = df_train)
upliftPred1 <- predict(fit1, df_test)
upliftPred2 <- predict(fit2, df_test)
upliftPred3 <- predict(fit3, df_test, s=lambda.opt)
df_eval<- data.frame(upliftPred1 = upliftPred1,
                    upliftPred2 = upliftPred2,
                    upliftPred3 = upliftPred3,
                    y = df_test$y,
                    T = df_test$T)
res <- inspect_performance(y ~ upliftPred1 + upliftPred2 + upliftPred3 + trt(T),
                          data = df_eval, qini = TRUE)
res
summary(res)
ggplot(res)
res$qiniC

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.