tmleCom_Options: Setting all possible options for 'tmleCommunity'

Description Usage Arguments Value See Also Examples

View source: R/zzz.R

Description

Additional options that control the estimation algorithm in tmleCommunity package

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
tmleCom_Options(Qestimator = c("speedglm__glm", "glm__glm",
  "h2o__ensemble", "SuperLearner"), gestimator = c("speedglm__glm",
  "glm__glm", "h2o__ensemble", "SuperLearner", "sl3_pipelines"),
  bin.method = c("equal.mass", "equal.len", "dhist"), nbins = 5,
  maxncats = 10, maxNperBin = 500, parfit = FALSE,
  poolContinVar = FALSE, savetime.fit.hbars = TRUE,
  h2ometalearner = "h2o.glm.wrapper", h2olearner = "h2o.glm.wrapper",
  sl3_metalearner = sl3::make_learner(sl3::Lrnr_optim, loss_function =
  sl3::loss_loglik_binomial, learner_function =
  sl3::metalearner_logistic_binomial), sl3_learners = list(glm_fast =
  sl3::make_learner(sl3::Lrnr_glm_fast)), CVfolds = 5,
  SL.library = c("SL.glm", "SL.step", "SL.glm.interaction"))

Arguments

Qestimator

A string specifying default estimator for outcome mechanism model fitting. The default estimator is "speedglm__glm", which estimates regressions with speedglm.wfit; Estimator "glm__glm" uses glm.fit; Estimator "h2o__ensemble" implements the super learner ensemble (stacking) algorithm using the H2O R interface; Estimator "SuperLearner" implements the super learner prediction methods. alongside a framework for general-purpose machine learning with pipelines. Note that if "h2o__ensemble" fails, it falls back on "SuperLearner". If "SuperLearner" fails, it falls back on "speedglm__glm". If "speedglm__glm" fails, it falls back on "glm__glm".

gestimator

A string specifying default estimator for exposure mechanism fitting. It has the same options as Qestimator except that gestimator can also be "sl3_pipelines", which is a modern implementation of the Super Learner algorithm for ensemble learning and model stacking. In such case, if "h2o__ensemble" fails, it falls back on "SuperLearner". If "sl3_pipelines" fails, it falls back on "SuperLearner", and so on.

bin.method

Specify the method for choosing bins when discretizing the conditional continuous exposure variable A. The default method is "equal.mass", which provides a data-adaptive selection of the bins based on equal mass/ area, i.e., each bin will contain approximately the same number of observations as otheres. Method "equal.len" partitions the range of A into equal length nbins intervals. Method "dhist" uses a combination of the above two approaches. Please see Denby and Mallows "Variations on the Histogram" (2009) for more details. Note that argument maxNperBin controls the maximum number of observations in each bin.

nbins

When bin.method = "equal.len", set to the user-supplied number of bins when discretizing a continous variable/ If not specified, then default to 5; If setting to as NA, then set to the nearest integer of nobs/ maxNperBin, where nobs is the total number of observations in the input data. When method is "equal.mass", nbins will be set as the maximum of the default nbins and the nearest integer of nobs/ maxNperBin.

maxncats

Integer that specifies the maximum number of unique categories a categorical variable A[j] can have. If A[j] has more unique categories, it is automatically considered a continuous variable. Default to 10.

maxNperBin

Integer that specifies the maximum number of observations in each bin when discretizing a continuous variable A[j] (applies directly when bin.method = "equal.mass" and indirectly when bin.method = "equal.len", but nbins = NA).

parfit

Logical. If TRUE, perform parallel regression fits and predictions for discretized continuous variables by functions foreach and dopar in foreach package. Default to FALSE. Note that it requires registering a parallel backend prior to running tmleCommunity function, e.g., using doParallel R package and running registerDoParallel(cores = ncores) for ncores parallel jobs.

poolContinVar

Logical. If TRUE, when fitting a model for binirized continuous variable, pool bin indicators across all bins and fit one pooled regression. Default to FALSE.

savetime.fit.hbars

Logical. If TRUE, skip estimation and prediction of exposure mechanism P(A|W,E) under g0 & gstar when f.gstar1 = NULL and TMLE.targetStep = "tmle.intercept", and then directly set h_gstar_h_gN = 1 for each observation. Default to TRUE.

h2ometalearner

A string to pass to h2o.ensemble, specifying the prediction algorithm used to learn the optimal combination of the base learners. Supports both h2o and SuperLearner wrapper functions. Default to "h2o.glm.wrapper".

h2olearner

A string or character vector to pass to h2o.ensemble, naming the prediction algorithm(s) used to train the base models for the ensemble. The functions must have the same format as the h2o wrapper functions. Default to "h2o.glm.wrapper".

CVfolds

Set the number of splits for the V-fold cross-validation step to pass to SuperLearner and h2o.ensemble. Default to 5.

SL.library

A string or character vector of prediction algorithms to pass to SuperLearner. Default to c("SL.glm", "SL.step", "SL.glm.interaction"). For more available algorithms see SuperLearner::listWrappers(). Additional wrapper functions are available at https://github.com/ecpolley/SuperLearnerExtra.

Value

Invisibly returns a list with old option settings.

See Also

print_tmleCom_opts

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
## Not run: 
#***************************************************************************************
# Example 1: using different estimators in estimation of Q and g mechanisms
#***************************************************************************************
# 1.1 using speed.glm (and glm)
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "speedglm__glm")
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "glm__glm")

# 1.2 using SuperLearner
library(SuperLearner)
# library including "SL.glm", "SL.glmnet", "SL.ridge", and "SL.stepAIC"
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", CVfolds = 5,
                SL.library = c("SL.glm", "SL.glmnet", "SL.ridge", "SL.stepAIC"))

# library including "SL.bayesglm", "SL.gam", and "SL.randomForest", and split to 10 CV folds
# require("gam"); require("randomForest")
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", CVfolds = 10,
                SL.library = c("SL.bayesglm", "SL.gam", "SL.randomForest"))

# Create glmnet wrappers with different alphas (the default value of alpha in SL.glmnet is 1)
create.SL.glmnet <- function(alpha = c(0.25, 0.50, 0.75)) {
  for(mm in seq(length(alpha))){
    eval(parse(text = paste('SL.glmnet.', alpha[mm], '<- function(..., alpha = ', 
                            alpha[mm], ') SL.glmnet(..., alpha = alpha)', sep = '')), 
         envir = .GlobalEnv)
  }
  invisible(TRUE)
}
create.SL.glmnet(seq(0, 1, length.out=3))  # 3 glmnet wrappers with alpha = 0, 0.5, 1
# Create custom randomForest learners (set ntree to 100 rather than the default of 500) 
create.SL.rf <- create.Learner("SL.randomForest", list(ntree = 100))
# Create a sequence of 3 customized KNN learners 
# set the number of nearest neighbors as 8 and 12 rather than the default of 10
create.SL.Knn <- create.Learner("SL.kernelKnn", detailed_names=TRUE, tune=list(k=c(8, 12)))
SL.library <- c(grep("SL.glmnet.", as.vector(lsf.str()), value=TRUE), 
                create.SL.rf$names, create.SL.Knn$names)
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", 
                SL.library = SL.library, CVfolds = 5)            

# 1.3 using h2o.ensemble
library("h2o"); library("h2oEnsemble")
# h2olearner including "h2o.glm.wrapper" and "h2o.randomForest.wrapper"
tmleCom_Options(Qestimator = "h2o__ensemble", gestimator = "h2o__ensemble", 
                CVfolds = 10, h2ometalearner = "h2o.glm.wrapper", 
                h2olearner = c("h2o.glm.wrapper", "h2o.randomForest.wrapper"))

# Create a sequence of customized h2o glm, randomForest and deeplearning wrappers 
h2o.glm.1 <- function(..., alpha = 1, prior = NULL) { 
  h2o.glm.wrapper(..., alpha = alpha, , prior=prior) 
}
h2o.glm.0.5 <- function(..., alpha = 0.5, prior = NULL) { 
  h2o.glm.wrapper(..., alpha = alpha, , prior=prior) 
}
h2o.randomForest.1 <- function(..., ntrees = 200, nbins = 50, seed = 1) {
  h2o.randomForest.wrapper(..., ntrees = ntrees, nbins = nbins, seed = seed)
}
h2o.deeplearning.1 <- function(..., hidden = c(500, 500), activation = "Rectifier", seed = 1) {
  h2o.deeplearning.wrapper(..., hidden = hidden, activation = activation, seed = seed)
}
h2olearner <- c("h2o.glm.1", "h2o.glm.0.5", "h2o.randomForest.1", 
                "h2o.deeplearning.1", "h2o.gbm.wrapper")
tmleCom_Options(Qestimator = "h2o__ensemble", gestimator = "h2o__ensemble",
                SL.library = c("SL.glm", "SL.glmnet", "SL.ridge", "SL.stepAIC"), CVfolds = 5,
                h2ometalearner = "h2o.deeplearning.wrapper", h2olearner = h2olearner)

# using "h2o.deeplearning.wrapper" for h2ometalearner
tmleCom_Options(Qestimator = "h2o__ensemble", gestimator = "h2o__ensemble",
                SL.library = c("SL.glm", "SL.glmnet", "SL.ridge", "SL.stepAIC"), CVfolds = 5,
                h2ometalearner = "h2o.deeplearning.wrapper", h2olearner = h2olearner)

# 1.4 using sl3
library(sl3)
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)

sl3_learners <- list(
  rf = make_learner(Lrnr_randomForest),
  xgb = make_learner(Lrnr_xgboost),
  svm = make_learner(Lrnr_svm),
  glmnet = make_learner(Lrnr_glmnet),
  glm_fast = make_learner(Lrnr_glm_fast),
  screened_glm = screen_and_glm,
  mean = make_learner(Lrnr_mean)
)

logit_metalearner <- make_learner(
  Lrnr_optim,
  loss_function = loss_loglik_binomial,
  learner_function = metalearner_logistic_binomial
)

tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "sl3_pipelines", 
                maxNperBin = N, nbins = 5, bin.method = "equal.mass",
                sl3_learners = sl3_learners, sl3_metalearner = logit_metalearner)
  
#***************************************************************************************
# Example 2: Define the values of bin cutoffs for continuous outcome in different ways
# through three arguments - bin.method, nbins, maxNperBin 
#***************************************************************************************
# 2.1 using equal-length method
# discretize a continuous outcome variable into 10 bins, no more than 1000 obs in each bin 
tmleCom_Options(bin.method = "equal.len", nbins = 10, maxNperBin = 1000)

# 2.2 find a compromise between equal-mass and equal-length method
# discretize into 5 bins (default), and no more than 5000 obs in each bin
tmleCom_Options(bin.method = "dhist", nbins = 10, maxNperBin = 5000)

# 2.3 Default to use equal-mass method with 5 bins, no more than 500 obs in each bin
tmleCom_Options()

## End(Not run)

chizhangucb/tmleCommunity documentation built on May 20, 2019, 3:34 p.m.