tmleCom_Options: Setting all possible options for 'tmleCommunity'
In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

Description Usage Arguments Value See Also Examples

Additional options that control the estimation algorithm in tmleCommunity package

tmleCom_Options(Qestimator = c("speedglm__glm", "glm__glm",
  "h2o__ensemble", "SuperLearner"), gestimator = c("speedglm__glm",
  "glm__glm", "h2o__ensemble", "SuperLearner", "sl3_pipelines"),
  bin.method = c("equal.mass", "equal.len", "dhist"), nbins = 5,
  maxncats = 10, maxNperBin = 500, parfit = FALSE,
  poolContinVar = FALSE, savetime.fit.hbars = TRUE,
  h2ometalearner = "h2o.glm.wrapper", h2olearner = "h2o.glm.wrapper",
  sl3_metalearner = sl3::make_learner(sl3::Lrnr_optim, loss_function =
  sl3::loss_loglik_binomial, learner_function =
  sl3::metalearner_logistic_binomial), sl3_learners = list(glm_fast =
  sl3::make_learner(sl3::Lrnr_glm_fast)), CVfolds = 5,
  SL.library = c("SL.glm", "SL.step", "SL.glm.interaction"))

`Qestimator`	A string specifying default estimator for outcome mechanism model fitting. The default estimator is `"speedglm__glm"`, which estimates regressions with `speedglm.wfit`; Estimator `"glm__glm"` uses `glm.fit`; Estimator `"h2o__ensemble"` implements the super learner ensemble (stacking) algorithm using the H2O R interface; Estimator `"SuperLearner"` implements the super learner prediction methods. alongside a framework for general-purpose machine learning with pipelines. Note that if `"h2o__ensemble"` fails, it falls back on "SuperLearner". If `"SuperLearner"` fails, it falls back on "speedglm__glm". If `"speedglm__glm"` fails, it falls back on "glm__glm".
`gestimator`	A string specifying default estimator for exposure mechanism fitting. It has the same options as `Qestimator` except that `gestimator` can also be `"sl3_pipelines"`, which is a modern implementation of the Super Learner algorithm for ensemble learning and model stacking. In such case, if `"h2o__ensemble"` fails, it falls back on "SuperLearner". If `"sl3_pipelines"` fails, it falls back on "SuperLearner", and so on.
`bin.method`	Specify the method for choosing bins when discretizing the conditional continuous exposure variable `A`. The default method is `"equal.mass"`, which provides a data-adaptive selection of the bins based on equal mass/ area, i.e., each bin will contain approximately the same number of observations as otheres. Method `"equal.len"` partitions the range of `A` into equal length `nbins` intervals. Method `"dhist"` uses a combination of the above two approaches. Please see Denby and Mallows "Variations on the Histogram" (2009) for more details. Note that argument `maxNperBin` controls the maximum number of observations in each bin.
`nbins`	When `bin.method = "equal.len"`, set to the user-supplied number of bins when discretizing a continous variable/ If not specified, then default to 5; If setting to as `NA`, then set to the nearest integer of `nobs/ maxNperBin`, where `nobs` is the total number of observations in the input data. When method is `"equal.mass"`, `nbins` will be set as the maximum of the default `nbins` and the nearest integer of `nobs/ maxNperBin`.
`maxncats`	Integer that specifies the maximum number of unique categories a categorical variable `A[j]` can have. If `A[j]` has more unique categories, it is automatically considered a continuous variable. Default to 10.
`maxNperBin`	Integer that specifies the maximum number of observations in each bin when discretizing a continuous variable `A[j]` (applies directly when `bin.method =` `"equal.mass"` and indirectly when `bin.method = "equal.len"`, but `nbins = NA`).
`parfit`	Logical. If `TRUE`, perform parallel regression fits and predictions for discretized continuous variables by functions `foreach` and `dopar` in `foreach` package. Default to `FALSE`. Note that it requires registering a parallel backend prior to running `tmleCommunity` function, e.g., using `doParallel` R package and running `registerDoParallel(cores = ncores)` for `ncores` parallel jobs.
`poolContinVar`	Logical. If `TRUE`, when fitting a model for binirized continuous variable, pool bin indicators across all bins and fit one pooled regression. Default to `FALSE`.
`savetime.fit.hbars`	Logical. If `TRUE`, skip estimation and prediction of exposure mechanism P(A\|W,E) under g0 & gstar when `f.gstar1 = NULL` and `TMLE.targetStep = "tmle.intercept"`, and then directly set `h_gstar_h_gN = 1` for each observation. Default to `TRUE`.
`h2ometalearner`	A string to pass to `h2o.ensemble`, specifying the prediction algorithm used to learn the optimal combination of the base learners. Supports both h2o and SuperLearner wrapper functions. Default to "h2o.glm.wrapper".
`h2olearner`	A string or character vector to pass to `h2o.ensemble`, naming the prediction algorithm(s) used to train the base models for the ensemble. The functions must have the same format as the h2o wrapper functions. Default to "h2o.glm.wrapper".
`CVfolds`	Set the number of splits for the V-fold cross-validation step to pass to `SuperLearner` and `h2o.ensemble`. Default to 5.
`SL.library`	A string or character vector of prediction algorithms to pass to `SuperLearner`. Default to c("SL.glm", "SL.step", "SL.glm.interaction"). For more available algorithms see `SuperLearner::listWrappers()`. Additional wrapper functions are available at https://github.com/ecpolley/SuperLearnerExtra.

Invisibly returns a list with old option settings.

print_tmleCom_opts

## Not run: 
#***************************************************************************************
# Example 1: using different estimators in estimation of Q and g mechanisms
#***************************************************************************************
# 1.1 using speed.glm (and glm)
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "speedglm__glm")
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "glm__glm")

# 1.2 using SuperLearner
library(SuperLearner)
# library including "SL.glm", "SL.glmnet", "SL.ridge", and "SL.stepAIC"
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", CVfolds = 5,
                SL.library = c("SL.glm", "SL.glmnet", "SL.ridge", "SL.stepAIC"))

# library including "SL.bayesglm", "SL.gam", and "SL.randomForest", and split to 10 CV folds
# require("gam"); require("randomForest")
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", CVfolds = 10,
                SL.library = c("SL.bayesglm", "SL.gam", "SL.randomForest"))

# Create glmnet wrappers with different alphas (the default value of alpha in SL.glmnet is 1)
create.SL.glmnet <- function(alpha = c(0.25, 0.50, 0.75)) {
  for(mm in seq(length(alpha))){
    eval(parse(text = paste('SL.glmnet.', alpha[mm], '<- function(..., alpha = ', 
                            alpha[mm], ') SL.glmnet(..., alpha = alpha)', sep = '')), 
         envir = .GlobalEnv)
  }
  invisible(TRUE)
}
create.SL.glmnet(seq(0, 1, length.out=3))  # 3 glmnet wrappers with alpha = 0, 0.5, 1
# Create custom randomForest learners (set ntree to 100 rather than the default of 500) 
create.SL.rf <- create.Learner("SL.randomForest", list(ntree = 100))
# Create a sequence of 3 customized KNN learners 
# set the number of nearest neighbors as 8 and 12 rather than the default of 10
create.SL.Knn <- create.Learner("SL.kernelKnn", detailed_names=TRUE, tune=list(k=c(8, 12)))
SL.library <- c(grep("SL.glmnet.", as.vector(lsf.str()), value=TRUE), 
                create.SL.rf$names, create.SL.Knn$names)
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", 
                SL.library = SL.library, CVfolds = 5)            

# 1.3 using h2o.ensemble
library("h2o"); library("h2oEnsemble")
# h2olearner including "h2o.glm.wrapper" and "h2o.randomForest.wrapper"
tmleCom_Options(Qestimator = "h2o__ensemble", gestimator = "h2o__ensemble", 
                CVfolds = 10, h2ometalearner = "h2o.glm.wrapper", 
                h2olearner = c("h2o.glm.wrapper", "h2o.randomForest.wrapper"))

# Create a sequence of customized h2o glm, randomForest and deeplearning wrappers 
h2o.glm.1 <- function(..., alpha = 1, prior = NULL) { 
  h2o.glm.wrapper(..., alpha = alpha, , prior=prior) 
}
h2o.glm.0.5 <- function(..., alpha = 0.5, prior = NULL) { 
  h2o.glm.wrapper(..., alpha = alpha, , prior=prior) 
}
h2o.randomForest.1 <- function(..., ntrees = 200, nbins = 50, seed = 1) {
  h2o.randomForest.wrapper(..., ntrees = ntrees, nbins = nbins, seed = seed)
}
h2o.deeplearning.1 <- function(..., hidden = c(500, 500), activation = "Rectifier", seed = 1) {
  h2o.deeplearning.wrapper(..., hidden = hidden, activation = activation, seed = seed)
}
h2olearner <- c("h2o.glm.1", "h2o.glm.0.5", "h2o.randomForest.1", 
                "h2o.deeplearning.1", "h2o.gbm.wrapper")
tmleCom_Options(Qestimator = "h2o__ensemble", gestimator = "h2o__ensemble",
                SL.library = c("SL.glm", "SL.glmnet", "SL.ridge", "SL.stepAIC"), CVfolds = 5,
                h2ometalearner = "h2o.deeplearning.wrapper", h2olearner = h2olearner)

# using "h2o.deeplearning.wrapper" for h2ometalearner
tmleCom_Options(Qestimator = "h2o__ensemble", gestimator = "h2o__ensemble",
                SL.library = c("SL.glm", "SL.glmnet", "SL.ridge", "SL.stepAIC"), CVfolds = 5,
                h2ometalearner = "h2o.deeplearning.wrapper", h2olearner = h2olearner)

# 1.4 using sl3
library(sl3)
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)

sl3_learners <- list(
  rf = make_learner(Lrnr_randomForest),
  xgb = make_learner(Lrnr_xgboost),
  svm = make_learner(Lrnr_svm),
  glmnet = make_learner(Lrnr_glmnet),
  glm_fast = make_learner(Lrnr_glm_fast),
  screened_glm = screen_and_glm,
  mean = make_learner(Lrnr_mean)
)

logit_metalearner <- make_learner(
  Lrnr_optim,
  loss_function = loss_loglik_binomial,
  learner_function = metalearner_logistic_binomial
)

tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "sl3_pipelines", 
                maxNperBin = N, nbins = 5, bin.method = "equal.mass",
                sl3_learners = sl3_learners, sl3_metalearner = logit_metalearner)
  
#***************************************************************************************
# Example 2: Define the values of bin cutoffs for continuous outcome in different ways
# through three arguments - bin.method, nbins, maxNperBin 
#***************************************************************************************
# 2.1 using equal-length method
# discretize a continuous outcome variable into 10 bins, no more than 1000 obs in each bin 
tmleCom_Options(bin.method = "equal.len", nbins = 10, maxNperBin = 1000)

# 2.2 find a compromise between equal-mass and equal-length method
# discretize into 5 bins (default), and no more than 5000 obs in each bin
tmleCom_Options(bin.method = "dhist", nbins = 10, maxNperBin = 5000)

# 2.3 Default to use equal-mass method with 5 bins, no more than 500 obs in each bin
tmleCom_Options()

## End(Not run)

chizhangucb/tmleCommunity documentation built on May 20, 2019, 3:34 p.m.

chizhangucb/tmleCommunity index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

chizhangucb/tmleCommunity
Targeted Maximum Likelihood Estimation for Hierarchical Data

tmleCom_Options: Setting all possible options for 'tmleCommunity'
In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

Description

Usage

Arguments

Value

See Also

Examples

Related to tmleCom_Options in chizhangucb/tmleCommunity...

R Package Documentation

Browse R Packages

We want your feedback!

chizhangucb/tmleCommunity Targeted Maximum Likelihood Estimation for Hierarchical Data

tmleCom_Options: Setting all possible options for 'tmleCommunity' In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data

Description

Usage

Arguments

Value

See Also

Examples

Related to tmleCom_Options in chizhangucb/tmleCommunity...

R Package Documentation

Browse R Packages

We want your feedback!

chizhangucb/tmleCommunity
Targeted Maximum Likelihood Estimation for Hierarchical Data

tmleCom_Options: Setting all possible options for 'tmleCommunity'
In chizhangucb/tmleCommunity: Targeted Maximum Likelihood Estimation for Hierarchical Data