CVtreeMLE: Fit ensemble decision trees to a vector of exposures and use...
In blind-contours/CVtreeMLE: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation

CVtreeMLE

R Documentation

Fit ensemble decision trees to a vector of exposures and use targeted maximum likelihood estimation to determine the average treatment effect in each leaf of best fitting tree

Description

Fit ensemble decision trees on a mixed exposure while controlling for covariates using iterative backfitting of two Super Learners. If partitioning nodes are identified, use these partitions as a rule-based exposure. The CV-TMLE framework is used to create training and estimation samples. Trees are fit to the training and the average treatment effect (ATE) of the rule-based exposure is estimated in the validation folds. Any type of mixed exposure (continuous, binary, multinomial) is accepted. The ATE for multiple mixture components (interactions) are given as well as marginal effects if data-adaptively identified.

Usage

CVtreeMLE(
  w,
  a,
  y,
  data,
  w_stack = NULL,
  aw_stack = NULL,
  a_stack = NULL,
  n_folds,
  seed = 6442,
  family,
  parallel = TRUE,
  parallel_cv = TRUE,
  parallel_type = "multi_session",
  num_cores = 2,
  h_aw_trunc_lvl = 50,
  pooled_rule_type = "average",
  min_max = "min",
  region = NULL,
  min_obs = 25
)

Arguments

`w`	A character vector indicating which variables in the data to use as baseline covariates.
`a`	A character vector indicating which variables in the data to use as exposures.
`y`	A character indicating which variable in the data to use as the outcome.
`data`	Data frame of (W,A,Y) variables of interest.
`w_stack`	Stack of estimators used in the Super Learner during the iterative backfitting for `Y\|W`, this should be an SL3 stack. If not provided, `utils_create_sls` is used to create default estimators used in the ensemble.
`aw_stack`	Stack of estimators used in the Super Learner for the Q and g mechanisms. If not provided, `utils_create_sls` is used to create default estimators used in the ensemble.
`a_stack`	Stack of estimators used in the Super Learner during the iterative backfitting for `Y\|A`, this should be an SL3 object. If not provided, `utils_create_sls` is used to create default decision tree estimators used in the ensemble.
`n_folds`	Number of cross-validation folds.
`seed`	Pass in a seed number for consistency of results. If not provided a default seed is generated.
`family`	Family ('binomial' or 'continuous').
`parallel`	Use parallel processing if a backend is registered; enabled by default.
`parallel_cv`	Use parallel processing on CV procedure vs. parallel processing on Super Learner model fitting
`parallel_type`	default is `multi_session`, if parallel is true which type of parallelization to do `multi_session` or `multicore`
`num_cores`	If using parallel, the number of cores to parallelize over
`h_aw_trunc_lvl`	Level to truncate the clever covariate to control variance, default is 10.
`pooled_rule_type`	is "average" or "union" how to construct the rule across folds. The average take the average cutpoints and returns an average rule with lower and upper bounds for each cutpoint. The union rule creates a new rule that is the space that contains all the rules found across the fold and is therefore more conservative.
`min_max`	Which oracle region to go after the one that minimizes or maximizes the outcome.
`region`	If a predetermined region is of interest, put here: like "A < 0.02"
`min_obs`	Minimum number of observations to have in a region.

Details

The function performs the following functions.

Imputes missing values with the mean and creates dummy indicator variables for imputed variables.
Separate out covariates into factors and continuous (ordered).
Create a variable which indicates the fold number assigned to each observation.
Fit iterative backfitting algorithm onto the mixed exposure which applies ensemble decision trees to the mixed exposure and an unrestricted Super Learner on the covariates. Algorithms are fit, offset by their compliment until there is virtually no difference between the model fits. Extract partition nodes found for the mixture. This is done on each training fold data.
Fit iterative backfitting algorithm onto each individual mixture component which applies ensemble decision trees to the mixed exposure and an unrestricted Super Learner on the covariates. Algorithms are fit, offset by their compliment until there is virtually no difference between the model fits. Extract partition nodes found for the mixture. This is done on each training fold data.
Estimate nuisance parameters (Q and g estimates) for mixture interaction rule
Estimate nuisance parameters (Q and g estimates) for marginal rules
Estimate the Q outcome mechanism over all the marginal rules for later user input for targeted ATE for different marginal combinations based on data-adaptively identified thresholds.
Use the mixture rules and data and do a TMLE fluctuation step to target the ATE for the given rule across all the folds. Calculate proportion of folds the rule is found.
Use the marginal rules and data and do a TMLE fluctuation step to target the ATE for the given rule across all the folds. Calculate proportion of folds the rule is found.
Calculate V-fold specific TMLE estimates of the rules.
For the mixture rules, calculate a union rule or the rule that covers all the observations across the folds that the respective variable set in the rule.
For the marginal rules, calculate a union rule or the rule that covers all the observations across the folds that the respective variable set in the rule.

Value

Object of class CVtreeMLE, containing a list of table results for: marginal ATEs, mixture ATEs, RMSE of marginal model fits, RMSE of mixture model fits, marginal rules, and mixture rules.

Model RMSEs: Root mean square error for marginal and interaction models in the iterative backfitting procedure
Pooled TMLE Marginal Results: Data frame of pooled TMLE Marginal Results: Pooled ATE results using TMLE for thresholds identified for each mixture component found
V-Specific Marg Results: A list of the v-fold marginal results. These are grouped by variable and direction of the ATE.
Pooled TMLE Mixture Results: Data frame of pooled TMLE Mixture Results
V-Specific Mix Results: A list of the v-fold mixture results. These are grouped by variable and direction of the ATE.
Pooled Marginal Refs: A data frame of the reference categories determined in each of the marginal results.
Marginal Rules: A data frame that includes the marginal rules and details related to fold found and RMSE
Mixture Rules: A data frame that includes the mixture rules and details related to fold found and RMSE

Authors

David McCoy, University of California, Berkeley

References

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289-300.

Gruber, S., & van der Laan, M. J. (2012). tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(i13).

Hubbard, A. E., Kherad-Pajouh, S., & van der Laan, M. J. (2016). Statistical Inference for Data Adaptive Target Parameters. The international journal of biostatistics, 12(1), 3-19.

Hubbard, A., Munoz, I. D., Decker, A., Holcomb, J. B., Schreiber, M. A., Bulger, E. M., ... & Rahbar, M. H. (2013). Time-Dependent Prediction and Evaluation of Variable Importance Using SuperLearning in High Dimensional Clinical Data. The journal of trauma and acute care surgery, 75(1 0 1), S53.

Hubbard, A. E., & van der Laan, M. J. (2016). Mining with inference: data-adaptive target parameters (pp. 439-452). In P. Buhlmann et al. (Ed.), Handbook of Big Data. CRC Press, Taylor & Francis Group, LLC: Boca Raton, FL.

van der Laan, M. J. (2006). Statistical inference for variable importance. The International Journal of Biostatistics, 2(1).

van der Laan, M. J., & Pollard, K. S. (2003). A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 117(2), 275-303.

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

Examples

n <- 800
p <- 4
x <- matrix(rnorm(n * p), n, p)
colnames(x) <- c("A1", "A2", "W1", "W2")
y_prob <- plogis(3 * sin(x[, 1]) + sin(x[, 2]), sin(x[, 4]))
Y <- rbinom(n = n, size = 1, prob = y_prob)
data <- as.data.frame(cbind(x, Y))

CVtreeMLE_fit <- CVtreeMLE(
  data = data,
  w = c("W1", "W2"),
  a = c("A1", "A2"),
  y = "Y",
  family = "binomial",
  parallel = FALSE,
  n_folds = 2
)

blind-contours/CVtreeMLE documentation built on June 22, 2024, 8:53 p.m.

blind-contours/CVtreeMLE index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

blind-contours/CVtreeMLE
Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation

CVtreeMLE: Fit ensemble decision trees to a vector of exposures and use...
In blind-contours/CVtreeMLE: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation

Fit ensemble decision trees to a vector of exposures and use targeted maximum likelihood estimation to determine the average treatment effect in each leaf of best fitting tree

Description

Usage

Arguments

Details

Value

Authors

References

Examples

Related to CVtreeMLE in blind-contours/CVtreeMLE...

R Package Documentation

Browse R Packages

We want your feedback!

blind-contours/CVtreeMLE Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation

CVtreeMLE: Fit ensemble decision trees to a vector of exposures and use... In blind-contours/CVtreeMLE: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation

Fit ensemble decision trees to a vector of exposures and use targeted maximum likelihood estimation to determine the average treatment effect in each leaf of best fitting tree

Description

Usage

Arguments

Details

Value

Authors

References

Examples

Related to CVtreeMLE in blind-contours/CVtreeMLE...

R Package Documentation

Browse R Packages

We want your feedback!

blind-contours/CVtreeMLE
Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation

CVtreeMLE: Fit ensemble decision trees to a vector of exposures and use...
In blind-contours/CVtreeMLE: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation