knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "../man/figures/README-" ) library(dplyr) load("../data/star.rda") # specifying the outcome outcomes <- "g3tlangss" # specifying the treatment treatment <- "treatment" # specifying the data (remove other outcomes) star_data <- star %>% dplyr::select(-c(g3treadss,g3tmathss)) # specifying the formula user_formula <- as.formula( "g3tlangss ~ treatment + gender + race + birthmonth + birthyear + SCHLURBN + GRDRANGE + GKENRMNT + GKFRLNCH + GKBUSED + GKWHITE ")
We can train the model with the caret
package (for further information about caret
, see the original website). We use parallel computing to speed up the computation.
# parallel computing library(doParallel) cl <- makePSOCKcluster(2) registerDoParallel(cl) # stop after finishing the computation stopCluster(cl)
The following example shows how to estimate the ITR with gradient boosting machine (GBM) using the caret
package. Note that we have already loaded the data and specify the treatment, outcome, and covariates as shown in the Sample Splitting vignette. Since we are using the caret
package, we need to specify the trainControl
and/or tuneGrid
arguments. The trainControl
argument specifies the cross-validation method and the tuneGrid
argument specifies the tuning grid. For more information about these arguments, please refer to the caret website.
We estimate the ITR with only one machine learning algorithm (GBM) and evaluate the ITR with the evaluate_itr()
function. To compute PAPDp
, we need to specify the algorithms
argument with more than 2 machine learning algorithms.
library(evalITR) library(caret) # specify the trainControl method fitControl <- caret::trainControl( method = "repeatedcv", # 3-fold CV number = 3, # repeated 3 times repeats = 3, search='grid', allowParallel = TRUE) # grid search # specify the tuning grid gbmGrid <- expand.grid( interaction.depth = c(5,9), n.trees = (5:10)*100, shrinkage = 0.1, n.minobsinnode = 20) # estimate ITR fit_caret <- estimate_itr( treatment = "treatment", form = user_formula, trControl = fitControl, data = star_data, algorithms = c("gbm"), budget = 0.2, split_ratio = 0.7, tuneGrid = gbmGrid, verbose = FALSE) # evaluate ITR est_caret <- evaluate_itr(fit_caret)
We can extract the training model from caret
and check the model performance. Other functions from caret
can be applied to the training model.
# extract the final model caret_model <- fit_caret$estimates$models$gbm print(caret_model$finalModel) # check model performance trellis.par.set(caretTheme()) # theme plot(caret_model) # heatmap plot( caret_model, plotType = "level", scales = list(x = list(rot = 90)))
Alternatively, we can train the model with the SuperLearner
package (for further information about SuperLearner
, see the original website). SuperLearner utilizes ensemble method by taking optimal weighted average of multiple machine learning algorithms to improve model performance.
We will compare the performance of the ITR estimated with causal_forest
and SuperLearner
.
library(SuperLearner) fit_sl <- estimate_itr( treatment = "treatment", form = user_formula, data = star_data, algorithms = c("causal_forest","SuperLearner"), budget = 0.2, split_ratio = 0.7, SL_library = c("SL.ranger", "SL.glmnet")) est_sl <- evaluate_itr(fit_sl) # summarize estimates summary(est_sl)
We plot the estimated Area Under the Prescriptive Effect Curve for the writing score across a range of budget constraints, seperately for the two ITRs, estimated with causal_forest
and SuperLearner
.
# plot the AUPEC plot(est_sl)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.