Cross-validation with single algorithm"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "../man/figures/README-"
  )

library(dplyr)

load("../data/star.rda")

# specifying the outcome
outcomes <- "g3tlangss"

# specifying the treatment
treatment <- "treatment"

# specifying the data (remove other outcomes)
star_data <- star %>% dplyr::select(-c(g3treadss,g3tmathss))

# specifying the formula
user_formula <- as.formula(
  "g3tlangss ~ treatment + gender + race + birthmonth + 
  birthyear + SCHLURBN + GRDRANGE + GKENRMNT + GKFRLNCH + 
  GKBUSED + GKWHITE ")

When users choose to estimate and evaluate ITR under cross-validation, the package implements Algorithm 1 from Imai and Li (2023) to estimate and evaluate ITR. For more information about Algorithm 1, please refer to the this page.

Instead of specifying the split_ratio argument, we choose the number of folds (n_folds). We present an example of estimating ITR with 3 folds cross-validation. In practice, we recommend using 10 folds to get a more stable model performance.

| Input | R package input | Descriptions | |:-----------------------------------------------------------------|:---------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Data $\mathbf{Z}=\left{\mathbf{X}i, T_i, Y_i\right}{i=1}^n$ | treatment = treatment, form = user_formula, data = star_data | treatment is a character string specifying the treatment variable in the data; form is a formula specifying the outcome and covariates; and a dataframe data | | Machine learning algorithm $F$ | algorithms = c("causal_forest") | a character vector specifying the ML algorithms to be used | | Evaluation metric $\tau_f$ | PAPE, PAPD, AUPEC, GATE | By default | | Number of folds $K$ | n_folds = 3 | n_folds is a numeric value indicating the number of folds used for cross-validation | | … | budget = 0.2 | budget is a numeric value specifying the maximum percentage of population that can be treated under the budget constraint |

library(evalITR)
# estimate ITR 
set.seed(2021)
fit_cv <- estimate_itr(
               treatment = treatment,
               form = user_formula,
               data = star_data,
               algorithms = c("causal_forest"),
               budget = 0.2,
               n_folds = 3)

The output will be an object that includes estimated evaluation metric $\hat{\tau}_F$ and the estimated variance of $\hat{\tau}_F$ for different metrics (PAPE, PAPD, AUPEC).

# evaluate ITR 
est_cv <- evaluate_itr(fit_cv)

# summarize estimates
summary(est_cv)


Try the evalITR package in your browser

Any scripts or data that you put into this service are public.

evalITR documentation built on Aug. 26, 2023, 1:08 a.m.