EnsemForest | R Documentation |
Build and/or predict on an ensemble regression forest. Two implementation are provided for fitting the forest. One treats each site as a distinct factor (implemented with ranger); Another uses mean encoding for site index (implemented with grf).
EnsemForest(
coord_id,
aug_df,
site,
covars,
honest = FALSE,
is_pred = FALSE,
is_encode = FALSE,
myfit = NULL,
est_leaves = NULL,
honest_y = NULL,
site_enc_tab = NULL,
...
)
coord_id |
Site index for coordinating site. |
aug_df |
The augmented data frame used to fit an ensemble forest ('data.table'). |
site |
Variable name for site indicator. |
covars |
A vector of covariate names used. |
honest |
Whether to use honest splitting (i.e., subsample splitting). Default is FALSE. |
is_pred |
Whether to build an ensemble forest or make prediction. Default is FALSE. |
is_encode |
Whether to treat each site as a distinct factor or use mean encoding as surrogate for site index. Useful when the number of underlying groups are known. Default is FALSE |
myfit |
A fitted ensemble forest (for prediction purpose). Default is NULL. |
est_leaves |
A matrix of the number of observations in the augmented data times the number of trees for assignment of terminal nodes of each tree for the honest sample/estimation set (for honest prediction purpose). Default is NULL. If "honest" is set to FALSE, "est_leaves" is NULL; If both "is_pred" and "honest" are TRUE, "est_leaves" should not be NULL. |
honest_y |
A vector of honest estimates for the honest sample/estimation set (for honest prediction purpose). Default is NULL. If "honest" is set to FALSE, "honest_y" is NULL; If both "is_pred" and "honest" are TRUE, "honest_y" should not be NULL. |
site_enc_tab |
A data.table of mean outcome for each site. Default is NULL. If both "is_pred" and "is_encode" are set to TRUE, "site_enc_tab" should not be NULL. |
\dots |
Additional arguments for building the forest. |
Training: return a fitted ensemble forest and OOB predictions of the input data; Prediction: return predictions of the input data.
data(SimDataLst)
K <- length(SimDataLst)
covars <- grep("^X", names(SimDataLst[[1]]), value=TRUE)
fit_lst <- list()
for (k in 1:K) {
tmpdf <- SimDataLst[[k]]
# use your estimator of interest
fit_lst[[k]] <- grf::causal_forest(X=as.matrix(tmpdf[, covars, with=FALSE]),
Y=tmpdf$Y, W=tmpdf$Z)
}
coord_id <- 1
coord_test <- GenSimData(coord_id)
coord_df <- SimDataLst[[coord_id]]
aug_df <- GenAugData(coord_id, coord_df, fit_lst, covars)
## Treat each site as a distinct factor
res_ef <- EnsemForest(coord_id, aug_df, "site", covars)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, is_pred=TRUE,
myfit=res_ef$myfit, est_leaves=res_ef$est_leaves, honest_y=res_ef$honest_y)
res_ef <- EnsemForest(coord_id, aug_df, "site", covars, honest=TRUE)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, honest=TRUE, is_pred=TRUE,
myfit=res_ef$myfit, est_leaves=res_ef$est_leaves, honest_y=res_ef$honest_y)
## Mean encoding as surrogate for site index
res_ef <- EnsemForest(coord_id, aug_df, "site", covars, is_encode=TRUE)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, is_pred=TRUE, is_encode=TRUE,
myfit=res_ef$myfit, site_enc_tab=res_ef$site_enc_tab)
res_ef <- EnsemForest(coord_id, aug_df, "site", covars, honest=TRUE, is_encode=TRUE)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, honest=TRUE,
is_pred=TRUE, is_encode=TRUE,
myfit=res_ef$myfit, site_enc_tab=res_ef$site_enc_tab)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.