EnsemForest: Ensemble Forest (EF)

View source: R/EnsemForest.R

EnsemForestR Documentation

Ensemble Forest (EF)

Description

Build and/or predict on an ensemble regression forest. Two implementation are provided for fitting the forest. One treats each site as a distinct factor (implemented with ranger); Another uses mean encoding for site index (implemented with grf).

Usage

EnsemForest(
  coord_id,
  aug_df,
  site,
  covars,
  honest = FALSE,
  is_pred = FALSE,
  is_encode = FALSE,
  myfit = NULL,
  est_leaves = NULL,
  honest_y = NULL,
  site_enc_tab = NULL,
  ...
)

Arguments

coord_id

Site index for coordinating site.

aug_df

The augmented data frame used to fit an ensemble forest ('data.table').

site

Variable name for site indicator.

covars

A vector of covariate names used.

honest

Whether to use honest splitting (i.e., subsample splitting). Default is FALSE.

is_pred

Whether to build an ensemble forest or make prediction. Default is FALSE.

is_encode

Whether to treat each site as a distinct factor or use mean encoding as surrogate for site index. Useful when the number of underlying groups are known. Default is FALSE

myfit

A fitted ensemble forest (for prediction purpose). Default is NULL.

est_leaves

A matrix of the number of observations in the augmented data times the number of trees for assignment of terminal nodes of each tree for the honest sample/estimation set (for honest prediction purpose). Default is NULL. If "honest" is set to FALSE, "est_leaves" is NULL; If both "is_pred" and "honest" are TRUE, "est_leaves" should not be NULL.

honest_y

A vector of honest estimates for the honest sample/estimation set (for honest prediction purpose). Default is NULL. If "honest" is set to FALSE, "honest_y" is NULL; If both "is_pred" and "honest" are TRUE, "honest_y" should not be NULL.

site_enc_tab

A data.table of mean outcome for each site. Default is NULL. If both "is_pred" and "is_encode" are set to TRUE, "site_enc_tab" should not be NULL.

\dots

Additional arguments for building the forest.

Value

Training: return a fitted ensemble forest and OOB predictions of the input data; Prediction: return predictions of the input data.

Examples

data(SimDataLst)
K <- length(SimDataLst)
covars <- grep("^X", names(SimDataLst[[1]]), value=TRUE)
fit_lst <- list()
for (k in 1:K) {
    tmpdf <- SimDataLst[[k]]
    # use your estimator of interest
    fit_lst[[k]] <- grf::causal_forest(X=as.matrix(tmpdf[, covars, with=FALSE]),
                                       Y=tmpdf$Y, W=tmpdf$Z)
}

coord_id <- 1
coord_test <- GenSimData(coord_id)

coord_df <- SimDataLst[[coord_id]]
aug_df <- GenAugData(coord_id, coord_df, fit_lst, covars)

## Treat each site as a distinct factor
res_ef <- EnsemForest(coord_id, aug_df, "site", covars)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, is_pred=TRUE,
            myfit=res_ef$myfit, est_leaves=res_ef$est_leaves, honest_y=res_ef$honest_y)

res_ef <- EnsemForest(coord_id, aug_df, "site", covars, honest=TRUE)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, honest=TRUE, is_pred=TRUE,
            myfit=res_ef$myfit, est_leaves=res_ef$est_leaves, honest_y=res_ef$honest_y)

## Mean encoding as surrogate for site index
res_ef <- EnsemForest(coord_id, aug_df, "site", covars, is_encode=TRUE)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, is_pred=TRUE, is_encode=TRUE,
            myfit=res_ef$myfit, site_enc_tab=res_ef$site_enc_tab)

res_ef <- EnsemForest(coord_id, aug_df, "site", covars, honest=TRUE, is_encode=TRUE)
ef_hat <- EnsemForest(coord_id, coord_test, "site", covars, honest=TRUE,
            is_pred=TRUE, is_encode=TRUE,
            myfit=res_ef$myfit, site_enc_tab=res_ef$site_enc_tab)



ellenxtan/ifedtree documentation built on March 28, 2023, 9:09 a.m.