f_holdoutRF: Efficient Variable Importance Using Holdout Forests

Description Usage Arguments

View source: R/MSE_Test_File.R

Description

Runs variable importance for all variables by using a single forest. Traditional random forest works by restricting the features available to split upon in each node in the tree. Holdout forests modify this by instead restricting features available for the entire tree. This induces a natural sorting of variables into trees with and without the variables. Each collection of these trees forms a random forest, which is then run through the MSE Test procedure.

Usage

1
2
3
f_holdoutRF(X, y, X.test = FALSE, y.test = FALSE, B = 1000,
  NTest = nrow(X.test), mintree = 30, max.trees = 5 * ncol(X) * mintree,
  verbose = F, mtry = ncol(X)/3, p = 0.5, keep_forest = F, ...)

Arguments

X

Data frame of covariates - the training data.

y

Response vector. Currently only numeric responses (regression) are supported.

X.test

Covariates of the test set with which the MSE is calculated.

y.test

Responses in the test set with which the MSE is calculated.

base.learner

One of "rpart", "ctree", "rtree", or "lm". Base model to be used in the bagging.

single_forest

Logical. If TRUE, then all variables are compared against a single original forest.

NTest

If X.test, y.test are not specified, this number of test points are drawn at random from X, y.

Nbtree

How many trees should be used for each variable. Can either be a single number, or a vector of length ncol(X), with each entry corresponding to the number of trees used for testing each column.

verbose

Logical. Should a progress tracker be output in the console?

keep_forest

Logical. Should the original random forest be returned?

mtry

The mtry parameter used in random forest models.

p

Subsample size exponent, see MSE_Test.

...

Additional arguments to be passed to MSE_Test.

mintree
max.trees

tim-coleman/RFtest documentation built on March 10, 2020, 12:28 p.m.