grow.forest: Growing random decision forest classifier

Description Usage Arguments Details Examples

View source: R/grow.forest.r

Description

Grow random decision forest classifier

Usage

1
2
3
4
5
grow.forest(formula, data, subset, na.action,
    impurity.function = "gini", 
    model = FALSE, x = FALSE, y = FALSE,
    min_node_obs, max_depth, 
    numsamps, numvars, numboots)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which grow.forest is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action.

impurity.function

the impurity function to be used to fit decision trees, currently only "gini" is supported.

model, x, y

logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response) are returned.

min_node_obs

the minimum number of observations required for a node to be split. If not provided as input, the package will attempt to choose a reasonable value.

max_depth

the deepest that a tree should be fit (root node is at depth 0). If not provided as input, the package will attempt to choose a reasonable value.

numsamps

number of samples to draw with replacement for each tree in the forest (bootstrapped sample). If not provided as input, the package will attempt to choose a reasonable value.

numvars

number of variables to be randomly selected without replacement for each tree in the forest. If not provided as input, the package will attempt to choose a reasonable value.

numboots

number of trees in the forest. If not provided as input, the package will attempt to choose a reasonable value.

Details

Bootstrapped samples will be automatically balanced between dependent variable classes. The number of sampled observations per tree will be increased as necessary to achieve a number that can divide the number of dependent variable classes so that bootstrapped samples will be balanced. The number of distinct values that the dependent variable has must be exactly two. Predictor variables should only be continuous, ordinal, or categorical with only two categories (do not include nominal variables or categorical variables with three or more categories).

Examples

1
2
3
4
5
  data(easy_2var_data)
  
  fforest = grow.forest(Y~X1+X2, data=easy_2var_data, 
    min_node_obs=5, max_depth=10,
    numsamps=90, numvars=1, numboots=5)

ParallelForest documentation built on May 29, 2017, 5:45 p.m.