random_forest: Random forests

View source: R/random_forest.R

random_forestR Documentation

Random forests

Description

An implementation of the standard random forest algorithm by Leo Breiman for classification. Given labeled data, a random forest can be trained and saved for future use; or, a pre-trained random forest can be used for classification.

Usage

random_forest(
  input_model = NA,
  labels = NA,
  maximum_depth = NA,
  minimum_gain_split = NA,
  minimum_leaf_size = NA,
  num_trees = NA,
  print_training_accuracy = FALSE,
  seed = NA,
  subspace_dim = NA,
  test = NA,
  test_labels = NA,
  training = NA,
  verbose = FALSE,
  warm_start = FALSE
)

Arguments

input_model

Pre-trained random forest to use for classification (RandomForestModel).

labels

Labels for training dataset (integer row).

maximum_depth

Maximum depth of the tree (0 means no limit). Default value "0" (integer).

minimum_gain_split

Minimum gain needed to make a split when building a tree. Default value "0" (numeric).

minimum_leaf_size

Minimum number of points in each leaf node. Default value "1" (integer).

num_trees

Number of trees in the random forest. Default value "10" (integer).

print_training_accuracy

If set, then the accuracy of the model on the training set will be predicted (verbose must also be specified). Default value "FALSE" (logical).

seed

Random seed. If 0, 'std::time(NULL)' is used. Default value "0" (integer).

subspace_dim

Dimensionality of random subspace to use for each split. '0' will autoselect the square root of data dimensionality. Default value "0" (integer).

test

Test dataset to produce predictions for (numeric matrix).

test_labels

Test dataset labels, if accuracy calculation is desired (integer row).

training

Training dataset (numeric matrix).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "FALSE" (logical).

warm_start

If true and passed along with 'training' and 'input_model' then trains more trees on top of existing model. Default value "FALSE" (logical).

Details

This program is an implementation of the standard random forest classification algorithm by Leo Breiman. A random forest can be trained and saved for later use, or a random forest may be loaded and predictions or class probabilities for points may be generated.

The training set and associated labels are specified with the "training" and "labels" parameters, respectively. The labels should be in the range [0, num_classes - 1]. Optionally, if "labels" is not specified, the labels are assumed to be the last dimension of the training dataset.

When a model is trained, the "output_model" output parameter may be used to save the trained model. A model may be loaded for predictions with the "input_model"parameter. The "input_model" parameter may not be specified when the "training" parameter is specified. The "minimum_leaf_size" parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The "num_trees" controls the number of trees in the random forest. The "minimum_gain_split" parameter controls the minimum required gain for a decision tree node to split. Larger values will force higher-confidence splits. The "maximum_depth" parameter specifies the maximum depth of the tree. The "subspace_dim" parameter is used to control the number of random dimensions chosen for an individual node's split. If "print_training_accuracy" is specified, the calculated accuracy on the training set will be printed.

Test data may be specified with the "test" parameter, and if performance measures are desired for that test set, labels for the test points may be specified with the "test_labels" parameter. Predictions for each test point may be saved via the "predictions"output parameter. Class probabilities for each prediction may be saved with the "probabilities" output parameter.

Value

A list with several components:

output_model

Model to save trained random forest to (RandomForestModel).

predictions

Predicted classes for each point in the test set (integer row).

probabilities

Predicted class probabilities for each point in the test set (numeric matrix).

Author(s)

mlpack developers

Examples

# For example, to train a random forest with a minimum leaf size of 20 using
# 10 trees on the dataset contained in "data"with labels "labels", saving the
# output random forest to "rf_model" and printing the training error, one
# could call

## Not run: 
output <- random_forest(training=data, labels=labels, minimum_leaf_size=20,
  num_trees=10, print_training_accuracy=TRUE)
rf_model <- output$output_model

## End(Not run)

# Then, to use that model to classify points in "test_set" and print the test
# error given the labels "test_labels" using that model, while saving the
# predictions for each point to "predictions", one could call 

## Not run: 
output <- random_forest(input_model=rf_model, test=test_set,
  test_labels=test_labels)
predictions <- output$predictions

## End(Not run)

mlpack documentation built on Oct. 29, 2022, 1:06 a.m.

Related to random_forest in mlpack...