hoeffding_tree: Hoeffding trees

View source: R/hoeffding_tree.R

hoeffding_treeR Documentation

Hoeffding trees

Description

An implementation of Hoeffding trees, a form of streaming decision tree for classification. Given labeled data, a Hoeffding tree can be trained and saved for later use, or a pre-trained Hoeffding tree can be used for predicting the classifications of new points.

Usage

hoeffding_tree(
  batch_mode = FALSE,
  bins = NA,
  confidence = NA,
  info_gain = FALSE,
  input_model = NA,
  labels = NA,
  max_samples = NA,
  min_samples = NA,
  numeric_split_strategy = NA,
  observations_before_binning = NA,
  passes = NA,
  test = NA,
  test_labels = NA,
  training = NA,
  verbose = getOption("mlpack.verbose", FALSE)
)

Arguments

batch_mode

If true, samples will be considered in batch instead of as a stream. This generally results in better trees but at the cost of memory usage and runtime. Default value "FALSE" (logical).

bins

If the 'domingos' split strategy is used, this specifies the number of bins for each numeric split. Default value "10" (integer).

confidence

Confidence before splitting (between 0 and 1). Default value "0.95" (numeric).

info_gain

If set, information gain is used instead of Gini impurity for calculating Hoeffding bounds. Default value "FALSE" (logical).

input_model

Input trained Hoeffding tree model (HoeffdingTreeModel).

labels

Labels for training dataset (integer row).

max_samples

Maximum number of samples before splitting. Default value "5000" (integer).

min_samples

Minimum number of samples before splitting. Default value "100" (integer).

numeric_split_strategy

The splitting strategy to use for numeric features: 'domingos' or 'binary'. Default value "binary" (character).

observations_before_binning

If the 'domingos' split strategy is used, this specifies the number of samples observed before binning is performed. Default value "100" (integer).

passes

Number of passes to take over the dataset. Default value "1" (integer).

test

Testing dataset (may be categorical) (numeric matrix/data.frame with info).

test_labels

Labels of test data (integer row).

training

Training dataset (may be categorical) (numeric matrix/data.frame with info).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).

Details

This program implements Hoeffding trees, a form of streaming decision tree suited best for large (or streaming) datasets. This program supports both categorical and numeric data. Given an input dataset, this program is able to train the tree with numerous training options, and save the model to a file. The program is also able to use a trained model or a model from file in order to predict classes for a given test set.

The training file and associated labels are specified with the "training" and "labels" parameters, respectively. Optionally, if "labels" is not specified, the labels are assumed to be the last dimension of the training dataset.

The training may be performed in batch mode (like a typical decision tree algorithm) by specifying the "batch_mode" option, but this may not be the best option for large datasets.

When a model is trained, it may be saved via the "output_model" output parameter. A model may be loaded from file for further training or testing with the "input_model" parameter.

Test data may be specified with the "test" parameter, and if performance statistics are desired for that test set, labels may be specified with the "test_labels" parameter. Predictions for each test point may be saved with the "predictions" output parameter, and class probabilities for each prediction may be saved with the "probabilities" output parameter.

Value

A list with several components:

output_model

Output for trained Hoeffding tree model (HoeffdingTreeModel).

predictions

Matrix to output label predictions for test data into (integer row).

probabilities

In addition to predicting labels, provide rediction probabilities in this matrix (numeric matrix).

Author(s)

mlpack developers

Examples

# For example, to train a Hoeffding tree with confidence 0.99 with data
# "dataset", saving the trained tree to "tree", the following command may be
# used:

## Not run: 
output <- hoeffding_tree(training=dataset, confidence=0.99)
tree <- output$output_model

## End(Not run)

# Then, this tree may be used to make predictions on the test set "test_set",
# saving the predictions into "predictions" and the class probabilities into
# "class_probs" with the following command: 

## Not run: 
output <- hoeffding_tree(input_model=tree, test=test_set)
predictions <- output$predictions
class_probs <- output$probabilities

## End(Not run)

mlpack documentation built on Oct. 5, 2024, 9:08 a.m.