decision_tree: Decision tree

Description Usage Arguments Details Value Author(s) Examples

View source: R/decision_tree.R

Description

An implementation of an ID3-style decision tree for classification, which supports categorical data. Given labeled data with numeric or categorical features, a decision tree can be trained and saved; or, an existing decision tree can be used for classification on new points.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
decision_tree(
  input_model = NA,
  labels = NA,
  maximum_depth = NA,
  minimum_gain_split = NA,
  minimum_leaf_size = NA,
  print_training_accuracy = FALSE,
  print_training_error = FALSE,
  test = NA,
  test_labels = NA,
  training = NA,
  verbose = FALSE,
  weights = NA
)

Arguments

input_model

Pre-trained decision tree, to be used with test points (DecisionTreeModel).

labels

Training labels (integer row).

maximum_depth

Maximum depth of the tree (0 means no limit). Default value "0" (integer).

minimum_gain_split

Minimum gain for node splitting. Default value "1e-07" (numeric).

minimum_leaf_size

Minimum number of points in a leaf. Default value "20" (integer).

print_training_accuracy

Print the training accuracy. Default value "FALSE" (logical).

print_training_error

Print the training error (deprecated; will be removed in mlpack 4.0.0). Default value "FALSE" (logical).

test

Testing dataset (may be categorical) (numeric matrix/data.frame with info).

test_labels

Test point labels, if accuracy calculation is desired (integer row).

training

Training dataset (may be categorical) (numeric matrix/data.frame with info).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "FALSE" (logical).

weights

The weight of label (numeric matrix).

Details

Train and evaluate using a decision tree. Given a dataset containing numeric or categorical features, and associated labels for each point in the dataset, this program can train a decision tree on that data.

The training set and associated labels are specified with the "training" and "labels" parameters, respectively. The labels should be in the range [0, num_classes - 1]. Optionally, if "labels" is not specified, the labels are assumed to be the last dimension of the training dataset.

When a model is trained, the "output_model" output parameter may be used to save the trained model. A model may be loaded for predictions with the "input_model" parameter. The "input_model" parameter may not be specified when the "training" parameter is specified. The "minimum_leaf_size" parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The "minimum_gain_split" parameter specifies the minimum gain that is needed for the node to split. The "maximum_depth" parameter specifies the maximum depth of the tree. If "print_training_error" is specified, the training error will be printed.

Test data may be specified with the "test" parameter, and if performance numbers are desired for that test set, labels may be specified with the "test_labels" parameter. Predictions for each test point may be saved via the "predictions" output parameter. Class probabilities for each prediction may be saved with the "probabilities" output parameter.

Value

A list with several components:

output_model

Output for trained decision tree (DecisionTreeModel).

predictions

Class predictions for each test point (integer row).

probabilities

Class probabilities for each test point (numeric matrix).

Author(s)

mlpack developers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# For example, to train a decision tree with a minimum leaf size of 20 on the
# dataset contained in "data" with labels "labels", saving the output model
# to "tree" and printing the training error, one could call

## Not run: 
output <- decision_tree(training=data, labels=labels, minimum_leaf_size=20,
  minimum_gain_split=0.001, print_training_accuracy=TRUE)
tree <- output$output_model

## End(Not run)

# Then, to use that model to classify points in "test_set" and print the test
# error given the labels "test_labels" using that model, while saving the
# predictions for each point to "predictions", one could call 

## Not run: 
output <- decision_tree(input_model=tree, test=test_set,
  test_labels=test_labels)
predictions <- output$predictions

## End(Not run)

mlpack documentation built on Dec. 19, 2020, 1:06 a.m.

Related to decision_tree in mlpack...