cuml_rand_forest: Train a random forest model.

Description Usage Arguments Value Examples

View source: R/rand_forest.R

Description

Train a random forest model for classification or regression tasks.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
cuml_rand_forest(x, ...)

## Default S3 method:
cuml_rand_forest(x, ...)

## S3 method for class 'data.frame'
cuml_rand_forest(
  x,
  y,
  mtry = NULL,
  trees = NULL,
  min_n = 2L,
  bootstrap = TRUE,
  max_depth = 16L,
  max_leaves = Inf,
  max_predictors_per_note_split = NULL,
  n_bins = 128L,
  min_samples_leaf = 1L,
  split_criterion = NULL,
  min_impurity_decrease = 0,
  max_batch_size = 128L,
  n_streams = 8L,
  cuml_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
  ...
)

## S3 method for class 'matrix'
cuml_rand_forest(
  x,
  y,
  mtry = NULL,
  trees = NULL,
  min_n = 2L,
  bootstrap = TRUE,
  max_depth = 16L,
  max_leaves = Inf,
  max_predictors_per_note_split = NULL,
  n_bins = 128L,
  min_samples_leaf = 1L,
  split_criterion = NULL,
  min_impurity_decrease = 0,
  max_batch_size = 128L,
  n_streams = 8L,
  cuml_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
  ...
)

## S3 method for class 'formula'
cuml_rand_forest(
  formula,
  data,
  mtry = NULL,
  trees = NULL,
  min_n = 2L,
  bootstrap = TRUE,
  max_depth = 16L,
  max_leaves = Inf,
  max_predictors_per_note_split = NULL,
  n_bins = 128L,
  min_samples_leaf = 1L,
  split_criterion = NULL,
  min_impurity_decrease = 0,
  max_batch_size = 128L,
  n_streams = 8L,
  cuml_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
  ...
)

## S3 method for class 'recipe'
cuml_rand_forest(
  x,
  data,
  mtry = NULL,
  trees = NULL,
  min_n = 2L,
  bootstrap = TRUE,
  max_depth = 16L,
  max_leaves = Inf,
  max_predictors_per_note_split = NULL,
  n_bins = 128L,
  min_samples_leaf = 1L,
  split_criterion = NULL,
  min_impurity_decrease = 0,
  max_batch_size = 128L,
  n_streams = 8L,
  cuml_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"),
  ...
)

Arguments

x

Depending on the context:

* A __data frame__ of predictors. * A __matrix__ of predictors. * A __recipe__ specifying a set of preprocessing steps * created from [recipes::recipe()]. * A __formula__ specifying the predictors and the outcome.

...

Optional arguments; currently unused.

y

A numeric vector (for regression) or factor (for classification) of desired responses.

mtry

The number of predictors that will be randomly sampled at each split when creating the tree models. Default: the square root of the total number of predictors.

trees

An integer for the number of trees contained in the ensemble. Default: 100L.

min_n

An integer for the minimum number of data points in a node that are required for the node to be split further. Default: 2L.

bootstrap

Whether to perform bootstrap. If TRUE, each tree in the forest is built on a bootstrapped sample with replacement. If FALSE, the whole dataset is used to build each tree.

max_depth

Maximum tree depth. Default: 16L.

max_leaves

Maximum leaf nodes per tree. Soft constraint. Default: Inf (unlimited).

max_predictors_per_note_split

Number of predictor to consider per node split. Default: square root of the total number predictors.

n_bins

Number of bins used by the split algorithm. Default: 128L.

min_samples_leaf

The minimum number of data points in each leaf node. Default: 1L.

split_criterion

The criterion used to split nodes, can be "gini" or "entropy" for classifications, and "mse" or "mae" for regressions. Default: "gini" for classification; "mse" for regression.

min_impurity_decrease

Minimum decrease in impurity requried for node to be spilt. Default: 0.

max_batch_size

Maximum number of nodes that can be processed in a given batch. Default: 128L.

n_streams

Number of CUDA streams to use for building trees. Default: 8L.

cuml_log_level

Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off.

formula

A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side.

data

When a __recipe__ or __formula__ is used, data is specified as a __data frame__ containing the predictors and (if applicable) the outcome.

Value

A random forest classifier / regressor object that can be used with the 'predict' S3 generic to make predictions on new data points.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
library(cuml)

# Classification

model <- cuml_rand_forest(
  formula = Species ~ .,
  data = iris,
  trees = 100
)

predictions <- predict(model, iris[-which(names(iris) == "Species")])

# Regression

model <- cuml_rand_forest(
  formula = mpg ~ .,
  data = mtcars,
  trees = 100
)

predictions <- predict(model, mtcars[-which(names(mtcars) == "mpg")])

cuml documentation built on Sept. 21, 2021, 1:06 a.m.

Related to cuml_rand_forest in cuml...