Description Usage Arguments Value Examples
Train a random forest model for classification or regression tasks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | cuml_rand_forest(
x,
y = NULL,
formula = NULL,
mode = c("classification", "regression"),
mtry = NULL,
trees = NULL,
min_n = NULL,
bootstrap = TRUE,
max_depth = 16,
max_leaves = -1,
max_predictors_per_note_split = NULL,
n_bins = 128,
min_samples_leaf = 1,
split_criterion = NULL,
min_impurity_decrease = 0,
max_batch_size = 128,
n_streams = 8,
cuml_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace")
)
|
x |
The input matrix or dataframe. Each data point should be a row and should consist of numeric values only. |
y |
A numeric vector of desired responses. |
formula |
If 'x' is a dataframe, then a R formula syntax of the form '<response col> ~ .' or '<response col> ~ <predictor 1> + <predictor 2> + ...' may be used to specify the response column and the predictor column(s). |
mode |
Type of task to perform. Should be either "classification" or "regression". |
mtry |
The number of predictors that will be randomly sampled at each split when creating the tree models. Default: the square root of the total number of predictors. |
trees |
An integer for the number of trees contained in the ensemble. Default: 100. |
min_n |
An integer for the minimum number of data points in a node that are required for the node to be split further. Default: 2. |
bootstrap |
Whether to perform bootstrap. If TRUE, each tree in the forest is built on a bootstrapped sample with replacement. If FALSE, the whole dataset is used to build each tree. |
max_depth |
Maximum tree depth. Default: 16. |
max_leaves |
Maximum leaf nodes per tree. Soft constraint. Default: -1 (unlimited). |
max_predictors_per_note_split |
Number of predictor to consider per node split. Default: square root of the total number predictors. |
n_bins |
Number of bins used by the split algorithm. Default: 128. |
min_samples_leaf |
The minimum number of data points in each leaf node. Default: 1. |
split_criterion |
The criterion used to split nodes, can be "gini" or "entropy" for classifications, and "mse" or "mae" for regressions. Default: "gini" for classification; "mse" for regression. |
min_impurity_decrease |
Minimum decrease in impurity requried for node to be spilt. Default: 0. |
max_batch_size |
Maximum number of nodes that can be processed in a given batch. Default: 128. |
n_streams |
Number of CUDA streams to use for building trees. Default: 8. |
cuml_log_level |
Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off. |
A random forest classifier / regressor object that can be used with the 'predict' S3 generic to make predictions on new data points.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | library(cuml4r)
# Classification
model <- cuml_rand_forest(
iris,
formula = Species ~ .,
mode = "classification",
trees = 100
)
predictions <- predict(model, iris)
print(predictions)
cat(
"Number of correct predictions: ",
sum(predictions == iris[, "Species"]),
"\n"
)
# Regression
model <- cuml_rand_forest(
iris,
formula = Species ~ .,
mode = "regression",
trees = 100
)
predictions <- predict(model, iris)
print(predictions)
print(round(predictions))
cat(
"Number of correct predictions: ",
sum(as.integer(round(predictions)) == as.integer(iris[, "Species"])),
"\n"
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.