Description Usage Arguments Details Engine Details Note See Also Examples
rand_forest()
is a way to generate a specification of a model
before fitting and allows the model to be created using
different packages in R or via Spark. The main arguments for the
model are:
mtry
: The number of predictors that will be
randomly sampled at each split when creating the tree models.
trees
: The number of trees contained in the ensemble.
min_n
: The minimum number of data points in a node
that are required for the node to be split further.
These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using set_engine()
. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions. If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
1 2 3 4 5 6 7 8 9 10 11 12 
mode 
A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification". 
mtry 
An integer for the number of predictors that will be randomly sampled at each split when creating the tree models. 
trees 
An integer for the number of trees contained in the ensemble. 
min_n 
An integer for the minimum number of data points in a node that are required for the node to be split further. 
object 
A random forest model specification. 
parameters 
A 1row tibble or named list with main
parameters to update. If the individual arguments are used,
these will supersede the values in 
fresh 
A logical for whether the arguments should be modified inplace of or replaced wholesale. 
... 
Not used for 
The model can be created using the fit()
function using the
following engines:
R: "ranger"
(the default) or "randomForest"
Spark: "spark"
Engines may have preset default arguments when executing the model fit call. For this type of model, the template of the fit calls are below:
rand_forest() %>% set_engine("ranger") %>% set_mode("regression") %>% translate()
1 2 3 4 5 6 7 8  ## Random Forest Model Specification (regression)
##
## Computational engine: ranger
##
## Model fit template:
## ranger::ranger(formula = missing_arg(), data = missing_arg(),
## case.weights = missing_arg(), num.threads = 1, verbose = FALSE,
## seed = sample.int(10^5, 1))

rand_forest() %>% set_engine("ranger") %>% set_mode("classification") %>% translate()
1 2 3 4 5 6 7 8  ## Random Forest Model Specification (classification)
##
## Computational engine: ranger
##
## Model fit template:
## ranger::ranger(formula = missing_arg(), data = missing_arg(),
## case.weights = missing_arg(), num.threads = 1, verbose = FALSE,
## seed = sample.int(10^5, 1), probability = TRUE)

Note that ranger::ranger()
does not require
factor predictors to be converted to indicator variables. fit()
does
not affect the encoding of the predictor values (i.e. factors stay
factors) for this model.
For ranger
confidence intervals, the intervals are constructed using
the form estimate +/ z * std_error
. For classification probabilities,
these values can fall outside of [0, 1]
and will be coerced to be in
this range.
rand_forest() %>% set_engine("randomForest") %>% set_mode("regression") %>% translate()
1 2 3 4 5 6  ## Random Forest Model Specification (regression)
##
## Computational engine: randomForest
##
## Model fit template:
## randomForest::randomForest(x = missing_arg(), y = missing_arg())

rand_forest() %>% set_engine("randomForest") %>% set_mode("classification") %>% translate()
1 2 3 4 5 6  ## Random Forest Model Specification (classification)
##
## Computational engine: randomForest
##
## Model fit template:
## randomForest::randomForest(x = missing_arg(), y = missing_arg())

Note that
randomForest::randomForest()
does
not require factor predictors to be converted to indicator variables.
fit()
does not affect the encoding of the predictor values
(i.e. factors stay factors) for this model.
rand_forest() %>% set_engine("spark") %>% set_mode("regression") %>% translate()
1 2 3 4 5 6 7  ## Random Forest Model Specification (regression)
##
## Computational engine: spark
##
## Model fit template:
## sparklyr::ml_random_forest(x = missing_arg(), formula = missing_arg(),
## type = "regression", seed = sample.int(10^5, 1))

rand_forest() %>% set_engine("spark") %>% set_mode("classification") %>% translate()
1 2 3 4 5 6 7  ## Random Forest Model Specification (classification)
##
## Computational engine: spark
##
## Model fit template:
## sparklyr::ml_random_forest(x = missing_arg(), formula = missing_arg(),
## type = "classification", seed = sample.int(10^5, 1))

fit()
does not affect the encoding of the predictor values
(i.e. factors stay factors) for this model.
The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.
parsnip  ranger  randomForest  spark 
mtry  mtry (see below)  mtry (see below)  feature_subset_strategy (see below) 
trees  num.trees (500)  ntree (500)  num_trees (20) 
min_n  min.node.size (see below)  nodesize (see below)  min_instances_per_node (1) 
For randomForest and spark, the default mtry
is the square root of
the number of predictors for classification, and onethird of the
predictors for regression.
For ranger, the default mtry
is the square root of the number of
predictors.
The default min_n
for both ranger and randomForest is 1 for
classification and 5 for regression.
For models created using the spark engine, there are
several differences to consider. First, only the formula
interface to via fit()
is available; using fit_xy()
will
generate an error. Second, the predictions will always be in a
spark table format. The names will be the same as documented but
without the dots. Third, there is no equivalent to factor
columns in spark tables so class predictions are returned as
character columns. Fourth, to retain the model object for a new
R session (via save
), the model$fit
element of the parsnip
object should be serialized via ml_save(object$fit)
and
separately saved to disk. In a new session, the object can be
reloaded and reattached to the parsnip
object.
1 2 3 4 5 6 7  rand_forest(mode = "classification", trees = 2000)
# Parameters can be represented by a placeholder:
rand_forest(mode = "regression", mtry = varying())
model < rand_forest(mtry = 10, min_n = 3)
model
update(model, mtry = 1)
update(model, mtry = 1, fresh = TRUE)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.