beset_rf | R Documentation |
beset_rf
is a wrapper to randomForest
that
estimates predictive performance of the random forest using repeated k-fold
cross-validation. beset_rf
insures that the correct arguments are
provided to randomForest
and that enough
information is retained for compatibility with beset
methods such as
variable importance
and partial dependence
.
beset_rf(
form,
data,
n_trees = 500,
sample_rate = 1 - exp(-1),
mtry = NULL,
min_obs_in_node = NULL,
n_folds = 10,
n_reps = 10,
seed = 42,
class_wt = NULL,
cutoff = NULL,
strata = NULL,
parallel_type = NULL,
n_cores = NULL,
cl = NULL
)
## S3 method for class 'beset_rf'
plot(x, metric = c("auto", "mse", "rsq", "err.rate"), ...)
form |
A model |
data |
Either a |
n_trees |
Number of trees. Defaults to 500. |
sample_rate |
Row sample rate per tree (from |
mtry |
(Optional) |
min_obs_in_node |
(Optional) |
n_folds |
|
n_reps |
|
seed |
|
class_wt |
Priors of the classes. Ignored for regression. |
cutoff |
(Classification only) A vector of length equal to number of classes. The ‘winning’ class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes (i.e., majority vote wins). |
strata |
A (factor) variable that is used for stratified sampling. |
parallel_type |
(Optional) character string indicating the type of
parallel operation to be used, either |
n_cores |
Integer value indicating the number of workers to run in
parallel during subset search and cross-validation. By default, this will
be set to one fewer than the maximum number of physical cores you have
available, as indicated by |
cl |
(Optional) |
x |
A |
metric |
Prediction metric to plot. Options are mean squared error
( |
... |
optional parameters to be passed to the low level function
|
A "beset_rf" object with the following components:
list of "randomForest" objects for each fold and repetition
a "cross_valid" object giving cross-validation metrics
the data frame used to train random forest
plot(beset_rf)
: Plot OOB and holdout MSE, R-squared, or error rate as a
function of number of trees in forest
# Using default 10 X 10 repeated k-fold cross-validation
data("prostate", package = "beset")
rf <- beset_rf(tumor ~ ., data = prostate)
summary(rf)
plot(rf)
# Using a single independent test set instead of cross-validation
inTrain <- sample.int(nrow(prostate), nrow(prostate)/2)
data <- data_partition(
train = prostate[inTrain,], test = prostate[-inTrain,], y = "tumor"
)
rf <- beset_rf(tumor ~ ., data = data)
summary(rf)
plot(rf)
# Example with continuous outcome
rf <- beset_rf(gleason ~ ., data = data)
summary(rf)
plot(rf)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.