rf_tidiers: Tidying methods for a randomForest model

augment.randomForestR Documentation

Tidying methods for a randomForest model

Description

These methods tidy the variable importance of a random forest model summary, augment the original data with information on the fitted values/classifications and error, and construct a one-row glance of the model's statistics.

Usage

## S3 method for class 'randomForest'
augment(x, data = NULL, ...)

## S3 method for class 'randomForest'
glance(x, ...)

## S3 method for class 'randomForest'
tidy(x, ...)

Arguments

x

randomForest object

data

Model data for use by augment.randomForest().

...

Additional arguments (ignored)

Value

augment.randomForest returns the original data with additional columns:

.oob_times

The number of trees for which the given case was "out of bag". See randomForest::randomForest() for more details.

.fitted

The fitted value or class.

augment returns additional columns for classification and usupervised trees:

.votes

For each case, the voting results, with one column per class.

.local_var_imp

The casewise variable importance, stored as data frames in a nested list-column, with one row per variable in the model. Only present if the model was created with importance = TRUE

glance.randomForest returns a data.frame with the following columns for regression trees:

mse

The average mean squared error across all trees.

rsq

The average pesudo-R-squared across all trees. See randomForest::randomForest() for more information.

For classification trees: one row per class, with the following columns:

precision
recall
accuracy
f_measure

All tidying methods return a data.frame without rownames. The structure depends on the method chosen.

tidy.randomForest returns one row for each model term, with the following columns:

term

The term in the randomForest model

MeanDecreaseAccuracy

A measure of variable importance. See randomForest::randomForest() for more information. Only present if the model was created with importance = TRUE

MeanDecreaseGini

A measure of variable importance. See randomForest::randomForest() for more information.

MeanDecreaseAccuracy_sd

Standard deviation of MeanDecreaseAccuracy. See randomForest::randomForest() for more information. Only present if the model was created with importance = TRUE

classwise_importance

Classwise variable importance for each term, stored as data frames in a nested list-column, with one row per class. Only present if the model was created with importance = TRUE


njtierney/broomstick documentation built on Dec. 12, 2023, 5:08 a.m.