myRF: myRF: A Random Forest function

Description Usage Arguments Value Author(s) Examples

Description

This function builds a random forest classifier, predicts class values for the unknown data set, and returns error rates and confusion matrices for the known and unknown data sets.

Usage

1
myRF(known, unknown, ctrl, grid, keeps, sampsize = NULL)

Arguments

known

A data set with known classes, used to classify the unknown data set. Defaults to NULL.

unknown

A data set whose classes are considered unknown. Classes will be predicted for this data set. Defaults to NULL.

ctrl

A trainControl statement from the caret package. Defaults to NULL.

grid

A grid for the tuneGrid parameter in the train (caret) function. Defaults to NULL.

keeps

A vector of feature names to consider in the model (must include 'class'). Defaults to NULL.

samps

A vector of sample sizes by class for the sampsize random forest argument.

Value

A list containing the following components:

x$model = Random forest model object (created using caret pckg).

x$classPred = Predicted class values for unknown data set.

x$conf_matrix_known = Confusion matrix for cross-validated model (on training set).

x$result = Accuracy for training model.

x$unknown.error = Error rate for applying model to unknown data.

x$conf_matrix_unknown = Confusion matrix for applying model to unknown data.

Author(s)

Jennifer Starling

Examples

1
2
3
4
5
6
## Define ctrl object.
c <- trainControl(method='cv',number=5,classProbs=F)
## Define list of features to keep, including 'class' as the first feature.
features <- c('class','feature1','feature2','feature3')
## Known and Unknown data sets must contain a 'class' column.
model <- myRF(known=labeled_data_set, unknown=unlabeled_data_set, ctrl=c,keeps=features)

jstarling1/varstar documentation built on May 20, 2019, 2:12 a.m.