Description Usage Arguments Value Author(s) Examples
This function builds a random forest classifier, predicts class values for the unknown data set, and returns error rates and confusion matrices for the known and unknown data sets.
1 |
known |
A data set with known classes, used to classify the unknown data set. Defaults to NULL. |
unknown |
A data set whose classes are considered unknown. Classes will be predicted for this data set. Defaults to NULL. |
ctrl |
A trainControl statement from the caret package. Defaults to NULL. |
grid |
A grid for the tuneGrid parameter in the train (caret) function. Defaults to NULL. |
keeps |
A vector of feature names to consider in the model (must include 'class'). Defaults to NULL. |
samps |
A vector of sample sizes by class for the sampsize random forest argument. |
A list containing the following components:
x$model = Random forest model object (created using caret pckg).
x$classPred = Predicted class values for unknown data set.
x$conf_matrix_known = Confusion matrix for cross-validated model (on training set).
x$result = Accuracy for training model.
x$unknown.error = Error rate for applying model to unknown data.
x$conf_matrix_unknown = Confusion matrix for applying model to unknown data.
Jennifer Starling
1 2 3 4 5 6 | ## Define ctrl object.
c <- trainControl(method='cv',number=5,classProbs=F)
## Define list of features to keep, including 'class' as the first feature.
features <- c('class','feature1','feature2','feature3')
## Known and Unknown data sets must contain a 'class' column.
model <- myRF(known=labeled_data_set, unknown=unlabeled_data_set, ctrl=c,keeps=features)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.