View source: R/RandomForestClassifier.R
RandomForestClassifier | R Documentation |
Training of a random forest classification, if required also predicts classification on test data
RandomForestClassifier(TrainData, TrainCls, TestData, Names,
NumberOfTrees=500, VariableImportance=TRUE, Seed,
PlotIt=TRUE, Verbose=FALSE, ABCanalysis=FALSE, Fast=FALSE)
TrainData |
(1:n,1:d) matrix, data Array of n cases withd variables of TrainData or Full data |
TrainCls |
vector, Array of variable names |
TestData |
Optional, (1:m,1:d) matrix, data Array of d cases with n variables of TestData |
Names |
Optional, (1:d) vector, Array of variable names, if not given, colnames are used |
NumberOfTrees |
Number of trees to grow. This should not be set to a too small number, to enshure that every input row gets predicted at least a few times |
VariableImportance |
Should importance of predictors be assessed? |
Seed |
set a seed for randomization |
PlotIt |
wether to plot error versus trees |
Verbose |
whether to show results of forest, default FALSE |
ABCanalysis |
Defualt FALSE, If TRUE select indizes of group A of feautures with highest accuracy values by computed ABCanalysis, only if VariableImportance==TRUE |
Fast |
default FALSE ( randomForest is used. TRUE: randomForestSRC is used but VariableImportance and ABCanalysis is set FALSE, because its not implemented for this case. |
reiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification. The “local” (or casewise) variable importance is computed as follows: For classification, it is the increase in percent of times a case is OOB and misclassified when the variable is permuted, in more detail:
the prediction error on the out-of-bag portion of the TrainData is recorded (error rate for each tree). Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences. If the standard deviation of the differences is equal to 0 for a variable, the division is not done (but the average is almost always equal to 0 in that case).
list V with
Classification |
[1:n], classification by randomForest of k clusters |
ImportancePerVariable |
[1:d,1:2], Importance of Variables by Gini and Accuracy |
ContingencyTable |
Vergleich TrainCls zu Classification |
ImportancePerClass |
Importance of Variables per Cluster j of k clusters of TrainCls |
Forest |
Object of randomForest |
MostImportantFeatures |
Feautures Index of the first m most-important features defined by group A ob ABCanalysis |
TestCls |
Null if no TestData Given, [1:m] vector of k classes otherwise |
Michael Thrun
Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.
library(FCPS)
data("Chainlink")
Data=Chainlink$Data
Cls=Chainlink$Cls
split=Classifiers::splitquoted(Data,Cls,Percentage = 80)
out=RandomForestClassifier(split$TrainData,TrainCls = split$TrainCls,TestData = split$TestData)
table(out$TestCls,split$TestCls)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.