RandomForestClassifier: Random Forest Classifier
In Mthrun/Classifiers: Common Supervised Machine Learning Algorithms

View source: R/RandomForestClassifier.R

RandomForestClassifier

R Documentation

Random Forest Classifier

Description

Training of a random forest classification, if required also predicts classification on test data

Usage

RandomForestClassifier(TrainData, TrainCls, TestData, Names,
NumberOfTrees=500, VariableImportance=TRUE, Seed,
PlotIt=TRUE, Verbose=FALSE, ABCanalysis=FALSE, Fast=FALSE)

Arguments

`TrainData`	(1:n,1:d) matrix, data Array of n cases withd variables of TrainData or Full data
`TrainCls`	vector, Array of variable names
`TestData`	Optional, (1:m,1:d) matrix, data Array of d cases with n variables of TestData
`Names`	Optional, (1:d) vector, Array of variable names, if not given, colnames are used
`NumberOfTrees`	Number of trees to grow. This should not be set to a too small number, to enshure that every input row gets predicted at least a few times
`VariableImportance`	Should importance of predictors be assessed?
`Seed`	set a seed for randomization
`PlotIt`	wether to plot error versus trees
`Verbose`	whether to show results of forest, default FALSE
`ABCanalysis`	Defualt FALSE, If TRUE select indizes of group A of feautures with highest accuracy values by computed ABCanalysis, only if VariableImportance==TRUE
`Fast`	default FALSE ( randomForest is used. TRUE: randomForestSRC is used but VariableImportance and ABCanalysis is set FALSE, because its not implemented for this case.

Details

reiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification. The “local” (or casewise) variable importance is computed as follows: For classification, it is the increase in percent of times a case is OOB and misclassified when the variable is permuted, in more detail:

the prediction error on the out-of-bag portion of the TrainData is recorded (error rate for each tree). Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences. If the standard deviation of the differences is equal to 0 for a variable, the division is not done (but the average is almost always equal to 0 in that case).

Value

list V with

`Classification`	[1:n], classification by randomForest of k clusters
`ImportancePerVariable`	[1:d,1:2], Importance of Variables by Gini and Accuracy
`ContingencyTable`	Vergleich TrainCls zu Classification
`ImportancePerClass`	Importance of Variables per Cluster j of k clusters of TrainCls
`Forest`	Object of randomForest
`MostImportantFeatures`	Feautures Index of the first m most-important features defined by group A ob ABCanalysis
`TestCls`	Null if no TestData Given, [1:m] vector of k classes otherwise

Author(s)

Michael Thrun

References

Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.

Examples

library(FCPS)
data("Chainlink")
Data=Chainlink$Data
Cls=Chainlink$Cls
split=Classifiers::splitquoted(Data,Cls,Percentage = 80)
out=RandomForestClassifier(split$TrainData,TrainCls = split$TrainCls,TestData = split$TestData)
table(out$TestCls,split$TestCls)

Mthrun/Classifiers documentation built on June 28, 2023, 9:28 a.m.