RandomForestClassifier: Random Forest Classifier

View source: R/RandomForestClassifier.R

RandomForestClassifierR Documentation

Random Forest Classifier

Description

Training of a random forest classification, if required also predicts classification on test data

Usage

RandomForestClassifier(TrainData, TrainCls, TestData, Names,
NumberOfTrees=500, VariableImportance=TRUE, Seed,
PlotIt=TRUE, Verbose=FALSE, ABCanalysis=FALSE, Fast=FALSE)

Arguments

TrainData

(1:n,1:d) matrix, data Array of n cases withd variables of TrainData or Full data

TrainCls

vector, Array of variable names

TestData

Optional, (1:m,1:d) matrix, data Array of d cases with n variables of TestData

Names

Optional, (1:d) vector, Array of variable names, if not given, colnames are used

NumberOfTrees

Number of trees to grow. This should not be set to a too small number, to enshure that every input row gets predicted at least a few times

VariableImportance

Should importance of predictors be assessed?

Seed

set a seed for randomization

PlotIt

wether to plot error versus trees

Verbose

whether to show results of forest, default FALSE

ABCanalysis

Defualt FALSE, If TRUE select indizes of group A of feautures with highest accuracy values by computed ABCanalysis, only if VariableImportance==TRUE

Fast

default FALSE ( randomForest is used. TRUE: randomForestSRC is used but VariableImportance and ABCanalysis is set FALSE, because its not implemented for this case.

Details

reiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification. The “local” (or casewise) variable importance is computed as follows: For classification, it is the increase in percent of times a case is OOB and misclassified when the variable is permuted, in more detail:

the prediction error on the out-of-bag portion of the TrainData is recorded (error rate for each tree). Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences. If the standard deviation of the differences is equal to 0 for a variable, the division is not done (but the average is almost always equal to 0 in that case).

Value

list V with

Classification

[1:n], classification by randomForest of k clusters

ImportancePerVariable

[1:d,1:2], Importance of Variables by Gini and Accuracy

ContingencyTable

Vergleich TrainCls zu Classification

ImportancePerClass

Importance of Variables per Cluster j of k clusters of TrainCls

Forest

Object of randomForest

MostImportantFeatures

Feautures Index of the first m most-important features defined by group A ob ABCanalysis

TestCls

Null if no TestData Given, [1:m] vector of k classes otherwise

Author(s)

Michael Thrun

References

Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.

Examples

library(FCPS)
data("Chainlink")
Data=Chainlink$Data
Cls=Chainlink$Cls
split=Classifiers::splitquoted(Data,Cls,Percentage = 80)
out=RandomForestClassifier(split$TrainData,TrainCls = split$TrainCls,TestData = split$TestData)
table(out$TestCls,split$TestCls)

Mthrun/Classifiers documentation built on June 28, 2023, 9:28 a.m.