rrfc2: Roughenen Random Forests - C2 (RRFC2)

Description Usage Arguments Value Author(s) References See Also Examples

Description

RRFC2 algorithm

1.Impose missing values under the mechanism of missing completely at random on all covariates of the training dataset.

2.Impute the missing values in a continuous variable by its minimum value and impute the missing values in a categorical variable by its mode value (Minimum-value /mode imputation).

3.Build one tree in random forests using the above imputed training dataset, and then use it to predict the binary outcomes in the original testing dataset.

4.Repeat 1 to 3 for number.trees times.

Usage

1
rrfc2(dat, yvar = ncol(dat), tr, te, mispct, number.trees)

Arguments

dat

A data frame containing both training and testing datasets

yvar

The column number of the binary outcome variable, a factor variable. The default value is set as ncol(dat)

tr

Row numbers of all training data

te

Row numbers of all testing data

mispct

Rate of missing data, ranging from 0 to 1

number.trees

Number of trees used in roughened random forests

Value

A prediction matrix. Each column shows the predicted values by a single tree. Each row is sequentially associated with the observations in the testing dataset. Each cell value is either 0 or 1.

Author(s)

Kuangnan Xiong

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Liaw, A. & Wiener, M., 2002. Classification and regression by randomForest. R News, 2(3), pp. 18-22.

Xiong, Kuangnan. "Roughened Random Forests for Binary Classification." PhD diss., State University of New York at Albany, 2014.

See Also

rrfa, rrfb, rrfc1,rrfc3, rrfc4, rrfc5, rrfc6, rrfc7, rrfd, rrfe

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
if(require(MASS)){
if(require(caTools)){

dat=rbind(Pima.tr,Pima.te)
number.trees=50
#number.trees=500
tr=1:200
te=201:532
mispct=0.2
yvar=ncol(dat)
  
#AUC value for the testing dataset based on the original random forests
rf=randomForest(dat[tr,-yvar],dat[tr,yvar],dat[te,-yvar],ntree=number.trees)
print(colAUC(rf$test$votes[,2],dat[te,yvar]))

#AUC value for the testing dataset based on RRFC2
pred.rrfc2=rrfc2(dat,yvar,tr,te,mispct,number.trees)
print(colAUC(apply(pred.rrfc2$pred,1,mean),dat[te,yvar]))

}}


Search within the roughrf package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.