rrfc2: Roughenen Random Forests - C2 (RRFC2)
In roughrf: Roughened Random Forests for Binary Classification

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/rrfc2.R

RRFC2 algorithm

1.Impose missing values under the mechanism of missing completely at random on all covariates of the training dataset.

2.Impute the missing values in a continuous variable by its minimum value and impute the missing values in a categorical variable by its mode value (Minimum-value /mode imputation).

3.Build one tree in random forests using the above imputed training dataset, and then use it to predict the binary outcomes in the original testing dataset.

4.Repeat 1 to 3 for number.trees times.

1	rrfc2(dat, yvar = ncol(dat), tr, te, mispct, number.trees)

`dat`	A data frame containing both training and testing datasets
`yvar`	The column number of the binary outcome variable, a factor variable. The default value is set as ncol(dat)
`tr`	Row numbers of all training data
`te`	Row numbers of all testing data
`mispct`	Rate of missing data, ranging from 0 to 1
`number.trees`	Number of trees used in roughened random forests

A prediction matrix. Each column shows the predicted values by a single tree. Each row is sequentially associated with the observations in the testing dataset. Each cell value is either 0 or 1.

Kuangnan Xiong

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Liaw, A. & Wiener, M., 2002. Classification and regression by randomForest. R News, 2(3), pp. 18-22.

Xiong, Kuangnan. "Roughened Random Forests for Binary Classification." PhD diss., State University of New York at Albany, 2014.

rrfa, rrfb, rrfc1,rrfc3, rrfc4, rrfc5, rrfc6, rrfc7, rrfd, rrfe

if(require(MASS)){
if(require(caTools)){

dat=rbind(Pima.tr,Pima.te)
number.trees=50
#number.trees=500
tr=1:200
te=201:532
mispct=0.2
yvar=ncol(dat)
  
#AUC value for the testing dataset based on the original random forests
rf=randomForest(dat[tr,-yvar],dat[tr,yvar],dat[te,-yvar],ntree=number.trees)
print(colAUC(rf$test$votes[,2],dat[te,yvar]))

#AUC value for the testing dataset based on RRFC2
pred.rrfc2=rrfc2(dat,yvar,tr,te,mispct,number.trees)
print(colAUC(apply(pred.rrfc2$pred,1,mean),dat[te,yvar]))

}}