Roughenen Random Forests - C2 (RRFC2)

Description

RRFC2 algorithm

1.Impose missing values under the mechanism of missing completely at random on all covariates of the training dataset.

2.Impute the missing values in a continuous variable by its minimum value and impute the missing values in a categorical variable by its mode value (Minimum-value /mode imputation).

3.Build one tree in random forests using the above imputed training dataset, and then use it to predict the binary outcomes in the original testing dataset.

4.Repeat 1 to 3 for number.trees times.

Usage

1
rrfc2(dat, yvar = ncol(dat), tr, te, mispct, number.trees)

Arguments

dat

A data frame containing both training and testing datasets

yvar

The column number of the binary outcome variable, a factor variable. The default value is set as ncol(dat)

tr

Row numbers of all training data

te

Row numbers of all testing data

mispct

Rate of missing data, ranging from 0 to 1

number.trees

Number of trees used in roughened random forests

Value

A prediction matrix. Each column shows the predicted values by a single tree. Each row is sequentially associated with the observations in the testing dataset. Each cell value is either 0 or 1.

Author(s)

Kuangnan Xiong

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Liaw, A. & Wiener, M., 2002. Classification and regression by randomForest. R News, 2(3), pp. 18-22.

Xiong, Kuangnan. "Roughened Random Forests for Binary Classification." PhD diss., State University of New York at Albany, 2014.

See Also

rrfa, rrfb, rrfc1,rrfc3, rrfc4, rrfc5, rrfc6, rrfc7, rrfd, rrfe

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
if(require(MASS)){
if(require(caTools)){

dat=rbind(Pima.tr,Pima.te)
number.trees=50
#number.trees=500
tr=1:200
te=201:532
mispct=0.2
yvar=ncol(dat)
  
#AUC value for the testing dataset based on the original random forests
rf=randomForest(dat[tr,-yvar],dat[tr,yvar],dat[te,-yvar],ntree=number.trees)
print(colAUC(rf$test$votes[,2],dat[te,yvar]))

#AUC value for the testing dataset based on RRFC2
pred.rrfc2=rrfc2(dat,yvar,tr,te,mispct,number.trees)
print(colAUC(apply(pred.rrfc2$pred,1,mean),dat[te,yvar]))

}}