plot.varSelRFBoot: plot a varSelRFBoot object
In varSelRF: Variable Selection using Random Forests

Description Usage Arguments Value Warning Note Author(s) References See Also Examples

View source: R/varSelRF.R

Plots of out-of-bag predictions and OOB error vs. number of variables.

## S3 method for class 'varSelRFBoot'
plot(x,  oobProb = TRUE,
                  oobProbBoxPlot = FALSE,
                  ErrorNum = TRUE,
                  subject.names = NULL,
                  class.to.plot = NULL,...)

`x`	An object of class varSelRFBoot, such as returned by function `varSelRFBoot`.
`oobProb`	If TRUE plot (average) out-of-bag predictions. See `prob.predictions` in `varSelRFBoot` for more details about the out-of-bag predictions.
`oobProbBoxPlot`	If TRUE plot a box-plot of out-of-bag predictions.
`ErrorNum`	If TRUE plot OOB error (as returned by random forest) vs. the number of variables.
`subject.names`	If not NULL, a vector, of the same length as the number of cases (samples or subjects) with IDs for the cases/samples/subjects, that will be shown to the left of the average out-of-bag prediction.
`class.to.plot`	If not NULL, an integer or a vector of integers. These integers are those class levels for which out-of-bag predictions plots will be returned.
`...`	Not used.

This function is only used for its side effects of producing plots.

The OOB Error rate is biased down (and can be severely biased down) because we do (potentially many) rounds of reducing the set of predictor variables until we minimize this OOB error rate. Note, however, that this is NOT the error rate reported as the estimate of the error rate for the procedure (for which we use the .632+ bootstrap rule).

When plotting the out-of-bag predictions, we show one plot for each class. This is an overkill for two-class problems, but not necessarily for problems with more than two classes. Use class.to.plot to plot only those classes that interest you.

Ramon Diaz-Uriarte rdiaz02@gmail.com

Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.

Diaz-Uriarte, R. and Alvarez de Andres, S. (2005) Variable selection from random forests: application to gene expression data. Tech. report. http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html

Efron, B. & Tibshirani, R. J. (1997) Improvements on cross-validation: the .632+ bootstrap method. J. American Statistical Association, 92, 548–560.

randomForest, varSelRF, summary.varSelRFBoot, varSelRFBoot

## Not run: 
## This is a small example, but can take some time.

x <- matrix(rnorm(25 * 30), ncol = 30)
x[1:10, 1:2] <- x[1:10, 1:2] + 2
cl <- factor(c(rep("A", 10), rep("B", 15)))  

rf.vs1 <- varSelRF(x, cl, ntree = 200, ntreeIterat = 100,
                   vars.drop.frac = 0.2)
rf.vsb <- varSelRFBoot(x, cl,
                       bootnumber = 10,
                       usingCluster = FALSE,
                       srf = rf.vs1)
rf.vsb
summary(rf.vsb)
plot(rf.vsb)

## End(Not run)

Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
Loading required package: parallel
Warning in varSelRFBoot(x, cl, bootnumber = 10, usingCluster = FALSE, srf = rf.vs1) :
  Using as ntree and mtryFactor the parameters obtained from srf

      Running bootstrap iterations..........

     .632+ prediction error  0.1247 


 Variable selection with random forest 
 ------------------------------

 Variables used 
[1] "v1" "v2" "v8"

 
 Number of variables used:  3 


 Bootstrap results
 ------------------

 Bootstrap (.632+) estimate of prediction error: 
  (using 10 bootstrap iterations): 
 0.1246571

 Number of vars in bootstrapped forests:        
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0     2.0     2.0     2.4     3.0     3.0 


 Variable selection using all data 
 ------------------------------

 
 variables used 
[1] "v1" "v2" "v8"

 
 Number of variables used:  3 


 Bootstrap results
 ------------------


 Bootstrap (.632+) estimate of prediction error: 0.1246571  (using 10 bootstrap iterations).


 Resubstitution error:                           0 


 Leave-one-out bootstrap error:                  0.1713333 


 Error rate at random:                           0.4 


 Number of vars in bootstrapped forests:        
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0     2.0     2.0     2.4     3.0     3.0 


 Overlapp of bootstrapped forests with forest from all data
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4082  0.6667  0.8165  0.7416  0.8165  1.0000 


 Variable freqs. in bootstrapped models 

 v8  v2  v1 v14 v16 v29  v4 
0.9 0.6 0.5 0.1 0.1 0.1 0.1 


 Variable freqs. of variables in forest from all data, and summary 

 v1  v2  v8 
0.5 0.6 0.9 

Number of cases in table: 2 
Number of factors: 1 


 Mean class membership probabilities from out of bag samples
            A          B
1  0.87000000 0.13000000
2  0.70500000 0.29500000
3  0.94833333 0.05166667
4  0.95333333 0.04666667
5  0.76750000 0.23250000
6  0.66600000 0.33400000
7  0.86285714 0.13714286
8  0.23600000 0.76400000
9  0.29666667 0.70333333
10 0.32000000 0.68000000
11 0.03500000 0.96500000
12 0.40750000 0.59250000
13 0.21000000 0.79000000
14 0.17750000 0.82250000
15 0.01500000 0.98500000
16 0.05285714 0.94714286
17 0.04500000 0.95500000
18 0.01625000 0.98375000
19 0.06400000 0.93600000
20 0.42375000 0.57625000
21 0.66500000 0.33500000
22 0.33000000 0.67000000
23 0.10000000 0.90000000
24 0.24250000 0.75750000
25 0.32500000 0.67500000