rf.crossValidation: Random Forest Classification or Regression Model...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rf.crossValidation.R

Description

Implements a permutation test cross-validation for Random Forests models

Usage

1
2
3
rf.crossValidation(x, xdata, ydata = NULL, p = 0.1, n = 99,
  seed = NULL, normalize = FALSE, bootstrap = FALSE, trace = FALSE,
  ...)

Arguments

x

random forest object

xdata

x data used in model

ydata

optional y data used in model, default is to use x$y from model object

p

Proportion data withhold (default p=0.10)

n

Number of cross validations (default n=99)

seed

Sets random seed in R global environment

normalize

(FALSE/TRUE) For regression, should rmse, mbe and mae be normalized using (max(y) - min(y))

bootstrap

(FALSE/TRUE) Should a bootstrap sampling be applied. If FALSE, an n-th percent withold will be conducted

trace

Print iterations

...

Additional arguments passed to Random Forests

Details

For classification problems, the cross-validation statistics are based on the prediction error on the withheld data: Total observed accuracy represents the percent correctly classified (aka, PCC) and is considered as a naive measure of agreement. The diagonal of the confusion matrix represents correctly classified observations where off-diagonals represent cross-classification error. The primary issue with this evaluation is that does not reveal if error was evenly distributed between classes. To represent the balance of error one can use omission and commission statistics such as estimates of users and producers accuracy. User's accuracy corresponds to error of commission (inclusion), observations being erroneously included in a given class. The commission errors are represented by row sums of the matrix. Producer's accuracy corresponds to error of omission (exclusion), observations being erroneously excluded from a given class. The omission errors are represented by column sums of the matrix. None of the previous statistics account for random agreement influencing the accuracy measure. The kappa statistic is a chance corrected metric that reflects the difference between observed agreement and agreement expected by random chance. A kappa of k=0.85 would indicate that there is 85

For regression problems, a Bootstrap is constructed and the subset models MSE and percent variance explained is reported. Additional, the RMSE between the withheld response variable (y) and the predicted subset model

Value

For classification a "rf.cv"", "classification" class object with the following components:

For regression a "rf.cv", "regression" class object with the following components:

Author(s)

Jeffrey S. Evans <jeffrey_evans<at>tnc.org>

References

Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683.

Murphy M.A., J.S. Evans, and A.S. Storfer (2010) Quantify Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252-261

Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer

See Also

randomForest for randomForest ... options

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
library(randomForest)

# For classification
  data(iris)
    iris$Species <- as.factor(iris$Species)    	
      set.seed(1234)	
( rf.mdl <- randomForest(iris[,1:4], iris[,"Species"], ntree=501) )
  ( rf.cv <- rf.crossValidation(rf.mdl, iris[,1:4], p=0.10, n=99, ntree=501) )

   # Plot cross validation versus model producers accuracy
   par(mfrow=c(1,2)) 
     plot(rf.cv, type = "cv", main = "CV producers accuracy")
     plot(rf.cv, type = "model", main = "Model producers accuracy")

   # Plot cross validation versus model oob
   par(mfrow=c(1,2)) 
     plot(rf.cv, type = "cv", stat = "oob", main = "CV oob error")
     plot(rf.cv, type = "model", stat = "oob", main = "Model oob error")	  

# For regression
data(airquality)
airquality <- na.omit(airquality) 
rf.mdl <- randomForest(y=airquality[,"Ozone"], x=airquality[,2:4])
( rf.cv <- rf.crossValidation(rf.mdl, airquality[,2:4], 
                              p=0.10, n=99, ntree=501) )
 par(mfrow=c(2,2))
   plot(rf.cv)  
   plot(rf.cv, stat = "mse")
   plot(rf.cv, stat = "var.exp")
plot(rf.cv, stat = "mae")

## End(Not run)	 
  

Example output

randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.

Call:
 randomForest(x = iris[, 1:4], y = iris[, "Species"], ntree = 501) 
               Type of random forest: classification
                     Number of trees: 501
No. of variables tried at each split: 2

        OOB estimate of  error rate: 5.33%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          5        45        0.10
running: classification cross-validation with 99 iterations 
Classification accuracy for cross-validation 
 
                   setosa versicolor virginica
users.accuracy        100        100       100
producers.accuracy    100        100        NA
 
Cross-validation Kappa = 0.9271 
Cross-validation OOB Error = 0.04861111 
Cross-validation error variance = 5.448834e-05 
 
 
Classification accuracy for model 
 
                   setosa versicolor virginica
users.accuracy        100       93.8      91.7
producers.accuracy    100       91.8      93.6
 
Model Kappa = 0.9271 
Model OOB Error = 0.04861111 
Model error variance = 3.19812e-05 
running: regression cross-validation with 99 iterations 
Fit MSE = 293.6456 
Fit percent variance explained = 73.08 
Median permuted MSE = 308.9923 
Median permuted percent variance explained = 72.38 
Median cross-validation RMSE = 14.93475 
Median cross-validation MBE = 0.5988576 
Median cross-validation MAE = 11.64054 
Range of ks p-values = 0.001349443 0.7989985 
Range of ks D statistic = 0.1818182 0.5454545 
RMSE cross-validation error variance = 42.1302 
MBE cross-validation error variance = 36.18507 
MAE cross-validation error variance = 13.35967 

rfUtilities documentation built on Oct. 3, 2019, 9:04 a.m.