Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/rf.crossValidation.R
Implements a permutation test cross-validation for Random Forests models
1 2 3 |
x |
random forest object |
xdata |
x data used in model |
ydata |
optional y data used in model, default is to use x$y from model object |
p |
Proportion data withhold (default p=0.10) |
n |
Number of cross validations (default n=99) |
seed |
Sets random seed in R global environment |
normalize |
(FALSE/TRUE) For regression, should rmse, mbe and mae be normalized using (max(y) - min(y)) |
bootstrap |
(FALSE/TRUE) Should a bootstrap sampling be applied. If FALSE, an n-th percent withold will be conducted |
trace |
Print iterations |
... |
Additional arguments passed to Random Forests |
For classification problems, the cross-validation statistics are based on the prediction error on the withheld data: Total observed accuracy represents the percent correctly classified (aka, PCC) and is considered as a naive measure of agreement. The diagonal of the confusion matrix represents correctly classified observations where off-diagonals represent cross-classification error. The primary issue with this evaluation is that does not reveal if error was evenly distributed between classes. To represent the balance of error one can use omission and commission statistics such as estimates of users and producers accuracy. User's accuracy corresponds to error of commission (inclusion), observations being erroneously included in a given class. The commission errors are represented by row sums of the matrix. Producer's accuracy corresponds to error of omission (exclusion), observations being erroneously excluded from a given class. The omission errors are represented by column sums of the matrix. None of the previous statistics account for random agreement influencing the accuracy measure. The kappa statistic is a chance corrected metric that reflects the difference between observed agreement and agreement expected by random chance. A kappa of k=0.85 would indicate that there is 85
pcc = [Number of correct observations / total number of observations]
pcc = [Number of correct observations / total number of observations]
producers accuracy = [Number of correct / total number of correct and omission errors]
k = (observed accuracy - chance agreement) / (1 - chance agreement) where; change agreement = sum[product of row and column totals for each class]
For regression problems, a Bootstrap is constructed and the subset models MSE and percent variance explained is reported. Additional, the RMSE between the withheld response variable (y) and the predicted subset model
For classification a "rf.cv"", "classification" class object with the following components:
cross.validation$cv.users.accuracy Class-level users accuracy for the subset cross validation data
cross.validation$cv.producers.accuracy Class-level producers accuracy for the subset cross validation data
cross.validation$cv.oob Global and class-level OOB error for the subset cross validation data
model$model.users.accuracy Class-level users accuracy for the model
model$model.producers.accuracy Class-level producers accuracy for the model
model$model.oob Global and class-level OOB error for the model
For regression a "rf.cv", "regression" class object with the following components:
fit.var.exp Percent variance explained from specified fit model
fit.mse Mean Squared Error from specified fit model
y.rmse Root Mean Squared Error (observed vs. predicted) from each Bootstrap iteration (cross-validation)
y.mbe Mean Bias Error from each Bootstrapped model
y.mae Mean Absolute Error from each Bootstrapped model
D Test statistic from Kolmogorov-Smirnov distribution Test (y and estimate)
p.val p-value for Kolmogorov-Smirnov distribution Test (y and estimate)
model.mse Mean Squared Error from each Bootstrapped model
model.varExp Percent variance explained from each Bootstrapped model
Jeffrey S. Evans <jeffrey_evans<at>tnc.org>
Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683.
Murphy M.A., J.S. Evans, and A.S. Storfer (2010) Quantify Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252-261
Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer
randomForest
for randomForest ... options
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ## Not run:
library(randomForest)
# For classification
data(iris)
iris$Species <- as.factor(iris$Species)
set.seed(1234)
( rf.mdl <- randomForest(iris[,1:4], iris[,"Species"], ntree=501) )
( rf.cv <- rf.crossValidation(rf.mdl, iris[,1:4], p=0.10, n=99, ntree=501) )
# Plot cross validation versus model producers accuracy
par(mfrow=c(1,2))
plot(rf.cv, type = "cv", main = "CV producers accuracy")
plot(rf.cv, type = "model", main = "Model producers accuracy")
# Plot cross validation versus model oob
par(mfrow=c(1,2))
plot(rf.cv, type = "cv", stat = "oob", main = "CV oob error")
plot(rf.cv, type = "model", stat = "oob", main = "Model oob error")
# For regression
data(airquality)
airquality <- na.omit(airquality)
rf.mdl <- randomForest(y=airquality[,"Ozone"], x=airquality[,2:4])
( rf.cv <- rf.crossValidation(rf.mdl, airquality[,2:4],
p=0.10, n=99, ntree=501) )
par(mfrow=c(2,2))
plot(rf.cv)
plot(rf.cv, stat = "mse")
plot(rf.cv, stat = "var.exp")
plot(rf.cv, stat = "mae")
## End(Not run)
|
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Call:
randomForest(x = iris[, 1:4], y = iris[, "Species"], ntree = 501)
Type of random forest: classification
Number of trees: 501
No. of variables tried at each split: 2
OOB estimate of error rate: 5.33%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 5 45 0.10
running: classification cross-validation with 99 iterations
Classification accuracy for cross-validation
setosa versicolor virginica
users.accuracy 100 100 100
producers.accuracy 100 100 NA
Cross-validation Kappa = 0.9271
Cross-validation OOB Error = 0.04861111
Cross-validation error variance = 5.448834e-05
Classification accuracy for model
setosa versicolor virginica
users.accuracy 100 93.8 91.7
producers.accuracy 100 91.8 93.6
Model Kappa = 0.9271
Model OOB Error = 0.04861111
Model error variance = 3.19812e-05
running: regression cross-validation with 99 iterations
Fit MSE = 293.6456
Fit percent variance explained = 73.08
Median permuted MSE = 308.9923
Median permuted percent variance explained = 72.38
Median cross-validation RMSE = 14.93475
Median cross-validation MBE = 0.5988576
Median cross-validation MAE = 11.64054
Range of ks p-values = 0.001349443 0.7989985
Range of ks D statistic = 0.1818182 0.5454545
RMSE cross-validation error variance = 42.1302
MBE cross-validation error variance = 36.18507
MAE cross-validation error variance = 13.35967
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.