rfcv | R Documentation |
This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.
rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5,
mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)
trainx |
matrix or data frame containing columns of predictor variables |
trainy |
vector of response, must have length equal to the number
of rows in |
cv.fold |
number of folds in the cross-validation |
scale |
if |
step |
if |
mtry |
a function of number of remaining predictor variables to
use as the |
recursive |
whether variable importance is (re-)assessed at each step of variable reduction |
... |
other arguments passed on to |
A list with the following components:
list(n.var=n.var, error.cv=error.cv, predicted=cv.pred)
n.var |
vector of number of variables used at each step |
error.cv |
corresponding vector of error rates or MSEs at each step |
predicted |
list of |
Andy Liaw
Svetnik, V., Liaw, A., Tong, C. and Wang, T., “Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules”, MCS 2004, Roli, F. and Windeatt, T. (Eds.) pp. 334-343.
randomForest
, importance
set.seed(647)
myiris <- cbind(iris[1:4], matrix(runif(96 * nrow(iris)), nrow(iris), 96))
result <- rfcv(myiris, iris$Species, cv.fold=3)
with(result, plot(n.var, error.cv, log="x", type="o", lwd=2))
## The following can take a while to run, so if you really want to try
## it, copy and paste the code into R.
## Not run:
result <- replicate(5, rfcv(myiris, iris$Species), simplify=FALSE)
error.cv <- sapply(result, "[[", "error.cv")
matplot(result[[1]]$n.var, cbind(rowMeans(error.cv), error.cv), type="l",
lwd=c(2, rep(1, ncol(error.cv))), col=1, lty=1, log="x",
xlab="Number of variables", ylab="CV Error")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.