Description Usage Arguments Details Value Author(s) References See Also Examples
Prediction step refines the selection of intepretation step
VSURF_interp
by eliminating redundancy in the set of variables
selected, for prediction prupose. This is the third step of the
VSURF
function.
1 2 3 4 5 6 7 8  VSURF_pred(x, ...)
## Default S3 method:
VSURF_pred(x, y, ntree = 2000, err.interp, varselect.interp,
nfor.pred = 25, nmj = 1, ...)
## S3 method for class 'formula'
VSURF_pred(formula, data, ..., na.action = na.fail)

x, formula 
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted. 
... 
others parameters to be passed on to the 
y 
A response vector (must be a factor for classification problems and numeric for regression ones). 
ntree 
Number of trees in each forests grown. Standard parameter of

err.interp 
A vector of the mean OOB error rates of the embedded
random forests models build during interpretation step (value

varselect.interp 
A vector of indices of variables selected after interpretation step. 
nfor.pred 
Number of forests grown. 
nmj 
Number of times the mean jump is multiplied. See details below. 
data 
a data frame containing the variables in the model. 
na.action 
A function to specify the action to be taken if NAs are
found. (NOTE: If given, this argument must be named, and as

nfor.pred
embedded random forests models are grown, starting with the
random forest build with only the most important variable. Variables are
added to the model in a stepwise manner. The mean jump value
mean.jump
is calculated using variables that have been left out by
interpretation step, and is set as the mean absolute difference between mean
OOB errors of one model and its first following model. Hence a variable is
included in the model if the mean OOB error decrease is larger than
nmj
* mean.jump
.
Note that,
the mtry
parameter of randomForest
is set to its default value
(see randomForest
) if nvm
, the number of variables
in the model, is not greater than the number of observations,
while it is set to nvm/3
otherwise. This is to ensure quality of OOB
error estimations along embedded RF models.
An object of class VSURF_pred
, which is a list with the
following components:
varselect.pred 
A vector of indices of variables selected after "prediction step". 
err.pred 
A vector of the mean OOB error rates of the random forests models build during the "prediction step". 
mean.jump 
The mean jump value computed during the "prediction step". 
num.varselect.pred 
The number of selected variables. 
nmj 
Value of the parameter in the call. 
comput.time 
Computation time. 
call 
The original call to 
terms 
Terms associated to the formula (only if formulatype call was used). 
Robin Genuer, JeanMichel Poggi and Christine TuleauMalot
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 22252236
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2015), VSURF: An R Package for Variable Selection Using Random Forests, The R Journal 7(2):1933
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  data(iris)
iris.thres < VSURF_thres(iris[,1:4], iris[,5], ntree = 100, nfor.thres = 20)
iris.interp < VSURF_interp(iris[,1:4], iris[,5], vars = iris.thres$varselect.thres,
nfor.interp = 10)
iris.pred < VSURF_pred(iris[,1:4], iris[,5], err.interp = iris.interp$err.interp,
varselect.interp = iris.interp$varselect.interp, nfor.pred = 10)
iris.pred
## Not run:
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres < VSURF_thres(toys$x, toys$y)
toys.interp < VSURF_interp(toys$x, toys$y, vars = toys.thres$varselect.thres)
toys.pred < VSURF_pred(toys$x, toys$y, err.interp = toys.interp$err.interp,
varselect.interp = toys.interp$varselect.interp)
toys.pred
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.