Description Usage Arguments Details Value Author(s) References See Also Examples
Interpretation step aims to select all variables related to the response for
interpretation purpose. This is the second step of the VSURF
function. It is designed to be executed after the thresholding step
VSURF_thres
.
1 2 3 4 5 6 7 8 9  VSURF_interp(x, ...)
## Default S3 method:
VSURF_interp(x, y, ntree = 2000, vars, nfor.interp = 25,
nsd = 1, RFimplementation = "randomForest", parallel = FALSE,
ncores = detectCores()  1, clusterType = "PSOCK", ...)
## S3 method for class 'formula'
VSURF_interp(formula, data, ..., na.action = na.fail)

x, formula 
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted. 
... 
others parameters to be passed on to the 
y 
A response vector (must be a factor for classification problems and numeric for regression ones). 
ntree 
Number of trees in each forests grown. Standard parameter of

vars 
A vector of variable indices. Typically, indices of variables
selected by thresholding step (see value 
nfor.interp 
Number of forests grown. 
nsd 
Number of times the standard deviation of the minimum value of

RFimplementation 
Choice of the random forests implementation to use : "randomForest" (default) or "ranger". 
parallel 
A logical indicating if you want VSURF to run in parallel on multiple cores (default to FALSE). 
ncores 
Number of cores to use. Default is set to the number of cores detected by R minus 1. 
clusterType 
Type of the multiple cores cluster used to run VSURF in
parallel. Must be chosen among "PSOCK" (default: SOCKET cluster available
locally on all OS), "FORK" (local too, only available for Linux and Mac OS)
and "MPI" (can be used on a remote cluster, which needs 
data 
a data frame containing the variables in the model. 
na.action 
A function to specify the action to be taken if NAs are
found. (NOTE: If given, this argument must be named, and as

nfor.interp
embedded random forests models are grown, starting with the
random forest build with only the most important variable and ending with all
variables. Then, err.min
the minimum mean outofbag (OOB) error rate
of these models and its associated standard deviation sd.min
are
computed. Finally, the smallest model (and hence its corresponding variables)
having a mean OOB error less than err.min
+ nsd
* sd.min
is selected.
Note that, the mtry
parameter of randomForest
is set to its
default value (see randomForest
) if nvm
, the number of
variables in the model, is not greater than the number of observations, while
it is set to nvm/3
otherwise. This is to ensure quality of OOB error
estimations along embedded RF models.
An object of class VSURF_interp
, which is a list with the
following components:
varselect.interp 
A vector of indices of selected variables. 
err.interp 
A vector of the mean OOB error rates of the embedded random forests models. 
sd.min 
The standard deviation of OOB error rates associated to the random forests model attaining the minimum mean OOB error rate. 
num.varselect.interp 
The number of selected variables. 
varselect.thres 
A vector of indexes of variables selected after "thresholding step", sorted according to their mean VI, in decreasing order. 
nsd 
Value of the parameter in the call. 
comput.time 
Computation time. 
RFimplementation 
The RF implementation used to run

ncores 
The number of cores used to run 
clusterType 
The type of the cluster used to run 
call 
The original call to 
terms 
Terms associated to the formula (only if formulatype call was used). 
Robin Genuer, JeanMichel Poggi and Christine TuleauMalot
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 22252236
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2015), VSURF: An R Package for Variable Selection Using Random Forests, The R Journal 7(2):1933
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  data(iris)
iris.thres < VSURF_thres(iris[,1:4], iris[,5], ntree = 100, nfor.thres = 20)
iris.interp < VSURF_interp(iris[,1:4], iris[,5],
vars = iris.thres$varselect.thres, nfor.interp = 10)
iris.interp
## Not run:
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres < VSURF_thres(toys$x, toys$y)
toys.interp < VSURF_interp(toys$x, toys$y,
vars = toys.thres$varselect.thres)
toys.interp
## End(Not run)

