Description Usage Arguments Details Value Author(s) References See Also Examples
Thresholding step is dedicated to roughly eliminate irrelevant variables a
the dataset. This is the first step of the VSURF
function. For
refined variable selection, see VSURF other steps:
VSURF_interp
and VSURF_pred
.
1 2 3 4 5 6 7 8 9 10  VSURF_thres(x, ...)
## Default S3 method:
VSURF_thres(x, y, ntree = 2000,
mtry = max(floor(ncol(x)/3), 1), nfor.thres = 50, nmin = 1,
parallel = FALSE, clusterType = "PSOCK", ncores = detectCores()  1,
...)
## S3 method for class 'formula'
VSURF_thres(formula, data, ..., na.action = na.fail)

x, formula 
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted. 
... 
others parameters to be passed on to the 
y 
A response vector (must be a factor for classification problems and numeric for regression ones). 
ntree 
Number of trees in each forest grown. Standard

mtry 
Number of variables randomly sampled as candidates at each
split. Standard 
nfor.thres 
Number of forests grown. 
nmin 
Number of times the "minimum value" is multiplied to set threshold value. See details below. 
parallel 
A logical indicating if you want VSURF to run in parallel on multiple cores (default to FALSE). 
clusterType 
Type of the multiple cores cluster used to run VSURF in
parallel. Must be chosen among "PSOCK" (default: SOCKET cluster available
locally on all OS), "FORK" (local too, only available for Linux and Mac OS)
and "MPI" (can be used on a remote cluster, which needs 
ncores 
Number of cores to use. Default is set to the number of cores detected by R minus 1. 
data 
a data frame containing the variables in the model. 
na.action 
A function to specify the action to be taken if NAs are
found. (NOTE: If given, this argument must be named, and as

First, nfor.thres
random forests are computed using the function
randomForest
with arguments importance=TRUE
, and our choice
of default values for
ntree
and mtry
(which are higher than default in
randomForest
to get a more stable variable importance measure).
Then variables
are sorted according to their mean variable importance (VI), in decreasing
order. This order is kept all along the procedure. Next, a threshold is
computed: min.thres
, the minimum predicted value of a pruned CART
tree fitted to the curve of the standard deviations of VI. Finally, the
actual thresholding is performed: only variables with a mean VI larger than
nmin
* min.thres
are kept.
An object of class VSURF_thres
, which is a list with the
following components:
varselect.thres 
A vector of indices of selected variables, sorted according to their mean VI, in decreasing order. 
imp.varselect.thres 
A vector of importances of the

min.thres 
The minimum predicted value of a pruned CART tree fitted to the curve of the standard deviations of VI. 
num.varselect.thres 
The number of selected variables. 
imp.mean.dec 
A vector of the variables importance means
(over 
imp.mean.dec.ind 
The ordering index vector associated to the sorting of variables importance means. 
imp.sd.dec 
A vector of standard deviations of all variables
importances. The order is given by 
mean.perf 
The mean OOB error rate, obtained by a random forests build with all variables. 
pred.pruned.tree 
The predictions of the CART tree fitted to the curve of the standard deviations of VI. 
nmin 
Value of the parameter in the call. 
comput.time 
Computation time. 
ncores 
The number of cores used to run 
clusterType 
The type of the cluster used to run

call 
The original call to 
terms 
Terms associated to the formula (only if formulatype call was used). 
Robin Genuer, JeanMichel Poggi and Christine TuleauMalot
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 22252236
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2015), VSURF: An R Package for Variable Selection Using Random Forests, The R Journal 7(2):1933
1 2 3 4 5 6 7 8 9 10 11  data(iris)
iris.thres < VSURF_thres(iris[,1:4], iris[,5], ntree = 100, nfor.thres = 20)
iris.thres
## Not run:
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres < VSURF_thres(toys$x, toys$y)
toys.thres
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.