Description Usage Arguments Details Value Author(s) References See Also Examples
Thresholding step is dedicated to roughly eliminate irrelevant variables a the
dataset. This is the first step of the VSURF
function. For
refined variable selection, see VSURF other steps: VSURF_interp
and VSURF_pred
.
1 2 3 4 5 6 7 8 9 10  VSURF_thres(x, ...)
## Default S3 method:
VSURF_thres(x, y, ntree = 2000,
mtry = max(floor(ncol(x)/3), 1), nfor.thres = 50, nmin = 1,
RFimplementation = "randomForest", parallel = FALSE,
clusterType = "PSOCK", ncores = parallel::detectCores()  1, ...)
## S3 method for class 'formula'
VSURF_thres(formula, data, ..., na.action = na.fail)

x, formula 
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted. 
... 
others parameters to be passed on to the 
y 
A response vector (must be a factor for classification problems and numeric for regression ones). 
ntree 
Number of trees in each forests grown. Standard parameter of

mtry 
Number of variables randomly sampled as candidates at each split.
Standard parameter of 
nfor.thres 
Number of forests grown. 
nmin 
Number of times the "minimum value" is multiplied to set threshold value. See details below. 
RFimplementation 
Choice of the random forests implementation to use : "randomForest" (default) or "ranger". 
parallel 
A logical indicating if you want VSURF to run in parallel on multiple cores (default to FALSE). 
clusterType 
Type of the multiple cores cluster used to run VSURF in
parallel. Must be chosen among "PSOCK" (default: SOCKET cluster available
locally on all OS), "FORK" (local too, only available for Linux and Mac OS)
and "MPI" (can be used on a remote cluster, which needs 
ncores 
Number of cores to use. Default is set to the number of cores detected by R minus 1. 
data 
a data frame containing the variables in the model. 
na.action 
A function to specify the action to be taken if NAs are
found. (NOTE: If given, this argument must be named, and as

First, nfor.thres
random forests are computed using the function
randomForest
with arguments importance=TRUE
, and our choice of
default values for ntree
and mtry
(which are higher than default
in randomForest
to get a more stable variable importance
measure). Then variables are sorted according to their mean variable
importance (VI), in decreasing order. This order is kept all along the
procedure. Next, a threshold is computed: min.thres
, the minimum
predicted value of a pruned CART tree fitted to the curve of the standard
deviations of VI. Finally, the actual thresholding is performed: only
variables with a mean VI larger than nmin
* min.thres
are kept.
An object of class VSURF_thres
, which is a list with the
following components:
varselect.thres 
A vector of indices of selected variables, sorted according to their mean VI, in decreasing order. 
imp.varselect.thres 
A vector of importance of the

min.thres 
The minimum predicted value of a pruned CART tree fitted to the curve of the standard deviations of VI. 
num.varselect.thres 
The number of selected variables. 
imp.mean.dec 
A vector of the variables importance means (over

imp.mean.dec.ind 
The ordering index vector associated to the sorting of variables importance means. 
imp.sd.dec 
A vector of standard deviations of all variables
importance. The order is given by 
mean.perf 
The mean OOB error rate, obtained by a random forests build with all variables. 
pred.pruned.tree 
The predictions of the CART tree fitted to the curve of the standard deviations of VI. 
nmin 
Value of the parameter in the call. 
comput.time 
Computation time. 
RFimplementation 
The RF implementation used to run

ncores 
The number of cores used to run 
clusterType 
The type of the cluster used to run 
call 
The original call to 
terms 
Terms associated to the formula (only if formulatype call was used). 
Robin Genuer, JeanMichel Poggi and Christine TuleauMalot
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 22252236
Genuer, R. and Poggi, J.M. and TuleauMalot, C. (2015), VSURF: An R Package for Variable Selection Using Random Forests, The R Journal 7(2):1933
1 2 3 4 5 6 7 8 9 10 11  data(iris)
iris.thres < VSURF_thres(iris[,1:4], iris[,5], ntree = 100, nfor.thres = 20)
iris.thres
## Not run:
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres < VSURF_thres(toys$x, toys$y)
toys.thres
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.