VSURF_pred: Prediction step of VSURF

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/VSURF_pred.R

Description

Prediction step refines the selection of interpretation step VSURF_interp by eliminating redundancy in the set of variables selected, for prediction purpose. This is the third step of the VSURF function.

Usage

1
2
3
4
5
6
7
8
VSURF_pred(x, ...)

## Default S3 method:
VSURF_pred(x, y, ntree = 2000, err.interp, varselect.interp,
  nfor.pred = 25, nmj = 1, RFimplementation = "randomForest", ...)

## S3 method for class 'formula'
VSURF_pred(formula, data, ..., na.action = na.fail)

Arguments

x, formula

A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted.

...

others parameters to be passed on to the randomForest function (see ?randomForest for further information).

y

A response vector (must be a factor for classification problems and numeric for regression ones).

ntree

Number of trees in each forests grown. Standard parameter of randomForest.

err.interp

A vector of the mean OOB error rates of the embedded random forests models build during interpretation step (value err.interp of function VSURF_interp).

varselect.interp

A vector of indices of variables selected after interpretation step.

nfor.pred

Number of forests grown.

nmj

Number of times the mean jump is multiplied. See details below.

RFimplementation

Choice of the random forests implementation to use : "randomForest" (default) or "ranger".

data

a data frame containing the variables in the model.

na.action

A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named, and as randomForest it is only used with the formula-type call.)

Details

nfor.pred embedded random forests models are grown, starting with the random forest build with only the most important variable. Variables are added to the model in a stepwise manner. The mean jump value mean.jump is calculated using variables that have been left out by interpretation step, and is set as the mean absolute difference between mean OOB errors of one model and its first following model. Hence a variable is included in the model if the mean OOB error decrease is larger than nmj * mean.jump.

Note that, the mtry parameter of randomForest is set to its default value (see randomForest) if nvm, the number of variables in the model, is not greater than the number of observations, while it is set to nvm/3 otherwise. This is to ensure quality of OOB error estimations along embedded RF models.

Value

An object of class VSURF_pred, which is a list with the following components:

varselect.pred

A vector of indices of variables selected after "prediction step".

err.pred

A vector of the mean OOB error rates of the random forests models build during the "prediction step".

mean.jump

The mean jump value computed during the "prediction step".

num.varselect.pred

The number of selected variables.

nmj

Value of the parameter in the call.

comput.time

Computation time.

RFimplementation

The RF implementation used to run VSURF_pred.

call

The original call to VSURF.

terms

Terms associated to the formula (only if formula-type call was used).

Author(s)

Robin Genuer, Jean-Michel Poggi and Christine Tuleau-Malot

References

Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 2225-2236

Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2015), VSURF: An R Package for Variable Selection Using Random Forests, The R Journal 7(2):19-33

See Also

VSURF

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(iris)
iris.thres <- VSURF_thres(iris[,1:4], iris[,5], ntree = 100, nfor.thres = 20)
iris.interp <- VSURF_interp(iris[,1:4], iris[,5],
  vars = iris.thres$varselect.thres, nfor.interp = 10)
iris.pred <- VSURF_pred(iris[,1:4], iris[,5],
  err.interp = iris.interp$err.interp,
  varselect.interp = iris.interp$varselect.interp, nfor.pred = 10)
iris.pred

## Not run: 
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres <- VSURF_thres(toys$x, toys$y)
toys.interp <- VSURF_interp(toys$x, toys$y,
  vars = toys.thres$varselect.thres)
toys.pred <- VSURF_pred(toys$x, toys$y, err.interp = toys.interp$err.interp,
  varselect.interp = toys.interp$varselect.interp)
toys.pred
## End(Not run)

robingenuer/VSURF documentation built on Oct. 16, 2018, 11:09 a.m.