fs  R Documentation 
Run forward selection starting from a baseline model. As it uses
all observations in the input data frame, it is not possible to
produce unbiased estimates of the predictive performance of the panel
selected (use nested.fs()
for that purpose).
fs( formula, data, family, choose.from = NULL, test = c("t", "wilcoxon"), num.inner.folds = 30, max.iters = 10, min.llk.diff = 2, max.pval = 0.5, sel.crit = c("paired.test", "total.loglik", "both"), num.filter = 0, filter.ignore = NULL, seed = 50, verbose = TRUE ) forward.selection(x, y, init.model, family, ...)
formula 
An object of class 
data 
Data frame or matrix containing outcome variable and predictors. 
family 
Type of model fitted: either 
choose.from 
Indices or variable names over which the selection should
be performed. If 
test 
Type of statistical paired test to use (ignored if

num.inner.folds 
Number of folds in the inner crossvalidation. It must be at least 5 (default: 30). 
max.iters 
Maximum number of iterations (default: 10). 
min.llk.diff 
Minimum improvement in loglikelihood required before selection is terminated (default: 2). 
max.pval 
Interrupt the selection when the best achievable pvalue exceeds this threshold (default: 0.5). 
sel.crit 
Selection criterion: 
num.filter 
Number of variables to be retained by the univariate
association filter (see Details), which can only be enabled
if 
filter.ignore 
Vector of variable names that should not be pruned by
the univariate association filter so that they are always allowed to
be selected (ignored if 
seed 
Seed of the random number generator for the inner folds. 
verbose 
Whether the variable chosen at each iteration should be
printed out (default: 
x 
Dataframe of predictors: this should include all variables in the initial set and the variables that are allowed to enter the selected panel. 
y 
Outcome variable. If 
init.model 
Either a formula or a vector of names of the initial set of variables that define the model from which the forward selection should start. 
... 
Further arguments to 
At each iteration, this function runs crossvalidation to choose which variable enters the final panel by fitting the current model augmented by each remaining variable considered one at a time.
By default variables are selected according to the paired.test
criterion. At each iteration, the sampling distribution of differences in
validation loglikelihood obtained across all inner crossvalidation folds
of the models with and without each additional variable are tested against
the null hypothesis of zero mean (with the alternative hypothesis being
that the model with the additional variable is better). The test is paired
according to the inner folds. Although the training folds are not
independent, the pvalue from this test approximates the probability that
including the marker will not decrease the validation loglikelihood
(approximate false discovery rate).
In the case of a binary outcome when very large number of predictors is
available, it may be convenient to apply a univariate association filter.
If num.filter
is set to a positive value, then all available
predictors (excluding those whose name is matched by filter.ignore
)
are tested for univariate association with the outcome, and only the first
num.filter
enter the selection phase, while the others are filtered
out. This is done on the training part of all inner folds. Filtering can
enhance the performance of forward selection when the number of available
variables exceeds about 3040.
forward.selection
provides the legacy interface used up to version 0.9.2.
It is considered discontinued, and in the future it will be deprecated and
eventually removed.
An object of class fs
containing the following fields:
fs 
A data frame containing the forward selection summary. 
init 
The set of variables used in the initial model. 
panel 
Names of variables selected (in order). 
init.model 
Righthand side of the formula corresponding to the initial model. 
final.model 
Righthand side of the formula corresponding to the final model after forward selection. 
family 
Type of model fitted. 
params 
List of parameters used. 
iter1 
Summary statistics for all variables at the first iteration. 
all.iter 
Validation loglikelihoods for all inner folds at all iterations. 
nested.fs()
and summary.fs()
.
data(diabetes) fs.res < fs(Y ~ age + sex, data=diabetes, family=gaussian(), choose.from=1:10, num.inner.folds=5, max.iters=3) summary(fs.res)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.