qselection: Selecting variables for several subset sizes

Description Usage Arguments Value Author(s) See Also Examples

Description

Function that enables to obtain the best variables for more than one size of subset. Returns a table with the chosen covariates to be introduced into the models and their information criteria. Additionally, an asterisk is shown next to the size of subset which minimizes the information criterion.

Usage

1
2
qselection(x, y, qvector, criterion = "deviance", method = "lm",
  family = "gaussian", nfolds = 5, cluster = TRUE, ncores = NULL)

Arguments

x

A data frame containing all the covariates.

y

A vector with the response values.

qvector

A vector with more than one variable-subset size to be selected.

criterion

The information criterion to be used. Default is the deviance. Other functions provided are the coefficient of determination ("R2"), the residual variance ("variance"), the Akaike information criterion ("aic"), AIC with a correction for finite sample sizes ("aicc") and the Bayesian information criterion ("bic"). The deviance, coefficient of determination and variance are calculated by cross-validation.

method

A character string specifying which regression method is used, i.e., linear models ("lm"), generalized additive models ("glm") or generalized additive models ("gam").

family

A description of the error distribution and link function to be used in the model: ("gaussian"), ("binomial") or ("poisson").

nfolds

Number of folds for the cross-validation procedure, for deviance, R2 or variance criterion.

cluster

A logical value. If TRUE (default), the procedure is parallelized. Note that there are cases without enough repetitions (e.g., a low number of initial variables) that R will gain in performance through serial computation. R takes time to distribute tasks across the processors also it will need time for binding them all together later on. Therefore, if the time for distributing and gathering pieces together is greater than the time need for single-thread computing, it does not worth parallelize.

ncores

An integer value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1.

Value

q

A vector of subset sizes.

criterion

A vector of Information criterion values.

selection

Selected variables for each size.

Author(s)

Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardinas.

See Also

selection plot.qselection.

Examples

1
2
3
4
5
6
library(FWDselect)
data(diabetes)
x = diabetes[ ,2:11]
y = diabetes[ ,1]
obj2 = qselection(x, y, qvector = c(1:9), method = "lm", criterion = "variance", cluster = FALSE)
obj2

sestelo/fwdselect documentation built on May 29, 2019, 6:58 p.m.