# VARselect

## Description

Estimation in the regression model : Y= X β + σ N(0,1)
Variable selection by choosing the best predictor among predictors emanating
from different methods as lasso, elastic-net, adaptive lasso, pls, randomForest.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```VARselect(Y, X, dmax = NULL, normalize = TRUE, method = c("lasso", "ridge", "pls", "en", "ALridge", "ALpls", "rF", "exhaustive"), pen.crit = NULL, lasso.dmax = NULL, ridge.dmax = NULL, pls.dmax = NULL, en.dmax = NULL, ALridge.dmax = NULL, ALpls.dmax = NULL, rF.dmax = NULL, exhaustive.maxdim = 5e+05, exhaustive.dmax = NULL, en.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), rF.lmtry = 2, pls.ncomp = 5, ALridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ALpls.ncomp = 5, max.steps = NULL, K = 1.1, verbose = TRUE, long.output = FALSE) ```

## Arguments

 `Y` vector with n components : response variable. `X` matrix with n rows and p columns : covariates. `dmax` integer : maximum number of variables in the lasso estimator. `dmax` ≤ D where D = min (3*p/4 , n-5) if p ≥ n D= min(p,n-5) if p < n. Default : `dmax` = D. `normalize` logical : if TRUE the columns of X are scaled `method` vector of characters whose components are subset of “lasso”, “ridge”, “pls”, “en”, “ALridge”, “ALpls”, “rF”, “exhaustive”. `pen.crit` vector with `dmax`+1 components : for d=0, ..., `dmax`, `penalty[d+1]` gives the value of the penalty for the dimension d. Default : `penalty` = NULL. In that case, the penalty will be calculated by the function penalty. `lasso.dmax` integer lower than `dmax`, default = `dmax`. `ridge.dmax` integer lower than `dmax`, default = `dmax`. `pls.dmax` integer lower than `dmax`, default = `dmax`. `en.dmax` integer lower than `dmax`, default = `dmax`. `ALridge.dmax` integer lower than `dmax`, default = `dmax`. `ALpls.dmax` integer lower than `dmax`, default = `dmax`. `rF.dmax` integer lower than `dmax`, default = `dmax`. `exhaustive.maxdim` integer : maximum number of subsets of covariates considered in the exhaustive method. See details. `exhaustive.dmax` integer lower than `dmax`, default = `dmax` `en.lambda` vector : tuning parameter of the ridge. It is the input parameter `lambda` of function `enet` `ridge.lambda` vector : tuning parameter of the ridge. It is the input parameter lambda of function `lm.ridge` `rF.lmtry` vector : tuning paramer `mtry` of function `randomForest`, `mtry` =p/`rF.lmtry`. `pls.ncomp` integer : tuning parameter of the pls. It is the input parameter `ncomp` of the function `plsr`. See details. `ALridge.lambda` similar to `ridge.lambda` in the adaptive lasso procedure. `ALpls.ncomp` similar to `pls.ncomp` in the adaptive lasso procedure. See details. `max.steps` integer. Maximum number of steps in the lasso procedure. Corresponds to the input `max.steps` of the function `enet`. Default : `max.steps` = 2*min(p,n) `K` scalar : value of the parameter K in the LINselect criteria. `verbose` logical : if TRUE a trace of the current process is displayed in real time. `long.output` logical : if FALSE only the component summary will be returned. See Value.

## Details

When method is `pls` or `ALpls`, the `LINselect` procedure is carried out considering the number of components in the `pls` method as the tuning parameter.
This tuning parameter varies from 1 to `pls.ncomp`.

When method is `exhaustive`, the maximum number of variate d is calculated as follows.
Let q be the largest integer such that `choose(p,q)` < `exhaustive.maxdim`. Then d = `min(q, exhaustive.dmax,dmax)`.

## Value

A list with at least `length(method)` components.
For each procedure in `method` a list with components

• `support`: vector of integers. Estimated support of the parameters β for the considered procedure.

• `crit`: scalar equals to the LINselect criteria calculated in the estimated support.

• `fitted`: vector with length n. Fitted value of the response calculated when the support of β equals `support`.

• `coef`: vector whose first component is the estimated intercept.
The other components are the estimated non zero coefficients when the support of β equals `support`.

If `length(method)` > 1, the additional component `summary` is a list with three components:

• `support`: vector of integers. Estimated support of the parameters β corresponding to the minimum of the criteria among all procedures.

• `crit`: scalar. Minimum value of the criteria among all procedures.

• `method`: vector of characters. Names of the procedures for which the minimum is reached

If `pen.crit = NULL`, the component `pen.crit` gives the values of the penalty calculated by the function `penalty`. If `long.output` is TRUE the component named `chatty` is a list with `length(method)` components.
For each procedure in `method`, a list with components

• `support` where `support[[l]]` is a vector of integers containing an estimator of the support of the parameters β.

• `crit` : vector where `crit[l]` contains the value of the LINselect criteria calculated in `support[[l]]`.

## Note

When method is `lasso`, library `elasticnet` is loaded.

When method is `en`, library `elasticnet` is loaded.

When method is `ridge`, library `MASS` is loaded.

When method is `rF`, library `randomForest` is loaded.

When method is `pls`, library `pls` is loaded.

When method is `ALridge`, libraries `MASS` and `elasticnet` are loaded.

When method is `ALpls`, libraries `pls` and `elasticnet` are loaded.

When method is `exhaustive`, library `gtools` is loaded.

## Author(s)

Yannick Baraud, Christophe Giraud, Sylvie Huet

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```#source("charge.R") library("LINselect") # simulate data with # beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15)) ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.VARselect <- VARselect(ex\$Y,ex\$X,exhaustive.dmax=2) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- VARselect(y,x2,exhaustive.dmax=5) ## Not run: detach(diabetes) ```

