Description Features Algorithm Technical remarks Author(s) References

This package is designed to fit accurately several popular penalized linear regression models using the algorithm proposed in Grandvalet, Chiquet and Ambroise (submitted) by solving quadratic problems with increasing size.

At the moment, two `R`

fitting functions are available:

the

`elastic.net`

function, which solves a family of linear regression problems penalized by a mixture of*l1*and*l2*norms. It notably includes the LASSO (Tibshirani, 1996), the adaptive-LASSO (Zou, 2006), the Elastic-net (Zou and Hastie, 2006) or the Structured Elastic-net (Slawski et al., 2010). See examples as well as the available`demo(quad_enet)`

.the

`bounded.reg`

function, which fits a linear model penalized by a mixture of*infinity*and*l2*norms. It owns the same versatility as the`elastic.net`

function regarding the*l2*norm, yet the*l1*-norm is replaced by the infinity norm. Check`demo(quad_breg)`

and examples.

The problem commonly solved for these two functions writes

`elastic.net`

and
`bounded.reg`

. The diagonal
matrix The S4 objects produced by the fitting procedures own the
classical methods for linear model in `R`

, as well as methods
for plotting, (double) cross-validation and for the stability
selection procedure of Meinshausen and Buhlmann (2010).

All the examples of this documentation have been included to the package source, in the 'examples' directory. Some (too few!) routine testing scripts using the testhat package are also present in the 'tests' directory, where we check basic functionalities of the code, especially the reproducibility of the Lasso/Elastic-net solution path with the lars, elasticnet and glmnet packages. We also check the handling of runtime errors or unstabilities.

The general strategy of the algorithm relies on maintaining an
active set of variables, starting from a vector of zeros. The
underlying optimization problem is solved only on the activated
variables, thus handling with small smooth problems with
increasing size. Hence, by considering a decreasing grid of values
for the penalty *lambda1* and fixing
*lambda2*, we may explore the whole path of
solutions at a reasonable numerical cost, providing that
*lambda1* does not end up too small.

For the *l1*-based methods (available in the
`elastic.net`

function), the size of the underlying problems
solved is related to the number of nonzero coefficients in the
vector of parameters. With the *infinity*-norm,
(available in the `boundary.reg`

function), we do not produce
sparse estimator. Nevertheless, the size of the systems solved
along the path deals with the number of unbounded variables for
the current penalty level, which is quite smaller than the number
of predictors for a reasonable *lambda1*. The same
kind of proposal was made in Zhao, Rocha and Yu (2009).

Underlying optimization is performed by direct resolution of
quadratic sub problems, which is the main purpose of this
package. This strategy is thoroughly exposed in Grandvalet,
Chiquet and Ambroise (submitted). Still, we also implemented the
popular and versatile proximal (FISTA) approaches for routine
checks and numerical comparisons. A coordinate descent approach is
also included, yet only for the `elastic.net`

fitting
procedure.

The default setting uses the quadratic approach that gives its
name to the package. It has been optimized to be the method of
choice for small and medium scale problems, and produce very
accurate solutions. However, the first order methods (coordinate
descent and FISTA) can be interesting in situations where the
problem is close to singular, in which case the Cholesky
decomposition used in the quadratic solver can be computationally
unstable. Though it is extremely unlikely for
`elastic.net`

– and if so, we encourage the user to
send us back any report of such an event –, this happens at times
with `bounded.reg`

. Regarding this issue, we let the
possibility for the user to run the optimization of the
`bounded.reg`

criterion in a (hopefully) 'bulletproof'
mode: using mainly the fast and accurate quadratic approach, it
switches to the slower but more robust proximal resolution when
unstability is detected.

Most of the numerical work is done in C++, relying on the RcppArmadillo package. We also provide a (double) cross-validation procedure and functions for stability selection, both using the multi-core capability of the computer, through the parallel package. This feature is not available for Windows user, though. Finally, note that the plot methods enjoy some (still very few) of the capabilities of the ggplot2 package.

We hope to enrich quadrupen with other popular fitting procedures and develop other statistical tools, particularly towards bootstrapping and model selection purpose. Sparse matrix encoding is partially supported at the moment, and will hopefully be thoroughly available in the future, thanks to upcoming updates of the great RcppArmadillo package.

Julien Chiquet [email protected]

Yves Grandvalet, Julien Chiquet and Christophe Ambroise, Sparsity by Worst-case Quadratic Penalties, arXiv preprint, 2012.

Nicolas Meinshausen and Peter Buhlmann. Stability Selection, JRSS(B), 2010.

Martin Slawski, Wolfgang zu Castell, and Gerhard Tutz. Feature selection guided by structural information, AOAS, 2010.

Peng Zhao, Guillerme Rocha and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, 2009.

Hui Zou. The Adaptive Lasso and Its Oracle Properties, JASA, 2006.

Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net, JRSS(B), 2006.

Robert Tibshirani. Regression Shrinkage and Selection via the Lasso, JRSS(B), 1996.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.