This package is designed to fit accurately several popular penalized linear regression models using the algorithm proposed in Grandvalet, Chiquet and Ambroise (submitted) by solving quadratic problems with increasing size.
At the moment, two
R fitting functions are available:
elastic.net function, which solves a family of
linear regression problems penalized by a mixture of
l1 and l2 norms. It notably includes
the LASSO (Tibshirani, 1996), the adaptive-LASSO (Zou, 2006), the
Elastic-net (Zou and Hastie, 2006) or the Structured Elastic-net
(Slawski et al., 2010). See examples as well as the available
bounded.reg function, which fits a linear model
penalized by a mixture of infinity and
l2 norms. It owns the same versatility as the
elastic.net function regarding the l2 norm,
yet the l1-norm is replaced by the infinity
demo(quad_breg) and examples.
The problem commonly solved for these two functions writes
elastic.netand q=infinity for
bounded.reg. The diagonal matrix D allows different weights for the first part of the penalty. The structuring matrix S can be used to introduce some prior information regarding the predictors. It is provided via a positive semidefinite matrix.
The S4 objects produced by the fitting procedures own the
classical methods for linear model in
R, as well as methods
for plotting, (double) cross-validation and for the stability
selection procedure of Meinshausen and Buhlmann (2010).
All the examples of this documentation have been included to the package source, in the 'examples' directory. Some (too few!) routine testing scripts using the testhat package are also present in the 'tests' directory, where we check basic functionalities of the code, especially the reproducibility of the Lasso/Elastic-net solution path with the lars, elasticnet and glmnet packages. We also check the handling of runtime errors or unstabilities.
The general strategy of the algorithm relies on maintaining an active set of variables, starting from a vector of zeros. The underlying optimization problem is solved only on the activated variables, thus handling with small smooth problems with increasing size. Hence, by considering a decreasing grid of values for the penalty lambda1 and fixing lambda2, we may explore the whole path of solutions at a reasonable numerical cost, providing that lambda1 does not end up too small.
For the l1-based methods (available in the
elastic.net function), the size of the underlying problems
solved is related to the number of nonzero coefficients in the
vector of parameters. With the infinity-norm,
(available in the
boundary.reg function), we do not produce
sparse estimator. Nevertheless, the size of the systems solved
along the path deals with the number of unbounded variables for
the current penalty level, which is quite smaller than the number
of predictors for a reasonable lambda1. The same
kind of proposal was made in Zhao, Rocha and Yu (2009).
Underlying optimization is performed by direct resolution of
quadratic sub problems, which is the main purpose of this
package. This strategy is thoroughly exposed in Grandvalet,
Chiquet and Ambroise (submitted). Still, we also implemented the
popular and versatile proximal (FISTA) approaches for routine
checks and numerical comparisons. A coordinate descent approach is
also included, yet only for the
The default setting uses the quadratic approach that gives its
name to the package. It has been optimized to be the method of
choice for small and medium scale problems, and produce very
accurate solutions. However, the first order methods (coordinate
descent and FISTA) can be interesting in situations where the
problem is close to singular, in which case the Cholesky
decomposition used in the quadratic solver can be computationally
unstable. Though it is extremely unlikely for
elastic.net – and if so, we encourage the user to
send us back any report of such an event –, this happens at times
bounded.reg. Regarding this issue, we let the
possibility for the user to run the optimization of the
bounded.reg criterion in a (hopefully) 'bulletproof'
mode: using mainly the fast and accurate quadratic approach, it
switches to the slower but more robust proximal resolution when
unstability is detected.
Most of the numerical work is done in C++, relying on the RcppArmadillo package. We also provide a (double) cross-validation procedure and functions for stability selection, both using the multi-core capability of the computer, through the parallel package. This feature is not available for Windows user, though. Finally, note that the plot methods enjoy some (still very few) of the capabilities of the ggplot2 package.
We hope to enrich quadrupen with other popular fitting procedures and develop other statistical tools, particularly towards bootstrapping and model selection purpose. Sparse matrix encoding is partially supported at the moment, and will hopefully be thoroughly available in the future, thanks to upcoming updates of the great RcppArmadillo package.
Julien Chiquet [email protected]
Yves Grandvalet, Julien Chiquet and Christophe Ambroise, Sparsity by Worst-case Quadratic Penalties, arXiv preprint, 2012.
Nicolas Meinshausen and Peter Buhlmann. Stability Selection, JRSS(B), 2010.
Martin Slawski, Wolfgang zu Castell, and Gerhard Tutz. Feature selection guided by structural information, AOAS, 2010.
Peng Zhao, Guillerme Rocha and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, 2009.
Hui Zou. The Adaptive Lasso and Its Oracle Properties, JASA, 2006.
Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net, JRSS(B), 2006.
Robert Tibshirani. Regression Shrinkage and Selection via the Lasso, JRSS(B), 1996.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.