quadrupen-package | R Documentation |
Fits classical sparse regression models with efficient active set algorithms by solving quadratic problems as described by Grandvalet, Chiquet and Ambroise (2017) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.1210.2077")}. Also provides a few methods for model selection purpose (cross-validation, stability selection).
This package is designed to fit accurately several popular penalized linear regression models using the algorithm proposed in Grandvalet, Chiquet and Ambroise (submitted) by solving quadratic problems with increasing size.
At the moment, two R
fitting functions are available:
the elastic.net
function, which solves a family of
linear regression problems penalized by a mixture of
\ell_1
and \ell_2
norms. It notably includes
the LASSO (Tibshirani, 1996), the adaptive-LASSO (Zou, 2006), the
Elastic-net (Zou and Hastie, 2006) or the Structured Elastic-net
(Slawski et al., 2010). See examples as well as the available
demo(quad_enet)
.
the bounded.reg
function, which fits a linear model
penalized by a mixture of \ell_\infty
and
\ell_2
norms. It owns the same versatility as the
elastic.net
function regarding the \ell_2
norm,
yet the \ell_1
-norm is replaced by the infinity
norm. Check demo(quad_breg)
and examples.
The problem commonly solved for these two functions writes
βhatλ1,λ2 = argminβ 1/2 RSS(β) + λ1 | D β |q + λ/2 2βT S β,
where
q=1
for elastic.net
and
q=\infty
for bounded.reg
. The diagonal
matrix D
allows different weights for the first part of
the penalty. The structuring matrix S
can be used to
introduce some prior information regarding the predictors. It is
provided via a positive semidefinite matrix.
The S4 objects produced by the fitting procedures own the
classical methods for linear model in R
, as well as methods
for plotting, (double) cross-validation and for the stability
selection procedure of Meinshausen and Buhlmann (2010).
All the examples of this documentation have been included to the package source, in the 'examples' directory. Some (too few!) routine testing scripts using the testhat package are also present in the 'tests' directory, where we check basic functionalities of the code, especially the reproducibility of the Lasso/Elastic-net solution path with the lars, elasticnet and glmnet packages. We also check the handling of runtime errors or unstabilities.
The general strategy of the algorithm relies on maintaining an
active set of variables, starting from a vector of zeros. The
underlying optimization problem is solved only on the activated
variables, thus handling with small smooth problems with
increasing size. Hence, by considering a decreasing grid of values
for the penalty \lambda_1
and fixing
\lambda_2
, we may explore the whole path of
solutions at a reasonable numerical cost, providing that
\lambda_1
does not end up too small.
For the \ell_1
-based methods (available in the
elastic.net
function), the size of the underlying problems
solved is related to the number of nonzero coefficients in the
vector of parameters. With the \ell_\infty
-norm,
(available in the boundary.reg
function), we do not produce
sparse estimator. Nevertheless, the size of the systems solved
along the path deals with the number of unbounded variables for
the current penalty level, which is quite smaller than the number
of predictors for a reasonable \lambda_1
. The same
kind of proposal was made in Zhao, Rocha and Yu (2009).
Underlying optimization is performed by direct resolution of
quadratic sub problems, which is the main purpose of this
package. This strategy is thoroughly exposed in Grandvalet,
Chiquet and Ambroise (submitted). Still, we also implemented the
popular and versatile proximal (FISTA) approaches for routine
checks and numerical comparisons. A coordinate descent approach is
also included, yet only for the elastic.net
fitting
procedure.
The default setting uses the quadratic approach that gives its
name to the package. It has been optimized to be the method of
choice for small and medium scale problems, and produce very
accurate solutions. However, the first order methods (coordinate
descent and FISTA) can be interesting in situations where the
problem is close to singular, in which case the Cholesky
decomposition used in the quadratic solver can be computationally
unstable. Though it is extremely unlikely for
elastic.net
– and if so, we encourage the user to
send us back any report of such an event –, this happens at times
with bounded.reg
. Regarding this issue, we let the
possibility for the user to run the optimization of the
bounded.reg
criterion in a (hopefully) 'bulletproof'
mode: using mainly the fast and accurate quadratic approach, it
switches to the slower but more robust proximal resolution when
unstability is detected.
Most of the numerical work is done in C++, relying on the RcppArmadillo package. We also provide a (double) cross-validation procedure and functions for stability selection, both using the multi-core capability of the computer, through the parallel package. This feature is not available for Windows user, though. Finally, note that the plot methods enjoy some (still very few) of the capabilities of the ggplot2 package.
We hope to enrich quadrupen with other popular fitting procedures and develop other statistical tools, particularly towards bootstrapping and model selection purpose. Sparse matrix encoding is partially supported at the moment, and will hopefully be thoroughly available in the future, thanks to upcoming updates of the great RcppArmadillo package.
Maintainer: Julien Chiquet julien.chiquet@inrae.fr (ORCID)
Julien Chiquet julien.chiquet@inrae.fr
Yves Grandvalet, Julien Chiquet and Christophe Ambroise, Sparsity by Worst-case Quadratic Penalties, arXiv preprint, 2012.
Nicolas Meinshausen and Peter Buhlmann. Stability Selection, JRSS(B), 2010.
Martin Slawski, Wolfgang zu Castell, and Gerhard Tutz. Feature selection guided by structural information, AOAS, 2010.
Peng Zhao, Guillerme Rocha and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, 2009.
Hui Zou. The Adaptive Lasso and Its Oracle Properties, JASA, 2006.
Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net, JRSS(B), 2006.
Robert Tibshirani. Regression Shrinkage and Selection via the Lasso, JRSS(B), 1996.
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.