knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, fig.align = "center")
glmnet is an R package for fitting the lasso and elastic-net
regularization paths of generalized linear models and related
regression problems. It has been on CRAN since 2008. Over the
intervening years it has grown from a single coordinate-descent
Fortran kernel for Gaussian, binomial, multinomial, Poisson, and Cox
responses into a layered system in which any GLM family can be
fitted, in which large-scale genomic problems are tractable, and in
which every numerical kernel is now written in modern C++. The
package reached its current shape in identifiable steps, each
due to contributions from several people.
This vignette is a history of those steps and an acknowledgement of
the contributions. While some historical information is in NEWS.md,
it misses the details. We hope that this document goes beyond a
listing of enhancements and bug fixes to explain why the package has
its current structure, and attributes contributions accurately with
references to published work where appropriate.
The statistical backbone of glmnet is older than the package. The
lasso was introduced in @lasso and the elastic-net penalty, which
combines $\ell_1$ and $\ell_2$ regularization and smooths the variable
selection behavior of the lasso in the presence of correlated
features, was introduced in @elasticnet. The algorithmic insight,
worked out in @pathwise, that cyclical coordinate descent is
extraordinarily effective when combined with warm starts along a
decreasing sequence of regularization parameters $\lambda_1 >
\lambda_2 > \cdots > \lambda_K$ was implemented in its CRAN debut
(version 1.1-1). The path is computed once; each $\lambda_k$ solution
starts from the previous, and both the set of active variables and the
number of inner passes stay small.
Coordinate descent has a long history, and our team was not the first
to use it in conjunction with the lasso. Tibshirani's Ph.D student
Wenjiang Fu at U. Toronto invented a version of coordinate descent
called the shooting algorithm in 1997. Ingrid Daubechies used
coordinate descent for $\ell_1$ image deblurring in a talk at Stanford
in 2002. In her Ph.D. thesis with Jacquie Meulman at U. Leiden in
2006, Anita van der Kooij used coordinate descent in fitting elastic
net problems. Others have used it is well. The pathwise use of
coordinate descent in glmnet leads to great efficiency, and allows
for a fairly seamless transition to different loss functions.
The canonical reference for the package itself is
@glmnet, published in the Journal of Statistical Software the year
after the initial release. That paper covers Gaussian, two-class
logistic, multinomial, and Poisson regression. The Cox
proportional-hazards model for right-censored survival data was
added in version 1.2 (2010) by Noah Simon, then a PhD student at
Stanford, and documented in @coxnet. At that point the available
families were exactly those exposed through the family argument
today: "gaussian", "binomial", "multinomial", "poisson",
"mgaussian", and "cox". Each one had its own Fortran subroutine,
and the dispatch in glmnet.R was a chain of if/else branches
keyed on the family string:
{width=88%}
A word on the implementation language is warranted, because it
explains most of what came later. The Fortran sources were not written
by hand in Fortran but generated from Mortran, a macro language
layered over Fortran 77. In the spirit of open source, the full
Mortran sources live in inst/mortran/ and were maintained by Jerome
Friedman for over a decade. The combination was extremely fast but it
was difficult to extend and hard for newcomers to read. Every new
feature that touched the inner loop had to be expressed in Mortran or
directly in Fortran, regenerated to Fortran, and then debugged under
the limitations of that toolchain. Furthermore, Fortran was dropping
in popularity and C/C++ was being used more and more for writing fast
code to interface with R. So too with glmnet.
The years between the first JSS paper and the 2019 v3.0 release were
spent in steady, broadening of what the package could accept as input
and what it could do with it. We summarize the milestones
thematically; the version numbers come from NEWS.md and CRAN
archive.
par(mar = c(0.4, 0.4, 0.4, 0.4)) milestones <- data.frame( ver = c("Initial release (2008)", "1.2 (2010)", "1.6 (2011)", "2.0-1 (2015)", "3.0 (2019)", "4.0 (2020)", "4.1 (2021)", "4.1-3 (2021)", "5.0 (2026)"), lbl = c("coordinate descent;\nFortran kernel\n(Friedman, Hastie,\nTibshirani)", "coxnet\n(Simon)", "strong rules screen\n(Tibshirani et al.)", "CV rewrite\n(Hastie)", "relax; assess;\nbigGlm\n(Hastie,\nNarasimhan)", "GLM extension\n(Tay, Hastie,\nNarasimhan)", "(start, stop] Cox;\nstrata; sparse Cox\n(Tay, Narasimhan,\nHastie)", "Fortran to C++\n(Yang)", "coxdev submodules;\nCox unified\n(Taylor,\nNarasimhan, Hastie)") ) n <- nrow(milestones); ntop <- 5; nbot <- n - ntop top_y <- 1.4; bot_y <- -1.4 col_x <- seq_len(ntop) px <- numeric(n); py <- numeric(n) px[1:ntop] <- col_x; py[1:ntop] <- top_y px[(ntop+1):n] <- rev(col_x[(ntop-nbot+1):ntop]); py[(ntop+1):n] <- bot_y plot.new(); plot.window(xlim = c(0.4, ntop + 0.45), ylim = c(-2.3, 2.3)) seg <- function(x0, y0, x1, y1) segments(x0, y0, x1, y1, lwd = 1.4, col = "grey35") arr <- function(x0, y0, x1, y1) arrows(x0, y0, x1, y1, length = 0.09, angle = 25, lwd = 1.4, col = "grey35") pad <- 0.18 # top row: left -> right for (i in 1:(ntop-1)) arr(px[i] + pad, py[i], px[i+1] - pad, py[i+1]) # fold down at right: tiny side step to clear labels on the shared column fold_x <- ntop + 0.3 seg(px[ntop] + pad, py[ntop], fold_x, py[ntop]) # right from last top node seg(fold_x, py[ntop], fold_x, py[ntop+1]) # down along margin arr(fold_x, py[ntop+1], px[ntop+1] + pad, py[ntop+1]) # left into first bottom node # bottom row: right -> left for (i in (ntop+1):(n-1)) arr(px[i] - pad, py[i], px[i+1] + pad, py[i+1]) points(px, py, pch = 21, bg = "grey90", col = "grey20", cex = 2.0, lwd = 1.3) text(px, py, labels = seq_len(n), cex = 0.72, col = "grey15", font = 2) # version labels: above top row, below bottom row text(px[1:ntop], py[1:ntop] + 0.28, milestones$ver[1:ntop], cex = 0.72, col = "grey25", pos = 3, offset = 0) text(px[(ntop+1):n], py[(ntop+1):n] - 0.28, milestones$ver[(ntop+1):n], cex = 0.72, col = "grey25", pos = 1, offset = 0) # description labels: below top row, above bottom row (interior) text(px[1:ntop], py[1:ntop] - 0.28, milestones$lbl[1:ntop], cex = 0.66, pos = 1, offset = 0) text(px[(ntop+1):n], py[(ntop+1):n] + 0.28, milestones$lbl[(ntop+1):n], cex = 0.66, pos = 3, offset = 0)
More responses. The mgaussian family for multivariate Gaussian
regression with a grouped-lasso penalty, and a grouped option for
the multinomial family, arrived in v1.8 (2012). Sparse x via
Matrix::dgCMatrix landed even earlier, in v1.6 (2011). This was a
major step making it possible to use glmnet with large, sparse
design matrices.
Strong rules. Version 1.6 (April 2011) also introduced strong
rules screening [@strongrules]: at each $\lambda$ along the path a
cheap inner-product check predicts which variables are likely to
stay inactive, and the coordinate-descent sweep skips them, with a
KKT-condition re-check afterwards to catch any violations. For
sparse problems this screens out the vast majority of predictors and
was a step-change in speed for large $p$. Like several other glmnet
milestones, the code landed in CRAN a year before the paper appeared.
Coefficient constraints. Upper and lower limits on individual
coefficients, together with glmnet.control() for tuning numerical
parameters, appeared in v1.9-1 (2013). The ability to fix a
coefficient in an interval --- or pin it to zero from outside via
exclude --- is now routinely relied upon by downstream packages.
Cross-validation maturity. The v2.0-1 (2015) release rewrote
cv.glmnet: each fold fits on its own $\lambda$ sequence and is then
evaluated at the original path. The alignment subtlety that motivated
this --- paths from different folds do not in general hit the same
$\lambda$ values --- was subsequently addressed again in v2.0-18
(2019) via the alignment argument.
Cox refinements. A bug affecting ties between the death set and
the risk set was fixed in v2.0-19, and coxgrad and an improved
coxnet.deviance were added in v2.0-20. These changes prepared the
ground for the larger Cox work that followed in v4.1.
Fortran maintenance. Substantial effort was spent in making the
Mortran generated Fortran build cleanly on every CRAN platform:
portable on every Fortran compiler (v2.0-15),
double-precision-consistent and -Wall-clean (v2.0-16, 2018), native
Fortran registration (v2.0-8, 2017). These changes were behind the
scenes, less visible to users, but essential for passing CRAN checks
for another decade.
Version 3.0, released in November 2019, was the first release in
which the feature set grew faster than the implementation underneath
it. The headline additions, contributed by Trevor Hastie and
Balasubramanian Narasimhan, were: the relax argument [@best_subset] to both
glmnet and cv.glmnet, which refits each active set in the path
without regularization and blends the two fits through a gamma
parameter (a dedicated relax vignette documents the usage);
assess.glmnet, roc.glmnet, and confusion.glmnet for evaluating
model performance; makeX for building input matrices from raw data
frames with one-hot encoding of factors and NA handling; and
bigGlm for unpenalized GLM fits using the same algorithmic
machinery. A trace.it progress bar made long paths less opaque,
and print methods for cv.glmnet and the new relaxed class were
tidied up.
Importantly, the family dispatch at this point was still entirely
character-based. A call glmnet(x, y, family = "binomial") passed
through an if branch into lognet.R, which called a Fortran
subroutine dedicated to logistic regression. Every family had its
own subroutine. That was about to change.
Before Version 4.0, each family reference such as "binomial" was
routed to a dedicated routine. Version 4.0 (May 2020) generalized the objective
$$ \sum_{i=1}^n w_i\,\text{NLL}i(\beta_0,\beta) + \lambda\,P\alpha(\beta), $$
where $\text{NLL}i$ is the negative log-likelihood contribution of
observation $i$ under any GLM family, and the elastic-net penalty is
$P\alpha(\beta) = \alpha\|\beta\|_1 +
\tfrac{1-\alpha}{2}\|\beta\|_2^2$. Similar to classical GLMs,
we use an iteratively reweighted lasso or elastic net algorithm at
each value of lambda (along with cyclical coordinate descent) to fit
this model, which is implemented pathwise down the sequence of
lambda values. That is, at each value of lambda, we implement the
proximal Newton via penalized iterative weighted lease squares
$$ \frac{1}{2n}\sum_{i=1}^n w_i \bigl(y_i - \beta_0 - x_i^\top\beta\bigr)^2 + \lambda\,P_\alpha(\beta). $$
This coordinate-descent kernel that serves this inner loop is
the weighted least squares routine wls, which becomes the engine
for every family the package does not handle with a dedicated
subroutine.
The programming was done by Kenneth Tay, then a PhD student at
Stanford, with supervision from
Hastie and Narasimhan.
The user-visible
payoff is that the family argument can now be an R family()
object --- any object of class "family" in the sense of stats::glm,
including MASS::negative.binomial, statmod::tweedie, or a
user-defined family. The published reference is @glmnetFlex. Note
that the paper appeared three years after the release; the software
led the publication.
{width=92%}
Two design decisions deserve explicit mention. First, the pre-v4
character-named families were retained and continue to go through
their dedicated subroutines, because a hand-written kernel will
always beat a generic IRLS loop. So the flexibility was added alongside the
old code, not on top of it. Second, the v4.0 warm-start convention
was tightened: the earliest form required passing a full glmnetfit
object, which was awkward; in v4.0.1 this was relaxed so a warm start
can be a simple list containing the intercept and coefficient vector.
Consistency with the pre-v4 numerical results --- verified through
KKT-condition checks against the Fortran output --- was the
correctness criterion throughout.
Version 4.1 (January 2021) brought the Cox model to something close
to feature parity with survival::coxph, which has served as the
reference implementation throughout glmnet's Cox history. Four
things were added in a single release:
survival has long
accepted.stratifySurv(), with the interpretation
of survival::coxph: a separate baseline hazard per stratum and
a single shared coefficient vector.dgCMatrix.survfit method for coxnet objects so that fitted survival
curves could be produced without leaving glmnet.Much of this work was also implemented by Tay, building on the v4.0 IRLS machinery
but targeting the Cox deviance specifically. In v4.1-2, following a
suggestion by Rob Tibshirani, the exclude argument was extended from
a fixed set of indices to a function that is handed x, y, and
weights and returns the indices to drop. This is the user-level hook
by which strong-rules screening can be plugged into the path; see
@strongrules and, for its application at GWAS scale, @snpnet.
During the summer of 2021, first year statistics Ph.D student James
Yang was looking for a fun activity after completing the grueling
qualiying exams. He decided to rewrite the fortran backbone in
glmnet in C++! This caught the attention of Hastie (who became his
advisor) and Narasimhan, because Yang is a very experienced
programmer. Between v4.1-3 (November 2021) and v4.1-6 (November
2022), Yang replaced essentially all of the Mortran/Fortran in the
package with a header-only C++ implementation built on Eigen. This
implementation, glmnetpp, is a single template kernel that absorbs
dense and sparse matrices and all the built-in families through
parametrization rather than code duplication.
The staging was deliberate. In v4.1-3 the weighted least squares
kernels (wls and its sparse counterpart) and the elnet family of
routines were ported first --- these were the engines that the new
glmnet.fit path exercised the most, and porting them delivered
roughly 4--8$\times$ speedups on the generic GLM path. In v4.1-4 the
rest of the Fortran was rewritten, leaving only the Cox subroutine.
In v4.1-6 the legacy Fortran that could be removed was removed.
{width=92%}
Several thousand lines of template C++ replaced several thousand lines
of Fortran, with comparable or better performance because arguments
were no longer copied---R version 3.1+ changed this behavior and dup
= FALSE had no effect. This provided a much lower barrier to future
extension, and removed the dependence on a generated-Fortran
toolchain. The Cox subroutine was left alone because its numerical
structure --- stratified risk sets, ties, left truncation --- is
harder to express cleanly as a template, and because James Yang's
priority was to get the rest of the package onto the new
footing. Resolving Cox would take another three years.
glmnet is also a piece of infrastructure on which other statistical
software is built, and some of those applications have exerted
pressure on the package's interface. We mention three briefly.
SNPnet [@snpnet] applies the lasso at the scale of modern
genome-wide association studies --- hundreds of thousands of samples
by hundreds of thousands of variants --- by aggressively screening
predictors using the strong rules of @strongrules. The work
motivated the v4.1-2 generalization of exclude to accept a
function, so that strong-rules screening can be plugged into the
standard glmnet call rather than requiring a fork.
The data-shared lasso [@datasharedlasso] uses glmnet as its
optimization backend to share information across related datasets
while retaining per-dataset coefficients, an idiom close to multi-task
learning.
Cooperative learning [@coopLearning] is a framework for
multi-view regression in which multiple data modalities on the same
subjects (genomics, proteomics, imaging, and so on) are jointly
regressed against an outcome under a regularization term that
penalizes disagreement between the views. The companion CRAN
package multiview dispatches its fits to glmnet.
On the eve of v5.0 the package was in an unusual state: every kernel except the Cox subroutine had been rewritten in C++. The Cox kernel had survived because of the combinatorial complexity its users expect of it --- tie handling, stratified risk sets, left truncation --- and because producing a numerically stable deviance, gradient, and Hessian in all of these cases in a single clean implementation is harder than it looks.
The v5.0 release resolves this. Jonathan Taylor joined the team, and contributed
coxdev, a standalone C++ library that computes the Cox partial
log-likelihood deviance together with its gradient and Hessian,
handling Breslow and Efron tie methods, strata, and $(start, stop]$
data through a small number of well-tested classes. coxdev is
separately developed (a nested git submodule of glmnetpp, which is
itself a submodule of glmnet), and the computational details are
the subject of a companion vignette. At the same time,
Narasimhan replaced the Fortran code for the Cox regularization path with
C++, making use of the coxdev library for the gradient and Hessian
computations, which automatically takes care of all the special cases.
From the point of
view of glmnet, the essential change is that Cox is now a
first-class GLM type inside glmnetpp --- glm_type::cox in the
type enumeration --- and the same IRLS loop that drives the
Gaussian, binomial, and Poisson paths drives the Cox path, with a
thin CoxDevAdapter handing the deviance and gradient queries off to
coxdev. The last of the Fortran leaves the package in this release.
{width=100%}
Two user-visible consequences of this unification are worth stating
plainly. First, the Cox model now takes a cox.ties argument
with choices "breslow" or "efron". The v5.0 default is
cox.ties = "breslow", preserving the pre-v5.0 numerical behavior for
upgraders; v5.1 will switch the default to "efron" to match
survival::coxph. Calls to glmnet() with family = "cox" that do
not set cox.ties explicitly emit a transition warning until the
switch occurs. The performance picture is favourable. On the benchmark set
in tests/benchmark_cox.R, dense Cox paths are roughly 3 times
faster than the legacy Fortran, sparse Cox paths roughly 90 times
faster, and stratified Cox paths --- which had been the most
penalized case in the old implementation --- roughly 300 times
faster. Full benchmark numbers appear in the NEWS entry for v5.0.
Development continues along several lines. A Python glmnet is
being prepared by Jonathan Taylor that shares a common C++ codebase with the R package,
so that the same kernels serve both languages. The shared codebase
opens the possibility of compiling glmnet to WebAssembly for
in-browser use. Upstream work in coxdev is aimed at reducing
per-call allocation in the IRLS loop, which is the last remaining
place where the Cox path pays a measurable overhead for its
extensibility.
glmnet is the work of many hands. Jerome Friedman wrote the
original coordinate-descent kernel and the Mortran infrastructure,
and is a co-author of @glmnet, @coxnet, and @pathwise. Trevor
Hastie has been co-author throughout, as well as the package
maintainer. He also wrote the bulk of the R interface code, as well as
methods for plotting, printing, predicting and cross-validation, and
the corresponding code for the relaxed models (with help from Narasimhan).
Robert Tibshirani introduced the lasso [@lasso] and co-authored the
strong-rules work on which the package depends. Noah
Simon led the original Cox path work and is the first author of
@coxnet. Kenneth Tay is responsible for the v4.0 GLM extension and
much of the v4.1 Cox expansion, and is first author of @glmnetFlex.
Balasubramanian Narasimhan has maintained the package through the
v4.x and v5.0 architectural transitions and is a co-author of
@glmnetFlex. James Yang carried out the v4.1-3 through v4.1-6
port from Fortran to C++ and is the author of glmnetpp. Junyang
Qian is first author of @snpnet, whose screening machinery
influenced the v4.1-2 exclude-as-function interface. Jonathan
Taylor contributed coxdev and with Narasimhan drove the v5.0 unification of the
Cox path onto the C++ engine. Many bug reports and fixes
contributed by users over the years are recorded in NEWS.md; the
work of Tomas Kalibera, David Keplinger, and others is gratefully
acknowledged.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.