glmnetpp
This is a documentation for the design choices of the C++ implementation of glmnet
original Fortran routines, glmnetpp
.
For context, glmnet5dpclean.m
(or glmnet5dpclean.f
, the generated Fortran code)
is the MORTRAN code that was the workhorse for glmnet
up until version 4.1-2
.
This code contains all the implementation of the IRLS algorithm for the various GLMs (Gaussian, Binomial, Poisson, etc.).
As seen in the MORTRAN code, the biggest drawback with the code is the difficulty in maintenance.
We believe that this backend code of glmnet
should arguably be the easiest to maintain, if anything,
as it is the most crucial part of the library.
Hence, the primary goal of this C++ implementation is to use better design principles
to ease maintainability and extendability (see Current Design for more detail).
We note in passing that in the process of converting the MORTRAN code, however, to our surprise,
we actually found very tangible speed-ups as well
(at least 2x speed-up for dense versions and around 20% speed-up for sparse versions).
The biggest drawback with the MORTRAN code is the large code duplication. All versions (for different GLMS) of the IRLS algorithm are essentially the same algorithm, yet the algorithm is copied and pasted as separate subroutines for each GLM and dense/sparse version, leading to about 16 copies in total. While this is tough for maintenance and future extendability, we should note that it may be more beneficial performance-wise. A refactored code would potentially abstract a common block into a function and reuse this function in the different versions, which would require a function call. If the Fortran compiler cannot inline this function, we would pay a large cost in performance if that function call were made frequently. However, a copy-paste would be one way to manually force inlining of the function. In C++, we have ways to strong-inline functions (force the compiler to inline a function if possible) (see macros.hpp) and we are unaware of similar capabilities of the Fortran compiler.
The second drawback of the MORTRAN code is that the linear algebra routines are not as optimized
as those provided by some matrix libraries in C++.
Our implementation relies on the widely-used, industry-standard
matrix library, Eigen.
We noticed that Eigen
allows for more vectorized code, leading to performance boosts.
From the above, the problem is clear: how do we write a refactored library where we reuse abstracted code as much as possible to remove code duplications, write cache-friendly code, and use more vectorized code without incurring the cost of function calls and generally avoiding costs that come from abstraction?
Let's dive into the problem a little bit more. At first glance, it may seem like a simple task to refactor the MORTRAN code, however, the abstraction is quite subtle. Usually, when we refactor, we have the following problem:
void foo1()
{
// some stuff..
// BEGIN COMMON BLOCK
// ...
// END COMMON BLOCK
// some stuff..
}
void foo2()
{
// some different stuff..
// BEGIN COMMON BLOCK
// ...
// END COMMON BLOCK
// some different stuff..
}
and we would like to abstract the common block into a separate function from two macroscopic functions like:
void common_block() { /*...*/ }
void foo1()
{
// some stuff..
common_block();
// some stuff..
}
void foo2()
{
// some different stuff..
common_block();
// some different stuff..
}
Indeed, we do see these kinds of problems where the common block could be the formula for gradient computation, residual updates, and beta updates.
This is, indeed, a simple task, however, this is not the only problem we see in the MORTRAN code. We also have the following kind of problem:
void foo(...)
{
// 1. set-up quantities needed for the elastic net path-solver
// 2. loop through each lambda in the regularization path
for (int m = 0; m < max_n_lambdas; ++m) {
// 3. compute the current lambda at iteration m
// 4. solve the elastic net problem at the current lambda
// 5. save the result into output
}
// 6. do any final computations after solving for the whole path
}
Essentially, there is a common macroscopic function foo
and some of the internal details (numbered steps) are different
depending on the GLM and dense/sparse storage of the data matrix.
Moreover, step 4 also has further common code such as the weighted-least-squares (WLS) algorithm of the form:
void wls(...)
{
while (1) {
double convergence_measure = 0.0;
// 1. iterate through each feature and do coordinate descent.
for (int k = 0; k < p; ++k) {
// update beta...
if (new_beta == old_beta) continue;
// update convergence_measure...
// update other quantities...
}
// 2. check for convergence
if (convergence_measure < tolerance) {
// check KKT...
if (kkt_passed) return;
continue;
}
// 3. do coordinate descent on only the active variables until convergence
}
}
All versions of the subroutines reuse this general algorithm.
Only the exact details of how we update beta
, convergence_measure
, and other quantities
are different
across the different versions of the algorithm.
In the next section, we explain our current design of the library and how it tries to solve this very problem.
In this section, we go over the current design of glmnetpp
.
First, we summarize the library structure at a high-level.
Then, we delve into each of the key components of the library.
Of course, we cannot explain every minute detail,
so we only explain the big key design choices.
We do not discuss the correctness of the implementation,
as we did not change the algorithm itself (except bug fixes).
The following file structure outlines the library structure:
glmnetpp
\_ include
\_ glmnetpp_bits
\_ elnet_driver
\_ elnet_path
\_ elnet_point
\_ internal
\_ util
\_ elnet_driver.hpp
\_ elnet_path.hpp
\_ elnet_point.hpp
\_ glmnetpp
\_ src
\_ legacy
Recall that glmnetpp
is a header-only template library,
so the whole library is defined in include
.
src
exists purely for legacy reasons to store the old Fortran code.
glmnetpp_bits
contains the bits and pieces of the library
and the full library is exported into the final header file glmnetpp
.
Inside glmnetpp_bits
, there are four major components:
elnet_driver.hpp
.elnet_path.hpp
.elnet_point.hpp
.Finally, util
contains some utility functions/macros that are not intended to be exported for direct use,
but rather as tools in the implementation of the library.
All drivers have very similar implementation details, so we only discuss one of the drivers, gaussian.hpp
.
These drivers mimic the Fortran functions such as elnet, lognet, fishnet
,
the highest-level functions that get exported to R
.
The class declaration is of the form
template <>
struct ElnetDriver<util::glm_type::gaussian>
: ElnetDriverBase
{
private:
static constexpr util::glm_type glm = util::glm_type::gaussian;
using mode_t = util::mode_type<glm>;
public:
template <...>
void fit(...);
};
that is, we have a single class template ElnetDriver
which takes in an enum class
value of type util::glm_type
defined in types.hpp.
The idea is that we only specialize this class template only
for particular GLMs,
since logically, there is no "default" definition of a driver
for a general GLM.
The same idea holds in elnet_path
and elnet_point
.
Note that for each glm_type
, there is an associated Mode
class
which defines flags to indicate further differences in the algorithm
for a given GLM.
As an example, we will see that Gaussian GLM
allows for two different algorithms: naive and covariance.
These flags along with dense/sparse storage type of the data matrix
uniquely define each of the different versions of IRLS algorithm.
The fit
function simply prepares the inputs
for the path solvers such as
standardizing features and normalizing weights if necessary.
Note that fit
is heavily templatized!
We could have easily written it non-templated and take in
specific Eigen
matrix objects, since that is the only use-case currently.
However, looking ahead, we plan to export glmnetpp
not just to R
but to other higher-level languages such as Python
and Matlab
.
It is not clear what objects we would need to pass to the fit
function
in that case, as these languages may not support bindings for Eigen
objects.
The only cost in templatizing is compile-time,
which is abundant for our use-case since we build this library once
when exporting in R
.
fit
also checks at compile time whether X
data matrix is dense or sparse.
Hence, at compile-time, we choose the correct version of the path solver.
The technical details of this "choosing" is slightly nuanced
since we must be able to support C++14 standard and do not have the convenient if constexpr
in C++17.
In gaussian.hpp
, you will see FitPathGaussian
class template in namespace details
,
which is a metaprogramming tool to choose the dense or sparse version.
The problem is that if we write something like:
template <...>
void fit(...)
{
constexpr bool is_dense = ...;
if (is_dense) {
call_dense_version(...);
} else {
call_sparse_version(...)
}
}
the compiler has to be able to generate the code for both sparse and dense versions
for the same given input.
For example, if X
were sparse, the compiler would still need to be able to generate code for
call_dense_version
with X
, which is not going to be case for us since
there are, naturally, operations that are not defined for sparse matrices
but are defined for dense matrices that get used in call_dense_version
(such as operator+=
).
Instead we add a layer of dispatch with FitPathGaussian
like
FitPathGaussian<is_dense>::eval(...);
such that the compiler will first choose the specialization of FitPathGaussian
depending on the boolean is_dense
and then generate the code for eval
only for that specialization.
ElnetPath
is a class template responsible for fitting an elastic net model for the full regularization path.
The declaration is as follows:
template <util::glm_type glm
, util::mode_type<glm> mode
, class ElnetPointPolicy=ElnetPoint<glm, mode> >
struct ElnetPath;
As before, glm
denotes the GLM enum value and mode
is the glm
corresponding mode value.
The third template parameter ElnetPointPolicy
is the type of the point-solver.
A priori, ElnetPath
does not depend on any specific implementation of a point-solver.
By default, we use our own implementation given in ElnetPoint<glm, mode>
,
but we may substitute that type with a different one, so long as it satisfies a certain interface
(for now, read the code to see how ElnetPoint
is used in ElnetPath
, TODO: maybe write-up an interface).
The structure is hopefully clear: the most derived classes are the header files that don't end with _base.hpp
and the sparse versions are prefixed with sp_
.
Each GLM has its own base class, and all such classes derive from the base class(es) in base.hpp
.
Curiously Recurring Template Pattern (CRTP)
is a design pattern where a base class template takes in the derived type as a template parameter
and casts itself to that derived type to access its (public) interface.
A derived class D
is expected to inherit from the CRTP base class B<D>
, where B
is the CRTP base class template.
As a result, for all derived classes D_i
that inherit from B<D_i>
, they all inherit from different instances of B
,
that is, B<D_i>
are all different class types.
So, if there are any common functions defined in the class template B
that do not actually depend on the derived type,
the compiler generates copies of them in every B<D_i>
, which leads to both increase in compile-time and code-bloat.
Only the code that depends on the derived class, i.e. those that must be differently generated per derived class,
should be defined inside B
.
The rest should be in a non-CRTP base class (like ElnetPathBase
) so that the compiler only generates one version of these member functions
for all derived classes.
CRTP is not necessary for our purposes, but it does encapsulate and abstract very well so it is convenient. All the CRTP base member functions could have been free function templates, which would've also reduced compile-time, but it does expose this function to the user technically, which isn't something we need to do. We viewed these members as implementation detail and wanted to encapsulate them in a class.
...Pack
nested class templates?All ElnetPath
classes define three "pack" class templates: FitPack, PathConfigPack, PointConfigPack
.
They all perform one job: pack a bunch of arguments into a single object.
FitPack
packs all the arguments from the user into a single object.PathConfigPack
: packs any configuration variables needed for the path-solver.PointConfigPack
: packs any configuration variables needed for the point-solver.Not surprisingly, these packs have an inheritance hierarchy as there are many shared quantities across the different versions. We decided to nest them rather than having them as separate classes because:
ElnetPath
-- they are structures that only have meaning in the context of ElnetPath
.ElnetPath
and its base class(es), so it is convenient to nest.But why create these packs in the first place?
Let's consider the FitPack
as an example.
All versions of the path solvers have different set of inputs.
For example, some take in an offset
parameter whereas some don't.
Some take in a residual vector r
where as others take instead a gradient vector g
.
And then there are some that take both r
and g
.
It would be a massive headache to find a super-set of these arguments
and write our path-solvers assuming this one big set of arguments for the following reasons:
Hence, we resolve this at compile-time by having each version define its own set of FitPack
,
containing only the arguments it needs.
We now have a generic interface where the path-solvers simply work with one FitPack
-like object.
See base.hpp to see how this generic fit
function works
in ElnetPathCRTPBase
.
Similarly, PathConfigPack
and PointConfigPack
define each version-specific set of values
that are needed for the path-solver and point-solver, respectively.
ElnetPoint
is a class template responsible for fitting an elastic net model for a fixed regularization parameter.
The declaration is as follows:
template <util::glm_type glm
, util::mode_type<glm> mode
, class ElnetPointInternalPolicy=ElnetPointInternal<glm, mode> >
struct ElnetPoint;
A priori, ElnetPoint
does not depend on the internal implementation for a point solver.
By default, we use our policy ElnetPointInternal
defined in elnet_point/internal,
but we may swap it for a different class so long as it satisfies the interface needed by ElnetPoint
(for now, read the code to see how ElnetPoint
interacts with ElnetPointInternal
.
Note that different specializations require a different interface).
It may seem like an over-generalization to have yet another policy for the internal implementation detail, but once we explain what we mean by "internal", the design will make more sense. As explained in Motivation, there are two types of abstraction problems:
ElnetPoint
aims to solve the second problem.
As an example, in base.hpp,
ElnetPointCRTPBase
contains some common routines used by all point solvers.
For example, the fit
function is really the coordinate-descent algorithm.
Note that fit
simply "manages" the smaller functions such as increment_passes
,
coord_desc_reset
, for_each_with_skip
, all_begin
, etc.
In particular, it defines the order and conditions under which these smaller functions are to be executed.
However, fit
does not rely on the implementation-specific detail for the smaller functions.
It is precisely these smaller functions that we refer to as "internal".
For each different GLM, we would obviously want a different internal policy
that defines each of these smaller functions,
but for future development as well, we may want to try different internal policies for the same GLM.
This design grants us this flexibility.
ElnetPointCRTPBase
inherit the internal policy?This was purely out of coding convenience. The internal policies have constructors that are very cumbersome to type out, so it was very convenient to simply inherit the policy and expose its constructor. Because this doesn't break any logic, we stuck with this design.
ElnetPointInternal
is a class template responsible for defining the internal details for each ElnetPoint
specialization.
The declaration is as follows:
template <util::glm_type g
, util::mode_type<g> mode
, class ValueType = double
, class IndexType = int
, class BoolType = bool>
struct ElnetPointInternal;
Currently, the only special behavior that occurs is when BoolType
is bool
.
Internally, we keep an inclusion vector that contains which features may be included in the model
and which are always forbidden from being included.
If BoolType
is bool
we use std::vector<bool>
to represent this vector,
as it is a more specialized and performant data structure.
Otherwise, we simply use an Eigen::Matrix<BoolType, -1, 1>
(column vector).
This class simply tries to solve the first problem in the
previous section:
generalizing the smaller common blocks.
These classes contain the meat of the algorithm.
They were written with the intention that the members would be called in a certain order
by ElnetPoint
classes, so when trying to read the code,
read it together with the corresponding ElnetPoint
.
Currently, the most derived classes really only contain logic related to the X
data matrix,
i.e. defining dense or sparse routines.
Otherwise, the two versions reuse the rest of the logic defined in their GLM-specific base class.
TODO: some work remains in abstracting the sparse logic across the different versions.
A lot of these base classes will reuse static members or further base class members. Hopefully, this sheds some light into how much was being duplicated in the original MORTRAN code!
ElnetPointInternalStaticBase
?For encapsulation purposes and to associate routines under a certain name,
we place commonly used routines as static member functions in ElnetPointInternalStaticBase
.
Other base classes may define more static members on top of these for the derived classes ElnetPointInternal
.
The point of these static members is mainly to define formulas once.
A very common source of bugs stems from copying and pasting a common block multiple times.
Once an edit must be made in one, we have to manually update the copied-and-pasted places as well.
Our approach removes this problem from ever happening.
See the following issues on GitHub regarding such bugs:
- Issue 39
- Issue 40
- Issue 41
- Issue 42
- Issue 43
The current design of glmnetpp
is by no means a perfect design.
In this section, we go over some other ideas we thought about in the process.
Another idea we thought about was to use policy-based design.
We noticed that in elnet_point/internal,
we're borrowing from multiple base class members sometimes (such as Gaussian multi-response and Binomial multi-response grouped).
It is possible to remove this intricate dependency by defining a set of policies for each ElnetPointInternal
.
We didn't go with this idea because it felt like the number of policies could get out-of-hand a priori.
In many places, we used CRTP and compile-time tricks to abstract away certain logic,
such as the Pack
objects in ElnetPack
.
However, one could have also used dynamic polymorphism without relying on template magic.
The big issue we saw was that dynamic polymorphism completely removes the benefit of inlining for us, which is detrimental!
Although it reduces binary size and compile-time,
once again, compile-time is abundant for us and code-bloat is not an issue
because this is not meant to be a generic library used by others for larger programs.
We only compile our export functions once using this library and create a dynamic library.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.