knitr::opts_chunk$set(fig.width = 10, fig.height = 5, echo = FALSE, collapse = TRUE, comment = "#>")
seagull
is a package that fits a mixed model via (fitted) sparse-group lasso
(SGL) regularization. Limiting cases of this penalty are the lasso and the
(fitted) group lasso, which are also available in this package. Solutions are
obtained via \textit{proximal gradient descent} on a grid of values for the
regularization parameter $\lambda$. The grid search for this value is
implemented using \textit{warm starts}. And since proximal gradient descent is
an iterative procedure, the step size between consecutive iterations is a
crucial parameter for convergence. In order to determine this step size,
\textit{backtracking line search} is implemented. The only exception to this is
fitted SGL, as it is implemented via proximal-averaged gradient descent. Thus,
the step size for this particular algorithm needs determination prior to the
call.
We assume the following underlying mixed model:
$$ y = X b + Z u + e, $$
where $b$ is a vector of \textit{fixed effects} and $u$ a vector of \textit{random effects}. The inclusion of fixed effects is not mandatory in order for the algorithms to work.
Let $n$, $p$, and $L$ be the number of observations, random effects, and groups,
respectively. seagull
solves the following SGL problem:
$$ \min_{(b, u)} \frac{1}{2n} \left\Vert y - X b - Z u \right\Vert^2_2 + \alpha \lambda \left\Vert \mathrm{diag} \left{ \omega^F_{1} , \ldots , \omega^F_{p} \right} u \right\Vert_1 + (1 - \alpha) \lambda \sum^L_{l=1} \omega^G_l \left\Vert u^{(l)} \right\Vert_2. $$
Each $\omega$ represents a weight, either for a particular feature ($F$) or group ($G$). The \textit{mixing parameter} $\alpha$ may range between 0 and 1. And as can be seen in the above formula, $\alpha = 1$ leads to the lasso, whereas $\alpha = 0$ leads to the group lasso. However, the fitted variants of the group lasso and SGL differ by substituting the expression $\left\Vert u^{(l)} \right\Vert_2$ with $\left\Vert Z^{(l)} u^{(l)} \right\Vert_2$. Where $Z^{(l)}$ are the columns of $Z$ which correspond to $u^{(l)}$. The expression $Z^{(l)} u^{(l)}$ is also known as \textit{fitted values}.
Once installed, the package shall be loaded in the common way. A simulated example data set is available and loaded just as easily:
library("seagull") data("seagull_data")
The data consists of a response variable $y$ named phenotypes
, a design
matrix $Z$ named genotypes
, and a variable called group
. $Z$ consists of
r dim(genotypes)[1]
observations and r dim(genotypes)[2]
explanatory
variables. These variables consist of genotype data from single nucleotide
polymorphism marker. The variable $y$ is another matrix, which harbors
information of r dim(phenotypes)[2]
traits. (One trait per column.) The
variable group
stores information about group assignments of each explanatory
variable.
To fit the SGL, we can just call:
fit_sgl1 <- seagull(y = phenotypes[, 1], Z = genotypes, groups = groups)
The created output is a list with the following attributes:
attributes(fit_sgl1)
The last attribute of this list (loops_lambda
) determines the number of
different values for the penalty parameter $\lambda$ that were used throughout
the grid search. Since we didn't specialize this value when calling the function
seagull
, it was set to its default, i.e. loops_lambda =
r fit_sgl1$loops_lambda
. This value determines the number of rows of the
variables random_effects
and iterations
(and also fixed_effects
if present). So, if we want to visualize the results from a certain $\lambda$
along the grid, say the very last solution, where $\lambda$ is at its smallest,
we could do this as follows:
last_solution <- fit_sgl1$loops_lambda plot(x = seq(1, dim(genotypes)[2], 1), y = fit_sgl1$random_effects[last_solution,], xlab = "position", ylab = "effect estimate", col = "gray80", pch = 16)
In total, there were r dim(genotypes)[2]
features to be estimated. We can
check the number of features of this last solution which remained to be exactly
equal to zero:
print(c("The number of ZEROS in the last solution is: ", length(which(fit_sgl1$random_effects[last_solution,]==0))))
As this is the result based on the smallest $\lambda$, the solution is noisy. (By that we mean: The number of zeros among the estimates is low compared to the total number of estimates.) We will compare this to a solution that was obtained with a larger value:
last_solution <- fit_sgl1$loops_lambda plot(x = seq(1, dim(genotypes)[2], 1), y = fit_sgl1$random_effects[last_solution,], xlab = "position", ylab = "effect estimate", col = "gray80", pch = 16) points(x = seq(1, dim(genotypes)[2], 1), y = fit_sgl1$random_effects[20,], pch = 16)
print(c("The number of ZEROS in the solution in line 20 is: ", length(which(fit_sgl1$random_effects[20,]==0))))
This is a significantly larger number of zeros among the estimates than what we saw previously. In fact, for this value of $\lambda$ the solution is sparse. Which we expect from a selection operator such as these lasso variants.
In this section we present a comprehensive list of input parameters: We will describe these variables and give additional comments about their use.
y
is a numeric vector of $n$ observations.
X
is an optional design matrix of dimension $n \times q$ which relates $y$
to fixed effects $b$.
Z
is a design matrix of dimension $n \times p$ which relates $y$ to random
effects $u$.
Note: The above mentioned variables will only be centered or standardized prior
to the calculations, if specified by the user via standardize = TRUE
(see below).
weights_u
is an optional vector of weights for the feature vector $u$. Its
length is $p$. These weights correspond to $\omega^F$ in the above SGL problem.
The default value is 1
for each feature. The group weights $\omega^G$ are
calculated as follows:
$$
\omega^G_l = \sqrt{\sum_{j \in > \mathrm{group} > l} \omega^F_j}.
$$
So, in the case of all $\omega^F$ being equal to one in a certain group, the
above expression collapses to the square root of the group size.
groups
is an integer vector of length $p$ or $q+p$. The entry is supposed to
be a code for a group, e.g., say the first entry is 4
, then the first feature
belongs to group 4
. This vector will remain unused for the lasso, so it
doesn't need to be provided. But it is mandatory for group lasso and SGL. If
fixed effects are incorporated, they may be assigned to a group. In this case,
the vector shall be of length $q+p$. If no group is assigned for fixed effects,
they will all automatically be assigned to the same group. In this case, or if
no fixed effects are present, the length of this vector shall be $p$. The
entries of this vector don't need to be in any kind of order.
alpha
is the optional mixing parameter for the SGL according to the above
formula. Its default is r fit_sgl1$alpha
.
standardize
is an optional parameter to specify whether the input data y
,
X
, and Z
should be standardized before the start of the algorithm. Here,
standardization means column-wise centering and scaling of the data, so that
afterwards each column has an empirical mean and standard deviation equal to 0
and 1, respectively. (Scaling is not performed on y
as it is mathematically
redundant.) Additionally, a filter will be applied to X
and Z
, which filters
columns with standard deviation less than $10^{-7}$. It is highly recommended to
set this parameter to TRUE
. However, default is FALSE
to ensure downwards
compatibility.
Note: If standardize = TRUE
, the results will be transformed back and
presented on the original scale of the problem.
l2_fitted_values
is an optional parameter to alter the models group lasso
and sparse-group lasso to their corresponding counterparts where fitted values
are used within the $l_2$-norm. The default value is FALSE
.
step_size
is a parameter, which is only used if l2_fitted_values = TRUE
and $0 < \alpha < 1$. As the fitted SGL is solved via proximal-averaged gradient
descent the exact impact of backtracking line search has not yet been
investigated. Therefore, a fixed value for the step size between consecutive
iterations needs to be provided. Default value is 0.1
.
delta
is a ridge-type parameter which becomes important only if
l2_fitted_values
is TRUE
. If for group $l$ the matrix $Z^{(l) T} Z^{(l)}$ is
not invertible, $\delta^2$ will be added to its main diagonal. Default value is
1.0
.
rel_acc
(or "$\varepsilon_{rel}$") is an optional parameter which reflects
the stopping criterion. Convergence is assumed after iteration $m$, if the
following inequality is fulfilled:
$$
\left\Vert \hat{ \genfrac(){0pt}{0}{b}{u} }^{[m]} - \hat{ \genfrac(){0pt}{0}{b}{u} }^{[m-1]} \right\Vert_{\infty} \le \varepsilon_{\mathrm{rel}} \left\Vert \hat{ \genfrac(){0pt}{0}{b}{u} }^{[m]} \right\Vert_2.
$$
Default is r fit_sgl1$rel_acc
. The smaller the value, the more iterations are
needed to find the solution, which means that also more time is needed for the
calculations.
max_lambda
is an optional value that sets the start value for the grid
search for the penalty parameter $\lambda$. There are algorithms implemented for
each lasso variant in order to determine an optimal value. These algorithms are
the default option. We don't recommend to set this value, as it will most likely
undermine the advantages of the warm starts procedure. For further details,
please type help("lambda_max")
.
xi
is an optional parameter for the determination of the minimal penalty
parameter "$\lambda_{min}$", i.e., $\lambda_{min} = \xi \lambda_{max}$. We
assume $\xi \in (0,1]$. Default value is r fit_sgl1$xi
.
loops_Lambda
is an optional, non-negative integer. It sets the number of
$\lambda$'s which shall be investigated along the grid search from
$\lambda_{max}$ to $\xi \lambda_{max}$. If xi = 1
, this value will
automatically be set to 1
and only a single solution will be computed for
$\lambda = \lambda_{max}$. Default is r fit_sgl1$loops_lambda
.
max_iter
is another optional, non_negative integer. It determines the
maximum number of iterations which is allowed in order to try to reach
convergence (according to rel_acc
). Default is r fit_sgl1$max_iter
.
gamma_bls
is an optional variable. Should satisfy $0 < \gamma_{bls} < 1$.
This parameter is related to the backtracking line search. Since proximal
gradient descent is an iterative algorithm, a proper step size ($t$) between
iterations needs to be determined. In order to find an admissible update for
$\hat{ \genfrac(){0pt}{1}{b}{u} }$ each iteration begins with $t = 1$. If the
condition for backtracking line search is satisfied, the update is admissible
and will thus be performed. If the condition is not satisfied, $t$ will be
decreased to $\gamma_{bls} t$ and the current iteration will restart. The
restart is performed until the condition is met. Default is
r fit_sgl1$gamma_bls
.
trace progress
is an optional, logical parameter. If TRUE
, a message will
show up to indicate the end of the calculations for each $\lambda$ along the
regularization path. This might come in handy for larger data sets. Default is
FALSE
.
We want to point out a correlation to another lasso variant, called \textit{Integrative lasso with Penalty Factors, or short IPF-LASSO}. If we go back to the implemented SGL problem and its limiting case for the lasso, i.e., $\alpha = 1$, we obtain:
$$ \min_{(b, u)} \frac{1}{2n} \left\Vert y - X b - Z u \right\Vert^2_2 + \lambda \left\Vert \mathrm{diag} \left{ \omega^F_{1} , \ldots , \omega^F_{p} \right} u \right\Vert_1. $$
If we suppose that no fixed effects are present in the model (i.e., $b=0$) and we multiply the entire expression by the factor $2n$, we get:
$$ \min_{(b, u)} \left\Vert y - Z u \right\Vert^2_2 + 2 n \lambda \left\Vert \mathrm{diag} \left{ \omega^F_{1} , \ldots , \omega^F_{p} \right} u \right\Vert_1, $$
or equivalently:
$$ \min_{(b, u)} \left\Vert y - Z u \right\Vert^2_2 + 2 n \lambda \sum^p_{j=1} \left\vert \omega^F_{j} u_j \right\vert. $$
The weights for features $\omega^F$ are assumed to be positive. So, we can simplify the last expression by introducing $\lambda_j = 2 n \lambda \omega^F_j$:
$$ \min_{(b, u)} \left\Vert y - Z u \right\Vert^2_2 + \sum^p_{j=1} \lambda_j \left\vert u_j \right\vert. $$
As a last step, we assume that $u$ is obtained from $M$ different sources ("modalities") and we let all $\lambda$'s which belong to the same modality have the same value. Then the last term can be written as a sum over modalities $m$:
$$ \sum^p_{j=1} \lambda_j \left\vert u_j \right\vert = \sum^M_{m=1} \lambda_m \left\Vert u^{(m)} \right\Vert_1. $$
And this immediately leads to the IPF-LASSO. So in the seagull
package, this
particular lasso variant is implicitly included for mixed models. The weights
for features $\omega^F$ (weights_u
) just need to be set accordingly, i.e.,
the same weight for features that belong to the same modality.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.