seagull: Mixed model fitting with lasso, group lasso, or sparse-group...
In seagull: Lasso, Group Lasso, and Sparse-Group Lasso for Mixed Models

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/seagull.R

Fit a mixed model with lasso, (fitted) group lasso, or (fitted) sparse-group lasso via proximal gradient descent. As this is an iterative algorithm, the step size for each iteration is determined via backtracking line search. The only exception to this is the fitted sparse-group lasso. Its step size needs determination prior to the call. A grid search for the regularization parameter λ is performed using warm starts. Depending on the input parameters alpha and l2_fitted_values this function subsequently calls one of the five implemented lasso_variants. Input variables will be standardized if demanded by the user.

1
2
3

seagull(y, X, Z, weights_u, groups, alpha, standardize,
        l2_fitted_values, step_size, delta, rel_acc, max_lambda, xi,
        loops_lambda, max_iter, gamma_bls, trace_progress)

`y`	numeric vector of observations.
`X`	(optional) numeric design matrix relating y to fixed effects b.
`Z`	numeric design matrix relating y to random effects u.
`weights_u`	(optional) numeric vector of weights for the vector of random effects u. If no weights are passed by the user, all weights for u are set to `1`.
`groups`	(not required for the lasso) integer vector specifying which effect belongs to which group. If the lasso is called, this vector is without use and may remain empty. For group and sparse-group lasso, at least each random effect variable needs to be assigned to one group. If in this case also fixed effect variables are present in the model, but without assigned group values, then the same value will be assigned to all fixed effect variables automatically. Float or double input values will be truncated.
`alpha`	mixing parameter for the sparse-group lasso. Has to satisfy: 0 ≤ α ≤ 1. The penalty term looks as follows: α '"lasso penalty" + (1-α) * "group lasso penalty".* If `alpha=1`, the lasso is called. If `alpha=0`, the group lasso is called. Default value is `0.9`.
`standardize`	if `TRUE`, the input vector y will be centered, and each column of the input matrices X and Z will be centered and scaled. Additionally, a filter will be applied to X and Z, which filters columns with standard deviation less than `1.e-7`. Results will be transformed back and presented on the original scale of the problem. Default value is `FALSE`.
`l2_fitted_values`	if `TRUE`, the l2-penalty is not applied on the clustered vector u, but on the clustered vector of fitted values Z*u. Default value is `FALSE`.
`step_size`	is a parameter, which is only used if `l2_fitted_values = TRUE` and 0 < α < 1. As the fitted sparse-group lasso is solved via proximal-averaged gradient descent the exact impact of backtracking line search has not yet been investigated. Therefore, a fixed value for the step size between consecutive iterations needs to be provided. Default value is `0.1`.
`delta`	is a ridge-type parameter, which is only used if `l2_fitted_values = TRUE`. If for group l, the matrix t(Z^(l)) %*% Z^(l) is not invertible, delta is first squared and then added to its main diagonal . Default value is `1.0`.
`rel_acc`	(optional) relative accuracy of the solution to stop the algorithm for the current value of λ. The algorithm stops after iteration m, if: \|\|sol^{(m)} - sol^{(m-1)}\|\|_∞ < rel_acc \|\|sol1{(m-1)}\|\|_2.* Default value is `0.0001`.
`max_lambda`	(optional) maximum value for the penalty parameter. Default option is an integrated algorithm to calculate this value. This is the start value for the grid search of the penalty parameter λ. (More details about the integrated algorithms are available here: `lambda_max`.)
`xi`	(optional) multiplicative parameter to determine the minimum value of λ for the grid search, i.e. λ_{min} = ξ λ_{max}. Has to satisfy: 0 < ξ ≤ 1. If `xi=1`, only a single solution for λ = λ_{max}* is calculated. Default value is `0.01`.
`loops_lambda`	(optional) number of lambdas for the grid search between λ_{max} and ξ λ_{max}*. Loops are performed on a logarithmic grid. Float or double input values will be truncated. Default value is `50`.
`max_iter`	(optional) maximum number of iterations for each value of the penalty parameter λ. Determines the end of the calculation if the algorithm didn't converge according to `rel_acc` before. Float or double input values will be truncated. Default value is `1000`.
`gamma_bls`	(optional) multiplicative parameter to decrease the step size during backtracking line search. Has to satisfy: 0 < γ_{bls} < 1. Default value is `0.8`.
`trace_progress`	(optional) if `TRUE`, a message will occur on the screen after each finished loop of the λ grid. This is particularly useful for larger data sets. Default value is `FALSE`.

The underlying mixed model has the form:

y = X b + Z u + residual,

where b is the vector of fixed effects and u is the vector of random effects. The penalty of the sparse-group lasso (without additional weights for features) is then:

α λ ||u||_1 + (1 - α) λ ∑_l ω^G_l ||u^{(l)}||_2.

The variable ω^G_l is a weight for group l. If α = 1, this leads to the lasso. If α = 0, this leads to the group lasso.

The above penalty can be enhanced by introducing additional weights ω^F for features in the lasso penalty:

α λ ∑_j | ω^F_j u_j | + (1 - α) λ ∑_l ω^G_l ||u^{(l)}||_2.

The group weights ω^G_l are implemented as follows: ω^G_l = ∑_{j, j \in G} √{ω^F_{j}}. So, in the case that the weights for features in a particular group are all equal to one, this term collapses to the square root of the group size.

Furthermore, if instead of applying the l_2-norm on u^{(l)} but on the fitted values Z^{(l)} u^{(l)} two more algorithms may be called: either the fitted group lasso or the fitted sparse-group lasso.

In addition, the above penalty can be formally rewritten by including the fixed effects and assigning weights equal to zero to all these features. This is how the algorithms are implemented. Consequently, if a weight for any random effect is set to zero by the user, the function drops an error and the calculation will not start.

A list of estimates and parameters relevant for the computation:

intercept: estimate for the general intercept. Only present, if the parameter standardize was manually set to TRUE.
fixed_effects: estimates for the fixed effects b, if present in the model. Each row corresponds to a particular value of λ.
random_effects: predictions for the random effects u. Each row corresponds to a particular value of λ.
lambda: all values for λ which were used during the grid search.
iterations: a sequence of actual iterations for each value of λ. If an occurring number is equal to max_iter, then the algorithm most likely did not converge to rel_acc during the corresponding run of the grid search.

The following parameters are also returned. But primarily for the purpose of comparison and repetition: alpha (only for the sparse-group lasso), max_iter, gamma_bls, xi, and loops_lambda.

Jan Klosa

Simon, N., Friedman, J., Hastie T., and Tibshirani, R. (2011): A Sparse-Group Lasso, http://faculty.washington.edu/nrsimon/SGLpaper.pdf, Journal of Computational and Graphical Statistics, Vol. 22(2), 231-245, April 2013

plot

set.seed(62)
n <- 50  ## observations
p <- 8   ## variables

## Create a design matrix X for fixed effects to model the intercept:
X <- matrix(1, nrow = n, ncol = 1)

## Create a design matrix Z for random effects:
Z <- matrix(rnorm(p * n, mean = 0, sd = 1), nrow = n)

## Intercept b, random effect vector u, and response y:
b <- 0.2
u <- c(0, 1.5, 0, 0.5, 0, 0, -2, 1)
y <- X%*%b + Z%*%u + rnorm(n, mean = 0, sd = 1)

## Create a vector of three groups corresponding to vector u:
group_indices <- c(1L, 2L, 2L, 2L, 1L, 1L, 3L, 1L)

## Calculate the solution:
fit_l   <- seagull(y = y, X = X, Z = Z, alpha = 1.0)
fit_gl  <- seagull(y = y, X = X, Z = Z, alpha = 0.0, groups = group_indices)
fit_sgl <- seagull(y = y, X = X, Z = Z, groups = group_indices)

## Combine the estimates for fixed and random effects:
estimates_l   <- cbind(fit_l$fixed_effects, fit_l$random_effects)
estimates_gl  <- cbind(fit_gl$fixed_effects, fit_gl$random_effects)
estimates_sgl <- cbind(fit_sgl$fixed_effects, fit_sgl$random_effects)
true_effects  <- c(b, u)

## Calculate mean squared errors along the solution paths:
MSE_l   <- rep(as.numeric(NA), fit_l$loops_lambda)
MSE_gl  <- rep(as.numeric(NA), fit_l$loops_lambda)
MSE_sgl <- rep(as.numeric(NA), fit_l$loops_lambda)

for (i in 1:fit_l$loops_lambda) {
  MSE_l[i] <- t(estimates_l[i,] - true_effects)%*%(estimates_l[i,] - true_effects)/(p+1)
  MSE_gl[i] <- t(estimates_gl[i,] - true_effects)%*%(estimates_gl[i,] - true_effects)/(p+1)
  MSE_sgl[i] <- t(estimates_sgl[i,] - true_effects)%*%(estimates_sgl[i,] - true_effects)/(p+1)
}

## Plot a fraction of the results of the MSEs:
plot(x = seq(1, fit_l$loops_lambda, 1)[25:50], MSE_l[25:50], type = "l", lwd = 2)
points(x = seq(1, fit_l$loops_lambda, 1)[25:50], MSE_gl[25:50], type = "l", lwd = 2, col = "blue")
points(x = seq(1, fit_l$loops_lambda, 1)[25:50], MSE_sgl[25:50], type = "l", lwd = 2, col = "red")

## A larger example with simulated genetic data:
data(seagull_data)

fit_l1 <- seagull(y = phenotypes[,1], Z = genotypes, alpha = 1.0)
fit_l2 <- seagull(y = phenotypes[,2], Z = genotypes, alpha = 1.0)
fit_l3 <- seagull(y = phenotypes[,3], Z = genotypes, alpha = 1.0)

fit_gl1 <- seagull(y = phenotypes[,1], Z = genotypes, alpha = 0.0,
           groups = groups)
fit_gl2 <- seagull(y = phenotypes[,2], Z = genotypes, alpha = 0.0,
           groups = groups)
fit_gl3 <- seagull(y = phenotypes[,3], Z = genotypes, alpha = 0.0,
           groups = groups)

fit_sgl1 <- seagull(y = phenotypes[,1], Z = genotypes, groups = groups)
fit_sgl2 <- seagull(y = phenotypes[,2], Z = genotypes, groups = groups)
fit_sgl3 <- seagull(y = phenotypes[,3], Z = genotypes, groups = groups)