lasso_variants: Lasso, (fitted) group lasso, and (fitted) sparse-group lasso

Description Usage Arguments Value

Description

Fit a mixed model with lasso, group lasso, or sparse-group lasso via proximal gradient descent. As this is an iterative algorithm, the step size for each iteration is determined via backtracking line search. A grid search for the regularization parameter λ is performed using warm starts. The mixed model has the form:

y = X b + Z u + residual.

The penalty of the sparse-group lasso (without additional weights for features) is then:

α λ ||u||_1 + (1 - α) λ ∑_l ω^G_l ||u^{(l)}||_2.

If α = 1, this leads to the lasso. If α = 0, this leads to the group lasso. Furthermore, if instead of applying the l_2-norm on u^{(l)} but on the fitted values Z^{(l)} u^{(l)} two more algorithms may be called: either the fitted group lasso or the fitted sparse-group lasso.

Usage

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
seagull_fitted_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_WEIGHTS_GROUPSc,
  VECTOR_FULL_COLUMN_RANK,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  DELTA,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)

seagull_fitted_sparse_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_WEIGHTS_GROUPSc,
  VECTOR_FULL_COLUMN_RANK,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  ALPHA,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  LAMBDA_MAX,
  PROPORTION_XI,
  DELTA,
  STEP_SIZE,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)

seagull_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)

seagull_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_BETAc,
  VECTOR_INDEX_EXCLUDE,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)

seagull_sparse_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  ALPHA,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)

Arguments

VECTOR_Yc

numeric vector of observations.

Y_MEAN

arithmetic mean of VECTOR_Yc.

MATRIX_Xc

numeric design matrix relating y to fixed and random effects [X Z]. The columns may be permuted corresponding to their group assignments.

VECTOR_Xc_MEANS

numeric vector of arithmetic means of each column of MATRIX_Xc.

VECTOR_Xc_STANDARD_DEVIATIONS

numeric vector of estimates of standard deviations of each column of MATRIX_Xc. Values are calculated via the function colSds from the R-package matrixStats.

VECTOR_WEIGHTS_FEATURESc

numeric vector of weights for the vectors of fixed and random effects [b^T, u^T]^T. The entries may be permuted corresponding to their group assignments.

VECTOR_WEIGHTS_GROUPSc

numeric vector of pre-calculated weights for each group.

VECTOR_FULL_COLUMN_RANK

Boolean vector, which harbors the information of whether or not the group-wise parts of the filtered matrix Z, i.e., Z^{(l)} for each group l, have full column rank.

VECTOR_GROUPS

integer vector specifying which effect (fixed and random) belongs to which group.

VECTOR_BETAc

numeric vector whose partitions will be returned (partition 1: estimates of fixed effects, partition 2: predictions of random effects). During the computation the entries may be in permuted order. But they will be returned according to the order of the user's input.

VECTOR_INDEX_PERMUTATION

integer vector that contains information about the original order of the user's input.

VECTOR_INDEX_EXCLUDE

integer vector, which contains the indices of every column that was filtered due to low standard deviation. This vector only has an effect, if standardize = TRUE is used.

EPSILON_CONVERGENCE

value for relative accuracy of the solution to stop the algorithm for the current value of λ. The algorithm stops after iteration m, if:

||sol^{(m)} - sol^{(m-1)}||_∞ < ε_c * ||sol1{(m-1)}||_2.

ITERATION_MAX

maximum number of iterations for each value of the penalty parameter λ. Determines the end of the calculation if the algorithm didn't converge according to EPSILON_CONVERGENCE before.

GAMMA

multiplicative parameter to decrease the step size during backtracking line search. Has to satisfy: 0 < γ < 1.

LAMBDA_MAX

maximum value for the penalty parameter. This is the start value for the grid search of the penalty parameter λ.

PROPORTION_XI

multiplicative parameter to determine the minimum value of λ for the grid search, i.e. λ_{min} = ξ * λ_{max}. Has to satisfy: 0 < ξ ≤ 1. If xi=1, only a single solution for λ = λ_{max} is calculated.

DELTA

numeric value, which is squared and added to the main diagonal of Z^{(l)T} Z^{(l)} for group l, if this matrix is not invertible.

NUMBER_INTERVALS

number of lambdas for the grid search between λ_{max} and ξ * λ_{max}. Loops are performed on a logarithmic grid.

NUMBER_FIXED_EFFECTS

non-negative integer to determine the number of fixed effects present in the mixed model.

NUMBER_VARIABLES

non-negative integer which corresponds to the sum of all columns of the initial model matrices X and Z.

INTERNAL_STANDARDIZATION

if TRUE, the input vector y is centered, and each column of the input matrices X and Z is centered and scaled with an internal process. Additionally, a filter is applied to X and Z, which filters columns with standard deviation less than 1.e-7.

TRACE_PROGRESS

if TRUE, a message will occur on the screen after each finished loop of the λ grid. This is particularly useful for larger data sets.

ALPHA

mixing parameter of the penalty terms. Satisfies: 0 < α < 1. The penalty term looks as follows:

α * "lasso penalty" + (1-α) * "group lasso penalty".

STEP_SIZE

numeric value which represents the size of the step between consecutive iterations.

Value

A list of estimates and parameters relevant for the computation:

intercept

estimate for the intercept, if present in the model.

fixed_effects

estimates for the fixed effects b, if present in the model. Each row corresponds to a particular value of λ.

random_effects

predictions for the random effects u. Each row corresponds to a particular value of λ.

lambda

all values for λ which were used during the grid search.

iterations

a sequence of actual iterations for each value of λ. If an occurring number is equal to max_iter, then the algorithm most likely did not converge to rel_acc during the corresponding run of the grid search.

The following parameters are also returned. But primarily for the purpose of comparison and repetition: alpha = ALPHA (only for the sparse-group lasso), max_iter = ITERATION_MAX, gamma_bls = GAMMA, xi = PROPORTION_XI, and loops_lambda = NUMBER_INTERVALS.


seagull documentation built on April 20, 2021, 5:06 p.m.