lmSP_cv | R Documentation |
Sparse Adaptive overlap group-LASSO, or sparse adaptive group L_2
-regularized regression, solves the following optimization problem
\textrm{min}_{\beta,\gamma} ~ \frac{1}{2}\|y-X\beta-Z\gamma\|_2^2 + \lambda\Big[(1-\alpha) \sum_{g=1}^G \|S_g T\beta\|_2+\alpha\Vert T_1\beta\Vert_1\Big]
to obtain a sparse coefficient vector \beta\in\mathbb{R}^p
for the matrix of penalized predictors X
and a coefficient vector \gamma\in\mathbb{R}^q
for the matrix of unpenalized predictors Z
. For each group g
, each row of
the matrix S_g\in\mathbb{R}^{n_g\times p}
has non-zero entries only for those variables belonging
to that group. These values are provided by the arguments groups
and group_weights
(see below).
Each variable can belong to more than one group. The diagonal matrix T\in\mathbb{R}^{p\times p}
contains the variable-specific weights. These values are
provided by the argument var_weights
(see below). The diagonal matrix T_1\in\mathbb{R}^{p\times p}
contains
the variable-specific L_1
weights. These values are provided by the argument var_weights_L1
(see below).
The regularization path is computed for the sparse adaptive overlap group-LASSO penalty at a grid of values for the regularization
parameter \lambda
using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method. The regularization is a combination of L_2
and
L_1
simultaneous constraints. Different specifications of the penalty
argument lead to different models choice:
The classical Lasso regularization (Tibshirani, 1996) can be obtained by specifying \alpha = 1
and the matrix T_1
as the p \times p
identity matrix. An adaptive version of this model (Zou, 2006) can be obtained if T_1
is
a p \times p
diagonal matrix of adaptive weights. See also Hastie et al. (2015) for further details.
The group-Lasso regularization (Yuan and Lin, 2006) can be obtained by specifying \alpha = 0
,
non-overlapping groups in S_g
and by setting the matrix T
equal to the p \times p
identity matrix.
An adaptive version of this model can be obtained if the matrix T
is a p \times p
diagonal matrix of adaptive weights. See also Hastie et al. (2015) for further details.
The sparse group-Lasso regularization (Simon et al., 2011) can be obtained by specifying \alpha\in(0,1)
,
non-overlapping groups in S_g
and by setting the matrices T
and T_1
equal to the p \times p
identity matrix.
An adaptive version of this model can be obtained if the matrices T
and T_1
are p \times p
diagonal matrices of adaptive weights.
The overlap group-Lasso regularization (Jenatton et al., 2011) can be obtained by specifying
\alpha = 0
, overlapping groups in S_g
and by setting the matrix T
equal to the p \times p
identity matrix. An adaptive version of this model can be obtained if the matrix T
is a p \times p
diagonal matrix of adaptive weights.
The sparse overlap group-Lasso regularization (Jenatton et al., 2011) can be obtained by specifying
\alpha\in(0,1)
, overlapping groups in S_g
and by setting the matrices T
and T_1
equal to the p \times p
identity matrix.
An adaptive version of this model can be obtained if the matrices T
and T_1
are p \times p
diagonal matrices of adaptive weights.
lmSP_cv(
X,
Z = NULL,
y,
penalty = c("LASSO", "GLASSO", "spGLASSO", "OVGLASSO", "spOVGLASSO"),
groups,
group_weights = NULL,
var_weights = NULL,
var_weights_L1 = NULL,
cv.fold = 5,
standardize.data = TRUE,
intercept = FALSE,
overall.group = FALSE,
lambda = NULL,
alpha = NULL,
lambda.min.ratio = NULL,
nlambda = 30,
control = list()
)
X |
an |
Z |
an |
y |
a length- |
penalty |
choose one from the following options: 'LASSO', for the or adaptive-Lasso penalties, 'GLASSO', for the group-Lasso penalty, 'spGLASSO', for the sparse group-Lasso penalty, 'OVGLASSO', for the overlap group-Lasso penalty and 'spOVGLASSO', for the sparse overlap group-Lasso penalty. |
groups |
either a vector of length |
group_weights |
a vector of length |
var_weights |
a vector of length |
var_weights_L1 |
a vector of length |
cv.fold |
the number of folds - default is 5. |
standardize.data |
logical. Should data be standardized? |
intercept |
logical. If it is TRUE, a column of ones is added to the design matrix. |
overall.group |
logical. This setting is only available for the overlap group-LASSO and the sparse overlap group-LASSO penalties, otherwise it is set to NULL. If it is TRUE, an overall group including all penalized covariates is added. |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
alpha |
the sparse overlap group-LASSO mixing parameter, with |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
A named list containing
a length-p
solution vector for the parameters \beta
. If n_\lambda>1
then the provided vector corresponds to the minimum cross-validated MSE.
a length-q
solution vector for the parameters \gamma
. If n_\lambda>1
then the provided vector corresponds to the minimum cross-validated MSE.
It is provided only when either the matrix Z
in input is not NULL or the intercept is set to TRUE.
an (n_\lambda\times p)
matrix of estimated \beta
coefficients for each lambda of the provided sequence.
an (n_\lambda\times q)
matrix of estimated \gamma
coefficients for each lambda of the provided sequence.
It is provided only when either the matrix Z
in input is not NULL or the intercept is set to TRUE.
sequence of lambda.
value of lambda that attains the minimum cross-validated MSE.
cross-validated mean squared error.
minimum value of the cross-validated MSE for the sequence of lambda.
logical. 1 denotes achieved convergence.
elapsed time in seconds.
number of iterations.
When you run the algorithm, output returns not only the solution, but also the iteration history recording following fields over iterates:
objective function value
norm of primal residual
norm of dual residual
feasibility tolerance for primal feasibility condition
feasibility tolerance for dual feasibility condition.
Iteration stops when both r_norm
and s_norm
values
become smaller than eps_pri
and eps_dual
, respectively.
The control argument is a list that can supply any of the following components:
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
an augmented Lagrangian parameter. The default value is 1.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) for details.
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) for details.
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
maximum number of iterations. The default value is 100.
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
bernardi_etal.2022fdaSP
\insertRefboyd_etal.2011fdaSP
\insertRefhastie_etal.2015fdaSP
\insertRefjenatton_etal.2011fdaSP
\insertReflin_etal.2022fdaSP
\insertRefsimon_etal.2013fdaSP
\insertRefyuan_lin.2006fdaSP
\insertRefzou.2006fdaSP
### generate sample data
set.seed(2023)
n <- 50
p <- 30
X <- matrix(rnorm(n * p), n, p)
### Example 1, LASSO penalty
beta <- apply(matrix(rnorm(p, sd = 1), p, 1), 1, fdaSP::softhresh, 1.5)
y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20)
### set the hyper-parameters of the ADMM algorithm
maxit <- 1000
adaptation <- TRUE
rho <- 1
reltol <- 1e-5
abstol <- 1e-5
### run cross-validation
mod_cv <- lmSP_cv(X = X, y = y, penalty = "LASSO",
standardize.data = FALSE, intercept = FALSE,
cv.fold = 5, nlambda = 30,
control = list("adaptation" = adaptation,
"rho" = rho,
"maxit" = maxit, "reltol" = reltol,
"abstol" = abstol,
"print.out" = FALSE))
### graphical presentation
plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n",
xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error",
ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd),
main = "Cross-validated Prediction Error")
fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd,
yVmax = mod_cv$mse + mod_cv$mse.sd)
abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]),
col = "red", lwd = 1.0)
### comparison with oracle error
mod <- lmSP(X = X, y = y, penalty = "LASSO",
standardize.data = FALSE,
intercept = FALSE,
nlambda = 30,
control = list("adaptation" = adaptation,
"rho" = rho,
"maxit" = maxit, "reltol" = reltol,
"abstol" = abstol,
"print.out" = FALSE))
err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - beta)^2))
plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2,
xlab = latex2exp::TeX("$\\log(\\lambda)$"),
ylab = "Estimation Error", main = "True Estimation Error", bty = "n")
abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0)
abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]),
col = "red", lwd = 1.0, lty = 2)
### Example 2, sparse group-LASSO penalty
beta <- c(rep(4, 12), rep(0, p - 13), -2)
y <- X %*% beta + rnorm(n, sd = sqrt(crossprod(X %*% beta)) / 20)
### define groups of dimension 3 each
group1 <- rep(1:10, each = 3)
### set regularization parameter grid
lam <- 10^seq(1, -2, length.out = 30)
### set the alpha parameter
alpha <- 0.5
### set the hyper-parameters of the ADMM algorithm
maxit <- 1000
adaptation <- TRUE
rho <- 1
reltol <- 1e-5
abstol <- 1e-5
### run cross-validation
mod_cv <- lmSP_cv(X = X, y = y, penalty = "spGLASSO",
groups = group1, cv.fold = 5,
standardize.data = FALSE, intercept = FALSE,
lambda = lam, alpha = 0.5,
control = list("adaptation" = adaptation,
"rho" = rho,
"maxit" = maxit, "reltol" = reltol,
"abstol" = abstol,
"print.out" = FALSE))
### graphical presentation
plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n",
xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error",
ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd),
main = "Cross-validated Prediction Error")
fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd,
yVmax = mod_cv$mse + mod_cv$mse.sd)
abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]),
col = "red", lwd = 1.0)
### comparison with oracle error
mod <- lmSP(X = X, y = y,
penalty = "spGLASSO",
groups = group1,
standardize.data = FALSE,
intercept = FALSE,
lambda = lam,
alpha = 0.5,
control = list("adaptation" = adaptation, "rho" = rho,
"maxit" = maxit, "reltol" = reltol, "abstol" = abstol,
"print.out" = FALSE))
err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - beta)^2))
plot(log(mod$lambda), err_mod, type = "l", col = "blue", lwd = 2,
xlab = latex2exp::TeX("$\\log(\\lambda)$"),
ylab = "Estimation Error", main = "True Estimation Error", bty = "n")
abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0)
abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]),
col = "red", lwd = 1.0, lty = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.