SMME | R Documentation |
Efficient procedure for solving the Lasso or SCAD penalized soft maximin problem for large scale_y data. This software implements two proximal gradient based algorithms (NPG and FISTA) to solve different forms of the soft maximin problem from Lund et al., 2022. 1) For general group specific design the soft maximin problem is solved using the NPG algorithm. 2) For fixed identical d-array-tensor design across groups, where d = 1, 2, 3, the estimation procedure uses either the FISTA algorithm or the NPG algorithm and is implemented for the following two cases; i) For a tensor design matrix the algorithms use array arithmetic to speed up design matrix multiplications using only the tensor components ii) For a wavelet design matrix the algorithms use the pyramid algorithm to completely avoid the design matrix and speed up design matrix multiplications. Multi-threading is possible when openMP is available for R.
Note this package SMME replaces the SMMA package.
softmaximin(x, y, zeta, penalty = c("lasso", "scad"), alg = c("npg", "fista"), nlambda = 30, lambda.min.ratio = 1e-04, lambda = NULL, scale_y = 1, penalty.factor = NULL, reltol = 1e-05, maxiter = 1000, steps = 1, btmax = 100, c = 0.0001, tau = 2, M = 4, nu = 1, Lmin = 0, lse = TRUE, nthreads = 2)
x |
Either a list containing the G group specific design matrices of sizes n_i \times p_i (general model), a list containing the d (d \in \{ 1, 2, 3\}) tensor components (tensor array model) or a string indicating which wavelet design to use (wavelet array model), see wt for options. |
y |
list containing the G group specific response vectors of sizes
n_i \times 1. Alternatively for a model with identical tensor design
across G groups, |
zeta |
vector of strictly positive floats controlling the softmaximin
approximation accuracy. When |
penalty |
string specifying the penalty type. Possible values are
|
alg |
string specifying the optimization algorithm. Possible values are
|
nlambda |
positive integer giving the number of |
lambda.min.ratio |
strictly positive float giving the smallest value for
|
lambda |
A sequence of strictly positive floats used as penalty parameters. |
scale_y |
strictly positive number that the response |
penalty.factor |
a length p vector of positive floats that are
multiplied with each element in |
reltol |
strictly positive float giving the convergence tolerance. |
maxiter |
positive integer giving the maximum number of iterations
allowed for each |
steps |
strictly positive integer giving the number of steps used in the
multi-step adaptive lasso algorithm for non-convex penalties. Automatically
set to 1 when |
btmax |
strictly positive integer giving the maximum number of backtracking
steps allowed in each iteration. Default is |
c |
strictly positive float used in the NPG algorithm. Default is
|
tau |
strictly positive float used to control the stepsize for NPG.
Default is |
M |
positive integer giving the look back for the NPG. Default is |
nu |
strictly positive float used to control the stepsize. A value less
that 1 will decrease the stepsize and a value larger than one will increase it.
Default is |
Lmin |
non-negative float used by the NPG algorithm to control the
stepsize. For the default |
lse |
logical variable indicating whether to use the log-sum-exp-loss. TRUE is default and yields the loss below and FALSE yields the exponential of this. |
nthreads |
integer giving the number of threads to use when openMP is available. Default is 2. |
Consider modeling heterogeneous data y_1,…, y_n by dividing it into G groups \mathbf{y}_g = (y_1, …, y_{n_g}), g \in \{ 1,…, G\} and then using a linear model
\mathbf{y}_g = \mathbf{X}_gb_g + ε_g, \quad g \in \{1,…, G\},
to model the group response. Then b_g is a group specific p\times 1 coefficient, \mathbf{X}_g an n_g\times p group design matrix and ε_g an n_g\times 1 error term. The objective is to estimate a common coefficient β such that \mathbf{X}_gβ is a robust and good approximation to \mathbf{X}_gb_g across groups.
Following Lund et al., 2022, this objective may be accomplished by solving the soft maximin estimation problem
\min_{β}\frac{1}{ζ}\log\bigg(∑_{g = 1}^G \exp(-ζ \hat V_g(β))\bigg) + λ \Vertβ\Vert_1, \quad ζ > 0,λ ≥q 0.
Here ζ essentially controls the amount of pooling across groups (ζ \sim 0 effectively ignores grouping and pools observations) and
\hat V_g(β):=\frac{1}{n_g}(2β^\top \mathbf{X}_g^\top \mathbf{y}_g-β^\top \mathbf{X}_g^\top \mathbf{X}_gβ),
is the empirical explained variance, see Lund et al., 2022 for more details and references.
The function softmaximin
solves the soft maximin estimation problem in
large scale settings for a sequence of penalty parameters
λ_{max}>… >λ_{min}>0 and a sequence of strictly positive
softmaximin parameters ζ_1, ζ_2,….
The implementation also solves the problem above with the penalty given by the SCAD penalty, using the multiple step adaptive lasso procedure to loop over the inner proximal algorithm.
Two optimization algorithms are implemented in the SMME packages; a non-monotone proximal gradient (NPG) algorithm and a fast iterative soft thresholding algorithm (FISTA).
The implementation is particularly efficient for models where the design is identical across groups i.e. \mathbf{X}_g = \mathbf{X} \forall g \in \{1, …, G\} in the following two cases: i) first if \mathbf{X} has tensor structure i.e.
\mathbf{X} = \bigotimes_{i=1}^d \mathbf{M}_i
for marginal n_i\times p_i design matrices \mathbf{M}_1,…, \mathbf{M}_d
, d \in \{ 1, 2, 3\}, y
is a d + 1 dimensional response array
and x
is a list containing the d marginal matrices
\mathbf{M}_1,…, \mathbf{M}_d. In this case softmaximin
solves
the soft maximin problem using minimal memory by way of tensor optimized
arithmetic, see also RH
.
ii) second, if the design matrix \mathbf{X} is the inverse matrix of an
orthogonal wavelet transform softmaximin
solves the soft maximin problem
given the d + 1 dimensional response array y
and
x
the name of the wavelet family wt
, using the
pyramid algorithm to compute multiplications
involving \mathbf{X}.
Note that when multiple values for ζ is provided it is possible to distribute the computations across CPUs if openMP is available.
An object with S3 Class "SMME".
spec |
A string indicating the array dimension (1, 2 or 3) and the penalty. |
coef |
A |
lambda |
A |
df |
A |
dimcoef |
An integer giving the number p of model parameters. For array data a vector giving the dimension of the model coefficient array β. |
dimobs |
An integer giving the number of observations. For array data a
vector giving the dimension of the observation (response) array |
dim |
Integer indicating the dimension of of the array model. Equal to 1 for non array. |
wf |
A string indicating the wavelet name if used. |
diagnostics |
A list of length 3. Item |
endmod |
Vector of length |
Stops |
Convergence indicators. |
Adam Lund
Maintainer: Adam Lund, adam.lund@math.ku.dk
Lund, A., S. W. Mogensen and N. R. Hansen (2022). Soft Maximin Estimation for Heterogeneous Data. Scandinavian Journal of Statistics, vol. 49, no. 4, pp. 1761-1790. url = https://doi.org/10.1111/sjos.12580
#Non-array data ##size of example set.seed(42) G <- 10; n <- sample(100:500, G); p <- 60 x <- y <- list() ##group design matrices for(g in 1:G){x[[g]] <- matrix(rnorm(n[g] * p), n[g], p)} ##common features and effects common_features <- rbinom(p, 1, 0.1) #sparsity of comm. feat. common_effects <- rnorm(p) * common_features ##group response for(g in 1:G){ bg <- rnorm(p, 0, 0.5) * (1 - common_features) + common_effects mu <- x[[g]] %*% bg y[[g]] <- rnorm(n[g]) + mu } ##fit model for range of lambda and zeta system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1), penalty = "lasso", alg = "npg")) betahat <- fit$coef ##estimated common effects for specific lambda and zeta zetano <- 2 modelno <- dim(betahat[[zetano]])[2] m <- min(betahat[[zetano]][ , modelno], common_effects) M <- max(betahat[[zetano]][ , modelno], common_effects) plot(common_effects, type = "p", ylim = c(m, M), col = "red") lines(betahat[[zetano]][ , modelno], type = "h") #Array data ##size of example set.seed(42) G <- 50; n <- c(30, 20, 10); p <- c(7, 5, 4) ##marginal design matrices (Kronecker components) x <- list() for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i]), n[i], p[i])} ##common features and effects common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat. common_effects <- rnorm(prod(p),0,0.1) * common_features ##group response y <- array(NA, c(n, G)) for(g in 1:G){ bg <- rnorm(prod(p), 0, .1) * (1 - common_features) + common_effects Bg <- array(bg, p) mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg))) y[,,, g] <- array(rnorm(prod(n)), dim = n) + mu } ##fit model for range of lambda and zeta system.time(fit <- softmaximin(x, y, zeta = c(1, 10, 100), penalty = "lasso", alg = "npg")) betahat <- fit$coef ##estimated common effects for specific lambda and zeta zetano <- 1 modelno <- dim(betahat[[zetano]])[2] m <- min(betahat[[zetano]][, modelno], common_effects) M <- max(betahat[[zetano]][, modelno], common_effects) plot(common_effects, type = "p", ylim = c(m, M), col = "red") lines(betahat[[zetano]][ , modelno], type = "h") #Array data and wavelets ##size of example set.seed(42) G <- 50; p <- n <- c(2^3, 2^4, 2^5); ##common features and effects common_features <- rbinom(prod(p), 1, 0.1) #sparsity of comm. feat. common_effects <- rnorm(prod(p), 0, 1) * common_features ##group response y <- array(NA, c(n, G)) for(g in 1:G){ bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects Bg <- array(bg, p) mu <- iwt(Bg) y[,,, g] <- array(rnorm(prod(n), 0, 0.5), dim = n) + mu } ##fit model for range of lambda and zeta system.time(fit <- softmaximin(x = "la8", y, zeta = c(0.1, 1, 10), penalty = "lasso", alg = "fista")) betahat <- fit$coef ##estimated common effects for specific lambda and zeta zetano <- 3 modelno <- dim(betahat[[zetano]])[2] m <- min(betahat[[zetano]][, modelno], common_effects) M <- max(betahat[[zetano]][, modelno], common_effects) plot(common_effects, type = "p", ylim = c(m, M), col = "red") lines(betahat[[zetano]][ , modelno], type = "h")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.