# SMMA: Soft Maximin Estimation for Large Scale Array Data with Known... In SMMA: Soft Maximin Estimation for Large Scale Array-Tensor Models

## Description

Efficient design matrix free procedure for solving a soft maximin problem for large scale array-tensor structured models, see Lund et al., 2020. Currently Lasso and SCAD penalized estimation is implemented.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 softmaximin(X, Y, zeta, penalty = c("lasso", "scad"), alg = c("npg", "fista"), nlambda = 30, lambda.min.ratio = 1e-04, lambda = NULL, penalty.factor = NULL, reltol = 1e-05, maxiter = 15000, steps = 1, btmax = 100, c = 0.0001, tau = 2, M = 4, nu = 1, Lmin = 0, log = TRUE) 

## Arguments

 X list containing the Kronecker components (1, 2 or 3) of the Kronecker design matrix. These are matrices of sizes n_i \times p_i. Y array of size n_1 \times\cdots\times n_d \times G containing the response values. zeta strictly positive float controlling the softmaximin approximation accuracy. penalty string specifying the penalty type. Possible values are "lasso", "scad". alg string specifying the optimization algorithm. Possible values are "npg", "fista". nlambda positive integer giving the number of lambda values. Used when lambda is not specified. lambda.min.ratio strictly positive float giving the smallest value for lambda, as a fraction of λ_{max}; the (data dependent) smallest value for which all coefficients are zero. Used when lambda is not specified. lambda sequence of strictly positive floats used as penalty parameters. penalty.factor array of size p_1 \times \cdots \times p_d of positive floats. Is multiplied with each element in lambda to allow differential penalization on the coefficients. reltol strictly positive float giving the convergence tolerance for the inner loop. maxiter positive integer giving the maximum number of iterations allowed for each lambda value, when summing over all outer iterations for said lambda. steps strictly positive integer giving the number of steps used in the multi-step adaptive lasso algorithm for non-convex penalties. Automatically set to 1 when penalty = "lasso". btmax strictly positive integer giving the maximum number of backtracking steps allowed in each iteration. Default is btmax = 100. c strictly positive float used in the NPG algorithm. Default is c = 0.0001. tau strictly positive float used to control the stepsize for NPG. Default is tau = 2. M positive integer giving the look back for the NPG. Default is M = 4. nu strictly positive float used to control the stepsize. A value less that 1 will decrease the stepsize and a value larger than one will increase it. Default is nu = 1. Lmin non-negative float used by the NPG algorithm to control the stepsize. For the default Lmin = 0 the maximum step size is the same as for the FISTA algorithm. log logical variable indicating whether to use log-loss. TRUE is default and yields the loss below.

## Details

Following Lund et al., 2020 this package solves the optimization problem for a linear model for heterogeneous d-dimensional array data (d=1,2,3) organized in G known groups, and with identical tensor structured design matrix X across all groups. Specifically n = ∏_i^d n_i is the number of observations in each group, Y_g the n_1\times \cdots \times n_d response array for group g \in \{1,…,G\}, and X a n\times p design matrix, with tensor structure

X = \bigotimes_{i=1}^d X_i.

For d =1,2,3, X_1,…, X_d are the marginal n_i\times p_i design matrices (Kronecker components). Using the GLAM framework the model equation for group g\in \{1,…,G\} is expressed as

Y_g = ρ(X_d,ρ(X_{d-1},…,ρ(X_1,B_g))) + E_g,

where ρ is the so called rotated H-transfrom (see Currie et al., 2006), B_g for each g is a (random) p_1\times\cdots\times p_d parameter array and E_g is a n_1\times \cdots \times n_d error array.

This package solves the penalized soft maximin problem from Lund et al., 2020, given by

\min_{β}\frac{1}{ζ}\log\bigg(∑_{g=1}^G \exp(-ζ \hat V_g(β))\bigg) + λ \Vertβ\Vert_1, \quad ζ > 0,λ ≥q 0

for the setup described above. Note that

\hat V_g(β):=\frac{1}{n}(2β^\top X^\top vec(Y_g)-β^\top X^\top Xβ),

is the empirical explained variance from Meinshausen and Buhlmann, 2015. See Lund et al., 2020 for more details and references.

For d=1,2,3, using only the marginal matrices X_1,X_2,… (for d=1 there is only one marginal), the function softmaximin solves the soft maximin problem for a sequence of penalty parameters λ_{max}>… >λ_{min}>0.

Two optimization algorithms are implemented, a non-monotone proximal gradient (NPG) algorithm and a fast iterative soft thresholding algorithm (FISTA). We note that this package also solves the problem above with the penalty given by the SCAD penalty, using the multiple step adaptive lasso procedure to loop over the proximal algorithm.

## Value

An object with S3 Class "SMMA".

 spec A string indicating the array dimension (1, 2 or 3) and the penalty. coef A p_1\cdots p_d \times nlambda matrix containing the estimates of the model coefficients (beta) for each lambda-value. lambda A vector containing the sequence of penalty values used in the estimation procedure. Obj A matrix containing the objective values for each iteration and each model. df The number of nonzero coefficients for each value of lambda. dimcoef A vector giving the dimension of the model coefficient array β. dimobs A vector giving the dimension of the observation (response) array Y. Iter A list with 4 items: bt_iter is total number of backtracking steps performed, bt_enter is the number of times the backtracking is initiated, and iter_mat is a vector containing the number of iterations for each lambda value and iter is total number of iterations i.e. sum(Iter).

## References

Lund, A., S. W. Mogensen and N. R. Hansen (2020). Soft Maximin Estimation for Heterogeneous Array Data. Preprint.

Meinshausen, N and P. Buhlmann (2015). Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 43, 4, 1801-1830. url = https://doi.org/10.1214/15-AOS1325.

Currie, I. D., M. Durban, and P. H. C. Eilers (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society. Series B. 68, 259-280. url = http://dx.doi.org/10.1111/j.1467-9868.2006.00543.x.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 ##size of example n1 <- 65; n2 <- 26; n3 <- 13; p1 <- 13; p2 <- 5; p3 <- 4 ##marginal design matrices (Kronecker components) X1 <- matrix(rnorm(n1 * p1), n1, p1) X2 <- matrix(rnorm(n2 * p2), n2, p2) X3 <- matrix(rnorm(n3 * p3), n3, p3) X <- list(X1, X2, X3) component <- rbinom(p1 * p2 * p3, 1, 0.1) Beta1 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3)) mu1 <- RH(X3, RH(X2, RH(X1, Beta1))) Y1 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu1 Beta2 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3)) mu2 <- RH(X3, RH(X2, RH(X1, Beta2))) Y2 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu2 Beta3 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3)) mu3 <- RH(X3, RH(X2, RH(X1, Beta3))) Y3 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu3 Beta4 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3)) mu4 <- RH(X3, RH(X2, RH(X1, Beta4))) Y4 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu4 Beta5 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3)) mu5 <- RH(X3, RH(X2, RH(X1, Beta5))) Y5 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu5 Y <- array(NA, c(dim(Y1), 5)) Y[,,, 1] <- Y1; Y[,,, 2] <- Y2; Y[,,, 3] <- Y3; Y[,,, 4] <- Y4; Y[,,, 5] <- Y5; fit <- softmaximin(X, Y, zeta = 10, penalty = "lasso", alg = "npg") Betafit <- fit\$coef modelno <- 15 m <- min(Betafit[ , modelno], c(component)) M <- max(Betafit[ , modelno], c(component)) plot(c(component), type="l", ylim = c(m, M)) lines(Betafit[ , modelno], col = "red") 

SMMA documentation built on Sept. 17, 2020, 5:08 p.m.