# SMME: Soft Maximin Estimation for Large Scale Heterogenous Data In SMME: Soft Maximin Estimation for Large Scale Heterogeneous Data

## Description

Efficient procedure for solving the Lasso or SCAD penalized soft maximin problem for large scale data. This software implements two proximal gradient based algorithms (NPG and FISTA) to solve different forms of the soft maximin problem from Lund et al., 2021. i) For general group specific design the soft maximin problem is solved using the NPG algorithm. ii) For fixed identical d-array-tensor design across groups, where d = 1, 2, 3, the estimation procedure uses either the FISTA algorithm or the NPG algorithm, and for d = 2,3 avoids using the tensor design matrix. Multi-threading is possible when openMP is available for R.

Note this package SMME replaces the SMMA package.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 softmaximin(x, y, zeta, penalty = c("lasso", "scad"), alg = c("npg", "fista"), nlambda = 30, lambda.min.ratio = 1e-04, lambda = NULL, penalty.factor = NULL, reltol = 1e-05, maxiter = 1000, steps = 1, btmax = 100, c = 0.0001, tau = 2, M = 4, nu = 1, Lmin = 0, log = TRUE, nthreads = 4) 

## Arguments

 x list containing the G group specific design matrices of sizes n_i \times p_i. Alternatively for a model with identical tensor design across G groups, x is a list containing the d (d \in \{ 1, 2, 3\}) tensor components. y list containing the G group specific response vectors of sizes n_i \times 1. Alternatively for a model with identical tensor design across G groups, yis an array of size n_1 \times\cdots\times n_d \times G (d \in \{ 1, 2, 3\}) containing the response values. zeta vector of strictly positive floats controlling the softmaximin approximation accuracy. When length(zeta) > 1 the procedure will distribute the computations using the nthreads parameter below when openMP is available. penalty string specifying the penalty type. Possible values are "lasso", "scad". alg string specifying the optimization algorithm. Possible values are "npg", "fista". nlambda positive integer giving the number of lambda values. Used when lambda is not specified. lambda.min.ratio strictly positive float giving the smallest value for lambda, as a fraction of λ_{max}; the (data dependent) smallest value for which all coefficients are zero. Used when lambda is not specified. lambda sequence of strictly positive floats used as penalty parameters. penalty.factor array of size p_1 \times \cdots \times p_d of positive floats. Is multiplied with each element in lambda to allow differential penalization on the coefficients. reltol strictly positive float giving the convergence tolerance for the inner loop. maxiter positive integer giving the maximum number of iterations allowed for each lambda value, when summing over all outer iterations for said lambda. steps strictly positive integer giving the number of steps used in the multi-step adaptive lasso algorithm for non-convex penalties. Automatically set to 1 when penalty = "lasso". btmax strictly positive integer giving the maximum number of backtracking steps allowed in each iteration. Default is btmax = 100. c strictly positive float used in the NPG algorithm. Default is c = 0.0001. tau strictly positive float used to control the stepsize for NPG. Default is tau = 2. M positive integer giving the look back for the NPG. Default is M = 4. nu strictly positive float used to control the stepsize. A value less that 1 will decrease the stepsize and a value larger than one will increase it. Default is nu = 1. Lmin non-negative float used by the NPG algorithm to control the stepsize. For the default Lmin = 0 the maximum step size is the same as for the FISTA algorithm. log logical variable indicating whether to use log-loss. TRUE is default and yields the loss below. nthreads integer giving the number of threads to use when openMP is available. Default is 4.

## Details

Consider modeling heterogeneous data y_1,…, y_n by dividing it into G groups \mathbf{y}_g = (y_1, …, y_{n_g}), g \in \{ 1,…, G\} and then using a linear model

\mathbf{y}_g = \mathbf{X}_gb_g + ε_g, \quad g \in \{1,…, G\},

to model the group response. Then b_g is a group specific p\times 1 coefficient, \mathbf{X}_g an n_g\times p group design matrix and ε_g an n_g\times 1 error term. The objective is to estimate a common coefficient β such that \mathbf{X}_gβ is a robust and good approximation to \mathbf{X}_gb_g across groups.

Following Lund et al., 2021, this objective may be accomplished by solving the soft maximin estimation problem

\min_{β}\frac{1}{ζ}\log\bigg(∑_{g = 1}^G \exp(-ζ \hat V_g(β))\bigg) + λ \Vertβ\Vert_1, \quad ζ > 0,λ ≥q 0.

Here ζ essentially controls the amount of pooling across groups (ζ \sim 0 ignores grouping and pools observations) and

\hat V_g(β):=\frac{1}{n_g}(2β^\top \mathbf{X}_g^\top \mathbf{y}_g-β^\top \mathbf{X}_g^\top \mathbf{X}_gβ),

is the empirical explained variance from Meinshausen and Buhlmann, 2015. See Lund et al., 2021 for more details and references.

The function softmaximin solves the soft maximin estimation problem in large scale settings for a sequence of penalty parameters λ_{max}>… >λ_{min}>0 and a sequence of strictly positive softmaximin parameters ζ_1, ζ_2,….

The implementation also solves the problem above with the penalty given by the SCAD penalty, using the multiple step adaptive lasso procedure to loop over the inner proximal algorithm.

Two optimization algorithms are implemented in the SMME packages; a non-monotone proximal gradient (NPG) algorithm and a fast iterative soft thresholding algorithm (FISTA). The implementation is particularly efficient for applications where the design is identical across groups i.e. \mathbf{X}_g = \mathbf{X} \forall g \in \{1, …, G\} and where \mathbf{X} has tensor structure i.e.

\mathbf{X} = \bigotimes_{i=1}^d \mathbf{M}_i.

for marginal n_i\times p_i design matrices \mathbf{M}_1,…, \mathbf{M}_d.

For d \in \{ 1, 2, 3\}, provided only with the marginal matrices and the group response vectors, softmaximin solves the soft maximin problem with minimal memory footprint using tensor optimized arithmetic, see RH.

Note that when multiple values for ζ is provided it is possible to distribute the computations across CPUs if openMP is available.

## Value

An object with S3 Class "SMME".

 spec A string indicating the array dimension (1, 2 or 3) and the penalty. coef A p_1\cdots p_d \times nlambda matrix containing the estimates of the model coefficients (beta) for each lambda-value for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such matrices. lambda A vector containing the sequence of penalty values used in the estimation procedure for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such vectors. Obj A matrix containing the objective values for each iteration and each model for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such matrices. df A vector indicating the nonzero model coefficients for each value of lambda for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such vectors. dimcoef An integer giving the number p of model parameters. For array data a vector giving the dimension of the model coefficient array β. dimobs An integer giving the number of observations. For array data a vector giving the dimension of the observation (response) array Y. iter A vector containing the number of iterations for each lambda value for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such vectors.

## References

Lund, A., S. W. Mogensen and N. R. Hansen (2021). Soft Maximin Estimation for Heterogeneous Data. Preprint. url = https://arxiv.org/abs/1805.02407

Meinshausen, N and P. Buhlmann (2015). Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 43, 4, 1801-1830. url = https://doi.org/10.1214/15-AOS1325.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 #Non-array data ##size of example set.seed(42) G <- 10; n <- sample(100:500, G); p <- 60 x <- y <- list() ##group design matrices for(g in 1:G){x[[g]] <- matrix(rnorm(n[g] * p), n[g], p)} ##common features and effects common_features <- rbinom(p, 1, 0.1) common_effects <- rnorm(p) * common_features ##group response for(g in 1:G){ bg <- rnorm(p, 0, 0.5) * (1 - common_features) + common_effects mu <- x[[g]] %*% bg y[[g]] <- rnorm(n[g]) + mu } ##fit model for range of lambda and zeta system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1), penalty = "lasso", alg = "npg")) betahat <- fit$coef ##estimated common effects for specific lambda and zeta modelno <- 6; zetano <- 2 m <- min(betahat[[zetano]][ , modelno], common_effects) M <- max(betahat[[zetano]][ , modelno], common_effects) plot(common_effects, type = "p", ylim = c(m, M), col = "red") lines(betahat[[zetano]][ , modelno], type = "h") #Array data ##size of example set.seed(42) G <- 5; n <- c(65, 26, 13); p <- c(13, 5, 4) ##marginal design matrices (Kronecker components) x <- list() for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i]), n[i], p[i])} ##common features and effects common_features <- rbinom(prod(p), 1, 0.1) common_effects <- rnorm(prod(p), 0, 0.1) * common_features ##group response y <- array(NA, c(n, G)) for(g in 1:G){ bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects Bg <- array(bg, p) mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg))) y[,,, g] <- array(rnorm(prod(n)), dim = n) + mu } ##fit model for range of lambda and zeta system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1, 10, 100), penalty = "lasso", alg = "npg")) Betahat <- fit$coef ##estimated common effects for specific lambda and zeta modelno <- 10; zetano <- 3 m <- min(Betahat[[zetano]][, modelno], common_effects) M <- max(Betahat[[zetano]][, modelno], common_effects) plot(common_effects, type = "h", ylim = c(m, M), col = "red") lines(Betahat[[zetano]][, modelno], type = "h") 

SMME documentation built on July 22, 2021, 9:08 a.m.