SMME: Soft Maximin Estimation for Large Scale Heterogenous Data

Description Usage Arguments Details Value Author(s) References Examples

Description

Efficient procedure for solving the Lasso or SCAD penalized soft maximin problem for large scale data. This software implements two proximal gradient based algorithms (NPG and FISTA) to solve different forms of the soft maximin problem from Lund et al., 2021. i) For general group specific design the soft maximin problem is solved using the NPG algorithm. ii) For fixed identical d-array-tensor design across groups, where d = 1, 2, 3, the estimation procedure uses either the FISTA algorithm or the NPG algorithm, and for d = 2,3 avoids using the tensor design matrix. Multi-threading is possible when openMP is available for R.

Note this package SMME replaces the SMMA package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
softmaximin(x,
            y,
            zeta,
            penalty = c("lasso", "scad"),
            alg = c("npg", "fista"),
            nlambda = 30,
            lambda.min.ratio = 1e-04,
            lambda = NULL,
            penalty.factor = NULL,
            reltol = 1e-05,
            maxiter = 1000,
            steps = 1,
            btmax = 100,
            c = 0.0001,
            tau = 2,
            M = 4,
            nu = 1,
            Lmin = 0,
            log = TRUE,
            nthreads = 4)

Arguments

x

list containing the G group specific design matrices of sizes n_i \times p_i. Alternatively for a model with identical tensor design across G groups, x is a list containing the d (d \in \{ 1, 2, 3\}) tensor components.

y

list containing the G group specific response vectors of sizes n_i \times 1. Alternatively for a model with identical tensor design across G groups, yis an array of size n_1 \times\cdots\times n_d \times G (d \in \{ 1, 2, 3\}) containing the response values.

zeta

vector of strictly positive floats controlling the softmaximin approximation accuracy. When length(zeta) > 1 the procedure will distribute the computations using the nthreads parameter below when openMP is available.

penalty

string specifying the penalty type. Possible values are "lasso", "scad".

alg

string specifying the optimization algorithm. Possible values are "npg", "fista".

nlambda

positive integer giving the number of lambda values. Used when lambda is not specified.

lambda.min.ratio

strictly positive float giving the smallest value for lambda, as a fraction of λ_{max}; the (data dependent) smallest value for which all coefficients are zero. Used when lambda is not specified.

lambda

sequence of strictly positive floats used as penalty parameters.

penalty.factor

array of size p_1 \times \cdots \times p_d of positive floats. Is multiplied with each element in lambda to allow differential penalization on the coefficients.

reltol

strictly positive float giving the convergence tolerance for the inner loop.

maxiter

positive integer giving the maximum number of iterations allowed for each lambda value, when summing over all outer iterations for said lambda.

steps

strictly positive integer giving the number of steps used in the multi-step adaptive lasso algorithm for non-convex penalties. Automatically set to 1 when penalty = "lasso".

btmax

strictly positive integer giving the maximum number of backtracking steps allowed in each iteration. Default is btmax = 100.

c

strictly positive float used in the NPG algorithm. Default is c = 0.0001.

tau

strictly positive float used to control the stepsize for NPG. Default is tau = 2.

M

positive integer giving the look back for the NPG. Default is M = 4.

nu

strictly positive float used to control the stepsize. A value less that 1 will decrease the stepsize and a value larger than one will increase it. Default is nu = 1.

Lmin

non-negative float used by the NPG algorithm to control the stepsize. For the default Lmin = 0 the maximum step size is the same as for the FISTA algorithm.

log

logical variable indicating whether to use log-loss. TRUE is default and yields the loss below.

nthreads

integer giving the number of threads to use when openMP is available. Default is 4.

Details

Consider modeling heterogeneous data y_1,…, y_n by dividing it into G groups \mathbf{y}_g = (y_1, …, y_{n_g}), g \in \{ 1,…, G\} and then using a linear model

\mathbf{y}_g = \mathbf{X}_gb_g + ε_g, \quad g \in \{1,…, G\},

to model the group response. Then b_g is a group specific p\times 1 coefficient, \mathbf{X}_g an n_g\times p group design matrix and ε_g an n_g\times 1 error term. The objective is to estimate a common coefficient β such that \mathbf{X}_gβ is a robust and good approximation to \mathbf{X}_gb_g across groups.

Following Lund et al., 2021, this objective may be accomplished by solving the soft maximin estimation problem

\min_{β}\frac{1}{ζ}\log\bigg(∑_{g = 1}^G \exp(-ζ \hat V_g(β))\bigg) + λ \Vertβ\Vert_1, \quad ζ > 0,λ ≥q 0.

Here ζ essentially controls the amount of pooling across groups (ζ \sim 0 ignores grouping and pools observations) and

\hat V_g(β):=\frac{1}{n_g}(2β^\top \mathbf{X}_g^\top \mathbf{y}_g-β^\top \mathbf{X}_g^\top \mathbf{X}_gβ),

is the empirical explained variance from Meinshausen and Buhlmann, 2015. See Lund et al., 2021 for more details and references.

The function softmaximin solves the soft maximin estimation problem in large scale settings for a sequence of penalty parameters λ_{max}>… >λ_{min}>0 and a sequence of strictly positive softmaximin parameters ζ_1, ζ_2,….

The implementation also solves the problem above with the penalty given by the SCAD penalty, using the multiple step adaptive lasso procedure to loop over the inner proximal algorithm.

Two optimization algorithms are implemented in the SMME packages; a non-monotone proximal gradient (NPG) algorithm and a fast iterative soft thresholding algorithm (FISTA). The implementation is particularly efficient for applications where the design is identical across groups i.e. \mathbf{X}_g = \mathbf{X} \forall g \in \{1, …, G\} and where \mathbf{X} has tensor structure i.e.

\mathbf{X} = \bigotimes_{i=1}^d \mathbf{M}_i.

for marginal n_i\times p_i design matrices \mathbf{M}_1,…, \mathbf{M}_d.

For d \in \{ 1, 2, 3\}, provided only with the marginal matrices and the group response vectors, softmaximin solves the soft maximin problem with minimal memory footprint using tensor optimized arithmetic, see RH.

Note that when multiple values for ζ is provided it is possible to distribute the computations across CPUs if openMP is available.

Value

An object with S3 Class "SMME".

spec

A string indicating the array dimension (1, 2 or 3) and the penalty.

coef

A p_1\cdots p_d \times nlambda matrix containing the estimates of the model coefficients (beta) for each lambda-value for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such matrices.

lambda

A vector containing the sequence of penalty values used in the estimation procedure for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such vectors.

Obj

A matrix containing the objective values for each iteration and each model for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such matrices.

df

A vector indicating the nonzero model coefficients for each value of lambda for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such vectors.

dimcoef

An integer giving the number p of model parameters. For array data a vector giving the dimension of the model coefficient array β.

dimobs

An integer giving the number of observations. For array data a vector giving the dimension of the observation (response) array Y.

iter

A vector containing the number of iterations for each lambda value for which the procedure converged. When length(zeta) > 1 a length(zeta)-list of such vectors.

Author(s)

Adam Lund

Maintainer: Adam Lund, adam.lund@math.ku.dk

References

Lund, A., S. W. Mogensen and N. R. Hansen (2021). Soft Maximin Estimation for Heterogeneous Data. Preprint. url = https://arxiv.org/abs/1805.02407

Meinshausen, N and P. Buhlmann (2015). Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 43, 4, 1801-1830. url = https://doi.org/10.1214/15-AOS1325.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#Non-array data

##size of example
set.seed(42)
G <- 10; n <- sample(100:500, G); p <- 60
x <- y <- list()

##group design matrices
for(g in 1:G){x[[g]] <- matrix(rnorm(n[g] * p), n[g], p)}

##common features and effects
common_features <- rbinom(p, 1, 0.1)
common_effects <- rnorm(p) * common_features

##group response
for(g in 1:G){
bg <- rnorm(p, 0, 0.5) * (1 - common_features) + common_effects
mu <- x[[g]] %*% bg
y[[g]] <- rnorm(n[g]) + mu
}

##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1), penalty = "lasso", alg = "npg"))
betahat <- fit$coef

##estimated common effects for specific lambda and zeta
modelno <- 6; zetano <- 2
m <- min(betahat[[zetano]][ , modelno], common_effects)
M <- max(betahat[[zetano]][ , modelno], common_effects)
plot(common_effects, type = "p", ylim = c(m, M), col = "red")
lines(betahat[[zetano]][ , modelno], type = "h")

#Array data

##size of example
set.seed(42)
G <- 5; n <- c(65, 26, 13); p <- c(13, 5, 4)

##marginal design matrices (Kronecker components)
x <- list()
for(i in 1:length(n)){x[[i]] <- matrix(rnorm(n[i] * p[i]), n[i], p[i])}

##common features and effects
common_features <- rbinom(prod(p), 1, 0.1)
common_effects <- rnorm(prod(p), 0, 0.1) * common_features

##group response
 y <- array(NA, c(n, G))
for(g in 1:G){
bg <- rnorm(prod(p), 0, 0.1) * (1 - common_features) + common_effects
Bg <- array(bg, p)
mu <- RH(x[[3]], RH(x[[2]], RH(x[[1]], Bg)))
y[,,, g] <- array(rnorm(prod(n)), dim = n) + mu
}

##fit model for range of lambda and zeta
system.time(fit <- softmaximin(x, y, zeta = c(0.1, 1, 10, 100), penalty = "lasso", alg = "npg"))
Betahat <- fit$coef

##estimated common effects for specific lambda and zeta
modelno <- 10; zetano <- 3
m <- min(Betahat[[zetano]][, modelno], common_effects)
M <- max(Betahat[[zetano]][, modelno], common_effects)
plot(common_effects, type = "h", ylim = c(m, M), col = "red")
lines(Betahat[[zetano]][, modelno], type = "h")

SMME documentation built on July 22, 2021, 9:08 a.m.

Related to SMME in SMME...