SMMA: Soft Maximin Estimation for Large Scale Array Data with Known...
In tvw146/SMMA: Soft Maximin Estimation for Large Scale Array-Tensor Models

Description Usage Arguments Details Value Author(s) References Examples

Efficient design matrix free procedure for solving a soft maximin problem for large scale array-tensor structured models, see Lund et al., 2017. Currently Lasso and SCAD penalized estimation is implemented.

softmaximin(X,
            Y,
            penalty = c("lasso", "scad"),
            nlambda = 30,
            lambda.min.ratio = 1e-04,
            lambda = NULL,
            penalty.factor = NULL,
            reltol = 1e-05,
            maxiter = 15000,
            steps = 1,
            btmax = 100,
            zeta = 2,
            c = 0.001,
            Delta0 = 1,
            nu = 1,
            alg = c("npg", "mfista"),
            log = TRUE)

`X`	A list containing the Kronecker components (1,2 or 3) of the Kronecker design matrix. These are matrices of sizes n_i \times p_i.
`Y`	The response values, an array of size n_1 \times\cdots\times n_d \times G.
`penalty`	A string specifying the penalty. Possible values are `"lasso", "scad"`.
`nlambda`	The number of `lambda` values.
`lambda.min.ratio`	The smallest value for `lambda`, given as a fraction of λ_{max}; the (data dependent) smallest value for which all coefficients are zero.
`lambda`	The sequence of penalty parameters for the regularization path.
`penalty.factor`	An array of size p_1 \times \cdots \times p_d. Is multiplied with each element in `lambda` to allow differential shrinkage on the coefficients.
`reltol`	The convergence tolerance for the inner loop.
`maxiter`	The maximum number of iterations allowed for each `lambda` value, when summing over all outer iterations for said `lambda`.
`steps`	The number of steps used in the multi-step adaptive lasso algorithm for non-convex penalties. Automatically set to 1 when `penalty = "lasso"`.
`btmax`	Maximum number of backtracking steps allowed in each iteration. Default is `btmax = 100`.
`zeta`	Constant controlling the softmax apprximation accuracy. Must be strictly positive. Default is `zeta = 2`.
`c`	constant used in the NPG algorithm. Must be strictly positive. Default is `c = 0.001`.
`Delta0`	constant used to bound the stepsize. Must be strictly positive. Default is `Delta0 = 1`.
`nu`	constant used to control the stepsize. Must be positive. A small value gives a big stepsize. Default is `nu = 1`.
`alg`	string indicating which algortihm to use. Possible values are `"npg", "mfista"`.
`log`	logical variable indicating wheter to use log-loss to or not. TRUE is default and yields the problem described below.

In Lund et al., 2017 the following mixed model setup for d-dimensional array data, d=1,2,3, with known fixed group structure and tensor structured design matrix is considered: With G groups, g \in \{1,…,G\}, n is the number of observations in each group, Y_g:=(y_{i},…,y_{i_{n}})^\top the group-specific n_1\times \cdots \times n_d response array and X a n\times p design matrix, with tensor structure

X = \bigotimes_{i=1}^d X_i,

where for d =1,2,3, X_1,…, X_d are the marginal n_i\times p_i design matrices (Kronecker components). Using the GLAM framework the model equation is

Y_g = ρ(X_d,ρ(X_{d-1},…,ρ(X_1,B_g))) + E,

where ρ is the so called rotated H-transfrom (see Currie et al., 2006), B_g for each g is a random p_1\times\cdots\times p_d parameter array and E is n_1\times \cdots \times n_d error array uncorrelated with X. Note that for d=1 the model is a GLM.

In Lund et al., 2017 a penalized soft maximin problem, given as

\min_{β}\log\bigg(∑_{g=1}^G \exp(-ζ \hat V_g(β))\bigg) + λ J (β),

is proposed where J is a proper and convex penalty, ζ > 0 and

\hat V_g(β):=\frac{1}{n}(2β^\top X^\top y_g-β^\top X^\top Xβ),

y_g:=vec(Y_g), is the minimal empirical explained variance from Meinshausen and Buhlmann, 2015.

For d=1,2,3, using only the marginal matrices X_1,X_2,… (for d=1 there is only one marginal), the function softmaximin solves the soft maximin problem for a sequence of penalty parameters λ_{max}>… >λ_{min}>0. The underlying algorithm is based on a non-monotone proximal gradient method. We note that if J is not convex, as with the SCAD penalty, we use the multiple step adaptive lasso procedure to loop over the proximal algorithm, see Lund et al., 2017 for more details.

An object with S3 Class "SMMA".

`spec`	A string indicating the array dimension (1, 2 or 3) and the penalty.
`coef`	A p_1\cdots p_d \times `nlambda` matrix containing the estimates of the model coefficients (`beta`) for each `lambda`-value.
`lambda`	A vector containing the sequence of penalty values used in the estimation procedure.
`Obj`	A matrix containing the objective values for each iteration and each model.
`df`	The number of nonzero coefficients for each value of `lambda`.
`dimcoef`	A vector giving the dimension of the model coefficient array β.
`dimobs`	A vector giving the dimension of the observation (response) array `Y`.
`Iter`	A list with 4 items: `bt_iter` is total number of backtracking steps performed, `bt_enter` is the number of times the backtracking is initiated, and `iter_mat` is a vector containing the number of iterations for each `lambda` value and `iter` is total number of iterations i.e. `sum(Iter)`.

Adam Lund

Maintainer: Adam Lund, adam.lund@math.ku.dk

Lund, A., S. W. Mogensen and N. R. Hansen (2017). Estimating Soft Maximin Effects in Heterogeneous Large-scale Array Data. Preprint.

Meinshausen, N and P. Buhlmann (2015). Maximin effects in inhomogeneous large-scale data. The Annals of Statistics. 43, 4, 1801-1830. url = https://doi.org/10.1214/15-AOS1325.

Currie, I. D., M. Durban, and P. H. C. Eilers (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society. Series B. 68, 259-280. url = http://dx.doi.org/10.1111/j.1467-9868.2006.00543.x.

##size of example
n1 <- 65; n2 <- 26; n3 <- 13; p1 <- 13; p2 <- 5; p3 <- 4

##marginal design matrices (Kronecker components)
X1 <- matrix(rnorm(n1 * p1), n1, p1)
X2 <- matrix(rnorm(n2 * p2), n2, p2)
X3 <- matrix(rnorm(n3 * p3), n3, p3)
X <- list(X1, X2, X3)

component <- rbinom(p1 * p2 * p3, 1, 0.1)
Beta1 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3))
mu1 <- RH(X3, RH(X2, RH(X1, Beta1)))
Y1 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu1
Beta2 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3))
mu2 <- RH(X3, RH(X2, RH(X1, Beta2)))
Y2 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu2
Beta3 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3))
mu3 <- RH(X3, RH(X2, RH(X1, Beta3)))
Y3 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu3
Beta4 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3))
mu4 <- RH(X3, RH(X2, RH(X1, Beta4)))
Y4 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu4
Beta5 <- array(rnorm(p1 * p2 * p3, 0, 0.1) + component, c(p1 , p2, p3))
mu5 <- RH(X3, RH(X2, RH(X1, Beta5)))
Y5 <- array(rnorm(n1 * n2 * n3), dim = c(n1, n2, n3)) + mu5

Y <- array(NA, c(dim(Y1), 5))
Y[,,, 1] <- Y1; Y[,,, 2] <- Y2; Y[,,, 3] <- Y3; Y[,,, 4] <- Y4; Y[,,, 5] <- Y5;

fit <- softmaximin(X, Y, penalty = "lasso", alg = "npg")
Betafit <- fit$coef

modelno <- 15
m <- min(Betafit[ , modelno], c(component))
M <- max(Betafit[ , modelno], c(component))
plot(c(component), type="l", ylim = c(m, M))
lines(Betafit[ , modelno], col = "red")