MADMMplasso: MADMMplasso: Multi Variate Multi Response ADMM with...

View source: R/MADMMplasso.R

MADMMplassoR Documentation

MADMMplasso: Multi Variate Multi Response ADMM with Interaction Effects

Description

This system allows one to model a multi-variate, multi-response problem with interaction effects. It combines the usual squared error loss for the multi-response problem with some penalty terms to encourage responses that correlate to form groups and also allow for modeling main and interaction effects that exit within the covariates. The optimization method employed is the Alternating Direction Method of Multipliers (ADMM). The implementation is based on the methodology presented on Quachie Asenso, T., & Zucknick, M. (2023) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2303.11155")}.

This function fits a multi-response pliable lasso model over a path of regularization values.

Usage

MADMMplasso(
  X,
  Z,
  y,
  alpha,
  my_lambda = NULL,
  lambda_min = 0.001,
  max_it = 50000,
  e.abs = 0.001,
  e.rel = 0.001,
  maxgrid,
  nlambda,
  rho = 5,
  my_print = FALSE,
  alph = 1.8,
  tree,
  pal = cl == 1L,
  gg = NULL,
  tol = 1e-04,
  cl = 1L,
  legacy = FALSE
)

Arguments

X

N by p matrix of predictors

Z

N by K matrix of modifying variables. The elements of Z may represent quantitative or categorical variables, or a mixture of the two. Categorical variables should be coded by 0-1 dummy variables: for a k-level variable, one can use either k or k-1 dummy variables.

y

N by D matrix of responses. The X and Z variables are centered in the function. We recommend that X and Z also be standardized before the call

alpha

mixing parameter. When the goal is to include more interactions, alpha should be very small and vice versa.

my_lambda

user specified lambda_3 values

lambda_min

the smallest value for lambda_3 , as a fraction of max(lambda_3), the (data derived (lammax)) entry value (i.e. the smallest value for which all coefficients are zero)

max_it

maximum number of iterations in loop for one lambda during the ADMM optimization

e.abs

absolute error for the ADMM

e.rel

relative error for the ADMM

maxgrid

number of lambda_3 values desired

nlambda

number of lambda_3 values desired. Similar to maxgrid but can have a value less than or equal to maxgrid.

rho

the Lagrange variable for the ADMM. This value is updated during the ADMM call based on a certain condition.

my_print

Should information form each ADMM iteration be printed along the way? This prints the dual and primal residuals

alph

an overrelaxation parameter in [1, 1.8]. The implementation is borrowed from Stephen Boyd's MATLAB code

tree

The results from the hierarchical clustering of the response matrix. The easy way to obtain this is by using the function (tree_parms) which gives a default clustering. However, user decide on a specific structure and then input a tree that follows such structure.

pal

Should the lapply function be applied for an alternative to parallelization.

gg

penalty term for the tree structure. This is a 2×2 matrix values in the first row representing the maximum to the minimum values for lambda_1 and the second row representing the maximum to the minimum values for lambda_2. In the current setting, we set both maximum and the minimum to be same because cross validation is not carried across the lambda_1 and lambda_2. However, setting different values will work during the model fit.

tol

threshold for the non-zero coefficients

cl

The number of CPUs to be used for parallel processing

legacy

If TRUE, use the R version of the algorithm

Value

predicted values for the MADMMplasso object with the following components: path: a table containing the summary of the model for each lambda_3.

beta0: a list (length=nlambda) of estimated beta_0 coefficients each having a size of 1 by ncol(y)

beta: a list (length=nlambda) of estimated beta coefficients each having a matrix ncol(X) by ncol(y)

BETA_hat: a list (length=nlambda) of estimated beta and theta coefficients each having a matrix (ncol(X)+ncol(X) by ncol(Z)) by ncol(y)

theta0: a list (length=nlambda) of estimated theta_0 coefficients each having a matrix ncol(Z) by ncol(y)

theta: a list (length=nlambda) of estimated theta coefficients each having a an array ncol(X) by ncol(Z) by ncol(y)

Lambdas: values of lambda_3 used

non_zero: number of nonzero betas for each model in path

LOSS: sum of squared of the error for each model in path

Y_HAT: a list (length=nlambda) of predicted response nrow(X) by ncol(y)

gg: penalty term for the tree structure (lambda_1 and lambda_2) for each lambda_3 nlambda by 2

Author(s)

Maintainer: Waldir Leoncio w.l.netto@medisin.uio.no (ORCID)

Authors:

Examples

# Train the model
# generate some data
set.seed(1235)
N <- 100
p <- 50
nz <- 4
K <- nz
X <- matrix(rnorm(n = N * p), nrow = N, ncol = p)
mx <- colMeans(X)
sx <- sqrt(apply(X, 2, var))
X <- scale(X, mx, sx)
X <- matrix(as.numeric(X), N, p)
Z <- matrix(rnorm(N * nz), N, nz)
mz <- colMeans(Z)
sz <- sqrt(apply(Z, 2, var))
Z <- scale(Z, mz, sz)
beta_1 <- rep(x = 0, times = p)
beta_2 <- rep(x = 0, times = p)
beta_3 <- rep(x = 0, times = p)
beta_4 <- rep(x = 0, times = p)
beta_5 <- rep(x = 0, times = p)
beta_6 <- rep(x = 0, times = p)

beta_1[1:5] <- c(2, 2, 2, 2, 2)
beta_2[1:5] <- c(2, 2, 2, 2, 2)
beta_3[6:10] <- c(2, 2, 2, -2, -2)
beta_4[6:10] <- c(2, 2, 2, -2, -2)
beta_5[11:15] <- c(-2, -2, -2, -2, -2)
beta_6[11:15] <- c(-2, -2, -2, -2, -2)

Beta <- cbind(beta_1, beta_2, beta_3, beta_4, beta_5, beta_6)
colnames(Beta) <- 1:6

theta <- array(0, c(p, K, 6))
theta[1, 1, 1] <- 2
theta[3, 2, 1] <- 2
theta[4, 3, 1] <- -2
theta[5, 4, 1] <- -2
theta[1, 1, 2] <- 2
theta[3, 2, 2] <- 2
theta[4, 3, 2] <- -2
theta[5, 4, 2] <- -2
theta[6, 1, 3] <- 2
theta[8, 2, 3] <- 2
theta[9, 3, 3] <- -2
theta[10, 4, 3] <- -2
theta[6, 1, 4] <- 2
theta[8, 2, 4] <- 2
theta[9, 3, 4] <- -2
theta[10, 4, 4] <- -2
theta[11, 1, 5] <- 2
theta[13, 2, 5] <- 2
theta[14, 3, 5] <- -2
theta[15, 4, 5] <- -2
theta[11, 1, 6] <- 2
theta[13, 2, 6] <- 2
theta[14, 3, 6] <- -2
theta[15, 4, 6] <- -2

pliable <- matrix(0, N, 6)
for (e in 1:6) {
  pliable[, e] <- compute_pliable(X, Z, theta[, , e])
}

esd <- diag(6)
e <- MASS::mvrnorm(N, mu = rep(0, 6), Sigma = esd)
y_train <- X %*% Beta + pliable + e
y <- y_train

colnames(y) <- c(paste0("y", seq_len(ncol(y))))
TT <- tree_parms(y)
plot(TT$h_clust)
gg1 <- matrix(0, 2, 2)
gg1[1, ] <- c(0.02, 0.02)
gg1[2, ] <- c(0.02, 0.02)

nlambda <- 1
e.abs <- 1E-4
e.rel <- 1E-2
alpha <- 0.2
tol <- 1E-3
fit <- MADMMplasso(
  X, Z, y,
  alpha = alpha, my_lambda = matrix(rep(0.2, ncol(y)), 1),
  lambda_min = 0.001, max_it = 1000, e.abs = e.abs, e.rel = e.rel,
  maxgrid = nlambda, nlambda = nlambda, rho = 5, tree = TT, my_print = FALSE,
  alph = TRUE, gg = gg1, tol = tol, cl = 2L
)

MADMMplasso documentation built on April 3, 2025, 10:53 p.m.