MADMMplasso | R Documentation |
This system allows one to model a multi-variate, multi-response problem with interaction effects. It combines the usual squared error loss for the multi-response problem with some penalty terms to encourage responses that correlate to form groups and also allow for modeling main and interaction effects that exit within the covariates. The optimization method employed is the Alternating Direction Method of Multipliers (ADMM). The implementation is based on the methodology presented on Quachie Asenso, T., & Zucknick, M. (2023) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2303.11155")}.
This function fits a multi-response pliable lasso model over a path of regularization values.
MADMMplasso(
X,
Z,
y,
alpha,
my_lambda = NULL,
lambda_min = 0.001,
max_it = 50000,
e.abs = 0.001,
e.rel = 0.001,
maxgrid,
nlambda,
rho = 5,
my_print = FALSE,
alph = 1.8,
tree,
pal = cl == 1L,
gg = NULL,
tol = 1e-04,
cl = 1L,
legacy = FALSE
)
X |
N by p matrix of predictors |
Z |
N by K matrix of modifying variables. The elements of Z may represent quantitative or categorical variables, or a mixture of the two. Categorical variables should be coded by 0-1 dummy variables: for a k-level variable, one can use either k or k-1 dummy variables. |
y |
N by D matrix of responses. The X and Z variables are centered in the function. We recommend that X and Z also be standardized before the call |
alpha |
mixing parameter. When the goal is to include more interactions, alpha should be very small and vice versa. |
my_lambda |
user specified lambda_3 values |
lambda_min |
the smallest value for lambda_3 , as a fraction of max(lambda_3), the (data derived (lammax)) entry value (i.e. the smallest value for which all coefficients are zero) |
max_it |
maximum number of iterations in loop for one lambda during the ADMM optimization |
e.abs |
absolute error for the ADMM |
e.rel |
relative error for the ADMM |
maxgrid |
number of lambda_3 values desired |
nlambda |
number of lambda_3 values desired. Similar to maxgrid but can have a value less than or equal to maxgrid. |
rho |
the Lagrange variable for the ADMM. This value is updated during the ADMM call based on a certain condition. |
my_print |
Should information form each ADMM iteration be printed along the way? This prints the dual and primal residuals |
alph |
an overrelaxation parameter in [1, 1.8]. The implementation is borrowed from Stephen Boyd's MATLAB code |
tree |
The results from the hierarchical clustering of the response matrix. The easy way to obtain this is by using the function (tree_parms) which gives a default clustering. However, user decide on a specific structure and then input a tree that follows such structure. |
pal |
Should the lapply function be applied for an alternative to parallelization. |
gg |
penalty term for the tree structure. This is a 2×2 matrix values in the first row representing the maximum to the minimum values for lambda_1 and the second row representing the maximum to the minimum values for lambda_2. In the current setting, we set both maximum and the minimum to be same because cross validation is not carried across the lambda_1 and lambda_2. However, setting different values will work during the model fit. |
tol |
threshold for the non-zero coefficients |
cl |
The number of CPUs to be used for parallel processing |
legacy |
If |
predicted values for the MADMMplasso object with the following components: path: a table containing the summary of the model for each lambda_3.
beta0: a list (length=nlambda) of estimated beta_0 coefficients each having a size of 1 by ncol(y)
beta: a list (length=nlambda) of estimated beta coefficients each having a matrix ncol(X) by ncol(y)
BETA_hat: a list (length=nlambda) of estimated beta and theta coefficients each having a matrix (ncol(X)+ncol(X) by ncol(Z)) by ncol(y)
theta0: a list (length=nlambda) of estimated theta_0 coefficients each having a matrix ncol(Z) by ncol(y)
theta: a list (length=nlambda) of estimated theta coefficients each having a an array ncol(X) by ncol(Z) by ncol(y)
Lambdas: values of lambda_3 used
non_zero: number of nonzero betas for each model in path
LOSS: sum of squared of the error for each model in path
Y_HAT: a list (length=nlambda) of predicted response nrow(X) by ncol(y)
gg: penalty term for the tree structure (lambda_1 and lambda_2) for each lambda_3 nlambda by 2
Maintainer: Waldir Leoncio w.l.netto@medisin.uio.no (ORCID)
Authors:
Theophilus Quachie Asenso t.q.asenso@medisin.uio.no
Manuela Zucknick Manuela.zucknick@medisin.uio.no
Chi Zhang andreachizhang@yahoo.com
# Train the model
# generate some data
set.seed(1235)
N <- 100
p <- 50
nz <- 4
K <- nz
X <- matrix(rnorm(n = N * p), nrow = N, ncol = p)
mx <- colMeans(X)
sx <- sqrt(apply(X, 2, var))
X <- scale(X, mx, sx)
X <- matrix(as.numeric(X), N, p)
Z <- matrix(rnorm(N * nz), N, nz)
mz <- colMeans(Z)
sz <- sqrt(apply(Z, 2, var))
Z <- scale(Z, mz, sz)
beta_1 <- rep(x = 0, times = p)
beta_2 <- rep(x = 0, times = p)
beta_3 <- rep(x = 0, times = p)
beta_4 <- rep(x = 0, times = p)
beta_5 <- rep(x = 0, times = p)
beta_6 <- rep(x = 0, times = p)
beta_1[1:5] <- c(2, 2, 2, 2, 2)
beta_2[1:5] <- c(2, 2, 2, 2, 2)
beta_3[6:10] <- c(2, 2, 2, -2, -2)
beta_4[6:10] <- c(2, 2, 2, -2, -2)
beta_5[11:15] <- c(-2, -2, -2, -2, -2)
beta_6[11:15] <- c(-2, -2, -2, -2, -2)
Beta <- cbind(beta_1, beta_2, beta_3, beta_4, beta_5, beta_6)
colnames(Beta) <- 1:6
theta <- array(0, c(p, K, 6))
theta[1, 1, 1] <- 2
theta[3, 2, 1] <- 2
theta[4, 3, 1] <- -2
theta[5, 4, 1] <- -2
theta[1, 1, 2] <- 2
theta[3, 2, 2] <- 2
theta[4, 3, 2] <- -2
theta[5, 4, 2] <- -2
theta[6, 1, 3] <- 2
theta[8, 2, 3] <- 2
theta[9, 3, 3] <- -2
theta[10, 4, 3] <- -2
theta[6, 1, 4] <- 2
theta[8, 2, 4] <- 2
theta[9, 3, 4] <- -2
theta[10, 4, 4] <- -2
theta[11, 1, 5] <- 2
theta[13, 2, 5] <- 2
theta[14, 3, 5] <- -2
theta[15, 4, 5] <- -2
theta[11, 1, 6] <- 2
theta[13, 2, 6] <- 2
theta[14, 3, 6] <- -2
theta[15, 4, 6] <- -2
pliable <- matrix(0, N, 6)
for (e in 1:6) {
pliable[, e] <- compute_pliable(X, Z, theta[, , e])
}
esd <- diag(6)
e <- MASS::mvrnorm(N, mu = rep(0, 6), Sigma = esd)
y_train <- X %*% Beta + pliable + e
y <- y_train
colnames(y) <- c(paste0("y", seq_len(ncol(y))))
TT <- tree_parms(y)
plot(TT$h_clust)
gg1 <- matrix(0, 2, 2)
gg1[1, ] <- c(0.02, 0.02)
gg1[2, ] <- c(0.02, 0.02)
nlambda <- 1
e.abs <- 1E-4
e.rel <- 1E-2
alpha <- 0.2
tol <- 1E-3
fit <- MADMMplasso(
X, Z, y,
alpha = alpha, my_lambda = matrix(rep(0.2, ncol(y)), 1),
lambda_min = 0.001, max_it = 1000, e.abs = e.abs, e.rel = e.rel,
maxgrid = nlambda, nlambda = nlambda, rho = 5, tree = TT, my_print = FALSE,
alph = TRUE, gg = gg1, tol = tol, cl = 2L
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.