View source: R/cv_MADMMplasso.R
| cv_MADMMplasso | R Documentation | 
Carries out cross-validation for a MADMMplasso model over a path of regularization values
cv_MADMMplasso(
  fit,
  nfolds,
  X,
  Z,
  y,
  alpha = 0.5,
  lambda = fit$Lambdas,
  max_it = 50000,
  e.abs = 0.001,
  e.rel = 0.001,
  nlambda,
  rho = 5,
  my_print = FALSE,
  alph = 1,
  foldid = NULL,
  pal = cl == 1L,
  gg = c(7, 0.5),
  TT,
  tol = 1e-04,
  cl = 1L,
  legacy = FALSE
)
fit | 
 object returned by the MADMMplasso function  | 
nfolds | 
 number of cross-validation folds  | 
X | 
 N by p matrix of predictors  | 
Z | 
 N by K matrix of modifying variables. The elements of Z may represent quantitative or categorical variables, or a mixture of the two. Categorical variables should be coded by 0-1 dummy variables: for a k-level variable, one can use either k or k-1 dummy variables.  | 
y | 
 N by D matrix of responses. The X and Z variables are centered in the function. We recommend that X and Z also be standardized before the call  | 
alpha | 
 mixing parameter. When the goal is to include more interactions, alpha should be very small and vice versa.  | 
lambda | 
 user specified lambda_3 values.  | 
max_it | 
 maximum number of iterations in loop for one lambda during the ADMM optimization  | 
e.abs | 
 absolute error for the ADMM  | 
e.rel | 
 relative error for the ADMM  | 
nlambda | 
 number of lambda_3 values desired. Similar to maxgrid but can have a value less than or equal to maxgrid.  | 
rho | 
 the Lagrange variable for the ADMM. This value is updated during the ADMM call based on a certain condition.  | 
my_print | 
 Should information form each ADMM iteration be printed along the way? This prints the dual and primal residuals  | 
alph | 
 an overrelaxation parameter in [1, 1.8]. The implementation is borrowed from Stephen Boyd's MATLAB code  | 
foldid | 
 vector with values in 1:K, indicating folds for K-fold CV. Default NULL  | 
pal | 
 Should the lapply function be applied for an alternative to parallelization.  | 
gg | 
 penalty term for the tree structure. This is a 2×2 matrix values in the first row representing the maximum to the minimum values for lambda_1 and the second row representing the maximum to the minimum values for lambda_2. In the current setting, we set both maximum and the minimum to be same because cross validation is not carried across the lambda_1 and lambda_2. However, setting different values will work during the model fit.  | 
TT | 
 The results from the hierarchical clustering of the response matrix. This should same as the parameter tree used during the MADMMplasso call.  | 
tol | 
 threshold for the non-zero coefficients  | 
cl | 
 The number of CPUs to be used for parallel processing  | 
legacy | 
 If   | 
results containing the CV values
# Train the model
# generate some data
set.seed(1235)
N <- 100
p <- 50
nz <- 4
K <- nz
X <- matrix(rnorm(n = N * p), nrow = N, ncol = p)
mx <- colMeans(X)
sx <- sqrt(apply(X, 2, var))
X <- scale(X, mx, sx)
X <- matrix(as.numeric(X), N, p)
Z <- matrix(rnorm(N * nz), N, nz)
mz <- colMeans(Z)
sz <- sqrt(apply(Z, 2, var))
Z <- scale(Z, mz, sz)
beta_1 <- rep(x = 0, times = p)
beta_2 <- rep(x = 0, times = p)
beta_3 <- rep(x = 0, times = p)
beta_4 <- rep(x = 0, times = p)
beta_5 <- rep(x = 0, times = p)
beta_6 <- rep(x = 0, times = p)
beta_1[1:5] <- c(2, 2, 2, 2, 2)
beta_2[1:5] <- c(2, 2, 2, 2, 2)
beta_3[6:10] <- c(2, 2, 2, -2, -2)
beta_4[6:10] <- c(2, 2, 2, -2, -2)
beta_5[11:15] <- c(-2, -2, -2, -2, -2)
beta_6[11:15] <- c(-2, -2, -2, -2, -2)
Beta <- cbind(beta_1, beta_2, beta_3, beta_4, beta_5, beta_6)
colnames(Beta) <- c(1:6)
theta <- array(0, c(p, K, 6))
theta[1, 1, 1] <- 2
theta[3, 2, 1] <- 2
theta[4, 3, 1] <- -2
theta[5, 4, 1] <- -2
theta[1, 1, 2] <- 2
theta[3, 2, 2] <- 2
theta[4, 3, 2] <- -2
theta[5, 4, 2] <- -2
theta[6, 1, 3] <- 2
theta[8, 2, 3] <- 2
theta[9, 3, 3] <- -2
theta[10, 4, 3] <- -2
theta[6, 1, 4] <- 2
theta[8, 2, 4] <- 2
theta[9, 3, 4] <- -2
theta[10, 4, 4] <- -2
theta[11, 1, 5] <- 2
theta[13, 2, 5] <- 2
theta[14, 3, 5] <- -2
theta[15, 4, 5] <- -2
theta[11, 1, 6] <- 2
theta[13, 2, 6] <- 2
theta[14, 3, 6] <- -2
theta[15, 4, 6] <- -2
pliable <- matrix(0, N, 6)
for (e in 1:6) {
  pliable[, e] <- compute_pliable(X, Z, theta[, , e])
}
esd <- diag(6)
e <- MASS::mvrnorm(N, mu = rep(0, 6), Sigma = esd)
y_train <- X %*% Beta + pliable + e
y <- y_train
colnames(y) <- c(paste("y", 1:(ncol(y)), sep = ""))
TT <- tree_parms(y)
plot(TT$h_clust)
gg1 <- matrix(0, 2, 2)
gg1[1, ] <- c(0.02, 0.02)
gg1[2, ] <- c(0.02, 0.02)
nlambda <- 3
e.abs <- 1E-3
e.rel <- 1E-1
alpha <- .2
tol <- 1E-2
fit <- MADMMplasso(
  X, Z, y, alpha = alpha, my_lambda = NULL, lambda_min = 0.001, max_it = 100,
  e.abs = e.abs, e.rel = e.rel, maxgrid = nlambda, nlambda = nlambda, rho = 5,
  tree = TT, my_print = FALSE, alph = 1, gg = gg1, tol = tol, cl = 2L
)
cv_admp <- cv_MADMMplasso(
  fit, nfolds = 5, X, Z, y, alpha = alpha, lambda = fit$Lambdas, max_it = 100,
  e.abs = e.abs, e.rel = e.rel, nlambda, rho = 5, my_print = FALSE, alph = 1,
  foldid = NULL, gg = fit$gg, TT = TT, tol = tol
)
plot(cv_admp)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.