R/missoNet.R
In missoNet: Joint Sparse Regression & Network Learning with Missing Data

Documented in missoNet

#' Fit missoNet models with missing responses
#'
#' @description
#' Fit a penalized multi-task regression with a response-network (\eqn{\Theta})
#' under missing responses. The method jointly estimates the coefficient matrix
#' \eqn{\mathbf{B}} and the precision matrix \eqn{\Theta} via penalized
#' likelihood with \eqn{\ell_1} penalties on \eqn{\mathbf{B}} and the off-diagonal
#' entries of \eqn{\Theta}.
#'
#' @details
#' The conditional Gaussian model is
#' \deqn{Y_i = \mu + X_i \mathbf{B} + E_i, \qquad E_i \sim \mathcal{N}_q(0, \Theta^{-1}).}
#' 
#' where:
#' \itemize{
#'   \item \eqn{Y_i} is the \eqn{i}-th observation of \eqn{q} responses
#'   \item \eqn{X_i} is the \eqn{i}-th observation of \eqn{p} predictors
#'   \item \eqn{\mathbf{B}} is the \eqn{p \times q} coefficient matrix
#'   \item \eqn{\Theta} is the \eqn{q \times q} precision matrix
#'   \item \eqn{\mu} is the intercept vector
#' }
#' 
#' The parameters are estimated by solving:
#' \deqn{\min_{\mathbf{B}, \Theta \succ 0} \quad g(\mathbf{B}, \Theta) + 
#'   \lambda_B \|\mathbf{B}\|_1 + \lambda_\Theta \|\Theta\|_{1,\mathrm{off}}}
#' 
#' where \eqn{g} is the negative log-likelihood.
#'
#' Missing values in \code{Y} are accommodated through unbiased estimating equations
#' using column-wise observation probabilities. Internally, \code{X} and \code{Y}
#' may be standardized for numerical stability; returned estimates are re-scaled
#' back to the original units.
#'
#' The grid search spans \code{lambda.beta} and \code{lambda.theta}. The optimal
#' pair is selected by the user-chosen goodness-of-fit criterion \code{GoF}:
#' \code{"AIC"}, \code{"BIC"}, or \code{"eBIC"} (default). If
#' \code{adaptive.search = TRUE}, a two-stage pre-optimization narrows the grid
#' before the main search (faster on large problems, with a small risk of missing
#' the global optimum).
#'
#' @param X Numeric matrix (\eqn{n \times p}). Predictors (no missing values).
#' @param Y Numeric matrix (\eqn{n \times q}). Responses, may contain \code{NA}/\code{NaN}.
#' @param rho Optional numeric vector of length \eqn{q}. Working missingness
#'   probabilities; if \code{NULL} (default), estimated from \code{Y}.
#' @param GoF Character. Goodness-of-fit criterion: \code{"AIC"}, \code{"BIC"},
#'   or \code{"eBIC"} (default).
#' @param lambda.beta,lambda.theta Optional numeric vectors (or scalars). 
#'   Candidate regularization paths for \eqn{\mathbf{B}} and \eqn{\Theta}.
#'   If \code{NULL}, paths are generated automatically.
#' @param lambda.beta.min.ratio,lambda.theta.min.ratio Optional numerics in \code{(0,1]}.
#'   Ratio of the smallest to largest lambda when generating paths (ignored if the
#'   corresponding \code{lambda.*} is supplied).
#' @param n.lambda.beta,n.lambda.theta Optional integers. Lengths of automatically
#'   generated lambda paths (ignored if the corresponding \code{lambda.*} is supplied).
#' @param beta.pen.factor Optional \eqn{p \times q} non-negative matrix of element-wise
#'   penalty multipliers for \eqn{\mathbf{B}}. \code{Inf} = maximum penalty;
#'   \code{0} = no penalty for that coefficient. Default: all 1s (equal penalty).
#' @param theta.pen.factor Optional \eqn{q \times q} non-negative matrix of element-wise
#'   penalty multipliers for \eqn{\Theta}. Off-diagonal entries control edge
#'   penalties; diagonal treatment is governed by \code{penalize.diagonal}.
#'   \code{Inf} = maximum penalty; \code{0} = no penalty for that coefficient. 
#'   Default: all 1s (equal penalty).
#' @param penalize.diagonal Logical or \code{NULL}. Whether to penalize the diagonal
#'   of \eqn{\Theta}. If \code{NULL} (default) the choice is made automatically.
#' @param beta.max.iter,theta.max.iter Integers. Max iterations for the
#'   \eqn{\mathbf{B}} update (FISTA) and \eqn{\Theta} update (graphical lasso).
#'   Defaults: \code{10000}.
#' @param beta.tol,theta.tol Numerics \eqn{> 0}. Convergence tolerances for the
#'   \eqn{\mathbf{B}} and \eqn{\Theta} updates. Defaults: \code{1e-5}.
#' @param eta Numeric in \code{(0,1)}. Backtracking line-search parameter for the
#'   \eqn{\mathbf{B}} update (default \code{0.8}).
#' @param eps Numeric in \code{(0,1)}. Eigenvalue floor used to stabilize positive
#'   definiteness operations (default \code{1e-8}).
#' @param standardize Logical. Standardize columns of \code{X} internally? Default \code{TRUE}.
#' @param standardize.response Logical. Standardize columns of \code{Y} internally?
#'   Default \code{TRUE}.
#' @param relax.net (Experimental) Logical. If \code{TRUE}, refit active edges of \eqn{\Theta}
#'   without \eqn{\ell_1} penalty (de-biased network). Default \code{FALSE}.
#' @param adaptive.search (Experimental) Logical. Use adaptive two-stage lambda search? Default \code{FALSE}.
#' @param parallel Logical. Evaluate parts of the grid in parallel using a provided
#'   cluster? Default \code{FALSE}.
#' @param cl Optional cluster from \code{parallel::makeCluster()} (required if
#'   \code{parallel = TRUE}).
#' @param verbose Integer in \code{0,1,2}. \code{0} = silent, \code{1} = progress (default),
#'   \code{2} = detailed tracing (not supported in parallel mode).
#'
#' @return
#' A list of class \code{"missoNet"} with components:
#' \describe{
#'   \item{est.min}{List at the selected lambda pair:
#'     \code{Beta} (\eqn{p \times q}), \code{Theta} (\eqn{q \times q}),
#'     intercept \code{mu} (length \eqn{q}), \code{lambda.beta}, \code{lambda.theta},
#'     \code{lambda.beta.idx}, \code{lambda.theta.idx}, scalar \code{gof}
#'     (AIC/BIC/eBIC at optimum), and (if requested) \code{relax.net}.}
#'   \item{rho}{Length-\eqn{q} vector of working missingness probabilities.}
#'   \item{lambda.beta.seq, lambda.theta.seq}{Unique lambda values explored along
#'     the grid for \eqn{\mathbf{B}} and \eqn{\Theta}.}
#'   \item{penalize.diagonal}{Logical indicating whether the diagonal of
#'     \eqn{\Theta} was penalized.}
#'   \item{beta.pen.factor, theta.pen.factor}{Penalty factor matrices actually used.}
#'   \item{param_set}{List with fitting diagnostics:
#'     \code{n}, \code{p}, \code{q}, \code{standardize}, \code{standardize.response},
#'     the vector of criterion values \code{gof}, and the evaluated grids
#'     \code{gof.grid.beta}, \code{gof.grid.theta} (length equals number of fitted models).}
#' }
#'
#' @examples
#' sim <- generateData(n = 120, p = 10, q = 6, rho = 0.1)
#' X <- sim$X; Y <- sim$Z
#'
#' \donttest{
#' # Fit with defaults (criterion = eBIC)
#' fit1 <- missoNet(X, Y)
#' # Extract the optimal estimates
#' Beta.hat <- fit1$est.min$Beta
#' Theta.hat <- fit1$est.min$Theta
#' 
#' # Plot missoNet results
#' plot(fit1, type = "heatmap")
#' plot(fit1, type = "scatter")
#'
#' # Provide short lambda paths
#' fit2 <- missoNet(
#'   X, Y,
#'   lambda.beta  = 10^seq(0, -2, length.out = 5),
#'   lambda.theta = 10^seq(0, -2, length.out = 5),
#'   GoF = "BIC"
#' )
#' 
#' # Test single lambda choice
#' fit3 <- missoNet(
#'   X, Y,
#'   lambda.beta  = 0.1,
#'   lambda.theta = 0.1,
#' )
#' 
#' # De-biased network on the active set
#' fit4 <- missoNet(X, Y, relax.net = TRUE, verbose = 0)
#' 
#' # Adaptive search for large problems
#' fit5 <- missoNet(X = X, Y = Y, adaptive.search = TRUE, verbose = 0)
#'
#' # Parallel (requires a cluster)
#' library(parallel)
#' cl <- makeCluster(2)
#' fit_par <- missoNet(X, Y, parallel = TRUE, cl = cl, verbose = 0)
#' stopCluster(cl)
#' }
#'
#' @seealso
#' \code{\link{cv.missoNet}} for cross-validated selection;
#' generic methods such as \code{plot()} and \code{predict()} for objects of class
#' \code{"missoNet"}.
#'
#' @references
#' Zeng, Y., et al. (2025). \emph{Multivariate regression with missing response data
#' for modelling regional DNA methylation QTLs}. arXiv:2507.05990.
#'
#' @author
#' Yixiao Zeng \email{yixiao.zeng@@mail.mcgill.ca}, Celia M. T. Greenwood
#' @export

missoNet <- function(X, Y, 
                     rho = NULL, 
                     GoF = "eBIC",
                     lambda.beta = NULL, 
                     lambda.theta = NULL,
                     lambda.beta.min.ratio = NULL, 
                     lambda.theta.min.ratio = NULL,
                     n.lambda.beta = NULL,
                     n.lambda.theta = NULL,
                     beta.pen.factor = NULL,
                     theta.pen.factor = NULL,
                     penalize.diagonal = NULL,
                     beta.max.iter = 10000,
                     beta.tol = 1e-5,
                     theta.max.iter = 10000, 
                     theta.tol = 1e-5,
                     eta = 0.8,
                     eps = 1e-8,
                     standardize = TRUE, 
                     standardize.response = TRUE,
                     relax.net = FALSE, 
                     adaptive.search = FALSE,
                     parallel = FALSE,
                     cl = NULL, 
                     verbose = 1) {
  
  ## ---------- Input validation ----------
  if (!is.matrix(X) && !is.data.frame(X)) stop("`X` must be a matrix or data.frame.")
  if (!is.matrix(Y) && !is.data.frame(Y)) stop("`Y` must be a matrix or data.frame.")
  
  X <- as.matrix(X)
  Y <- as.matrix(Y)
  
  stopifnot(
    "X and Y must have same number of rows" = nrow(X) == nrow(Y),
    "GoF must be 'AIC', 'BIC', or 'eBIC'" = GoF %in% c("AIC", "BIC", "eBIC"),
    "lambda.theta must be non-negative" = all(is.null(lambda.theta) | lambda.theta >= 0),
    "lambda.beta must be non-negative" = all(is.null(lambda.beta) | lambda.beta >= 0),
    "beta.max.iter must be positive" = beta.max.iter > 0 && is.finite(beta.max.iter),
    "theta.max.iter must be positive" = theta.max.iter > 0 && is.finite(theta.max.iter),
    "beta.tol must be positive" = beta.tol > 0 && is.finite(beta.tol),
    "theta.tol must be positive" = theta.tol > 0 && is.finite(theta.tol),
    "eps must be in (0, 1)" = eps > 0 && eps < 1,
    "eta must be in (0, 1)" = eta > 0 && eta < 1,
    "verbose must be 0, 1, or 2" = verbose %in% c(0, 1, 2)
  )
  
  n <- nrow(X); p <- ncol(X); q <- ncol(Y)
  
  check_data_condition <- function(X, Y, verbose) {
    # Check for multicollinearity
    if (ncol(X) <= nrow(X) && ncol(X) > 1) {
      cor_X <- suppressWarnings(cor(X, use = "complete.obs"))
      if (any(is.finite(cor_X))) {
        diag(cor_X) <- 0
        max_cor <- max(abs(cor_X), na.rm = TRUE)
        if (max_cor > 0.95) {
          warning("High collinearity detected (max correlation: ", 
                  round(max_cor, 3), "). Consider removing redundant predictors.")
        }
      }
    }
    
    # Check for near-zero variance
    var_X <- apply(X, 2, var, na.rm = TRUE)
    if (any(var_X <= .Machine$double.eps * 100, na.rm = TRUE)) {
      warning("Near-zero variance predictors detected. These may cause numerical issues.")
    }
    
    # Condition number warning (only when p <= n)
    if (ncol(X) <= nrow(X)) {
      svd.X <- svd(X, nu = 0, nv = 0)
      cond.num <- max(svd.X$d) / max(min(svd.X$d), .Machine$double.eps * 100)
      if (cond.num > 1e12) {
        warning("Predictor matrix `X` appears ill-conditioned (cond. no. = ",
                format(cond.num, scientific = TRUE), ").")
      }
    }
    
    return(invisible(NULL))
  }
  
  check_data_condition(X, Y, verbose)
  
  ## ---------- Header ----------
  if (verbose > 0) {
    cat("\n=============================================================\n")
    cat("                          missoNet\n")
    cat("=============================================================\n")
    cat("\n> Initializing model...\n")
  }
  
  ## ---------- Parameter initialization ----------
  init.obj <- InitParams(
    X = X, Y = Y, rho = rho,
    lamB.pf = beta.pen.factor,
    lamTh.pf = theta.pen.factor,
    pdiag = penalize.diagonal,
    standardize = standardize,
    standardize.response = standardize.response
  )
  
  # Adaptive tolerance adjustment for ill-conditioned problems
  if (init.obj$n_eff < max(p, q)) {
    beta.tol <- max(beta.tol, 1e-4)
    theta.tol <- max(theta.tol, 1e-4)
    if (verbose > 0) {
      cat("Note: relaxed tolerances due to limited effective n\n")
    }
  }
  
  ## ---------- Model configuration ----------
  if (verbose > 0) {
    cat("\n--- Model Configuration -------------------------------------\n")
    cat(sprintf("  Data dimensions:      n = %5d, p = %4d, q = %4d\n", n, p, q))
    cat(sprintf("  Missing rate (avg):   %5.1f%%\n", mean(init.obj$rho.vec) * 100))
    cat(sprintf("  Selection criterion:  %s\n", GoF))
  }
  
  # High missingness warning
  if (mean(init.obj$rho.vec) > 0.5) {
    message("\n============================================================")
    message("HIGH MISSINGNESS WARNING")
    message("------------------------------------------------------------")
    message("Over 50% missing data detected. Recommendations:")
    message("- Consider imputation before analysis")
    message("- Use conservative regularization (larger lambda values)")
    message("- Interpret results with caution")
    message("============================================================\n")
  }
  
  ## ---------- Lambda preparation ----------
  if (isTRUE(adaptive.search)) {
    if (verbose > 0) cat("  Lambda grid:          adaptive (fast pre-test)\n")
    
    step.obj <- stepOptim(
      X = X, Y = Y, init.obj = init.obj, GoF = GoF, cv = FALSE,
      eta = eta, eps = eps,
      lamB.vec = lambda.beta, lamTh.vec = lambda.theta,
      n.lamB = n.lambda.beta, n.lamTh = n.lambda.theta,
      lamB.min.ratio = lambda.beta.min.ratio, lamTh.min.ratio = lambda.theta.min.ratio,
      lamB.scale.factor = NULL, lamTh.scale.factor = NULL,
      Beta.maxit = beta.max.iter, Beta.thr = beta.tol,
      Theta.maxit = theta.max.iter, Theta.thr = theta.tol,
      verbose = verbose
    )
    
    golden_ratio <- 1.618
    obs_prob_avg <- mean(init.obj$obs_prob)
    
    lamB.range <- c(
      max(step.obj$lamB.min / golden_ratio^2, max(step.obj$lamB.unique) * 0.001) / max(obs_prob_avg, 0.1),
      min(step.obj$lamB.min * golden_ratio^8, max(step.obj$lamB.unique) * 0.75)
    )
    lamTh.range <- c(
      max(step.obj$lamTh.min / golden_ratio^8, max(step.obj$lamTh.unique) * 0.001) / max(obs_prob_avg, 0.1),
      min(step.obj$lamTh.min * golden_ratio^8, max(step.obj$lamTh.unique) * 0.80)
    )
    
    lamB.vec.neighbor <- step.obj$lamB.unique[
      step.obj$lamB.unique >= lamB.range[1] & step.obj$lamB.unique <= lamB.range[2]
    ]
    lamTh.vec.neighbor <- step.obj$lamTh.unique[
      step.obj$lamTh.unique >= lamTh.range[1] & step.obj$lamTh.unique <= lamTh.range[2]
    ]
    
    # Fallback if neighborhood is empty
    if (!length(lamB.vec.neighbor)) {
      k <- max(5, ceiling(length(step.obj$lamB.unique) * 0.25))
      i0 <- step.obj$lamB.id
      lo <- max(1, i0 - floor(k / 2))
      hi <- min(length(step.obj$lamB.unique), lo + k - 1)
      lamB.vec.neighbor <- step.obj$lamB.unique[lo:hi]
    }
    if (!length(lamTh.vec.neighbor)) {
      k <- max(5, ceiling(length(step.obj$lamTh.unique) * 0.25))
      i0 <- step.obj$lamTh.id
      lo <- max(1, i0 - floor(k / 2))
      hi <- min(length(step.obj$lamTh.unique), lo + k - 1)
      lamTh.vec.neighbor <- step.obj$lamTh.unique[lo:hi]
    }
    
    # Create grid for main optimization
    lamB.vec <- rep(lamB.vec.neighbor, each = length(lamTh.vec.neighbor))
    lamTh.vec <- rep(lamTh.vec.neighbor, times = length(lamB.vec.neighbor))
    
    lambda.Beta  <- sort(unique(step.obj$lamB.unique), decreasing = TRUE)
    lambda.Theta <- sort(unique(step.obj$lamTh.unique), decreasing = TRUE)
    
  } else {
    if (verbose > 0) cat("  Lambda grid:          standard (dense)\n")
    
    lambda.obj <- InitLambda(
      lamB = lambda.beta, lamTh = lambda.theta, init.obj = init.obj,
      n = n, p = p, q = q,
      n.lamB = n.lambda.beta, n.lamTh = n.lambda.theta,
      lamB.min.ratio = lambda.beta.min.ratio, lamTh.min.ratio = lambda.theta.min.ratio,
      lamB.scale.factor = NULL, lamTh.scale.factor = NULL
    )
    
    lamB.vec <- lambda.obj$lamB.vec
    lamTh.vec <- lambda.obj$lamTh.vec
    
    lambda.Beta  <- sort(unique(lamB.vec),  decreasing = TRUE)
    lambda.Theta <- sort(unique(lamTh.vec), decreasing = TRUE)
  }
  
  if (verbose > 0) {
    cat(sprintf("  Lambda grid size:     %d x %d = %d models\n", 
                length(unique(lamB.vec)), length(unique(lamTh.vec)), length(lamTh.vec)))
    cat("-------------------------------------------------------------\n")
  }
  
  ## ---------- Data preparation ----------
  X.tr <- robust_scale(X, center = init.obj$mx, scale = init.obj$sdx)
  Y.tr <- robust_scale(Y, center = init.obj$my, scale = init.obj$sdy)
  Z.tr <- Y.tr; Z.tr[is.na(Z.tr)] <- 0
  
  # Observation probability matrices
  obs_prob <- init.obj$obs_prob
  rho.mat.1 <- matrix(obs_prob, nrow = p, ncol = q, byrow = TRUE)  # p x q
  rho.mat.2 <- outer(obs_prob, obs_prob, `*`); diag(rho.mat.2) <- obs_prob  # q x q
  
  # Precompute information
  info <- list(
    n = n, p = p, q = q,
    xtx = make_positive_definite(crossprod(X.tr)),
    til.xty = crossprod(X.tr, Z.tr) / rho.mat.1
  )
  
  ## ---------- Beta initialization for warm starts ----------
  if (verbose > 0) {
    cat("\n--- Optimization Progress -----------------------------------\n")
    cat("  Stage 1: Initializing warm starts\n")
  }
  
  lamB.uniq <- unique(lamB.vec)
  B.init <- lapply(seq_along(lamB.uniq), function(x) matrix(0, p, q))
  Beta <- matrix(0, p, q)
  
  # Initial residual covariance
  residual.cov <- crossprod(Z.tr) / rho.mat.2 / (n - 1)
  residual.cov <- make_positive_definite(residual.cov)
  
  # Initial Theta
  lamTh.mat <- lamTh.vec[1] * init.obj$lamTh.pf
  lamTh.mat[lamTh.mat == 0] <- 1e-12
  diag(lamTh.mat) <- 1e-12
  
  Theta.out <- tryCatch({
    run_glasso(S = residual.cov, rho = lamTh.mat,
               thr = min(0.001, theta.tol * 100),
               maxIt = max(1000, theta.max.iter / 10))
  }, error = function(e) {
    if (verbose > 0) cat("  Warning: glasso failed in warm start; using diagonal approximation\n")
    list(wi = diag(pmin(pmax(1/diag(residual.cov), eps), 1/eps)))
  })
  Theta <- make_symmetric(Theta.out$wi)
  
  # Warm start path over unique lamB
  for (i in seq_along(lamB.uniq)) {
    lamB.mat <- lamB.uniq[i] * init.obj$lamB.pf
    lamB.mat[lamB.mat == 0] <- 1e-12
    
    Beta <- updateBeta(Theta = Theta, B0 = Beta,
                       n = info$n, xtx = info$xtx, xty = info$til.xty, lamB = lamB.mat,
                       eta = eta, tolin = min(0.001, beta.tol * 10), 
                       maxitrin = max(1000, ceiling(beta.max.iter / 10)))$Bhat
    
    E.tr <- Y.tr - X.tr %*% Beta
    residual.cov <- getResCov(E.tr, n, rho.mat.2)
    
    Theta.out <- tryCatch({
      run_glasso(S = residual.cov, rho = lamTh.mat,
                 thr = min(0.001, theta.tol * 100),
                 maxIt = max(1000, theta.max.iter / 10))
    }, error = function(e) {
      list(wi = Theta)  # Keep previous Theta
    })
    Theta <- make_symmetric(Theta.out$wi)
    
    B.init[[i]] <- Beta
  }
  
  ## ---------- Main optimization ----------
  if (!parallel || is.null(cl)) {
    ## ---------- Sequential execution ----------
    if (verbose > 0) {
      cat("  Stage 2: Grid search (sequential)\n")
      cat("-------------------------------------------------------------\n\n")
    }
    
    GoF.vec <- rep(0, length(lamTh.vec))
    Beta.warm <- lapply(seq_along(lamTh.vec), function(x) matrix(0, p, q))
    info.update <- list(
      B.init = B.init[[1]],
      residual.cov = getResCov(Y.tr - X.tr %*% B.init[[1]], n, rho.mat.2)
    )
    Beta.thr.rescale <- beta.tol
    lamB.crt <- lamB.vec[1]
    
    if (verbose == 1) {
      pb <- txtProgressBar(min = 0, max = length(lamTh.vec), style = 3, width = 50, char = "=")
    }
    
    for (i in seq_along(lamTh.vec)) {
      # Update warm start if lamB changes
      if (lamB.vec[i] < lamB.crt) {
        info.update$B.init <- B.init[[which(lamB.uniq == lamB.vec[i])]]
        E.tr <- Y.tr - X.tr %*% info.update$B.init
        info.update$residual.cov <- getResCov(E.tr, n, rho.mat.2)
        Beta.thr.rescale <- beta.tol
      }
      
      # Main update
      update.obj <- update.missoNet(
        lamTh = lamTh.vec[i], lamB = lamB.vec[i],
        Beta.maxit = beta.max.iter, Beta.thr = Beta.thr.rescale,
        Theta.maxit = theta.max.iter, Theta.thr = theta.tol,
        verbose = verbose, eps = eps, eta = eta, init.obj = init.obj,
        info = info, info.update = info.update, under.cv = FALSE
      )
      
      # Compute GoF
      E.tr <- Y.tr - X.tr %*% update.obj$Beta
      info.update$residual.cov <- getResCov(E.tr, n, rho.mat.2)
      GoF.vec[i] <- GoF_func(
        GoF = GoF, gamma = 0.5, n = n, p = p, q = q,
        Theta = update.obj$Theta, S = info.update$residual.cov, Beta = update.obj$Beta
      )
      
      info.update$B.init <- Beta.warm[[i]] <- update.obj$Beta
      Beta.thr.rescale <- max(beta.tol, beta.tol * norm(update.obj$Beta, "F"))
      lamB.crt <- lamB.vec[i]
      
      if (verbose == 1 && exists("pb")) setTxtProgressBar(pb, i)
    }
    
    if (verbose == 1 && exists("pb")) close(pb)
    
  } else {
    ## ---------- Parallel execution ----------
    if (verbose > 0) {
      cat("  Stage 2: Grid search (parallel)\n")
      cat(sprintf("  Using %d worker%s\n", length(cl), if(length(cl) == 1) "" else "s"))
      cat("-------------------------------------------------------------\n\n")
      
      pbapply::pboptions(type = "txt", style = 3, char = "=", txt.width = 50, use_lb = TRUE, nout = length(lamB.uniq))
    } else {
      pbapply::pboptions(type = "none", use_lb = TRUE)
    }
    
    GoF.vec <- rep(0, length(lamTh.vec))
    Beta.warm <- NULL
    lamTh.uniq <- unique(lamTh.vec)
    
    par.out <- pbapply::pblapply(seq_along(lamB.uniq), function(k) {
      parWrapper2(
        X.tr = X.tr, Y.tr = Y.tr, GoF = GoF, info = info,
        rho.mat.2 = rho.mat.2, B.init = B.init[[k]],
        lambda.Beta = lamB.uniq[k], lambda.Theta = lamTh.uniq,
        Beta.maxit = beta.max.iter, Beta.thr = beta.tol,
        Theta.maxit = theta.max.iter, Theta.thr = theta.tol,
        init.obj = init.obj, eps = eps, eta = eta
      )
    }, cl = cl)
    
    # Collect results
    for (k in seq_along(lamB.uniq)) {
      idx.start <- (k - 1) * length(lamTh.uniq) + 1
      idx.end <- k * length(lamTh.uniq)
      GoF.vec[idx.start:idx.end] <- par.out[[k]]$GoF.vec.fold
      Beta.warm <- c(Beta.warm, par.out[[k]]$Beta.warm.fold)
    }
    rm(par.out)
  }
  
  if (verbose > 0) cat("\n-------------------------------------------------------------\n\n")
  
  ## ---------- Select optimal model ----------
  GoF.min <- which.min(GoF.vec)
  lamTh.min <- lamTh.vec[GoF.min]
  lamB.min <- lamB.vec[GoF.min]
  
  if (verbose > 0) {
    cat("> Refitting optimal model ...\n\n")
  }
  
  # Final refit at optimal lambda
  est.min <- update.missoNet(
    X = X, Y = Y, lamTh = lamTh.min, lamB = lamB.min,
    Beta.maxit = max(1000, beta.max.iter), Beta.thr = min(1e-3, beta.tol),
    Theta.maxit = max(1000, theta.max.iter), Theta.thr = min(1e-3, theta.tol),
    verbose = verbose, eps = eps, eta = eta,
    info = NULL, info.update = NULL, under.cv = FALSE,
    init.obj = init.obj, B.init = Beta.warm[[GoF.min]]
  )
  
  # Final GoF computation
  E.tr <- Y.tr - X.tr %*% est.min$Beta
  est.min$gof <- GoF_func(
    GoF = GoF, gamma = 0.5, n = n, p = p, q = q,
    Theta = est.min$Theta, Beta = est.min$Beta,
    S = getResCov(E.tr, n, rho.mat.2)
  )
  
  # Store lambda information
  est.min$lambda.beta <- lamB.min
  est.min$lambda.theta <- lamTh.min
  est.min$lambda.beta.idx <- which(lambda.Beta == lamB.min)[1]
  est.min$lambda.theta.idx <- which(lambda.Theta == lamTh.min)[1]
  
  # Transform back to original scale
  est.min$Beta <- sweep(est.min$Beta / init.obj$sdx, 2, init.obj$sdy, `*`)
  est.min$mu <- as.numeric(init.obj$my - crossprod(est.min$Beta, init.obj$mx))
  
  # Optional relaxed fit
  if (relax.net) {
    est.min$relax.net <- tryCatch({
      relax.glasso(X = X, Y = Y, init.obj = init.obj, est = est.min, eps = eps,
                   Theta.thr = min(1e-3, theta.tol * 100), Theta.maxit = max(1000, theta.max.iter / 10))
    }, error = function(e) {
      if (verbose > 0) warning("Relaxed fit failed: ", e$message)
      NULL
    })
  } else {
    est.min$relax.net <- NULL
  }
  
  ## ---------- Results summary ----------
  if (verbose > 0) {
    cat("\n--- Optimization Results ------------------------------------\n")
    cat(sprintf("  Optimal lambda.beta:   %.4e\n", lamB.min))
    cat(sprintf("  Optimal lambda.theta:  %.4e\n", lamTh.min))
    cat(sprintf("  %s value:              %.4f\n", GoF, est.min$gof))
    
    # Sparsity information
    active_preds <- sum(rowSums(abs(est.min$Beta)) > 1e-8)
    active_edges <- sum(abs(est.min$Theta[upper.tri(est.min$Theta, diag = FALSE)]) > 1e-8)
    cat(sprintf("  Active predictors:     %d / %d (%.1f%%)\n", 
                active_preds, p, 100 * active_preds / p))
    cat(sprintf("  Network edges:         %d / %d (%.1f%%)\n",
                active_edges, q * (q - 1) / 2, 100 * active_edges / (q * (q - 1) / 2)))
    cat("-------------------------------------------------------------\n\n")
    
    cat("=============================================================\n")
  }
  
  ## ---------- Output object ----------
  misso.obj <- list(
    est.min = est.min,
    rho = init.obj$rho.vec,
    lambda.beta.seq = lambda.Beta,
    lambda.theta.seq = lambda.Theta,
    penalize.diagonal = init.obj$penalize_diagonal,
    beta.pen.factor = init.obj$lamB.pf,
    theta.pen.factor = init.obj$lamTh.pf,
    
    param_set = list(
      n = n, p = p, q = q,
      standardize = standardize,
      standardize.response = standardize.response,
      gof = GoF.vec,
      gof.grid.beta = lamB.vec,
      gof.grid.theta = lamTh.vec
    )
  )
  
  class(misso.obj) <- c("missoNet", class(misso.obj))
  return(misso.obj)
}