cv.missoNet | R Documentation |
Perform k
-fold cross-validation to select the regularization pair
(lambda.beta, lambda.theta)
for missoNet
. For each
fold the model is trained on k-1
partitions and evaluated on the held-out
partition over a grid of lambda pairs; the pair with minimum mean CV error is
returned, with optional 1-SE models for more regularized solutions.
cv.missoNet(
X,
Y,
kfold = 5,
rho = NULL,
lambda.beta = NULL,
lambda.theta = NULL,
lambda.beta.min.ratio = NULL,
lambda.theta.min.ratio = NULL,
n.lambda.beta = NULL,
n.lambda.theta = NULL,
beta.pen.factor = NULL,
theta.pen.factor = NULL,
penalize.diagonal = NULL,
beta.max.iter = 10000,
beta.tol = 1e-05,
theta.max.iter = 10000,
theta.tol = 1e-05,
eta = 0.8,
eps = 1e-08,
standardize = TRUE,
standardize.response = TRUE,
compute.1se = TRUE,
relax.net = FALSE,
adaptive.search = FALSE,
shuffle = TRUE,
seed = NULL,
parallel = FALSE,
cl = NULL,
verbose = 1
)
X |
Numeric matrix ( |
Y |
Numeric matrix ( |
kfold |
Integer |
rho |
Optional numeric vector of length |
lambda.beta , lambda.theta |
Optional numeric vectors. Candidate
regularization paths for |
lambda.beta.min.ratio , lambda.theta.min.ratio |
Optional numerics in |
n.lambda.beta , n.lambda.theta |
Optional integers. Lengths of the
automatically generated lambda paths (ignored if the corresponding
|
beta.pen.factor |
Optional |
theta.pen.factor |
Optional |
penalize.diagonal |
Logical or |
beta.max.iter , theta.max.iter |
Integers. Max iterations for the
|
beta.tol , theta.tol |
Numerics |
eta |
Numeric in |
eps |
Numeric in |
standardize |
Logical. Standardize columns of |
standardize.response |
Logical. Standardize columns of |
compute.1se |
Logical. Also compute 1-SE solutions? Default |
relax.net |
(Experimental) Logical. If |
adaptive.search |
(Experimental) Logical. Use adaptive two-stage lambda search? Default |
shuffle |
Logical. Randomly shuffle fold assignments? Default |
seed |
Optional integer seed (used when |
parallel |
Logical. Evaluate folds in parallel using a provided cluster?
Default |
cl |
Optional cluster from |
verbose |
Integer in |
Internally, predictors X
and responses Y
can be standardized
for optimization; all reported estimates are re-scaled back to the original
data scale. Missingness in Y
is handled via unbiased estimating
equations using column-wise observation probabilities estimated from Y
(or supplied via rho
). This is appropriate when the missingness of each
response is independent of its unobserved value (e.g., MCAR).
If adaptive.search = TRUE
, a fast two-stage pre-optimization narrows
the lambda grid before computing fold errors on a focused neighborhood; this
can be substantially faster on large grids but may occasionally miss the global
optimum.
When compute.1se = TRUE
, two additional solutions are reported:
the largest lambda.beta
and the largest lambda.theta
whose CV
error is within one standard error of the minimum (holding the other lambda
fixed at its optimal value). At the end, three special lambda pairs are identified:
lambda.min: Parameters giving minimum CV error
lambda.1se.beta: Largest \lambda_B
within 1 SE of minimum
(with \lambda_\Theta
fixed at optimum)
lambda.1se.theta: Largest \lambda_\Theta
within 1 SE of minimum
(with \lambda_B
fixed at optimum)
The 1SE rules provide more regularized models that may generalize better.
A list of class "missoNet"
with components:
List of estimates at the CV minimum:
Beta
(p \times q
), Theta
(q \times q
),
intercept mu
(length q
), lambda.beta
, lambda.theta
,
lambda.beta.idx
, lambda.theta.idx
, and (if requested)
relax.net
.
List of estimates at the 1-SE lambda.beta
(if compute.1se = TRUE
); NULL
otherwise.
List of estimates at the 1-SE lambda.theta
(if compute.1se = TRUE
); NULL
otherwise.
Length-q
vector of working missingness probabilities.
Number of folds used.
Integer vector of length n
giving fold assignments
(names are "fold-k"
).
Unique lambda values explored along
the grid for \mathbf{B}
and \Theta
.
Logical indicating whether the diagonal of
\Theta
was penalized.
Penalty factor matrices actually used.
List with CV diagnostics:
n
, p
, q
, standardize
, standardize.response
,
mean errors cv.errors.mean
, bounds cv.errors.upper/lower
,
and the evaluated grids cv.grid.beta
, cv.grid.theta
(length equals
number of fitted models).
Yixiao Zeng yixiao.zeng@mail.mcgill.ca, Celia M. T. Greenwood
Zeng, Y., et al. (2025). Multivariate regression with missing response data for modelling regional DNA methylation QTLs. arXiv:2507.05990.
missoNet
for model fitting;
generic methods such as plot()
and predict()
for objects of class
"missoNet"
.
sim <- generateData(n = 120, p = 12, q = 6, rho = 0.1)
X <- sim$X; Y <- sim$Z
# Basic 5-fold cross-validation
cvfit <- cv.missoNet(X = X, Y = Y, kfold = 5, verbose = 0)
# Extract optimal estimates
Beta.min <- cvfit$est.min$Beta
Theta.min <- cvfit$est.min$Theta
# Extract 1SE estimates (if computed)
if (!is.null(cvfit$est.1se.beta)) {
Beta.1se <- cvfit$est.1se.beta$Beta
}
if (!is.null(cvfit$est.1se.theta)) {
Theta.1se <- cvfit$est.1se.theta$Theta
}
# Make predictions
newX <- matrix(rnorm(10 * 12), 10, 12)
pred.min <- predict(cvfit, newx = newX, s = "lambda.min")
pred.1se <- predict(cvfit, newx = newX, s = "lambda.1se.beta")
# Parallel cross-validation
library(parallel)
cl <- makeCluster(min(detectCores() - 1, 2))
cvfit2 <- cv.missoNet(X = X, Y = Y, kfold = 5,
parallel = TRUE, cl = cl)
stopCluster(cl)
# Adaptive search for efficiency
cvfit3 <- cv.missoNet(X = X, Y = Y, kfold = 5,
adaptive.search = TRUE)
# Reproducible CV with specific lambdas
cvfit4 <- cv.missoNet(X = X, Y = Y, kfold = 5,
lambda.beta = 10^seq(0, -2, length = 20),
lambda.theta = 10^seq(0, -2, length = 20),
seed = 486)
# Plot CV results
plot(cvfit, type = "heatmap")
plot(cvfit, type = "scatter")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.