cv.galasso | R Documentation |
Does k-fold cross-validation for galasso
, and returns an optimal value
for lambda.
cv.galasso(
x,
y,
pf,
adWeight,
family = c("gaussian", "binomial"),
nlambda = 100,
lambda.min.ratio = ifelse(isTRUE(all.equal(adWeight, rep(1, p))), 0.001, 1e-06),
lambda = NULL,
nfolds = 5,
foldid = NULL,
maxit = 1000,
eps = 1e-05
)
x |
A length |
y |
A length |
pf |
Penalty factor. Can be used to differentially penalize certain variables |
adWeight |
Numeric vector of length p representing the adaptive weights for the L1 penalty |
family |
The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian". |
nlambda |
Length of automatically generated "lambda" sequence. If "lambda" is non NULL, "nlambda" is ignored. Default is 100 |
lambda.min.ratio |
Ratio that determines the minimum value of "lambda" when automatically generating a "lambda" sequence. If "lambda" is not NULL, "lambda.min.ratio" is ignored. Default is 1e-4 |
lambda |
Optional numeric vector of lambdas to fit. If NULL,
|
nfolds |
Number of foldid to use for cross validation. Default is 5, minimum is 3 |
foldid |
an optional length |
maxit |
Maximum number of iterations to run. Default is 10000 |
eps |
Tolerance for convergence. Default is 1e-5 |
cv.galasso
works by adding a group penalty to the aggregated objective
function to ensure selection consistency across imputations. Simulations
suggest that the "stacked" objective function approaches (i.e., saenet
)
tend to be more computationally efficient and have better estimation and
selection properties.
An object of type "cv.galasso" with 7 elements:
The call that generated the output.
The sequence of lambdas fit.
Average cross validation error for each "lambda". For family = "gaussian", "cvm" corresponds to mean squared error, and for binomial "cvm" corresponds to deviance.
Standard error of "cvm".
A "galasso" object fit to the full data.
The lambda value for the model with the minimum cross validation error.
The lambda value for the sparsest model within one standard error of the minimum cross validation error.
The number of nonzero coefficients for each value of lambda.
Du, J., Boss, J., Han, P., Beesley, L. J., Kleinsasser, M., Goutman, S. A., ... & Mukherjee, B. (2022). Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Journal of Computational and Graphical Statistics, 31(4), 1063-1075. <doi:10.1080/10618600.2022.2035739>
library(miselect)
library(mice)
set.seed(48109)
# Using the mice defaults for sake of example only.
mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))
# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
y[[i]] <- dfs[[i]]$Y
}
pf <- rep(1, 20)
adWeight <- rep(1, 20)
fit <- cv.galasso(x, y, pf, adWeight)
# By default 'coef' returns the betas for lambda.min.
coef(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.