Description Usage Arguments Details Value References Examples
galasso
fits an adaptive LASSO for multiply imputed data. "galasso"
supports both continuous and binary responses.
1 2 3 4 5 6 7 8 9 10 11 12 |
x |
A length |
y |
A length |
pf |
Penalty factor. Can be used to differentially penalize certain variables |
adWeight |
Numeric vector of length p representing the adaptive weights for the L1 penalty |
family |
The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian". |
nlambda |
Length of automatically generated 'lambda' sequence. If lambda' is non NULL, 'nlambda' is ignored. Default is 100 |
lambda.min.ratio |
Ratio that determines the minimum value of 'lambda' when automatically generating a 'lambda' sequence. If 'lambda' is not NULL, 'lambda.min.ratio' is ignored. Default is 1e-4 |
lambda |
Optional numeric vector of lambdas to fit. If NULL,
|
maxit |
Maximum number of iterations to run. Default is 10000 |
eps |
Tolerance for convergence. Default is 1e-5 |
galasso
works by adding a group penalty to the aggregated objective
function to ensure selection consistency across imputations. The objective
function is:
argmin_{β_{jk}} - L(β_{jk}| X_{ijk}, Y_{ik})
+ λ * Σ_{j=1}^{p} \hat{a}_j * pf_j * √{Σ_{k=1}^{m} β_{jk}^2}
Where L is the log likelihood,a
is the adaptive weights, and
pf
is the penalty factor. Simulations suggest that the "stacked"
objective function approach (i.e., saenet
) tends to be more
computationally efficient and have better estimation and selection
properties. However, the advantage of galasso
is that it allows one
to look at the differences between coefficient estimates across imputations.
An object with type "galasso" and subtype "galasso.gaussian" or galasso.binomial", depending on which family was used. Both subtypes have 4 elements:
Sequence of lambda fit.
p + 1 x nlambda matrix representing the estimated betas at each value of lambda. The betas are constructed as the average of the betas from each imputation.
Number of nonzero betas at each value of lambda.
For objects with subtype "galasso.gaussian", the training MSE for each value of lambda.
For objects with subtype "galasso.binomial", the training deviance for each value of lambda.
Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Jiacong Du, Jonathan Boss, Peisong Han, Lauren J Beesley, Stephen A Goutman, Stuart Batterman, Eva L Feldman, and Bhramar Mukherjee. 2020. arXiv:2003.07398
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library(miselect)
library(mice)
mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))
# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
y[[i]] <- dfs[[i]]$Y
}
pf <- rep(1, 20)
adWeight <- rep(1, 20)
fit <- galasso(x, y, pf, adWeight)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.