galasso: Multiple Imputation Grouped Adaptive LASSO
In umich-cphds/minet: Variable Selection for Multiply Imputed Data

galasso

R Documentation

Multiple Imputation Grouped Adaptive LASSO

Description

galasso fits an adaptive LASSO for multiply imputed data. "galasso" supports both continuous and binary responses.

Usage

galasso(
  x,
  y,
  pf,
  adWeight,
  family = c("gaussian", "binomial"),
  nlambda = 100,
  lambda.min.ratio = ifelse(isTRUE(all.equal(adWeight, rep(1, p))), 0.001, 1e-06),
  lambda = NULL,
  maxit = 10000,
  eps = 1e-05
)

Arguments

`x`	A length `m` list of `n * p` numeric matrices. No matrix should contain an intercept, or any missing values
`y`	A length `m` list of length `n` numeric response vectors. No vector should contain missing values
`pf`	Penalty factor. Can be used to differentially penalize certain variables
`adWeight`	Numeric vector of length p representing the adaptive weights for the L1 penalty
`family`	The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian".
`nlambda`	Length of automatically generated "lambda" sequence. If "lambda" is non NULL, "nlambda" is ignored. Default is 100
`lambda.min.ratio`	Ratio that determines the minimum value of "lambda" when automatically generating a "lambda" sequence. If "lambda" is not NULL, "lambda.min.ratio" is ignored. Default is 1e-4
`lambda`	Optional numeric vector of lambdas to fit. If NULL, `galasso` will automatically generate a lambda sequence based off of `nlambda` and `lambda.min.ratio`. Default is NULL
`maxit`	Maximum number of iterations to run. Default is 10000
`eps`	Tolerance for convergence. Default is 1e-5

Details

galasso works by adding a group penalty to the aggregated objective function to ensure selection consistency across imputations. The objective function is:

argmin_{\beta_{jk}} - L(\beta_{jk}| X_{ijk}, Y_{ik})

+ \lambda * \Sigma_{j=1}^{p} \hat{a}_j * pf_j * \sqrt{\Sigma_{k=1}^{m} \beta_{jk}^2}

Where L is the log likelihood,a is the adaptive weights, and pf is the penalty factor. Simulations suggest that the "stacked" objective function approach (i.e., saenet) tends to be more computationally efficient and have better estimation and selection properties. However, the advantage of galasso is that it allows one to look at the differences between coefficient estimates across imputations.

Value

An object with type galasso and subtype galasso.gaussian or galasso.binomial, depending on which family was used. Both subtypes have 4 elements:

lambda: Sequence of lambda fit.
coef: a list of length D containing the coefficient estimates from running galasso at each value of lambda. Each element in the list is a nlambda x (p+1) matrix.
df: Number of nonzero betas at each value of lambda.

References

Du, J., Boss, J., Han, P., Beesley, L. J., Kleinsasser, M., Goutman, S. A., ... & Mukherjee, B. (2022). Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Journal of Computational and Graphical Statistics, 31(4), 1063-1075. <doi:10.1080/10618600.2022.2035739>

Examples


library(miselect)
library(mice)

mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))

# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
    x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
    y[[i]] <- dfs[[i]]$Y
}

pf       <- rep(1, 20)
adWeight <- rep(1, 20)

fit <- galasso(x, y, pf, adWeight)

umich-cphds/minet documentation built on March 9, 2024, 8:08 p.m.