galasso: Multiple Imputation Grouped Adaptive LASSO

Description Usage Arguments Details Value References Examples

View source: R/galasso.R

Description

galasso fits an adaptive LASSO for multiply imputed data. "galasso" supports both continuous and binary responses.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
galasso(
  x,
  y,
  pf,
  adWeight,
  family = c("gaussian", "binomial"),
  nlambda = 100,
  lambda.min.ratio = ifelse(all.equal(adWeight, rep(1, p)), 0.001, 1e-06),
  lambda = NULL,
  maxit = 10000,
  eps = 1e-05
)

Arguments

x

A length m list of n * p numeric matrices. No matrix should contain an intercept, or any missing values

y

A length m list of length n numeric response vectors. No vector should contain missing values

pf

Penalty factor. Can be used to differentially penalize certain variables

adWeight

Numeric vector of length p representing the adaptive weights for the L1 penalty

family

The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian".

nlambda

Length of automatically generated 'lambda' sequence. If lambda' is non NULL, 'nlambda' is ignored. Default is 100

lambda.min.ratio

Ratio that determines the minimum value of 'lambda' when automatically generating a 'lambda' sequence. If 'lambda' is not NULL, 'lambda.min.ratio' is ignored. Default is 1e-4

lambda

Optional numeric vector of lambdas to fit. If NULL, galasso will automatically generate a lambda sequence based off of nlambda and codelambda.min.ratio. Default is NULL

maxit

Maximum number of iterations to run. Default is 10000

eps

Tolerance for convergence. Default is 1e-5

Details

galasso works by adding a group penalty to the aggregated objective function to ensure selection consistency across imputations. The objective function is:

argmin_{β_{jk}} - L(β_{jk}| X_{ijk}, Y_{ik})

+ λ * Σ_{j=1}^{p} \hat{a}_j * pf_j * √{Σ_{k=1}^{m} β_{jk}^2}

Where L is the log likelihood,a is the adaptive weights, and pf is the penalty factor. Simulations suggest that the "stacked" objective function approach (i.e., saenet) tends to be more computationally efficient and have better estimation and selection properties. However, the advantage of galasso is that it allows one to look at the differences between coefficient estimates across imputations.

Value

An object with type "galasso" and subtype "galasso.gaussian" or galasso.binomial", depending on which family was used. Both subtypes have 4 elements:

lambda

Sequence of lambda fit.

beta

p + 1 x nlambda matrix representing the estimated betas at each value of lambda. The betas are constructed as the average of the betas from each imputation.

df

Number of nonzero betas at each value of lambda.

mse

For objects with subtype "galasso.gaussian", the training MSE for each value of lambda.

dev

For objects with subtype "galasso.binomial", the training deviance for each value of lambda.

References

Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Jiacong Du, Jonathan Boss, Peisong Han, Lauren J Beesley, Stephen A Goutman, Stuart Batterman, Eva L Feldman, and Bhramar Mukherjee. 2020. arXiv:2003.07398

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(miselect)
library(mice)

mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))

# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
    x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
    y[[i]] <- dfs[[i]]$Y
}

pf       <- rep(1, 20)
adWeight <- rep(1, 20)

fit <- galasso(x, y, pf, adWeight)

miselect documentation built on March 31, 2020, 5:26 p.m.