DfiMI_lasso: Distributed Full-information Multiple Imputation (DfiMI)...
In DLMRMV: Distributed Linear Regression Models with Response Missing Variables

DfiMI_lasso

R Documentation

Distributed Full-information Multiple Imputation (DfiMI) using LASSO

Description

Performs multiple imputation of the response variable Y via R independent runs and M stochastic imputations per run. Missing Y values are imputed using LASSO regression on predictors.

Usage

DfiMI_lasso(data, R, M)

Arguments

data

A data.frame where:

First column:: Response Y (may contain NA)
Remaining columns:: Numeric predictors

R

Positive integer – number of simulation runs for stable coefficient estimation.

M

Positive integer – number of multiple imputations per run.

Details

This function extends the Distributed Full-information Multiple Imputation (DfiMI) approach by using LASSO regression for imputing missing values in the response variable Y. LASSO regression is particularly useful for high-dimensional predictor spaces and can handle multicollinearity among predictors. The function performs the following steps:

Initialize missing values in Y.
Fit LASSO regression models on complete cases.
Average coefficients across multiple imputations and runs.
Predict missing values using the final averaged coefficients.

The function requires the glmnet package for LASSO regression.

Value

A named list containing:

Yhat: Numeric vector – original Y values with missing values replaced by imputations.
betahat: Numeric vector – final regression coefficients.

Examples

set.seed(123)
data <- data.frame(
  Y = c(rnorm(50), rep(NA, 10)),  # 50 observed + 10 missing
  X1 = rnorm(60),
  X2 = rnorm(60)
)
res <- DfiMI_lasso(data, R = 3, M = 5)
head(res$Yhat)

DLMRMV documentation built on Aug. 8, 2025, 6:27 p.m.