Nothing
#' Imputation by direct use of lasso linear regression
#'
#' Imputes univariate missing normal data using lasso linear regression with bootstrap.
#'
#' @aliases mice.impute.lasso.norm lasso.norm
#' @inheritParams mice.impute.norm.boot
#' @param nfolds The number of folds for the cross-validation of the lasso penalty.
#' The default is 10.
#' @return Vector with imputed data, same type as \code{y}, and of length
#' \code{sum(wy)}
#' @details
#' The method consists of the following steps:
#' \enumerate{
#' \item For a given y variable under imputation, draw a bootstrap version y*
#' with replacement from the observed cases \code{y[ry]}, and stores in x* the
#' corresponding values from \code{x[ry, ]}.
#' \item Fit a regularised (lasso) linear regression with y* as the outcome,
#' and x* as predictors.
#' A vector of regression coefficients bhat is obtained.
#' All of these coefficients are considered random draws from the imputation model
#' parameters posterior distribution.
#' Same of these coefficients will be shrunken to 0.
#' \item Draw the imputed values from the predictive distribution defined by
#' the original (non-bootstrap) data, bhat, and estimated error variance.
#' }
#' The method is based on the Direct Use of Regularized Regression (DURR) proposed by
#' Zhao & Long (2016) and Deng et al (2016).
#' @author Edoardo Costantini, 2021
#' @references
#'
#' Deng, Y., Chang, C., Ido, M. S., & Long, Q. (2016). Multiple imputation for
#' general missing data patterns in the presence of high-dimensional data.
#' Scientific reports, 6(1), 1-10.
#'
#' Zhao, Y., & Long, Q. (2016). Multiple imputation in the presence of
#' high-dimensional data. Statistical Methods in Medical Research, 25(5),
#' 2021-2035.
#'
#' @family univariate imputation functions
#' @keywords datagen
#' @export
mice.impute.lasso.norm <- function(y, ry, x, wy = NULL, nfolds = 10, ...) {
install.on.demand("glmnet", ...)
# Bootstrap sample
if (is.null(wy)) wy <- !ry
n1 <- sum(ry)
s <- sample(n1, n1, replace = TRUE)
x_glmnet <- cbind(1, x)
dotxobs <- x_glmnet[ry, , drop = FALSE][s, , drop = FALSE]
dotyobs <- y[ry][s]
# Train imputation model
cv_lasso <- glmnet::cv.glmnet(
x = dotxobs, y = dotyobs,
family = "gaussian",
nfolds = nfolds,
alpha = 1
)
# Obtain imputations
s2hat <- mean((predict(cv_lasso, dotxobs, s = "lambda.min") - dotyobs)^2)
as.vector(predict(cv_lasso, x_glmnet[wy, ], s = "lambda.min")) +
rnorm(sum(wy), 0, sqrt(s2hat))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.