impute: Impute: Filling Missing Values

View source: R/impute.R

imputeR Documentation

Impute: Filling Missing Values

Description

Imputation is the process of replacing missing data with substituted values. This is done because of three main problems that missing data causes: missing data can introduce a substantial amount of bias, make the handling and analysis of the data more arduous, and create reductions in efficiency.

Usage

impute(
  .data,
  vars = everything(),
  algorithm = "mice",
  m = 10,
  method = NULL,
  FUN = median,
  info = TRUE,
  ...
)

is_imputed(.data)

get_mice(.data)

Arguments

.data

data set with missing values to impute

vars

variables of .data that must be imputed, defaults to everything() and supports the tidyselect language.

algorithm

algorithm to use for imputation, must be "mice" or "single-point", see Details. For the latter, FUN must be given.

m

number of multiple imputations if using MICE, see mice::mice(). The mean of all imputations will be used as result.

method

method to use if using MICE, see mice::mice()

FUN

function to use for single-point imputation (directly) or for MICE to summarise the results over all m iterations

info

print info about imputation

...

arguments to pass on to mice::mice()

Details

Imputation can be done using single-point, such as the mean or the median, or using Multivariate Imputations by Chained Equations (MICE). Using MICE is a lot more reliable, but also a lot slower, than single-point imputation.

The suggested and default method is MICE. The generated MICE object will be stored as an attribute with the data, and can be retrieved with get_mice(), containing all specifics about the imputation. MICE is also known as fully conditional specification and sequential regression multiple imputation. It was designed for data with randomly missing values, though there is simulation evidence to suggest that with a sufficient number of auxiliary variables it can also work on data that are missing not at random.

Use is_imputed() to get a data.frame with TRUEs for all values that were imputed.

Examples

iris2 <- dplyr::as_tibble(iris)
iris2[2, 2] <- NA
iris2[3, 3] <- NA
iris2[4, 5] <- NA
iris
iris2

result <- iris2 |> impute()
result
  
iris2 |> impute(algorithm = "single-point")
iris2 |>
  impute(vars = starts_with("Sepal"),
         algorithm = "single-point")
iris2 |>
  impute(vars = where(is.double),
         algorithm = "single-point",
         FUN = median)
  
result |> is_imputed()
result |> get_mice()

certe-medical-epidemiology/certestats documentation built on Nov. 9, 2024, 8:15 p.m.