imputeData: Multiple Imputations

View source: R/imputeData.R

imputeDataR Documentation

Multiple Imputations

Description

Multivariate imputation by chained equations. Add multiple imputations for missing indicator observations in a Nature Index data set.

Usage

imputeData(x = NULL, nSim = 1000, transConst = 0.01, ...)

Arguments

x

list of class niInput.

nSim

integer, number of imputations, default is 1000.

transConst

numeric scalar, 0<transConst<=0.1. Transformation constant in log-transformation. Default is 0.01.

...

further arguments passed on to mice::mice.

Details

Two general approaches for imputing multivariate data have emerged: joint modeling (JM) and fully conditional specification (FCS), also known as multivariate imputation by chained equations (MICE). JM involves specifying a multivariate distribution for the missing data, and drawing imputations from their conditional distributions by Markov Chain Monte Carlo (MCMC) techniques. This methodology is attractive if the multivariate distribution is a reasonable description of the data. FCS specifies the multivariate imputation model on a variable-by-variable basis by a set of conditional densities, one for each incomplete variable. Starting from an initial imputation, FCS draws imputations by iterating over the conditional densities.

A JM approach using the R-package Amelia has been tested for missing indicator observations in Nature Index data sets. This approach was, however, not robust when implemented as a general method for all indicators. The routine often crashed when the joint distribution model (multivariate normal) was not suitable and sometimes led to fatal errors in the CPU. imputeData therefore adopt the FCS approach using the routine mice::mice.

imputeData performs multiple imputations for all missing indicator observations. Each imputation consists of an expected value, a lower quartile and the interquartile distance (ID). The upper quartile of the imputed indicator observation is calculated from the lower quartile and the interquartile distance.

Indicator observations are normalized against their corresponding reference value and thereafter log-transformed before imputation modeling. The imputation model includes five variables in the order logmean, loglower, logID, year, indicator. A common pattern of missing values in the data is that all data for some indicators are missing for some years. This leads to empty cell problems if the imputation model includes interactions between year and indicator. The imputation model therefore does not contain interaction terms.

As default, imputeData uses predictive mean matching as imputation method and calls the function mice in package mice with arguments m = nSim and method = c("pmm", "pmm", "pmm", "", "").

The argument nSim determines the number of imputations. A continuous probability distribution is fitted to each imputation by elicitate. imputeData draws and returns one observation from each distribution.

Value

A list of class niImputations containing two elements:
identifiers: a data.frame with variables relating each imputed indicator observation to a missing observation in the data set x.
imputations: a numeric matrix where each row represents a missing indicator observation in the corresponding data set and contains single draws from each of nSim imputed distributions.

Author(s)

Bård Pedersen

See Also

imputeDiagnostics, impStand, mice::mice, elicitate, and calculateIndex.
The vignette objectsInNIcalc gives a more detailed description of niImputations and niInput lists.

Examples

## Not run: 
imputedValues <- imputeData(x = themeData,
                                nSim = 1000,
                                transConst = 0.01,
                                maxit = 20,
                                printFlag = TRUE)

## End(Not run)


NINAnor/NIcalc documentation built on Oct. 26, 2023, 9:37 a.m.