imputeData | R Documentation |
Multivariate imputation by chained equations. Add multiple imputations for missing indicator observations in a Nature Index data set.
imputeData(x = NULL, nSim = 1000, transConst = 0.01, ...)
x |
list of class |
nSim |
integer, number of imputations, default is 1000. |
transConst |
numeric scalar, |
... |
further arguments passed on to |
Two general approaches for imputing multivariate data have emerged: joint modeling (JM) and fully conditional specification (FCS), also known as multivariate imputation by chained equations (MICE). JM involves specifying a multivariate distribution for the missing data, and drawing imputations from their conditional distributions by Markov Chain Monte Carlo (MCMC) techniques. This methodology is attractive if the multivariate distribution is a reasonable description of the data. FCS specifies the multivariate imputation model on a variable-by-variable basis by a set of conditional densities, one for each incomplete variable. Starting from an initial imputation, FCS draws imputations by iterating over the conditional densities.
A JM approach using the R-package Amelia
has been tested for missing
indicator observations in Nature Index data sets. This approach was, however,
not robust when implemented as a general method for all indicators. The
routine often crashed when the joint distribution model (multivariate normal)
was not suitable and sometimes led to fatal errors in the CPU.
imputeData
therefore adopt the FCS approach using the routine
mice::mice
.
imputeData
performs multiple imputations for all missing indicator
observations. Each imputation consists of an expected value, a lower quartile
and the interquartile distance (ID). The upper quartile of the imputed
indicator observation is calculated from the lower quartile and the
interquartile distance.
Indicator observations are normalized against their corresponding reference
value and thereafter log-transformed before imputation modeling.
The imputation model includes five variables in the order logmean
,
loglower
, logID
, year
, indicator
. A common
pattern of missing values in the data is that all data for some indicators
are missing for some years. This leads to empty cell problems if the
imputation model includes interactions between year and indicator.
The imputation model therefore does not contain interaction terms.
As default, imputeData
uses predictive mean matching as imputation
method and calls the function mice
in package mice
with arguments
m = nSim
and method = c("pmm", "pmm", "pmm", "", "")
.
The argument nSim
determines the number of imputations.
A continuous probability distribution is fitted to each imputation by
elicitate
. imputeData
draws and returns one observation
from each distribution.
A list of class niImputations
containing two elements:
identifiers
: a data.frame with variables relating each imputed
indicator observation to a missing observation in the data set x
.
imputations
: a numeric matrix where each row represents a missing
indicator observation in the corresponding data set and contains single
draws from each of nSim
imputed distributions.
Bård Pedersen
imputeDiagnostics
, impStand
,
mice::mice
, elicitate
, and
calculateIndex
.
The vignette objectsInNIcalc
gives a
more detailed description of niImputations
and
niInput
lists.
## Not run:
imputedValues <- imputeData(x = themeData,
nSim = 1000,
transConst = 0.01,
maxit = 20,
printFlag = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.