merge_imputations: Merges multiple imputed data frames into a single data frame

Description Usage Arguments Details Value Note References Examples

View source: R/merge_imputations.R


This function merges multiple imputed data frames from mids-objects into a single data frame by computing the mean or selecting the most likely imputed value.


merge_imputations(dat, imp, ori = NULL, summary = c("none", "dens", "hist",
  "sd"), filter = NULL)



The data frame that was imputed and used as argument in the mice-function call.


The mids-object with the imputed data frames from dat.


Optional, if ori is specified, the imputed variables are appended to this data frame; else, a new data frame with the imputed variables is returned.


After merging multiple imputed data, summary displays a graphical summary of the "quality" of the merged values, compared to the original imputed values.


Creates a density plot, which shows the distribution of the mean of the imputed values for each variable at each observation. The larger the areas overlap, the better is the fit of the merged value compared to the imputed value.


Similar to summary = "dens", however, mean and merged values are shown as histogram. Bins should have almost equal height for both groups (mean and merged).


Creates a dot plot, where data points indicate the standard deviation for all imputed values (y-axis) at each merged value (x-axis) for all imputed variables. The higher the standard deviation, the less precise is the imputation, and hence the merged value.


A character vector with variable names that should be plotted. All non-defined variables will not be shown in the plot.


This method merges multiple imputations of variables into a single variable by computing the (rounded) mean of all imputed values of missing values. By this, each missing value is replaced by those values that have been imputed the most times.

imp must be a mids-object, which is returned by the mice()-function of the mice-package. merge_imputations() than creates a data frame for each imputed variable, by combining all imputations (as returned by the complete-function) of each variable, and computing the row means of this data frame. The mean value is then rounded for integer values (and not for numerical values with fractional part), which corresponds to the most frequent imputed value (mode) for a missing value. Missings in the original variable are replaced by the most frequent imputed value.


A data frame with (merged) imputed variables; or ori with appended imputed variables, if ori was specified. If summary is included, returns a list with the data frame data with (merged) imputed variables and some other summary information, which are required for the plot-output.


Typically, further analyses are conducted on pooled results of multiple imputed data sets (see pool), however, sometimes (in social sciences) it is also feasible to compute the mean or mode of multiple imputed variables (see Burns et al. 2011).


Burns RA, Butterworth P, Kiely KM, Bielak AAM, Luszcz MA, Mitchell P, et al. 2011. Multiple imputation was an efficient method for harmonizing the Mini-Mental State Examination with missing item-level data. Journal of Clinical Epidemiology;64:787–93 doi: 10.1016/j.jclinepi.2010.10.011


imp <- mice(nhanes)

# return data frame with imputed variables
merge_imputations(nhanes, imp)

# append imputed variables to original data frame
merge_imputations(nhanes, imp, nhanes)

# show summary of quality of merging imputations
merge_imputations(nhanes, imp, summary = "dens", filter = c("chl", "hyp"))

strengejacke/sjmisc documentation built on June 21, 2018, 4:55 a.m.