Analysis of Multiply Imputed Data Sets
In mitml: Tools for Multiple Imputation in Multilevel Modeling

library(knitr)
set.seed(123)
options(width=87)
opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE,
               fig.width=9, fig.height=9, warning=FALSE,
               message=FALSE)

This vignette is intended to provide an overview of the analysis of multiply imputed data sets with mitml. Specifically, this vignette addresses the following topics:

Working with multiply imputed data sets
Rubin's rules for pooling individual parameters
Model comparisons
Parameter constraints

Further information can be found in the other vignettes and the package documentation.

Example data (`studentratings`)

For the purposes of this vignette, we make use of the studentratings data set, which contains simulated data from 750 students in 50 schools including scores on reading and math achievement, socioeconomic status (SES), and ratings on school and classroom environment.

The package and the data set can be loaded as follows.

library(mitml)
library(lme4)
data(studentratings)

As evident from its summary, most variables in the data set contain missing values.

summary(studentratings)

In the present example, we investigate the differences in mathematics achievement that can be attributed to differences in SES when controlling for students' sex. Specifically, we are interested in the following model.

$$ \mathit{MA}{ij} = \gamma{00} + \gamma_{10} \mathit{Sex}{ij} + \gamma{20} (\mathit{SES}{ij}-\overline{\mathit{SES}}{\bullet j}) + \gamma_{01} \overline{\mathit{SES}}{\bullet j} + u{0j} + e_{ij} $$

Note that this model also employs group-mean centering to separate the individual and group-level effects of SES.

Generating imputations

In the present example, we generate 20 imputations from the following imputation model.

fml <- ReadDis + SES ~ 1 + Sex + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 200, m = 20)

The completed data are then extracted with mitmlComplete.

implist <- mitmlComplete(imp, "all")

Transforming the imputed data sets

In empirical research, the raw data rarely enter the analyses but often require to be transformed beforehand. For this purpose, the mitml package provides the within function, which applies a given transformation directly to each data set.

In the following, we use this to (a) calculate the group means of SES and (b) center the individual scores around their group means.

implist <- within(implist, {
  G.SES <- clusterMeans(SES, ID) # calculate group means
  I.SES <- SES - G.SES           # center around group means
})

This method can be used to apply arbitrary transformations to all of the completed data sets simultaneously.

Note regarding dplyr: Due to how it is implemented, within cannot be used directly with dplyr. Instead, users may use with instead of within with the following workaround. r implist <- with(implist,{ df <- data.frame(as.list(environment())) df <- ... # dplyr commands df }) implist <- as.mitml.list(implist) Advanced users may also consider using lapply for a similar workaround.`

Fitting the analysis model

In order to analyze the imputed data, each data set is analyzed using regular complete-data techniques. For this purpose, mitml offers the with function. In the present example, we use it to fit the model of interest with the R package lme4.

fit <- with(implist, {
  lmer(MathAchiev ~ 1 + Sex + I.SES + G.SES + (1|ID))
})

This results in a list of fitted models, one for each of the imputed data sets.

Pooling

The results obtained from the imputed data sets must be pooled in order to obtain a set of final parameter estimates and inferences. In the following, we employ a number of different pooling methods that can be used to address common statistical tasks, for example, for (a) estimating and testing individual parameters, (b) model comparisons, and (c) tests of constraints about one or several parameters.

Parameter estimates

Individual parameters are commonly pooled with the rules developed by Rubin (1987). In mitml, Rubin's rules are implemented in the testEstimates function.

testEstimates(fit)

In addition, the argument extra.pars = TRUE can be used to obtain pooled estimates of variance components, and df.com can be used to specify the complete-data degrees of freedom, which provides more appropriate (i.e., conservative) inferences in smaller samples.

For example, using a conservative value for the complete-data degrees of freedom for the fixed effects in the model of interest (Snijders & Bosker, 2012), the output changes as follows.

testEstimates(fit, extra.pars = TRUE, df.com = 46)

Multiple parameters and model comparisons

Oftentimes, statistical inference concerns more than one parameter at a time. For example, the combined influence of SES (within and between groups) on mathematics achievement is represented by two parameters in the model of interest.

Multiple pooling methods for Wald and likelihood ratio tests (LRTs) are implemented in the testModels function. This function requires the specification of a full model and a restricted model, which are then compared using (pooled) Wald tests or LRTs. Specifically, testModels allows users to pool Wald tests ($D_1$), $\chi^2$ test statistics ($D_2$), and LRTs ($D_3$ and $D_4$; for a comparison of these methods, see also Grund, Lüdtke, & Robitzsch, 2016b).

To examine the combined influence of SES on mathematics achievement, the following restricted model can be specified and compared with the model of interest (using $D_1$).

fit.null <- with(implist, {
  lmer(MathAchiev ~ 1 + Sex + (1|ID))
})

testModels(fit, fit.null)

Note regarding the order of arguments: Please note that testModels expects that the first argument contains the full model, and the second argument contains the restricted model. If the order of the arguments is reversed, the results will not be interpretable.

Similar to the test for individual parameters, smaller samples can be accommodated with testModels (with method $D_1$) by specifying the complete-data degrees of freedom for the denominator of the $F$ statistic.

testModels(fit, fit.null, df.com = 46)

The pooling method used by testModels is determined by the method argument. For example, to calculate the pooled LRT corresponding to the Wald test above (i.e., $D_3$), the following command can be issued.

testModels(fit, fit.null, method="D3")

Constraints on parameters

Finally, it is often useful to investigate functions (or constraints) of the parameters in the model of interest. In complete data sets, this can be achieved with a test of linear hypotheses or the delta method. The mitml package implements a pooled version of the delta method in the testConstraints function.

For example, the combined influence of SES on mathematics achievement can also be tested without model comparisons by testing the constraint that the parameters pertaining to I.SES and G.SES are both zero. This constraint is defined and tested as follows.

c1 <- c("I.SES", "G.SES")
testConstraints(fit, constraints = c1)

This test is identical to the Wald test given in the previous section. Arbitrary constraints on the parameters can be specified and tested in this manner, where each character string denotes an expression to be tested against zero.

In the present example, we are also interested in the contextual effect of SES on mathematics achievement (e.g., Snijders & Bosker, 2012). The contextual effect is simply the difference between the coefficients pertaining to G.SES and I.SES and can be tested as follows.

c2 <- c("G.SES - I.SES")
testConstraints(fit, constraints = c2)

Similar to model comparisons, constraints can be tested with different methods ($D_1$ and $D_2$) and can accommodate smaller samples by a value for df.com. Further examples for the analysis of multiply imputed data sets with mitml are given by Enders (2016) and Grund, Lüdtke, and Robitzsch (2016a).

References

Enders, C. K. (2016). Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour Research and Therapy. doi: 10.1016/j.brat.2016.11.008 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (2016a). Multiple imputation of multilevel missing data: An introduction to the R package pan. SAGE Open, 6(4), 1–17. doi: 10.1177/2158244016668220 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (2016b). Pooling ANOVA results from multiply imputed datasets: A simulation study. Methodology, 12, 75–88. doi: 10.1027/1614-2241/a000111 (Link)

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.

cat("Author: Simon Grund (simon.grund@uni-hamburg.de)\nDate:  ", as.character(Sys.Date()))

Any scripts or data that you put into this service are public.

mitml documentation built on March 31, 2023, 7:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mitml
Tools for Multiple Imputation in Multilevel Modeling

Analysis of Multiply Imputed Data Sets
In mitml: Tools for Multiple Imputation in Multilevel Modeling

Example data (`studentratings`)

Generating imputations

Transforming the imputed data sets

Fitting the analysis model

Pooling

Parameter estimates

Multiple parameters and model comparisons

Constraints on parameters

References

Try the mitml package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mitml Tools for Multiple Imputation in Multilevel Modeling

Analysis of Multiply Imputed Data Sets In mitml: Tools for Multiple Imputation in Multilevel Modeling

Example data (studentratings)

Generating imputations

Transforming the imputed data sets

Fitting the analysis model

Pooling

Parameter estimates

Multiple parameters and model comparisons

Constraints on parameters

References

Try the mitml package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mitml
Tools for Multiple Imputation in Multilevel Modeling

Analysis of Multiply Imputed Data Sets
In mitml: Tools for Multiple Imputation in Multilevel Modeling

Example data (`studentratings`)