# pool: Combine estimates by Rubin's rules In mice: Multivariate Imputation by Chained Equations

## Description

The `pool()` function combines the estimates from `m` repeated complete data analyses. The typical sequence of steps to do a multiple imputation analysis is:

1. Impute the missing data by the `mice` function, resulting in a multiple imputed data set (class `mids`);

2. Fit the model of interest (scientific model) on each imputed data set by the `with()` function, resulting an object of class `mira`;

3. Pool the estimates from each model into a single set of estimates and standard errors, resulting is an object of class `mipo`;

4. Optionally, compare pooled estimates from different scientific models by the `D1()` or `D3()` functions.

A common error is to reverse steps 2 and 3, i.e., to pool the multiply-imputed data instead of the estimates. Doing so may severely bias the estimates of scientific interest and yield incorrect statistical intervals and p-values. The `pool()` function will detect this case.

## Usage

 `1` ```pool(object, dfcom = NULL) ```

## Arguments

 `object` An object of class `mira` (produced by `with.mids()` or `as.mira()`), or a `list` with model fits. `dfcom` A positive number representing the degrees of freedom in the complete-data analysis. Normally, this would be the number of independent observation minus the number of fitted parameters. The default (`dfcom = NULL`) extract this information in the following order: 1) the component `residual.df` returned by `glance()` if a `glance()` function is found, 2) the result of `df.residual(` applied to the first fitted model, and 3) as `999999`. In the last case, the warning `"Large sample assumed"` is printed. If the degrees of freedom is incorrect, specify the appropriate value manually.

## Details

The `pool()` function averages the estimates of the complete data model, computes the total variance over the repeated analyses by Rubin's rules (Rubin, 1987, p. 76), and computes the following diagnostic statistics per estimate:

1. Relative increase in variance due to nonresponse `r`;

2. Residual degrees of freedom for hypothesis testing `df`;

3. Proportion of total variance due to missingness `lambda`;

4. Fraction of missing information `fmi`.

The function requires the following input from each fitted model:

1. the estimates of the model, usually obtainable by `coef()`

2. the standard error of each estimate;

3. the residual degrees of freedom of the model.

The degrees of freedom calculation for the pooled estimates uses the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999).

The `pool()` function relies on the `broom::tidy` for extracting the parameters. Versions before `mice 3.8.5` failed when no `broom::glance()` function was found for extracting the residual degrees of freedom. The `pool()` function is now more forgiving.

In versions prior to `mice 3.0` pooling required only that `coef()` and `vcov()` methods were available for fitted objects. This feature is no longer supported. The reason is that `vcov()` methods are inconsistent across packages, leading to buggy behaviour of the `pool()` function.

Since `mice 3.0+`, the `broom` package takes care of filtering out the relevant parts of the complete-data analysis. It may happen that you'll see the messages like `Error: No tidy method for objects of class ...` or `Error: No glance method for objects of class ...`. The message means that your complete-data method used in `with(imp, ...)` has no `tidy` or `glance` method defined in the `broom` package.

The `broom.mixed` package contains `tidy` and `glance` methods for mixed models. If you are using a mixed model, first run `library(broom.mixed)` before calling `pool()`.

If no `tidy` or `glance` methods are defined for your analysis tabulate the `m` parameter estimates and their variance estimates (the square of the standard errors) from the `m` fitted models stored in `fit\$analyses`. For each parameter, run `pool.scalar` to obtain the pooled parameters estimate, its variance, the degrees of freedom, the relative increase in variance and the fraction of missing information.

An alternative is to write your own `glance()` and `tidy()` methods and add these to `broom` according to the specifications given in https://broom.tidymodels.org.

## Value

An object of class `mipo`, which stands for 'multiple imputation pooled outcome'.

## References

Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.

van Buuren S and Groothuis-Oudshoorn K (2011). `mice`: Multivariate Imputation by Chained Equations in `R`. Journal of Statistical Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/

`with.mids`, `as.mira`, `pool.scalar`, `glance`, `tidy` https://github.com/amices/mice/issues/142, https://github.com/amices/mice/issues/274

## Examples

 ```1 2 3 4``` ```# pool using the classic MICE workflow imp <- mice(nhanes, maxit = 2, m = 2) fit <- with(data = imp, exp = lm(bmi ~ hyp + chl)) summary(pool(fit)) ```

### Example output

``` iter imp variable
1   1  bmi  hyp  chl
1   2  bmi  hyp  chl
2   1  bmi  hyp  chl
2   2  bmi  hyp  chl
est         se           t       df   Pr(>|t|)       lo 95
(Intercept) 22.36432970 5.42078497  4.12566258 3.232511 0.02232769  5.79415166
hyp         -0.14484606 2.55004936 -0.05680128 6.096104 0.95651960 -6.36082475
chl          0.02288881 0.03078571  0.74348809 2.559354 0.51948266 -0.08535482
hi 95 nmis       fmi    lambda
(Intercept) 38.9345077   NA 0.6371618 0.4657088
hyp          6.0711326    8 0.4577303 0.3048944
chl          0.1311324   10 0.7014198 0.5336477
```

mice documentation built on Nov. 14, 2020, 5:07 p.m.