pool: Pool results from models fitted on multiply imputed datasets
In miceFast: Fast Imputations Using 'Rcpp' and 'Armadillo'

View source: R/pool.R

pool	R Documentation

Pool results from models fitted on multiply imputed datasets

Description

Combines parameter estimates and standard errors from models fitted on m multiply imputed datasets using Rubin's rules (Rubin, 1987). Degrees of freedom are adjusted using the Barnard-Rubin (1999) small-sample correction.

This function works with any fitted model that supports coef and vcov methods (e.g., lm, glm, survival::coxph, etc.).

Results are validated against pool from the mice package for lm, glm (logistic and Poisson), weighted regression, interactions, and varying numbers of imputations.

Usage

pool(fits, dfcom = NULL)

Arguments

`fits`	a list of fitted model objects of length m >= 2. Each model must support `coef()` and `vcov()` methods. All models must have the same number of coefficients.
`dfcom`	a positive integer or `Inf`. The complete-data degrees of freedom. If `NULL` (default), it is extracted from the fitted models via `df.residual`. Set to `Inf` to skip the Barnard-Rubin small-sample correction.

Value

A data.frame with one row per parameter and columns:

term: Coefficient name.
m: Number of imputations.
estimate: Pooled estimate (average across m models).
std.error: Pooled standard error (sqrt(t)).
statistic: t-statistic (estimate / std.error).
p.value: Two-sided p-value from a t-distribution with df degrees of freedom.
df: Degrees of freedom (Barnard-Rubin adjusted).
riv: Relative increase in variance due to nonresponse: (1 + 1/m) * b / ubar.
lambda: Proportion of total variance attributable to missingness: (1 + 1/m) * b / t.
fmi: Fraction of missing information.
ubar: Within-imputation variance (average of the m variance estimates).
b: Between-imputation variance (variance of the m point estimates).
t: Total variance: ubar + (1 + 1/m) * b.
dfcom: Complete-data degrees of freedom used.
conf.low: Lower bound of the 95% confidence interval.
conf.high: Upper bound of the 95% confidence interval.

References

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Barnard, J. and Rubin, D.B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86(4), 948-955.

Examples

library(miceFast)
set.seed(123)
data(air_miss)

# Step 1: Generate m = 5 completed datasets using fill_NA with a stochastic model
completed <- lapply(1:5, function(i) {
  dat <- air_miss
  dat$Ozone <- fill_NA(
    x = dat,
    model = "lm_bayes",
    posit_y = "Ozone",
    posit_x = c("Solar.R", "Wind", "Temp")
  )
  dat
})

# Step 2: Fit a model on each completed dataset
fits <- lapply(completed, function(d) {
  lm(Ozone ~ Solar.R + Wind + Temp, data = d)
})

# Step 3: Pool using Rubin's rules
pool(fits)

miceFast documentation built on Feb. 26, 2026, 5:06 p.m.