pool.mi: Pooling Multiple Imputation Results

Description Usage Arguments Details Value Note References Examples

View source: R/pool_mi_stage3.R

Description

Combines multiple parameter estimates (as used in MI) across the K imputed datasets using Rubin 1996 / 1987 formulas, including: calculating a pooled mean, standard error, missing data statistics, confidence intervals, and p-values.

Usage

1
2
3
4
5
6
7
8
pool.mi(
  to.pool,
  n = 999999,
  method = c("smallsample", "rubin"),
  alpha = 0.05,
  prt = TRUE,
  verbose = FALSE
)

Arguments

to.pool

An array of p x 2 x K, where p is the number of parameters to be pooled, 2 refers to the parameters of mean and standard deviation, and K imputation draws. The rownames of to.pool are kept in the results.

n

A number providing the sample size, which is used in calculating the degrees of freedom. If nothing is specified, a large sample is assumed. Has no effect if K = 1.

method

A string to indicate the method to calculate the degrees of freedom, df.t. If method = "smallsample" (the default) then the Barnard-Rubin adjustment for small degrees of freedom is used. Otherwise, the method from Rubin (1987) is used.

alpha

Type I error used to form the confidence interval. Default: 0.05.

prt

Boolean variable for printing out the standard output. If TRUE, selective parts of a pool.mi object are printed to the screen in an understandable fashion.

verbose

Logical; if TRUE, prints more information. Useful to check for any errors in the code. Default: FALSE.

Details

Stage 3 of Multiple Imputation: We assume that each complete-data estimate is normally distributed. But, given incomplete data, we assume a t-distribution, which forms the basis for confidence intervals and hypothesis tests.

The input is an array with p rows referring to the number of parameters to be combined. An estimate and within standard error forms the two columns of the array, which can be easily be taken as the first two columns of the coefficients element of the summary of a glm/lm object. The last dimension is the number of imputations, K. See dataset wqs.pool.test as an example.

Uses Rubin's rules to calculate the statistics of an imputed dataset including: the pooled mean, total standard error, a relative increase in variance, the fraction of missing information, a (1-alpha)

Value

A data-frame is returned with the following columns:

pooled.mean

The pooled univariate estimate, Qbar, formula (3.1.2) Rubin (1987).

pooled.total.se

The total standard error of the pooled estimate, formula (3.1.5) Rubin (1987).

pooled.total.var

The total variance of the pooled estimate, formula (3.1.5) Rubin (1987).

se.within

The standard error of mean of the variances (i.e. the pooled within-imputation variance), formula (3.1.3) Rubin (1987).

se.between

The between-imputation standard error, square root of formula (3.1.4) Rubin (1987).

relative.inc.var(r)

The relative increase in variance due to nonresponse, formula (3.1.7) Rubin (1987).

proportion.var.missing(lambda)

The proportion of variation due to nonresponse, formula (2.24) Van Buuren (2012).

frac.miss.info

The fraction missing information due to nonresponse, formula (3.1.10) Rubin (1987).

df.t

The degrees of freedom for the reference t-distribution, formula (3.1.6) Rubin (1987) or method of Barnard-Rubin (1999) (if method = "smallsample" (default)).

CI

The (1-alpha)% confidence interval (CI) for each pooled estimate.

p.value

The p-value used to test significance.

Note

Modified the pool.scalar (version R 3.4) in the mice package to handle multiple parameters at once in an array and combine them. Similar to mi.inference in the norm package, but the small-sample adjustment is missing.

References

Rubin, D. B. (1987). Multiple Imputation for nonresponse in surveys. New York: Wiley.

Rubin, D. B. (1996). Multiple Imputation After 18+ Years. Journal of the American Statistical Association, 91(434), 473–489. https://doi.org/10.2307/2291635.

Barnard, J., & Rubin, D. B. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86(4), 948–955.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#### Example 1: Sample Dataset 87, using 10% BDL Scenario
data(wqs.pool.test)
# Example of the `to.pool` argument
head(wqs.pool.test)

# Pool WQS results and decrease in order of weights.
wqs.results.pooled <-   pool.mi(wqs.pool.test, n = 1000)
weight.dec <- c(order(wqs.results.pooled$pooled.mean[1:14], decreasing = TRUE), 15:16)
wqs.results.pooled <-  wqs.results.pooled[weight.dec, ]
wqs.results.pooled


# When there is 1 estimate (p = 1)
a <- pool.mi(wqs.pool.test[1, , , drop = FALSE], n = 1000)
a
# wqs.results.pooled["dieldrin", ]

# For single imputation (K = 1):
b <- pool.mi(wqs.pool.test[, , 1, drop = FALSE], n = 1000)
b

# Odds ratio and 95% CI using the CLT.
odds.ratio <- exp(wqs.results.pooled[15:16, c("pooled.mean", "CI.1", "CI.2")])
## makeJournalTables :: format.CI(odds.ratio, trim = TRUE, digits = 2, nsmall = 2)
odds.ratio

#  The mice package is suggested for the examples, but not needed for the function.

miWQS documentation built on April 3, 2021, 1:06 a.m.