pool_over_imputations: Pool Parameter Estimates Over Imputations

View source: R/S05_Statistics.R

pool_over_imputationsR Documentation

Pool Parameter Estimates Over Imputations

Description

Function to pool parameter estimates as well as their squared standard errors and associated degrees of freedom over imputation sets when using multiple imputation via chained equations (MICE). Estimates are pooled according to Rubin's rules. Input notation assumes I imputations and P parameters.

Usage

pool_over_imputations(estimates, standard_errors, degrees_of_freedom)

Arguments

estimates

An I x P matrix of parameter estimates.

standard_errors

An I x P matrix of standard errors.

degrees_of_freedom

An I x P matrix of degrees of freedom.

Details

Pools over multiple imputations using Rubin's rules. Final parameter estimates are the average over imputations, while standard errors are the combination of within variance (average of the estimates' variance, the square of the standard error, across imputations) and between variance (the variance of the point estimates across imputations). Finally, a correction is applied to the parameters' degrees of freedom.

Value

A matrix with the pooled estimates, standard errors, and associated degrees of freedom for the set of parameters.

References

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons.

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45 (3), 1-67. https://doi.org/10.18637/jss.v045.i03.

Examples

## Not run: 
# Simulation example
set.seed( 222 ) # For reproducibility

# Simulate 200 observations from
# 3 correlated variables
n <- 200
Sigma <- rbind(
  c(  1.0,  0.3,  0.2 ),
  c(  0.3,  1.0,  0.7 ),
  c(  0.2,  0.7,  1.0 )
)
Y <- MASS::mvrnorm(
  n, c( 0, 0, 0 ), Sigma
)
colnames(Y) <- c( 'Y1', 'Y2', 'Y3' )
Y <- data.frame( Y )

# Missing values for Y2 depend on both Y1 and Y2
Y$R2 <- rbinom(
  n, 1, logistic(
    logit(.3) + log(2)*Y$Y1 + log(4)*Y$Y2
  )
)
Y$Y2.M <- Y$Y2
Y$Y2.M[ Y$R2 == 1 ] <- NA

# Predict Y1 from Y2 (All cases)
lm_Y1_from_Y2 <- lm(
  Y1 ~ Y2, data = Y
)
print( round(
  summary( lm_Y1_from_Y2 )$coefficients[2, c(1:2, 4)], 3
) )

# Predict Y1 from Y2 (Complete cases)
lm_Y1_from_Y2.M <- lm(
  Y1 ~ Y2.M, data = Y
)
print( round(
  summary( lm_Y1_from_Y2.M )$coefficients[2, c(1:2, 4)], 3
) )

# Impute missing values of Y2 from Y3 via a simplistic
# version of predictive mean matching (Note better
# methods exist, approach is for example purposes only)

lm_Y2.M_from_Y3 <- lm(
  Y2.M ~ Y3, data = Y[ Y$R2 == 0, ]
)
Y2.M_pred <- predict(
  lm_Y2.M_from_Y3, newdata = Y[ Y$R2 == 1, ]
)

# Impute missing values 10 times, sampling randomly
# from 5 closest observed values
i <- sapply(
  1:10, function(m) {

    sapply(
      Y2.M_pred, function( yhat ) {
        smallest_diff <-
          order( abs( Y$Y2.M[ Y$R2 == 0 ] - yhat ) )
        return(
          Y$Y2.M[ Y$R2 == 0 ][smallest_diff][
            sample( 1:5, size = 1 )
          ]
        )
      }
    )

  }
)

# Predict Y1 from Y2 using imputed data sets
# and save estimates, standard errors, and
# degrees of freedom
cf <- sapply(
  1:10, function(m) {

    Y$Y2.I <- Y$Y2.M
    Y$Y2.I[ Y$R2 == 1 ] <- i[, m]

    lm_Y1_from_Y2.I <- lm(
      Y1 ~ Y2.I, data = Y
    )

    sm <- summary( lm_Y1_from_Y2.I )

    return(
      cbind( sm$coefficients[, 1:2], sm$df[2] )
    )

  }
)
cf <- array(
  cf, dim = c( 2, 3, 10 )
)

pooled <- pool_over_imputations(
  t( cf[, 1, ] ), # Coefficients
  t( cf[, 2, ] ), # Squared standard errors
  t( cf[, 3, ] ) # Degrees of freedom
)
print( round(
    c( pooled[2, 1:2],
       pt(
         abs( pooled[2, 1]/pooled[2, 2] ), pooled[2, 3], lower.tail = F
         )*2
  ), 3
) )

## End(Not run)


rettopnivek/arfpam documentation built on Oct. 20, 2024, 7:24 p.m.