# mice.impute.pmm: Imputation by predictive mean matching In mice: Multivariate Imputation by Chained Equations

## Description

Imputation by predictive mean matching

## Usage

  1 2 3 4 5 6 7 8 9 10 11 mice.impute.pmm( y, ry, x, wy = NULL, donors = 5L, matchtype = 1L, ridge = 1e-05, use.matcher = FALSE, ... ) 

## Arguments

 y Vector to be imputed ry Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y. x Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values. wy Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created. donors The size of the donor pool among which a draw is made. The default is donors = 5L. Setting donors = 1L always selects the closest match, but is not recommended. Values between 3L and 10L provide the best results in most cases (Morris et al, 2015). matchtype Type of matching distance. The default choice (matchtype = 1L) calculates the distance between the predicted value of yobs and the drawn values of ymis (called type-1 matching). Other choices are matchtype = 0L (distance between predicted values) and matchtype = 2L (distance between drawn values). ridge The ridge penalty used in .norm.draw() to prevent problems with multicollinearity. The default is ridge = 1e-05, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set ridge = 1e-06 or even lower to reduce bias. For highly collinear data, set ridge = 1e-04 or higher. use.matcher Logical. Set use.matcher = TRUE to specify the C function matcher(), the now deprecated matching function that was default in versions 2.22 (June 2014) to 3.11.7 (Oct 2020). Since version 3.12.0 mice() uses the much faster matchindex C function. Use the deprecated matcher function only for exact reproduction. ... Other named arguments.

## Details

Imputation of y by predictive mean matching, based on van Buuren (2012, p. 73). The procedure is as follows:

1. Calculate the cross-product matrix S=X_{obs}'X_{obs}.

2. Calculate V = (S+{diag}(S)κ)^{-1}, with some small ridge parameter κ.

3. Calculate regression weights \hatβ = VX_{obs}'y_{obs}.

4. Draw q independent N(0,1) variates in vector \dot z_1.

5. Calculate V^{1/2} by Cholesky decomposition.

6. Calculate \dotβ = \hatβ + \dotσ\dot z_1 V^{1/2}.

7. Calculate \dotη(i,j)=|X_{{obs},[i]|}\hatβ-X_{{mis},[j]}\dotβ with i=1,…,n_1 and j=1,…,n_0.

8. Construct n_0 sets Z_j, each containing d candidate donors, from Y_obs such that ∑_d\dotη(i,j) is minimum for all j=1,…,n_0. Break ties randomly.

9. Draw one donor i_j from Z_j randomly for j=1,…,n_0.

10. Calculate imputations \dot y_j = y_{i_j} for j=1,…,n_0.

The name predictive mean matching was proposed by Little (1988).

## Value

Vector with imputed data, same type as y, and of length sum(wy)

## Author(s)

Stef van Buuren, Karin Groothuis-Oudshoorn

## References

Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287–301.

Morris TP, White IR, Royston P (2015). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. ;14:75.

Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/

Other univariate imputation functions: mice.impute.cart(), mice.impute.lda(), mice.impute.logreg.boot(), mice.impute.logreg(), mice.impute.mean(), mice.impute.midastouch(), mice.impute.mnar.logreg(), mice.impute.norm.boot(), mice.impute.norm.nob(), mice.impute.norm.predict(), mice.impute.norm(), mice.impute.polr(), mice.impute.polyreg(), mice.impute.quadratic(), mice.impute.rf(), mice.impute.ri()

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # We normally call mice.impute.pmm() from within mice() # But we may call it directly as follows (not recommended) set.seed(53177) xname <- c("age", "hgt", "wgt") r <- stats::complete.cases(boys[, xname]) x <- boys[r, xname] y <- boys[r, "tv"] ry <- !is.na(y) table(ry) # percentage of missing data in tv sum(!ry) / length(ry) # Impute missing tv data yimp <- mice.impute.pmm(y, ry, x) length(yimp) hist(yimp, xlab = "Imputed missing tv") # Impute all tv data yimp <- mice.impute.pmm(y, ry, x, wy = rep(TRUE, length(y))) length(yimp) hist(yimp, xlab = 'Imputed missing and observed tv') plot(jitter(y), jitter(yimp), main = 'Predictive mean matching on age, height and weight', xlab = 'Observed tv (n = 224)', ylab = 'Imputed tv (n = 224)') abline(0, 1) cor(y, yimp, use = 'pair') 

### Example output   Attaching package: ‘mice’

The following object is masked from ‘package:stats’:

filter

The following objects are masked from ‘package:base’:

cbind, rbind

ry
FALSE  TRUE
503   224
 0.6918845
 503
 727
 0.7415001


mice documentation built on Jan. 27, 2021, 5:10 p.m.