View source: R/mice.impute.pmm.R

mice.impute.pmm | R Documentation |

Imputation by predictive mean matching

mice.impute.pmm( y, ry, x, wy = NULL, donors = 5L, matchtype = 1L, exclude = -99999999, ridge = 1e-05, use.matcher = FALSE, ... )

`y` |
Vector to be imputed |

`ry` |
Logical vector of length |

`x` |
Numeric design matrix with |

`wy` |
Logical vector of length |

`donors` |
The size of the donor pool among which a draw is made.
The default is |

`matchtype` |
Type of matching distance. The default choice
( |

`exclude` |
Value or vector of values to exclude from the imputation donor pool in |

`ridge` |
The ridge penalty used in |

`use.matcher` |
Logical. Set |

`...` |
Other named arguments. |

Imputation of `y`

by predictive mean matching, based on
van Buuren (2012, p. 73). The procedure is as follows:

Calculate the cross-product matrix

*S=X_{obs}'X_{obs}*.Calculate

*V = (S+{diag}(S)κ)^{-1}*, with some small ridge parameter*κ*.Calculate regression weights

*\hatβ = VX_{obs}'y_{obs}.*Draw

*q*independent*N(0,1)*variates in vector*\dot z_1*.Calculate

*V^{1/2}*by Cholesky decomposition.Calculate

*\dotβ = \hatβ + \dotσ\dot z_1 V^{1/2}*.Calculate

*\dotη(i,j)=|X_{{obs},[i]|}\hatβ-X_{{mis},[j]}\dotβ*with*i=1,…,n_1*and*j=1,…,n_0*.Construct

*n_0*sets*Z_j*, each containing*d*candidate donors, from Y_obs such that*∑_d\dotη(i,j)*is minimum for all*j=1,…,n_0*. Break ties randomly.Draw one donor

*i_j*from*Z_j*randomly for*j=1,…,n_0*.Calculate imputations

*\dot y_j = y_{i_j}*for*j=1,…,n_0*.

The name *predictive mean matching* was proposed by Little (1988).

Vector with imputed data, same type as `y`

, and of length
`sum(wy)`

Gerko Vink, Stef van Buuren, Karin Groothuis-Oudshoorn

Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287–301.

Morris TP, White IR, Royston P (2015). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. ;14:75.

Van Buuren, S. (2018).
*Flexible Imputation of Missing Data. Second Edition.*
Chapman & Hall/CRC. Boca Raton, FL.

Van Buuren, S., Groothuis-Oudshoorn, K. (2011). `mice`

: Multivariate
Imputation by Chained Equations in `R`

. *Journal of Statistical
Software*, **45**(3), 1-67. doi: 10.18637/jss.v045.i03

Other univariate imputation functions:
`mice.impute.cart()`

,
`mice.impute.lasso.logreg()`

,
`mice.impute.lasso.norm()`

,
`mice.impute.lasso.select.logreg()`

,
`mice.impute.lasso.select.norm()`

,
`mice.impute.lda()`

,
`mice.impute.logreg.boot()`

,
`mice.impute.logreg()`

,
`mice.impute.mean()`

,
`mice.impute.midastouch()`

,
`mice.impute.mnar.logreg()`

,
`mice.impute.mpmm()`

,
`mice.impute.norm.boot()`

,
`mice.impute.norm.nob()`

,
`mice.impute.norm.predict()`

,
`mice.impute.norm()`

,
`mice.impute.polr()`

,
`mice.impute.polyreg()`

,
`mice.impute.quadratic()`

,
`mice.impute.rf()`

,
`mice.impute.ri()`

# We normally call mice.impute.pmm() from within mice() # But we may call it directly as follows (not recommended) set.seed(53177) xname <- c("age", "hgt", "wgt") r <- stats::complete.cases(boys[, xname]) x <- boys[r, xname] y <- boys[r, "tv"] ry <- !is.na(y) table(ry) # percentage of missing data in tv sum(!ry) / length(ry) # Impute missing tv data yimp <- mice.impute.pmm(y, ry, x) length(yimp) hist(yimp, xlab = "Imputed missing tv") # Impute all tv data yimp <- mice.impute.pmm(y, ry, x, wy = rep(TRUE, length(y))) length(yimp) hist(yimp, xlab = "Imputed missing and observed tv") plot(jitter(y), jitter(yimp), main = "Predictive mean matching on age, height and weight", xlab = "Observed tv (n = 224)", ylab = "Imputed tv (n = 224)" ) abline(0, 1) cor(y, yimp, use = "pair") # Use blots to exclude different values per column # Create blots object blots <- make.blots(boys) # Exclude ml 1 through 5 from tv donor pool blots$tv$exclude <- c(1:5) # Exclude 100 random observed heights from tv donor pool blots$hgt$exclude <- sample(unique(boys$hgt), 100) imp <- mice(boys, method = "pmm", print = FALSE, blots = blots, seed=123) blots$hgt$exclude %in% unlist(c(imp$imp$hgt)) # MUST be all FALSE blots$tv$exclude %in% unlist(c(imp$imp$tv)) # MUST be all FALSE

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.