Imputation by predictive mean matching

mice.impute.pmm( y, ry, x, wy = NULL, donors = 5L, matchtype = 1L, exclude = -99999999, ridge = 1e-05, use.matcher = FALSE, ... )

`y` |
Vector to be imputed |

`ry` |
Logical vector of length |

`x` |
Numeric design matrix with |

`wy` |
Logical vector of length |

`donors` |
The size of the donor pool among which a draw is made.
The default is |

`matchtype` |
Type of matching distance. The default choice
( |

`exclude` |
Value or vector of values to exclude from the imputation donor pool in |

`ridge` |
The ridge penalty used in |

`use.matcher` |
Logical. Set |

`...` |
Other named arguments. |

Imputation of `y`

by predictive mean matching, based on
van Buuren (2012, p. 73). The procedure is as follows:

Calculate the cross-product matrix

*S=X_{obs}'X_{obs}*.Calculate

*V = (S+{diag}(S)κ)^{-1}*, with some small ridge parameter*κ*.Calculate regression weights

*\hatβ = VX_{obs}'y_{obs}.*Draw

*q*independent*N(0,1)*variates in vector*\dot z_1*.Calculate

*V^{1/2}*by Cholesky decomposition.Calculate

*\dotβ = \hatβ + \dotσ\dot z_1 V^{1/2}*.Calculate

*\dotη(i,j)=|X_{{obs},[i]|}\hatβ-X_{{mis},[j]}\dotβ*with*i=1,…,n_1*and*j=1,…,n_0*.Construct

*n_0*sets*Z_j*, each containing*d*candidate donors, from Y_obs such that*∑_d\dotη(i,j)*is minimum for all*j=1,…,n_0*. Break ties randomly.Draw one donor

*i_j*from*Z_j*randomly for*j=1,…,n_0*.Calculate imputations

*\dot y_j = y_{i_j}*for*j=1,…,n_0*.

The name *predictive mean matching* was proposed by Little (1988).

Vector with imputed data, same type as `y`

, and of length
`sum(wy)`

Gerko Vink, Stef van Buuren, Karin Groothuis-Oudshoorn

# We normally call mice.impute.pmm() from within mice() # But we may call it directly as follows (not recommended) set.seed(53177) xname <- c("age", "hgt", "wgt") r <- stats::complete.cases(boys[, xname]) x <- boys[r, xname] y <- boys[r, "tv"] ry <- !is.na(y) table(ry) # percentage of missing data in tv sum(!ry) / length(ry) # Impute missing tv data yimp <- mice.impute.pmm(y, ry, x) length(yimp) hist(yimp, xlab = "Imputed missing tv") # Impute all tv data yimp <- mice.impute.pmm(y, ry, x, wy = rep(TRUE, length(y))) length(yimp) hist(yimp, xlab = "Imputed missing and observed tv") plot(jitter(y), jitter(yimp), main = "Predictive mean matching on age, height and weight", xlab = "Observed tv (n = 224)", ylab = "Imputed tv (n = 224)" ) abline(0, 1) cor(y, yimp, use = "pair") # Use blots to exclude different values per column # Create blots object blots <- make.blots(boys) # Exclude ml 1 through 5 from tv donor pool blots$tv$exclude <- c(1:5) # Exclude 100 random observed heights from tv donor pool blots$hgt$exclude <- sample(unique(boys$hgt), 100) imp <- mice(boys, method = "pmm", print = FALSE, blots = blots, seed=123) blots$hgt$exclude %in% unlist(c(imp$imp$hgt)) # MUST be all FALSE blots$tv$exclude %in% unlist(c(imp$imp$tv)) # MUST be all FALSE

