hot.deck: Multiple Hot Deck Imputation
In davidaarmstrong/hot.deck: Multiple Hot-Deck Imputation

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/hot.deck.R

This function performs multiple hot deck imputation on an input data frame with missing observations using either the “best cell” method (default) or the “probabilistic draw” method as described in Cranmer and Gill (2013). This technique is best suited for missingness in discrete variables, though it also performs well on continuous missing data.

1
2
3

hot.deck(data, m = 5, method = c("best.cell", "p.draw"), cutoff = 10, sdCutoff = 1, 
    optimizeSD = FALSE, optimStep = 0.1, optimStop = 5, weightedAffinity = FALSE, 
    impContinuous = c("HD", "mice"), IDvars = NULL, ...)

`data`	A data frame or matrix with missing values to be imputed using multiple hot deck imputation.
`m`	Number of imputed datasets required.
`method`	Method used to draw donors based on affinity either “best.cell” (the default) or “p.draw” for probabilistic draw
`cutoff`	A numeric scalar such that any variable with fewer than `cutoff` unique non-missing values will be considered discrete and necessarily imputed with hot deck imputation.
`sdCutoff`	Number of standard deviations between observations such that observations fewer than `sdCutoff` standard deviations away from each other are considered sufficiently close to be a match, otherwise they are considered too far away to be a match.
`optimizeSD`	Logical indicating whether the `sdCutoff` parameter should be optimized such that the smallest possible value is chosen that produces no thin cells from which to draw donors. Thin cells are those where the number of donors is less than `m`.
`optimStep`	The size of the steps in the optimization if `optimizeSD` is `TRUE`.
`optimStop`	The value at which optimization should stop if it has not already found a value that produces no thin cells. If this value is reached and thin cells still exist, a warning will be returned, though the routine will continue using `optimStop` as `sdCutoff`.
`weightedAffinity`	Logical indicating whether a correlation-weighted affinity score should be used.
`impContinuous`	Character string indicating how continuous missing data should be imputed. Valid options are “HD” (the default) in which case hot-deck imputation will be used, or “mice” in which case multiple imputation by chained equations will be used.
`IDvars`	A character vector of variable names not to be used in the imputation, but to be included in the final imputed datasets.
`...`	Optional additional arguments to be passed down to the `mice` routine.

A list with the following elements:

`data`	An object of class `mi` which contains `m` imputed datasets.
`affinity`	A matrix of affinity scores see `affinity`.
`donors`	A list of donors for each missing observation based on the affinity score.
`draws`	The `m` observations drawn from donors that were used for the multiple imputations.
`max.emp.aff`	Normalization constant for each row of affinity scores; the maximum possible value of the affinity scores if correlation-weighting is used.
`max.the.aff`	Normalization constant for each row of affinity scores; the number of columns in the original data.

Skyler Cranmer, Jeff Gill, Natalie Jackson, Andreas Murr and Dave Armstrong

Cranmer, S.J. and Gill, J.M.. (2013) “We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data.” British Journal of Political Science 43:2 (425-449).

van Buuren, S. and Karin Groothuis-Oudshoorn (2011). “mice: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software, 45:3 (1-67).

mice, affinity