impute.SEQ_HD: Sequential Hot-Deck Imputation
In HotDeckImputation: Hot Deck Imputation Methods for Missing Data

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

View source: R/SeqentialHotDeck.R

Resolves missing data by sequential Hot-Deck Imputation.

1 2	impute.SEQ_HD(DATA = NULL, initialvalues = 0, navalues = NA, modifyinplace = TRUE)

`DATA`	Data containing missing values. Should be a matrix of integers.
`initialvalues`	The initial values for the start-up process of the imputation. Should be `"integer"` and `length(initialvalues)==1 \| length(initialvalues)==dim(DATA)[2]`. The default of `0` is not normally a good value.
`navalues`	NA code for each variable that should be imputed. Should be `"integer"` and `length(initialvalues)==1 \| length(initialvalues)==dim(DATA)[2]`. Default is R's NA value.
`modifyinplace`	Should `DATA` be modified in place? (See the Section: Warning.) If not, a copy is made.

This function imputes the missing values in any variable by replicating the most recently observed value in that variable.

An imputed data matrix the same size as the input DATA.

If modifyinplace == FALSE DATA or rather the variable supplied is edited directly! This is significantly faster if the data set is large.

This is by far the fastest imputation method. Only one pass of the data is needed. However, no covariate information is used, thus only leads to good results if the data are missing MCAR.

Dieter William Joenssen Dieter.Joenssen@googlemail.com

Hanson, R.H. (1978) The Current Population Survey: Design and Methodology. Technical Paper No. 40 . U.S. Bureau of the Census.

Joenssen, D.W. (2015) Hot-Deck-Verfahren zur Imputation fehlender Daten – Auswirkungen des Donor-Limits. Ilmenau: Ilmedia. [in German, Dissertation]

Joenssen, D.W. and Bankhofer, U. (2012) Donor Limited Hot Deck Imputation: Effects on Parameter Estimation. Journal of Theoretical and Applied Computer Science. 6, 58–70.

Joenssen, D.W. and Muellerleile, T. (2014) Fehlende Daten bei Data-Mining. HMD Praxis der Wirtschaftsinformatik. 51, 458–468, 2014. doi: 10.1365/s40702-014-0038-8 [in German]

impute.CPS_SEQ_HD, impute.mean, impute.NN_HD

#Set the random seed to an arbitrary number
set.seed(421)

n<-1000
m<-5
pmiss<-.1


#Generate matrix of random integers
Y<-matrix(sample(0:9,replace=TRUE,size=n*m),nrow=n)

#generate 6 missing values, MCAR, in all but the first row
Y[-1,][sample(1:length(Y[-1,]),size=floor(pmiss*length(Y[-1,])))]<-NA

#perform the sequential imputation Y
impute.SEQ_HD(DATA=Y,initialvalues=0, navalues=NA, modifyinplace = FALSE)

####an example highlighting the modifyinplace option
#using cbind to show the results of the function and the intial data next to another
cbind(impute.SEQ_HD(DATA=Y,initialvalues=0, navalues=NA, modifyinplace = FALSE),Y)
#notice that columns 6-10 (representing Y) still have missing data

#same procedure, except modifyinplace is set to TRUE
cbind(impute.SEQ_HD(DATA=Y,initialvalues=0, navalues=NA, modifyinplace = TRUE),Y)
#notice that columns 6-10 (representing Y) are identical to columns 1-5, 
#Y has (and any Variables pointing to the same object have) been directly modified.