Data augmentation for incomplete multivariate normal data


Data augmentation under a normal-inverted Wishart prior. If no prior is specified by the user, the usual "noninformative" prior for the multivariate normal distribution is used. This function simulates one or more iterations of a single Markov chain. Each iteration consists of a random imputation of the missing data given the observed data and the current parameter value (I-step), followed by a draw from the posterior distribution of the parameter given the observed data and the imputed data (P-step).


da.norm(s, start, prior, steps=1, showits=FALSE, return.ymis=FALSE)



summary list of an incomplete normal data matrix produced by the function prelim.norm.


starting value of the parameter. This is a parameter vector in packed storage, such as one created by the function makeparam.norm. One obvious choice for a starting value is an ML estimate or posterior mode produced by em.norm.


optional prior distribution. This is a list containing the hyperparameters of a normal-inverted Wishart distribution. In order, the elements of the list are: tau (a scalar), m (a scalar), mu0 (a vector of length ncol(x), where x is the original matrix of incomplete data), and lambdainv (a matrix of dimension c(ncol(x),ncol(x))). The elements of mu0 and lambdainv apply to the data after transformation, i.e. after the columns have been centered and scaled to have mean zero and variance one. If no prior is supplied, the default is the usual noninformative prior for a multivariate normal model: tau=0, m=-1, mu0=arbitrary, and lambdainv = matrix of zeros.


number of data augmentation iterations to be simulated.


if TRUE, reports the iterations so the user can monitor the progress of the algorithm.


if TRUE, returns the output of the last I-step (imputed values of missing data) in addition to the output of the last P-step. These imputed values are useful for forming Rao-Blackwellized estimates of posterior summaries.


if return.ymis=FALSE, returns a parameter vector, the result of the last P-step. If the value of steps was large enough to guarantee approximate stationarity, then this parameter can be regarded as a proper draw from the observed-data posterior, independent of start. If return.ymis=TRUE, then this function returns a list of the following two components:


a parameter vector, the result of the last P-step


a vector of missing values, the result of the last I-step. The length of this vector is sum(, where x is the original data matrix. The storage order is the same as that of x[].


Before this function may be used, the random number generator seed must be initialized with rngseed at least once in the current S session.


See Chapter 5 of Schafer (1996).

See Also

rngseed, em.norm, prelim.norm, and getparam.norm.


s  <-  prelim.norm(mdata)
thetahat <- em.norm(s)   #find the MLE for a starting value
rngseed(1234567)   #set random number generator seed
theta <- da.norm(s,thetahat,steps=20,showits=TRUE)  # take 20 steps
getparam.norm(s,theta) # look at result

Want to suggest features or report bugs for Use the GitHub issue tracker. Vote for new features on Trello.

comments powered by Disqus