mcmcNorm: MCMC algorithm for incomplete multivariate normal data
In norm2: Analysis of Incomplete Multivariate Data under a Normal Model

Description Usage Arguments Details Value Author(s) References See Also Examples

Simulates parameters and missing values from a joint posterior distribution under a normal model using Markov chain Monte Carlo.

mcmcNorm(obj, ...)


## Default S3 method:
mcmcNorm(obj, x = NULL, intercept = TRUE,
   starting.values, iter = 1000, multicycle = NULL,
   seeds = NULL, prior = "uniform",
   prior.df = NULL, prior.sscp = NULL, save.all.series = TRUE,
   save.worst.series = FALSE, worst.linear.coef = NULL,
   impute.every = NULL, ...)


## S3 method for class 'formula'
mcmcNorm(formula, data, starting.values, 
   iter = 1000, multicycle = NULL, seeds = NULL, prior = "uniform", 
   prior.df = NULL, prior.sscp = NULL, save.all.series = TRUE, 
   save.worst.series = FALSE, worst.linear.coef = NULL,
   impute.every=NULL, ...)


## S3 method for class 'norm'
mcmcNorm(obj, starting.values = obj$param,
   iter = 1000, multicycle = obj$multicycle, 
   seeds = NULL, prior = obj$prior, prior.df = obj$prior.df, 
   prior.sscp = obj$prior.sscp,
   save.all.series = !(obj$method=="MCMC" & is.null( obj$series.beta )), 
   save.worst.series = !is.null( obj$worst.linear.coef ),
   worst.linear.coef = obj$worst.linear.coef,
   impute.every = obj$impute.every, ...)

`obj`	an object used to select a method. It may be `y`, a numeric matrix, vector or data frame containing response variables to be modeled as normal. Missing values (`NA`s) are allowed. If `y` is a data frame, any factors or ordered factors will be replaced by their internal codes, and a warning will be given. Alternatively, this first argument may be a `formula` as described below, or an object of class `"norm"` resulting from a call to `emNorm` or `mcmcNorm`; see DETAILS.
`x`	a numeric matrix, vector or data frame of covariates to be used as predictors for `y`. Missing values (`NA`'s) are not allowed. If `x` is a matrix, it must have the same number of rows as `y`. If `x` is a data frame, any factors or ordered factors are replaced by their internal codes, and a warning is given. If `NULL`, it defaults to `x = rep(1,nrow(y))`, an intercept-only model.
`intercept`	if `TRUE`, then a column of `1`'s is appended to `x`. Ignored if `x = NULL`.
`formula`	an object of class `"formula"` (or one that can be coerced to that class): a symbolic description of the model which is provided in lieu of `y` and `x`. The details of model specification are given under DETAILS.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `mcmcNorm` is called.
`starting.values`	starting values for the model parameters. This must be a list with two named components, `beta` and `sigma`, which are numeric matrices with correct dimensions. In most circumstances, the starting values will be obtained from a prior run of `emNorm` or `mcmcNorm`; see DETAILS.
`iter`	number of iterations to be performed. By default, each iteration consists of one Imputation or I-step followed by one Posterior or P-step, but this can be changed by `multicycle`.
`multicycle`	number of cycles per iteration, with `NULL` equivalent to `multicycle=1`. Specifying `multicycle=`k for some k>1 instructs `mcmcNorm` to perform the I-step and P-step cycle `k` times within each iteration; see DETAILS.
`seeds`	two integers to initialize the random number generator; see DETAILS.
`prior`	should be `"uniform"`, `"jeffreys"`, `"ridge"` or `"invwish"`. If `"ridge"` then `prior.df` must be supplied. If `"invwish"` then `prior.df` and `prior.sscp` must be supplied. For more information, see DETAILS.
`prior.df`	prior degrees of freedom for a ridge (`prior="ridge"`) or inverted Wishart (`prior="invwish"`) prior.
`prior.sscp`	prior sums of squares and cross-products (SSCP) matrix for an inverted Wishart prior (`prior="invwish"`).
`save.all.series`	if `TRUE`, then the simulated values of all parameters at all iterations will be saved.
`save.worst.series`	if `TRUE`, then the simulated values of the worst linear function of the parameters will be saved. Under ordinary circumstances, this function will have been estimated by `emNorm` after the EM algorithm converged.
`worst.linear.coef`	vector or coefficients that define the worst linear function of the parameters. Under ordinary circumstances, these are provided automatically in the result from `emNorm`.
`impute.every`	how many iterations to perform between imputations? If `impute.every=`k, then the simulated values for the missing data after every k iterations will be saved, resulting in `floor(iter/impute.every)` multiple imputations. If `NULL`, then no imputations will be saved.
`...`	values to be passed to the methods.

There are three different ways to specify the data and model when calling mcmcNorm:

by directly supplying as the initial argument a matrix of numeric response variables y, along with an optional matrix of predictor variables x;
by supplying a model specification formula, along with an optional data frame data; or
by supplying an object of class "norm", which was produced by an earlier call to emNorm or mcmcNorm.

In the first case, the matrix y is assumed to have a multivariate normal linear regression on x with coefficients beta and covariance matrix sigma, where dim(beta)=c(ncol(x),ncol(y)) and dim(sigma)=c(ncol(y),ncol(y)). Missing values NA are allowed in y but not in x.

In the second case, formula is a formula for a (typically multivariate) linear regression model in the manner expected by lm. A formula is given as y ~ model, where y is either a single numeric variable or a matrix of numeric variables bound together with the function cbind. The right-hand side of the formula (everything to the right of ~) is a linear predictor, a series of terms separated by operators +, : or * to specify main effects and interactions. Factors are allowed on the right-hand side and will enter the model as contrasts among the levels. The intercept term 1 is included by default; to remove the intercept, use -1.

In the third case, the initial argument to mcmcNorm is an object of class "norm" returned by a previous call to emNorm or mcmcNorm. The value of the parameters carried in this object (the estimates from the last iteration of EM or the simulated values from the last iteration of MCMC) will be used as the starting values.

The matrix y is assumed to have a multivariate normal linear regression on x with coefficients beta and covariance matrix sigma, where dim(beta)=c(ncol(x),ncol(y)) and dim(sigma)=c(ncol(y),ncol(y)).

Starting values for the parameters must be provided. In most cases these will be the result of a previous call to emNorm or mcmcNorm. If the starting values are close to the mode (i.e., if they are the result of an EM run that converged) then the worst linear function of the parameters will be saved at each iteration. If the starting values are the result of a previous run of MCMC, then the new run will be a continuation of the same Markov chain.

If multicycle=k for some k>1, then the length of the saved parameter series will be reduced by a factor of k, and the serial correlation in the series will also be reduced. This option is useful in large problems with many parameters and in slowly converging problems for which many iterations are needed.

norm2 functions use their own internal random number generator which is seeded by two integers, for example, seeds=c(123,456), which allows results to be reproduced in the future. If seeds=NULL then the function will seed itself with two random integers from R. Therefore, results can also be made reproducible by calling set.seed beforehand and taking seeds=NULL.

If prior="invwish" then an inverted Wishart prior distribution is applied to sigma with hyperparameters prior.df (a scalar) and prior.sscp (a symmetric, positive definite matrix of the same dimension as sigma). Using the device of imaginary results, we can interpret prior.sscp/prior.df as a prior guess for sigma, and prior.df as the prior degrees of freedom on which this guess is based.

The usual noninformative prior for the normal regression model (prior="jeffreys") is equivalent to the inverted Wishart density with prior.df equal to 0 and prior.sscp equal to a matrix of 0's.

The ridge prior (prior="ridge") is a special case of the inverted Wishart (Schafer, 1997). The prior guess for sigma is a diagonal matrix with diagonal elements estimated by regressing the observed values in each column of y on the corresponding rows of x. When prior="ridge", the user must supply a value for prior.df, which determines how strongly the estimated correlations are smoothed toward zero.

If the first argument to mcmcNorm is an object of class "norm", then the parameter values stored in that object will automatically be used as starting values.

For details of the MCMC algorithm, see the manual distributed with the NORM package in the subdirectory doc.

a list whose class attribute has been set to "norm". This object may be passed as the first argument in subsequent calls to emNorm, mcmcNorm, impNorm, loglikNorm or logpostNorm. The object also carries the original data and specifies the prior distribution, so that these do not need to be provided again.

To see a summary of this object, use the generic function summary, which passes the object to summary.norm.

Components of the list may also be directly accessed and examined by the user. Components which may be of interest include:

`iter`	number of MCMC iterations performed.
`param`	a list with elements `beta` and `sigma` containing the estimated parameters after the final iteration of MCMC. This may be supplied as starting values to `emNorm` or `mcmcNorm`, or as an argument to `impNorm`, `loglikNorm` or `logpostNorm`.
`loglik`	a numeric vector of length `iter` reporting the logarithm of the observed-data likelihood function at the start of each iteration.
`logpost`	a numeric vector of length `iter` reporting the logarithm of the observed-data posterior density function at the start of each iteration.
`series.worst`	a time-series object (class `"ts"`) which contains the simulated values of the worst linear function of the parameters from all iterations. This will be present if the starting values provided to `mcmcNorm` were close enough to the mode to provide a reliable estimate of the worst linear function. The dependence in this series tends to be higher than for other parameters, so examining the dependence by plotting the series with `plot` or its autocorrelation function with `acf` may help the user to judge how quickly the Markov chain achieves stationarity. For the definition of the worst linear function, see the manual accompanying the NORM package in the subdirectory `doc`.
`series.beta`	a multivariate time-series object (class `"ts"`) which contains the simulated values of the coefficients `beta` from all iterations. This will present if `save.all.series=TRUE`.
`series.sigma`	a multivariate time-series object (class `"ts"`) which contains the simulated values of the variances and covariances (elements of the lower triangle of `sigma`) from all iterations. This will be present if `save.all.series=TRUE`.
`imp.list`	a list containing the multiple imputations. Each component of this list is a data matrix resembling `y`, but with `NA`'s replaced by imputed values. The length of the list depends on the values of `iter` and `impute.every`.
`miss.patt`	logical matrix with `ncol(y)` columns reporting the missingness patterns seen in `y`. Each row of `miss.patt` corresponds to a distinct missingness pattern, with `TRUE` indicating that the variable is missing and `FALSE` indicating that the variable is observed.
`miss.patt.freq`	integer vector of length `nrow(miss.patt)` indicating, for each missingness pattern, the number of cases or rows of `y` having that pattern.
`which.patt`	integer vector of length `nrow(y)` indicating the missingness pattern for each row of `y`. Thus `is.na( y[i,] )` is the same thing as `miss.patt[ which.patt[i], ]`.

Joe Schafer Joseph.L.Schafer@census.gov

Schafer, J.L. (1997) Analysis of Incomplete Multivariate Data. London: Chapman & Hall/CRC Press.

For more information about this function and other functions in the NORM package, see User's Guide for norm2 in the library subdirectory doc.

emNorm, summary.norm, impNorm, loglikNorm, logpostNorm

## run EM for marijuana data with ridge prior
data(marijuana)
emResult <- emNorm(marijuana, prior="ridge", prior.df=0.5)

## run MCMC for 5,000 iterations starting from the 
## posterior mode using the same prior
mcmcResult <- mcmcNorm(emResult, iter=5000)

## summarize and plot worst linear function
summary(mcmcResult)
plot(mcmcResult$series.worst)
acf(mcmcResult$series.worst, lag.max=50)

## generate 25 multiple imputations, taking 
## 100 steps between imputations, and look st
## the first imputed dataset
mcmcResult <- mcmcNorm(emResult, iter=2500, impute.every=100)
mcmcResult$imp.list[[1]]