Description Usage Arguments Details Value Author(s) References See Also Examples
The PMC
function updates a model with Population Monte Carlo.
Given a model specification, data, and initial values, PMC
maximizes the logarithm of the unnormalized joint
posterior density and provides samples of the marginal
posterior distributions, deviance, and other monitored variables.
1 2 |
Model |
This is a model specification function. For more
information, see |
Initial.Values |
This is either a vector initial values, one for
each of K parameters, or in the case of a mixture of M
components, this is a M x K matrix of initial
values. If all initial values are zero in this vector, or in the
first row of a matrix, then |
Data |
This is a list of data. For more information, see
|
Covar |
This is a K x K covariance matrix for
K parameters, or for multiple mixture components, this is a
K x K x M array of M covariance
matrices, where M is the number of mixture components.
|
Iterations |
This is the number of iterations during which PMC will update the model. Updating the model for only one iteration is the same as applying non-adaptive importance sampling. |
Thinning |
This is the number by which the posterior is
thinned. To have 1,000 posterior samples with |
alpha |
This is a vector of length M, the number of mixture
components. alpha is the probability of each mixture
component. The default value is |
M |
This is the number M of multivariate t distribution mixture components. |
N |
This is the number N of samples per mixture component. The required number of samples increases with the number K of parameters. These samples are also called walkers or particles. |
nu |
This is the degrees of freedom parameter nu for the multivariate t distribution for each mixture component. If a multivariate normal distribution is preferred, then set nu > 1e4. |
CPUs |
This argument is required for parallel processing, and
and indicates the number of central processing units (CPUs) of the
computer or cluster. For example, when a user has a quad-core
computer, |
Type |
This argument defaults to |
The PMC
function uses the adaptive importance sampling
algorithm of Wraith et al. (2009), also called Mixture PMC or M-PMC
(Cappe et al., 2008). Iterative adaptive importance sampling was
introduced in the 1980s. Modern PMC was introduced (Cappe et al.,
2004), and extended to multivariate Gaussian or t-distributed mixtures
(Cappe et al., 2008). This version uses a multivariate t distribution
for each mixture component, and also allows a multivariate normal
distribution when the degrees of freedom, nu >
1e4. At each iteration, a mixture distribution is sampled with
importance sampling, and the samples (or populations) are adapted to
improve the importance sampling. Adaptation is a variant of EM
(Expectation-Maximization). The sample is self-normalized, and is an
example of self-normalized importance sampling (SNIS), or
self-importance sampling. The vector alpha contains the
probability of each mixture component. These, as well as multivariate
t distribution mixture parameters (except nu), are adapted
each iteration.
Advantages of PMC over MCMC include:
It is difficult to assess convergence of MCMC chains, and this is not necessary in PMC (Wraith et al., 2009).
MCMC chains have autocorrelation that effectively reduces posterior samples. PMC produces independent samples that are not reduced with autocorrelation.
PMC has been reported to produce samples with less variance than MCMC.
It is difficult to parallelize MCMC. Posterior samples from parallel chains can be pooled when all chains have converged, but until this occurs, parallelization is unhelpful. PMC, on the other hand, can parallelize the independent, Monte Carlo samples during each iteration and reduce run-time as the number of processors increases. Currently, PMC is not parallelized here.
The multivariate mixture in PMC can represent a multimodal posterior, where MCMC with parallel chains may be used to identify a multimodal posterior, but probably will not yield combined samples that proportionally represent it.
Disadvantages of PMC, compared to MCMC, include:
In PMC, the required number of samples at each iteration increases quickly with respect to an increase in parameters. MCMC is more suitable for models with large numbers of parameters, and therefore, MCMC is more generalizable.
PMC is more sensitive to initial values than MCMC, especially as the number of parameters increases.
PMC is more sensitive to the initial covariance matrix (or
matrices for mixture components) than adaptive MCMC. PMC requires more
information about the target distributions before updating. The
covariance matrix from a converged iterative quadrature algorithm,
Laplace Approximation, or Variational Bayes may be required (see
IterativeQuadrature
, LaplaceApproximation
,
or VariationalBayes
for more information).
Since PMC requires better initial information than iterative quadrature, Laplace Approximation, MCMC, and Variational Bayes, it is not recommended to begin updating a model that has little prior information with PMC, especially when the model has more than a few parameters. Instead, iterative quadrature, Laplace Approximation, MCMC, or Variational Bayes should be used. However, once convergence is found or assumed, it is recommended to attempt to update the model with PMC, given the latest parameters and convariance matrix from iterative quadrature, Laplace Approximation, MCMC, or Variational Bayes. Used in this way, PMC may improve the model fit obtained with MCMC and should reduce the variance of the marginal posterior distributions, which is desirable for predictive modeling.
Convergence is assessed by observing two outputs: normalized effective
sample size (ESSN
) and normalized perplexity
(Perplexity
). These are described below. PMC is considered to
have converged when these diagnostics stabilize (Wraith et al., 2009),
or when the normalized perplexity becomes sufficiently close to 1
(Cappe et al., 2008). If they do not stabilize, then it is suggested
to begin PMC again with a larger number N
of samples, and
possibly with different initial values and covariance matrix or
matrices. IterativeQuadrature
,
LaplaceApproximation
, or VariationalBayes
may be helpful to provide better starting values for PMC
.
If a message appears that warns about ‘bad weights’, then PMC
is attempting to work with an iteration in which importance weights
are problematic. If this occurs in the first iteration, then all
importance weights are set to 1/N. If this occurs in other
iterations, then the information from the previous iteration is used
instead and different draws are made from that importance
distribution. This may allow PMC
to eventually adapt
successfully to the target. If not, the user is advised to begin again
with a larger number N of samples, and possibly different
initial values and covariance matrix or matrices, as above. PMC can
experience difficulty when it begins with poor initial conditions.
The user may combine samples from previous iterations with samples from the latest iteration for inference, if the algorithm converged before the last iteration. Currently, a function is not provided for combining previous samples.
The returned object is an object of class pmc
with the
following components:
alpha |
This is a M x T matrix of the probabilities of mixture components, where M is the number of mixture components and T is the number of iterations. |
Call |
This is the matched call of |
Covar |
This stores the K x K x T x M proposal covariance matrix in an array, where K is the dimension or number of parameters or initial values, T is the number of iterations, and M is the number of mixture components. If the model is updated in the future, then the latest covariance matrix for each mixture component can be extracted and used to start the next update where the last update left off. |
Deviance |
This is a vector of the deviance of the model, with a length equal to the number of thinned samples that were retained. Deviance is useful for considering model fit, and is equal to the sum of the log-likelihood for all rows in the data set, which is then multiplied by negative two. |
DIC |
This is a vector of three values: Dbar, pD, and DIC. Dbar
is the mean deviance, pD is a measure of model complexity indicating
the effective number of parameters, and DIC is the Deviance
Information Criterion, which is a model fit statistic that is the
sum of Dbar and pD. |
ESSN |
This is a vector of length T that contains the normalized effective sample size (ESSN) per iteration across T iterations. ESSN is used as a convergence diagnostic. ESSN is normalized between zero and one, and can be interpreted as the proportion of samples with non-zero weight. Higher is better. |
Initial.Values |
This is the vector or matrix of
|
Iterations |
This reports the number of |
LML |
This is an approximation of the logarithm of the marginal
likelihood of the data (see the |
M |
This reports the number of mixture components. |
Minutes |
This indicates the number of minutes that |
Model |
This contains the model specification |
N |
This is the number of un-thinned samples per mixture component. |
itemnuThis is the degrees of freedom parameter nu for each multivariate t distribution in each mixture component.
Mu |
This is a T x K x M array of means for the importance sampling distribution across T iterations, K parameters, and M mixture components. |
Monitor |
This is a S x J matrix of thinned samples of monitored variables, where S is the number of thinned samples and J is the number of monitored variables. |
Parameters |
This reports the number K of parameters. |
Perplexity |
This is a vector of length T that contains the
normalized perplexity per iteration across T iterations, and
is used as a convergence diagnostic. Perplexity is an approximation
of the negative of the Kullback-Leibler divergence (see
|
Posterior1 |
This is an N x K x T x M array of un-thinned posterior samples across N samples, K parameters, T iterations, and M mixture components. |
Posterior2 |
This is a S x K matrix of thinned posterior samples, where S is the number of thinned samples and K is the number of parameters. |
Summary |
This is a matrix that summarizes the marginal
posterior distributions of the parameters, deviance, and monitored
variables from thinned samples. The following summary statistics are
included: mean, standard deviation, MCSE (Monte Carlo Standard
Error), ESS is the effective sample size due to autocorrelation, and
finally the 2.5%, 50%, and 97.5% quantiles are reported. MCSE is
essentially a standard deviation around the marginal posterior mean
that is due to uncertainty associated with using Monte Carlo
sampling. The acceptable size of the MCSE depends on the acceptable
uncertainty associated around the marginal posterior mean. The
default |
Thinned.Samples |
This is the number of thinned samples in
|
Thinning |
This is the amount of thinning requested by the user. |
W |
This is a N x T matrix of normalized importance weights, where N is the number of un-thinned samples per mixture component and T is the number of iterations. Computationally, the algorithm uses the logarithm of the weights. |
Statisticat, LLC. software@bayesian-inference.com
Cappe, O., Douc, R., Guillin, A., Marin, J.M., and Robert, C. (2008). "Adaptive Importance Sampling in General Mixture Classes". Statistics and Computing, 18, p. 587–600.
Cappe, O., Guillin, A., Marin, J.M., and Robert, C. (2004). "Population Monte Carlo". Journal of Computational and Graphical Statistics, 13, p. 907–929.
Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2004). "Bayesian Data Analysis, Texts in Statistical Science, 2nd ed.". Chapman and Hall, London.
Wraith, D., Kilbinger, M., Benabed, K., Cappe, O., Cardoso, J.F., Fort, G., Prunet, S., and Robert, C.P. (2009). "Estimation of Cosmological Parameters Using Adaptive Importance Sampling". Physical Review D, 80(2), p. 023507.
BayesFactor
,
IterativeQuadrature
,
LaplaceApproximation
,
LML
,
PMC.RAM
,
Thin
, and
VariationalBayes
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | # The accompanying Examples vignette is a compendium of examples.
#################### Load the LaplacesDemon Library #####################
library(LaplacesDemon)
############################## Demon Data ###############################
data(demonsnacks)
y <- log(demonsnacks$Calories)
X <- cbind(1, as.matrix(log(demonsnacks[,c(1,4,10)]+1)))
J <- ncol(X)
for (j in 2:J) X[,j] <- CenterScale(X[,j])
######################### Data List Preparation #########################
mon.names <- "LP"
parm.names <- as.parm.names(list(beta=rep(0,J), sigma=0))
pos.beta <- grep("beta", parm.names)
pos.sigma <- grep("sigma", parm.names)
PGF <- function(Data) {
beta <- rnorm(Data$J)
sigma <- runif(1)
return(c(beta, sigma))
}
MyData <- list(J=J, PGF=PGF, X=X, mon.names=mon.names,
parm.names=parm.names, pos.beta=pos.beta, pos.sigma=pos.sigma, y=y)
########################## Model Specification ##########################
Model <- function(parm, Data)
{
### Parameters
beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
### Log-Prior
beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE))
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
### Log-Likelihood
mu <- tcrossprod(Data$X, t(beta))
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)
}
set.seed(666)
############################ Initial Values #############################
Initial.Values <- GIV(Model, MyData, PGF=TRUE)
######################## Population Monte Carlo #########################
Fit <- PMC(Model, MyData, Initial.Values, Covar=NULL, Iterations=5,
Thinning=1, alpha=NULL, M=1, N=100, CPUs=1)
Fit
print(Fit)
PosteriorChecks(Fit)
caterpillar.plot(Fit, Parms="beta")
plot(Fit, BurnIn=0, MyData, PDF=FALSE)
Pred <- predict(Fit, Model, MyData, CPUs=1)
summary(Pred, Discrep="Chi-Square")
plot(Pred, Style="Covariates", Data=MyData)
plot(Pred, Style="Density", Rows=1:9)
plot(Pred, Style="ECDF")
plot(Pred, Style="Fitted")
plot(Pred, Style="Jarque-Bera")
plot(Pred, Style="Predictive Quantiles")
plot(Pred, Style="Residual Density")
plot(Pred, Style="Residuals")
Levene.Test(Pred)
Importance(Fit, Model, MyData, Discrep="Chi-Square")
#End
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.