Variational Bayes
Description
The VariationalBayes
function is a numerical approximation
method for deterministically estimating the marginal posterior
distributions, target distributions, in a Bayesian model with
approximated distributions by minimizing the KullbackLeibler
Divergence (KLD
) between the target and its
approximation.
Usage
1 2 3  VariationalBayes(Model, parm, Data, Covar=NULL, Interval=1.0E6,
Iterations=1000, Method="Salimans2", Samples=1000, sir=TRUE,
Stop.Tolerance=1.0E5, CPUs=1, Type="PSOCK")

Arguments
Model 
This required argument receives the model from a
userdefined function. The userdefined function is where the model
is specified. 
parm 
This argument requires a vector of initial values equal in
length to the number of parameters. 
Data 
This required argument accepts a list of data. The list of
data must include 
Covar 
This argument defaults to 
Interval 
This argument receives an interval for estimating approximate gradients. The logarithm of the unnormalized joint posterior density of the Bayesian model is evaluated at the current parameter value, and again at the current parameter value plus this interval. 
Iterations 
This argument accepts an integer that determines the
number of iterations that 
Method 
This optional argument currently accepts only

Samples 
This argument indicates the number of posterior samples
to be taken with sampling importance resampling via the

sir 
This logical argument indicates whether or not Sampling
Importance Resampling (SIR) is conducted via the 
Stop.Tolerance 
This argument accepts any positive number and
defaults to 1.0E3. Tolerance is calculated each iteration, and the
criteria varies by algorithm. The algorithm is considered to have
converged to the userspecified 
CPUs 
This argument accepts an integer that specifies the number
of central processing units (CPUs) of the multicore computer or
computer cluster. This argument defaults to 
Type 
This argument specifies the type of parallel processing to
perform, accepting either 
Details
Variational Bayes (VB) is a family of numerical approximation algorithms that is a subset of variational inference algorithms, or variational methods. Some examples of variational methods include the meanfield approximation, loopy belief propagation, treereweighted belief propagation, and expectation propagation (EP).
Variational inference for probabilistic models was introduced in the field of machine learning, influenced by statistical physics literature (Saul et al., 1996; Saul and Jordan, 1996; Jaakkola, 1997). The meanfield methods in Neal and Hinton (1999) led to variational algorithms.
Variational inference algorithms were later generalized for conjugate exponentialfamily models (Attias, 1999, 2000; Wiegerinck, 2000; Ghahramani and Beal, 2001; Xing et al., 2003). These algorithms still require different designs for different model forms. Salimans and Knowles (2013) introduced generalpurpose VB algorithms for Gaussian posteriors.
A VB algorithm deterministically estimates the marginal posterior
distributions (target distributions) in a Bayesian model with
approximated distributions by minimizing the KullbackLeibler
Divergence (KLD
) between the target and its
approximation. The complicated posterior distribution is approximated
with a simpler distribution. The simpler, approximated distribution is
called the variational approximation, or approximation distribution,
of the posterior. The term variational is derived from the calculus of
variations, and regards optimization algorithms that select the best
function (which is a distribution in VB), rather than merely selecting
the best parameters.
VB algorithms often use Gaussian distributions as approximating distributions. In this case, both the mean and variance of the parameters are estimated.
Usually, a VB algorithm is slower to convergence than a Laplace Approximation algorithm, and faster to convergence than a Monte Carlo algorithm such as Markov chain Monte Carlo (MCMC). VB often provides solutions with comparable accuracy to MCMC in less time. Though Monte Carlo algorithms provide a numerical approximation to the exact posterior using a set of samples, VB provides a locallyoptimal, exact analytical solution to an approximation of the posterior. VB is often more applicable than MCMC to big data or largedimensional models.
Since VB is deterministic, it is asymptotic and subject to the same limitations with respect to sample size as Laplace Approximation. However, VB estimates more parameters than Laplace Approximation, such as when Laplace Approximation optimizes the posterior mode of a Gaussian distribution, while VB optimizes both the Gaussian mean and variance.
Traditionally, VB algorithms required customized equations. The
VariationalBayes
function uses generalpurpose algorithms. A
generalpurpose VB algorithm is less efficient than an algorithm
custom designed for the model form. However, a generalpurpose
algorithm is applied consistently and easily to numerous model forms.
When Method="Salimans2"
, the second algorithm of Salimans and
Knowles (2013) is used. This requires the gradient and Hessian, which
is more efficient with a small number of parameters as long as the
posterior is twice differentiable. The step size is constant. This
algorithm is suitable for marginal posterior distributions that are
Gaussian and unimodal. A stochastic approximation algorithm is used
in the context of fixedform VB, inspired by considering fixedform VB
to be equivalent to performing a linear regression with the sufficient
statistics of the approximation as independent variables and the
unnormalized logarithm of the joint posterior density as the dependent
variable. The number of requested iterations should be large, since the
stepsize decreases for larger requested iterations, and a small
stepsize will eventually converge. A large number of requested
iterations results in a smaller stepsize and better convergence
properties, so hope for early convergence. However convergence is
checked only in the last half of the iterations after the algorithm
begins to average the mean and variance from the samples of the
stochastic approximation. The history of stochastic samples is
returned.
Value
VariationalBayes
returns an object of class vb
that is a list with the following components:
Call 
This is the matched call of 
Converged 
This is a logical indicator of whether or not

Covar 
This is the estimated covariance matrix. The

Deviance 
This is a vector of the iterative history of the
deviance in the 
History 
This is an array of the iterative history of the
parameters in the 
Initial.Values 
This is the vector of initial values that was
originally given to 
LML 
This is an approximation of the logarithm of the marginal
likelihood of the data (see the 
LP.Final 
This reports the final scalar value for the logarithm of the unnormalized joint posterior density. 
LP.Initial 
This reports the initial scalar value for the logarithm of the unnormalized joint posterior density. 
Minutes 
This is the number of minutes that

Monitor 
When 
Posterior 
When 
Step.Size.Final 
This is the final, scalar 
Step.Size.Initial 
This is the initial, scalar 
Summary1 
This is a summary matrix that summarizes the pointestimated posterior means and variances. Uncertainty around the posterior means is estimated from the estimated covariance matrix. Rows are parameters. The following columns are included: Mean, SD (Standard Deviation), LB (Lower Bound), and UB (Upper Bound). The bounds constitute a 95% probability interval. 
Summary2 
This is a summary matrix that summarizes the
posterior samples drawn with sampling importance resampling
( 
Tolerance.Final 
This is the last 
Tolerance.Stop 
This is the 
Author(s)
Statisticat, LLC software@bayesianinference.com
References
Attias, H. (1999). "Inferring Parameters and Structure of Latent Variable Models by Variational Bayes". In Uncertainty in Artificial Intelligence.
Attias, H. (2000). "A Variational Bayesian Framework for Graphical Models". In Neural Information Processing Systems.
Ghahramani, Z. and Beal, M. (2001). "Propagation Algorithms for Variational Bayesian Learning". In Neural Information Processing Systems, p. 507–513.
Jaakkola, T. (1997). "Variational Methods for Inference and Estimation in Graphical Models". PhD thesis, Massachusetts Institute of Technology.
Salimans, T. and Knowles, D.A. (2013). "FixedForm Variational Posterior Approximation through Stochastic Linear Regression". Bayesian Analysis, 8(4), p. 837–882.
Neal, R. and Hinton, G. (1999). "A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants". In Learning in Graphical Models, p. 355–368. MIT Press, 1999.
Saul, L. and Jordan, M. (1996). "Exploiting Tractable Substructures in Intractable Networks". Neural Information Processing Systems.
Saul, L., Jaakkola, T., and Jordan, M. (1996). "Mean Field Theory for Sigmoid Belief Networks". Journal of Artificial Intelligence Research, 4, p. 61–76.
Wiegerinck, W. (2000). "Variational Approximations Between Mean Field Theory and the Junction Tree Algorithm". In Uncertainty in Artificial Intelligence.
Xing, E., Jordan, M., and Russell, S. (2003). "A Generalized Mean Field Algorithm for Variational Inference in Exponential Families". In Uncertainty in Artificial Intelligence.
See Also
BayesFactor
,
IterativeQuadrature
,
LaplaceApproximation
,
LaplacesDemon
,
GIV
,
LML
,
PMC
, and
SIR
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70  # The accompanying Examples vignette is a compendium of examples.
#################### Load the LaplacesDemon Library #####################
library(LaplacesDemon)
############################## Demon Data ###############################
data(demonsnacks)
y < log(demonsnacks$Calories)
X < cbind(1, as.matrix(log(demonsnacks[,10]+1)))
J < ncol(X)
for (j in 2:J) X[,j] < CenterScale(X[,j])
######################### Data List Preparation #########################
mon.names < "mu[1]"
parm.names < as.parm.names(list(beta=rep(0,J), sigma=0))
pos.beta < grep("beta", parm.names)
pos.sigma < grep("sigma", parm.names)
PGF < function(Data) {
beta < rnorm(Data$J)
sigma < runif(1)
return(c(beta, sigma))
}
MyData < list(J=J, PGF=PGF, X=X, mon.names=mon.names,
parm.names=parm.names, pos.beta=pos.beta, pos.sigma=pos.sigma, y=y)
########################## Model Specification ##########################
Model < function(parm, Data)
{
### Parameters
beta < parm[Data$pos.beta]
sigma < interval(parm[Data$pos.sigma], 1e100, Inf)
parm[Data$pos.sigma] < sigma
### LogPrior
beta.prior < sum(dnormv(beta, 0, 1000, log=TRUE))
sigma.prior < dhalfcauchy(sigma, 25, log=TRUE)
### LogLikelihood
mu < tcrossprod(Data$X, t(beta))
LL < sum(dnorm(Data$y, mu, sigma, log=TRUE))
### LogPosterior
LP < LL + beta.prior + sigma.prior
Modelout < list(LP=LP, Dev=2*LL, Monitor=mu[1],
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)
}
############################ Initial Values #############################
#Initial.Values < GIV(Model, MyData, PGF=TRUE)
Initial.Values < rep(0,J+1)
#Fit < VariationalBayes(Model, Initial.Values, Data=MyData, Covar=NULL,
# Iterations=1000, Method="Salimans2", Stop.Tolerance=1e3, CPUs=1)
#Fit
#print(Fit)
#PosteriorChecks(Fit)
#caterpillar.plot(Fit, Parms="beta")
#plot(Fit, MyData, PDF=FALSE)
#Pred < predict(Fit, Model, MyData, CPUs=1)
#summary(Pred, Discrep="ChiSquare")
#plot(Pred, Style="Covariates", Data=MyData)
#plot(Pred, Style="Density", Rows=1:9)
#plot(Pred, Style="Fitted")
#plot(Pred, Style="JarqueBera")
#plot(Pred, Style="Predictive Quantiles")
#plot(Pred, Style="Residual Density")
#plot(Pred, Style="Residuals")
#Levene.Test(Pred)
#Importance(Fit, Model, MyData, Discrep="ChiSquare")
#Fit$Covar is scaled (2.38^2/d) and submitted to LaplacesDemon as Covar.
#Fit$Summary[,1] is submitted to LaplacesDemon as Initial.Values.
#End
