# marginalLikelihood: Calcluated the marginal likelihood from a set of MCMC samples In BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics

## Description

Calcluated the marginal likelihood from a set of MCMC samples

## Usage

 1 marginalLikelihood(sampler, numSamples = 1000, method = "Chib", ...) 

## Arguments

 sampler an MCMC or SMC sampler or list, or for method "Prior" also a BayesianSetup numSamples number of samples to use. How this works, and if it requires recalculating the likelihood, depends on the method method method to choose. Currently available are "Chib" (default), the harmonic mean "HM", sampling from the prior "Prior", and bridge sampling "Bridge". See details ... further arguments passed to getSample

## Details

The marginal likelihood is the average likelihood across the prior space. It is used, for example, for Bayesian model selection and model averaging.

It is defined as

ML = \int L(Θ) p(Θ) dΘ

Given that MLs are calculated for each model, you can get posterior weights (for model selection and/or model averaging) on the model by

P(M_i|D) = ML_i * p(M_i) / (∑_i ML_i * p(M_i) )

In BT, we return the log ML, so you will have to exp all values for this formula.

It is well-known that the ML is VERY dependent on the prior, and in particular the choice of the width of uninformative priors may have major impacts on the relative weights of the models. It has therefore been suggested to not use the ML for model averaging / selection on uninformative priors. If you have no informative priors, and option is to split the data into two parts, use one part to generate informative priors for the model, and the second part for the model selection. See help for an example.

The marginalLikelihood function currently implements four ways to calculate the marginal likelihood. Be aware that marginal likelihood calculations are notoriously prone to numerical stability issues. Especially in high-dimensional parameter spaces, there is no guarantee that any of the implemented algorithms will converge reasonably fast. The recommended (and default) method is the method "Chib" (Chib and Jeliazkov, 2001), which is based on MCMC samples, with a limited number of additional calculations. Despite being the current recommendation, note there are some numeric issues with this algorithm that may limit reliability for larger dimensions.

The harmonic mean approximation, is implemented only for comparison. Note that the method is numerically unrealiable and usually should not be used.

The third method is simply sampling from the prior. While in principle unbiased, it will only converge for a large number of samples, and is therefore numerically inefficient.

The Bridge method uses bridge sampling as implemented in the R package "bridgesampling". It is potentially more exact than the Chib method, but might require more computation time. However, this may be very dependent on the sampler.

## Value

A list with log of the marginal likelihood, as well as other diagnostics depending on the chose method

Florian Hartig

## References

Chib, Siddhartha, and Ivan Jeliazkov. "Marginal likelihood from the Metropolis-Hastings output." Journal of the American Statistical Association 96.453 (2001): 270-281.

Dormann et al. 2018. Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs

WAIC, DIC, MAP
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 ############################################################## # Comparison of ML for two regression models # Creating test data with quadratic relationship sampleSize = 30 x <- (-(sampleSize-1)/2):((sampleSize-1)/2) y <- 1 * x + 1*x^2 + rnorm(n=sampleSize,mean=0,sd=10) # plot(x,y, main="Test Data") # likelihoods for linear and quadratic model likelihood1 <- function(param){ pred = param[1] + param[2]*x + param[3] * x^2 singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[4]^2), log = TRUE) return(sum(singlelikelihoods)) } likelihood2 <- function(param){ pred = param[1] + param[2]*x singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[3]^2), log = TRUE) return(sum(singlelikelihoods)) } setUp1 <- createBayesianSetup(likelihood1, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30)) setUp2 <- createBayesianSetup(likelihood2, lower = c(-5,-5,0.01), upper = c(5,5,30)) out1 <- runMCMC(bayesianSetup = setUp1) M1 = marginalLikelihood(out1, start = 1000) out2 <- runMCMC(bayesianSetup = setUp2) M2 = marginalLikelihood(out2, start = 1000) ### Calculating Bayes factor exp(M1$ln.ML - M2$ln.ML) # BF > 1 means the evidence is in favor of M1. See Kass, R. E. & Raftery, A. E. # (1995) Bayes Factors. J. Am. Stat. Assoc., Amer Statist Assn, 90, 773-795. ### Calculating Posterior weights exp(M1$ln.ML) / ( exp(M1$ln.ML) + exp(M2$ln.ML)) # If models have different model priors, multiply with the prior probabilities of each model. ## Not run: ############################################################# # Fractional Bayes factor # Motivation: ML is very dependent on the prior, which is a problem if you # have uninformative priors. you can see this via rerunning the upper # example with changed priors - suddenly, support for M1 is gone setUp1 <- createBayesianSetup(likelihood1, lower = c(-500,-500,-500,0.01), upper = c(500,500,500,3000)) setUp2 <- createBayesianSetup(likelihood2, lower = c(-500,-500,0.01), upper = c(500,500,3000)) out1 <- runMCMC(bayesianSetup = setUp1) M1 = marginalLikelihood(out1, start = 1000) out2 <- runMCMC(bayesianSetup = setUp2) M2 = marginalLikelihood(out2, start = 1000) ### Calculating Bayes factor exp(M1$ln.ML - M2$ln.ML) # it has therefore been suggested that ML should not be calculated on uninformative priors. But # what to do if there are no informative priors? # one option is to calculate the fractional BF, which means that one splites the data in half, # uses the first half to fit the model, and then use the posterior as a new (now informative) # prior for the ML - let's do this for the previous case # likelihoods with half the data likelihood1 <- function(param){ pred = param[1] + param[2]*x + param[3] * x^2 singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[4]^2), log = TRUE) return(sum(singlelikelihoods[seq(1, 30, 2)])) } likelihood2 <- function(param){ pred = param[1] + param[2]*x singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[3]^2), log = TRUE) return(sum(singlelikelihoods[seq(1, 30, 2)])) } setUp1 <- createBayesianSetup(likelihood1, lower = c(-500,-500,-500,0.01), upper = c(500,500,500,3000)) setUp2 <- createBayesianSetup(likelihood2, lower = c(-500,-500,0.01), upper = c(500,500,3000)) out1 <- runMCMC(bayesianSetup = setUp1) out2 <- runMCMC(bayesianSetup = setUp2) newPrior1 = createPriorDensity(out1, start = 200, lower = c(-500,-500,-500,0.01), upper = c(500,500,500,3000)) newPrior2 = createPriorDensity(out2, start = 200, lower = c(-500,-500,0.01), upper = c(500,500,3000)) # now rerun this with likelihoods for the other half of the data and new prior likelihood1 <- function(param){ pred = param[1] + param[2]*x + param[3] * x^2 singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[4]^2), log = TRUE) return(sum(singlelikelihoods[seq(2, 30, 2)])) } likelihood2 <- function(param){ pred = param[1] + param[2]*x singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[3]^2), log = TRUE) return(sum(singlelikelihoods[seq(2, 30, 2)])) } setUp1 <- createBayesianSetup(likelihood1, prior = newPrior1) setUp2 <- createBayesianSetup(likelihood2, prior = newPrior2) out1 <- runMCMC(bayesianSetup = setUp1) M1 = marginalLikelihood(out1, start = 1000) out2 <- runMCMC(bayesianSetup = setUp2) M2 = marginalLikelihood(out2, start = 1000) ### Calculating the fractional Bayes factor exp(M1$ln.ML - M2$ln.ML) ## End(Not run) ############################################################ ### Performance comparison ### # Low dimensional case with narrow priors - all methods have low error # we use a truncated normal for the likelihood to make sure that the density # integrates to 1 - makes it easier to calcuate the theoretical ML likelihood <- function(x) sum(msm::dtnorm(x, log = TRUE, lower = -1, upper = 1)) prior = createUniformPrior(lower = rep(-1,2), upper = rep(1,2)) bayesianSetup <- createBayesianSetup(likelihood = likelihood, prior = prior) out = runMCMC(bayesianSetup = bayesianSetup, settings = list(iterations = 5000)) # plot(out) # theoretical value theory = log(1/(2^2)) marginalLikelihood(out)$ln.ML - theory marginalLikelihood(out, method = "Prior", numSamples = 500)$ln.ML - theory marginalLikelihood(out, method = "HM", numSamples = 500)$ln.ML - theory marginalLikelihood(out, method = "Bridge", numSamples = 500)$ln.ML - theory # higher dimensions - wide prior - HM and Prior don't work likelihood <- function(x) sum(msm::dtnorm(x, log = TRUE, lower = -10, upper = 10)) prior = createUniformPrior(lower = rep(-10,3), upper = rep(10,3)) bayesianSetup <- createBayesianSetup(likelihood = likelihood, prior = prior) out = runMCMC(bayesianSetup = bayesianSetup, settings = list(iterations = 5000)) # plot(out) # theoretical value theory = log(1/(20^3)) marginalLikelihood(out)$ln.ML - theory marginalLikelihood(out, method = "Prior", numSamples = 500)$ln.ML - theory marginalLikelihood(out, method = "HM", numSamples = 500)$ln.ML - theory marginalLikelihood(out, method = "Bridge", numSamples = 500)\$ln.ML - theory