Introduction

This package performs ANOVA-like Bayesian hypothesis tests for factorial designs for which cell means or differences between cells are estimated in a Bayesian model. The package name is an (excessive) acronym that expands to Cell Means Based Bayesian Hypothesis Tests.

You must have a factorial design, including single-factor designs, in which some dependent variable or model parameter is allowed to vary as a function of factor levels. Continuous independent variables are not supported.

You must have samples from the prior and posterior distributions of the parameters of interest.

This manual will first go through examples of some of the kinds of models that this package is designed to analyze. Then, basic usage of the package will be demonstrated. The steps of the procedure used to estimate Bayes factors are then shown. Finally, there are additional notes on how to use the package.

Version and Installation

This documentation is for version r packageVersion("CMBBHT") of this package. You can install this version of the package with the following two lines of code:

cat(knitr::knit_expand(text ='
install.packages("devtools")
devtools::install_github("hardmanko/CMBBHT@v{{ver}}", build_vignettes = TRUE)',
    ver = packageVersion("CMBBHT")))

Additional Documentation {#additionalDocumentation}

In addition to this document, which is the primary package documentation, there is an in-depth example of how to use this package called the Binomial Tutorial. If CMBBHT is installed, you can open the Binomial Tutorial vignette with the following line of code:

vignette("BinomialTutorial", package="CMBBHT")

There is a huge amount of relevant content in the Binomial Tutorial vignette that cannot be found in this document, so I would highly recommend reading both this document and the Binomial Tutorial.

The Appendix of @RickerHardman2017 contains an explanation of the logic of the hypothesis testing procedure used by this package that differs from the explanation in this document. The explanation in Ricker and Hardman is more conceptual and uses mathematical notation. The explanation here is still conceptual, but uses code more than math. The Appendix of Ricker and Hardman is included at the end of this document with some added notes.

Disclaimer

There are requirements for using this package that you as a user of this package need to be aware of. You are responsible for reading this manual, at least some of the referenced articles, and/or any other information that helps to clarify your understanding of what this package does. A failure to do so means that your analyses may be very wrong. That said, the approaches used by this package are fairly general, so you are probably ok at the $p < 0.05$ threshold.

Example Models

What follow are some examples of the kinds of models that you might have that you can use with this package. These examples are not meant to exclude other models. Keep an eye out for terminology while reading these examples.

Model 1: Basic Cell Means {#model1}

In this model, the design has some number of factors that are crossed in some way to get cells of the design. In each cell of the design, you have some number of replicates/measurements of some dependent variable. The model estimates the mean of each cell of the design. What follows is an example of a model for such a design, not a specification of what your model must be.

In the $i$th cell of the design, there are $K$ replicates of observed data, $y_{ik}$. They follow a normal distribution with a mean that is estimated in each cell, $\mu_i$, and a single, shared data variance, $\sigma^2_y$. The factorial nature of the design is ignored in this model specification, instead just treating each combination of factor levels as one cell of the design.

$$ y_{ik} \sim \text{Normal}(\mu_i, \sigma^2_y) $$

The cell means, $\mu_i$, and the data variance, $\sigma^2_y$, each have some fixed prior.

$$ \begin{aligned} \mu_i &\sim \text{Normal}(0, 10^4) \ \sigma^2_y &\sim \text{Inverse Gamma}(0.1, 0.1) \end{aligned} $$

This design is possibly workable, but it has the issue that the prior on the cell means is uninformative. The problem with uninformative priors is discussed further in the Effect of Uninformative Priors section. I prefer the structure of the next model, which is able to place moderately informative priors on the quantities of interest, which are typically the differences between cells rather than the means of the cells themselves.

Model 2: Grand Mean plus Cell Effects {#model2}

In this model, the data are the same as in Model 1. The model estimates a grand mean/intercept, $\mu$, (the exact interpretation of $\mu$ depends on other choices) and a cell effect, $\delta_i$, for each cell. To get the mean for each cell, you add the cell effect to the grand mean/intercept.

At the data level, this model is the same as in Model 1.

$$ y_{ik} \sim \text{Normal}(\mu_i, \sigma^2_y) $$

Now each cell mean is the sum of the intercept, $\mu$, and the cell effect, $\delta_i$. $$ \begin{aligned} \mu_i &= \mu + \delta_i \ \mu &\sim \text{Normal}(0, 10^4) \ \delta_i &\sim \text{Normal}(0, 2) \ \sigma^2_y &\sim \text{Inverse Gamma}(0.1, 0.1) \end{aligned} $$

In this case, you can use the cell effects ($\delta$s) for analysis, ignoring the intercept. This model has an advantage over the basic cell means model in that you can place an uninformative prior on the grand mean, which means that you don't need to guess the approximate magnitude of the mean of the data. You can, however, place reasonably informative priors on the $\delta$s. The informativeness of the priors on the $\delta$s are effectively your prior beliefs about how large the differences between the cells are likely to be.

Note that some constraint on the $\delta$s is required for this model to be identifiable, otherwise the mean of the $\delta$s and the value of $\mu$ will trade off. I like to use a cornerstone parameterization in which one of the $\delta$s is set to 0. In that case, $\mu$ is an intercept. You could also use a sums-to-zero constraint, in which case $\mu$ is a grand mean.

With a cornerstone parameterization, you are essentially changing the model as follows: $$ \begin{aligned} \delta_1 &\sim \text{Normal}(0, 0) ~\text{(i.e. a point mass at 0.)} \ \delta_i &\sim \text{Normal}(0, 2), ~i \neq 1 \end{aligned} $$

Thus, different $\delta$s have different priors. I see no problem with this, you just have to be careful that when you sample from the priors on the $\delta$s that you account for the different priors on different $\delta$s. Note that you do not need to use index 1 as the cornerstone condition. I have found that using the cell with the most data, if there is one, works the best.

I typically use a model like this with a cornerstone parameterization and it usually works well. Part of the value of this package is that it allows you to use a simple model like this one, that is not parameterized in terms of main effects and interactions, but after estimating the cell effects, $\delta_i$, you can test whether main effects or interactions are present. How to do those tests with this package will be discussed later.

Model 3: Full ANOVA Effects Model {#model3}

This model uses the same data as Model 1. While Model 1 ignored the factorial nature of the design, this model explicitly includes the factors. Assume that you have two factors called A and B. The model estimates a grand mean ($\mu$), a main effect of A ($\alpha_i$), a main effect of B ($\beta_j$), and the interaction between A and B ($\pi_{ij}$).

The data level is the same as the previous models

$$ y_{ik} \sim \text{Normal}(\mu_i, \sigma^2_y) $$

The cell mean is calculated with $$ \mu_i = \mu + \alpha_i + \beta_j + \pi_{ij} $$

which is like the standard setup for two-factor ANOVA.

The components of the cell mean have the following priors. $$ \begin{aligned} \mu &\sim \text{Normal}(0, 10^4) \ \alpha_i &\sim \text{Normal}(0, 2) \ \beta_j &\sim \text{Normal}(0, 4) \ \pi_{ij} &\sim \text{Normal}(0, 1) \ \sigma^2_y &\sim \text{Inverse Gamma}(0.1, 0.1) \end{aligned} $$

Notice how this model specification allows for informative priors on the effect parameters ($\alpha$, $\beta$, and $\pi$) while allowing for an uninformative prior on the grand mean, $\mu$.

A major advantage of a model of this kind over Model 2 is that it is possible to place different priors on the different effects, which allows you to state the prior expectation that you are looking for different sized effects of different factors. In this model, the prior on $\alpha_i$ is narrower than the prior on $\beta_j$, indicating that a smaller effect of the $\alpha$ factor is expected than of the $\beta$ factor. In addition, the prior on the interations terms, $pi_{ij}$, is the narrowest, indicating that the prior belief is that an interaction, if there is one, will be small.

If you have a model like this, you should be able to use this package fairly easily. Rather than using cell means for your analysis, you can directly use your estimates of the effect parameters, $\alpha$, $\beta$, and $\pi$. To test hypotheses about the existence of effects, you can pass the prior and posterior effect parameters to testFunction_SDDR or testFunction_EPA. This should make more sense once you read farther.

Note that like Model 2, you must provide some constraints on the effect parameters in order for this model to be identifiable. This gets complicated when you have multiple factors so I won't go into it. I don't typically parameterize my models like this because of the complexity of constraining the effect parameters. In my experience, hierarchical constraint is dead on arrival, so don't bother trying. You need a hard constraint, like a cornerstone or sums-to-zero, not a soft constraint.

Model 4: A Complex Model

For all of the models so far, you might be wondering why I didn't just use the BayesFactor package and do an ANOVA. For all of those models, you should just use the BayesFactor package. Where this package becomes useful is when your model is complex and there is a fair degree of separation between the parameters and the data. An example of such a model was used by @HardmanVergauweRicker2017. In that model, the data are distributed as a mixture of various distributions. If we are interested in a parameter $\phi$, a version of the model that leaves a lot out is something like

$$ \begin{aligned} p(y_{ij} | \phi_{ij}, \theta) &= \sum_k p_k f_k( \phi_{ij}, \theta ) \ \phi_{ij} &= T(\phi_i + \phi_j) \ \phi_i &\sim \text{Normal}(\mu_\phi, \sigma^2_\phi) \ \phi_j &\sim \text{Cauchy}(0, 2) \end{aligned} $$

Where $y_{ij}$ are the data for the $i$th participant in the $j$th task condition, $\phi_{ij}$ is a parameter of interest, $\theta$ are a bunch of other parameters, and $p_k$ and $f_k$ are mixture weights and density functions for the $k$th component of the model. $\phi_{ij}$, unfortunately, is some nonlinear transformation, $T()$, of the sum of a participant effect, $\phi_i$, and an effect of task condition, $\phi_j$.

Inference is focused on the effect of task condition, $\phi_j$. That is where this package shines: It can test whether $\phi_j$ varies with task condition without worrying about the whole structure of the model. All it needs are the prior and posterior samples of $\phi_j$. Models like Model 4 are why I wrote this package.

Basic Usage

TO demonstrate how to use this package while limiting complexity, I will work with a model like Model 2. In Model 2, differences from a cornerstone condition are estimated with cell effect parameters, $\delta_i$. I will use a design with two factors, called let and num. The let factor has three levels, a, b, and c. The num factor has two levels, 1 and 2. The design is fully crossed.

One of the things that you will need in order to use this package is a data.frame that contains the factor levels. Each column is a factor and each row contains a combination of factor levels that define a cell of the design. Note that factor names and factor levels may not contain period (".") or colon (":"). The following code creates the factors data.frame for future use.

factors = data.frame(
    let = c('a', 'a', 'b', 'b', 'c', 'c'),
    num = c('1', '2', '1', '2', '1', '2')
)
M2_sampleDeltas = function(mus, posteriorSD = 1, priorMu = 0, priorSD = 3, cornerstone = 1, samples = 10000) {

    mus = mus - mus[cornerstone]

    priorDelta = postDelta = matrix(0, nrow = samples, ncol=length(mus))

    for (i in 1:ncol(postDelta)) {
        if (i != cornerstone) {
            priorDelta[,i] = rnorm(nrow(priorDelta), priorMu, priorSD)
            postDelta[,i] = rnorm(nrow(postDelta), mus[i], posteriorSD)
        }
    }

    list(prior = priorDelta, post = postDelta)
}

Imagine that the cell means are 2 through 7 and that M2_sampleDeltas takes samples from the prior and posterior distributions of the $\delta_i$ parameters.

# Corresponds to the rows of factors
mus = c( 2,3,   4,5,   6,7 )

set.seed(143)
deltas = M2_sampleDeltas(mus)

deltas contains two matrices, prior and post, each of which has 6 columns, each corresponding to an $\delta$ parameter and r nrow(deltas$post) rows, each being a sample from the prior or posterior. Note that the order of the columns of the cell effects must correspond to the rows of factors.

names(deltas)
dim(deltas$post)

A plot the effect parameters with 95\% credible intervals will help with understanding what effects are present in the data.

library(LineChart)

#Put the data into a data.frame for plotting
cmdf = NULL
for (i in 1:ncol(deltas$post)) {
    temp = data.frame(let = factors$let[i], num = factors$num[i], delta = deltas$post[,i])
    cmdf = rbind(cmdf, temp)
}

credInt = function(x) {
    list(eb = quantile(x, c(0.025, 0.975)),
             includesCenter = TRUE)
}

lineChart(delta ~ let * num, cmdf, errBarType = credInt)

From looking at this plot, it seems like:

  1. There is a main effect of letters.
  2. There may or may not be a main effect of numbers.
  3. The lines are really parallel, so there is no interaction.

Let's test these hypotheses. First, the main effect of letters. To test this hypothesis, pass the prior and posterior cell means, the factors data.frame, and the effect to test, "let", to testHypothesis.

library(CMBBHT)

testHypothesis(deltas$prior, deltas$post, factors, "let")

With default settings, this returns a list with results from the test. As long as success is TRUE, we can examine bf10, which is the Bayes factor in favor of the hypothesis that there is a main effect of the let factor. This Bayes factor is far above 1, so there is evidence for a main effect of let.

The test of the main effect of num proceeds in the same way.

ht = testHypothesis(deltas$prior, deltas$post, factors, "num")
ht$bf10

This Bayes factor is ambiguous as it is near 1. This fits with how the figure looks.

For the interaction, you provide the names of the factors in the interaction with a colon (":") between factor names.

ht = testHypothesis(deltas$prior, deltas$post, factors, "let:num")
ht$bf10

This Bayes factor favors the null as it is well below 1, so there is no interaction. Again, this fits with the figure.

This concludes basic use of the package. With just a little setup and one function call, you are able to test hypotheses about main effects or interactions. This package, however, has more to offer, as I describe in the following sections.

For an in-depth tutorial that gives a concrete example of model specification and fitting, and generally provides more information about how to use this package in practice, see the BinomialTutorial vignette that comes with this package and is discussed further in the Additional Documentation section of this document.

Steps of the procedure

This section goes step-by-step through what needs to happen to perform a hypothesis test using the logic of this package. The point of this section is to help you understand how this package works. You will probably not need to understand this logic to use this package, but you might benefit from reading it anyway.

I am going to perform a test of the hypothesis that there is a main effect of the let factor for the same data that was just being used. The null hypothesis is that there is no main effect and the alternative hypothesis is that there is a main effect. If there is no main effect, there should be no difference between the levels of the let factor.

Effect Parameters

You can calculate matrices of effect parameters given cell means/differences and the factors data.frame with the function getEffectParameters. Here I calculate the posterior effects for the main effect of the let factor. A number of steps happen in getEffectParameters, but those steps are not important to understanding the concept of how the hypothesis testing procedure works, so I will skip them. The process of calculating effect parameters is mechanical and not controversial.

post_eff = getEffectParameters(deltas$post, factors, "let")
post_eff[1:5,]

Each column of post_eff corresponds to one ANOVA effect parameter. These effect parameters were calculated with a sums-to-zero constraint, which can be verified by observing that each row (sample) sums to 0. These effect parameters can be interpreted as the difference from the grand mean to each cell mean. From the plot of cell means above, it is clear that the a level of let has a lower value than average and that the c level of let has a higher value than average. This is captured by the effect parameters in post_eff.

Again, if there is no main effect, there should be no difference between the levels of the let factor and the effect parameters should be near 0. One informal way of answering this question is to examine the values of the effect parameters.

You can get means and credible intervals for the effect parameters with the summarizeEffectParameters function.

summary = summarizeEffectParameters(post_eff)
summary

You can also plot the summarized effects.

plotEffectParameterSummary(summary)

As you can see, there appears to be a main effect because the effect parameters are reasonably far from one another relative to their credible intervals.

If you don't like Bayes factors as a method of inference, you can still get something out of this package by using summaries of the posterior effect parameters. In addition, you can group posterior effects with pairwise comparisons with groupEffectParameters, which gives results that are conceptually similar to, e.g., the Tukey HSD procedure. By default, groupEffectParameters uses credible intervals of the difference between effect parameters to group the effects. You can alternately set it to use Bayes factors.

groupEffectParameters(post_eff)

Factor levels that share a grouping letter are indistinguishable. That none of the factor levels shared a letter indicates that they are all distinguishable. Thus, this grouping of effect parameters agrees with the hypothesis test that was done earlier, namely that there is a main effect.

Deviance Measure

Now that have explored the effect parameters, we can move on to the next step. Again, under the null hypothesis, the effect parameters should be near to one another. Under the alternative hypothesis, the effect parameters may be more spread out. Thus, we might be able to make use of some measure of the deviance/spread of the effect parameters. The default deviance measure used by this package is the sample variance, which I have found to work well (there is no difference between sample variance and population variance in this case; it's just a scale factor).

If you have a matrix of effect parameters, calculating the deviance, $D$, is very easy. Just apply the deviance function, in this case var, to each sample (row) from the effect parameters in post_eff.

# Calculate vector of deviances
post_D = apply(post_eff, 1, var)

We can do the same for the prior effects and plot the prior and posterior $D$s.

prior_eff = getEffectParameters(deltas$prior, factors, "let")

prior_D = apply(prior_eff, 1, var)

#Little helper function
estimateDDens = function(d, xs) {
    ls = polspline::logspline(d, lbound=0)
    polspline::dlogspline(xs, ls)
}

xs = seq(0, max(post_D), length.out=100)

plot(xs, estimateDDens(post_D, xs), type='l', ylim=c(0, 0.3),
         xlab="D", ylab="Density")
lines(xs, estimateDDens(prior_D, xs), col="green")
legend("topright", legend = c("prior", "post"), lty=1, col=c("green", "black"))

As you can see from the plot, there is much more prior density at 0 than posterior density at 0.

Under the null hypothesis, the effect parameters are close to one another and $D$ is consequently near 0. If there is more posterior density at 0 than prior density, it suggests that in the posterior the effect parameters are more clustered than the prior effect parameters, which is evidence in favor of the null hypothesis. If there is less posterior density at 0 than prior density, it suggests that the posterior effect parameters are more spread out than in the prior, which is evidence in favor of the alternative hypothesis. This is the basic logic of the Savage-Dickey density ratio, which will be discussed further.

Priors on Functions of Parameters

The deviance parameter, $D$, is a function of the effect parameters. An important piece of logic that allows the prior on $D$ to be known is that the prior on some function of model parameters is that function of the prior on the parameters.

For example, imagine that a model has two parameters, $\alpha$ and $\beta$ with the following priors. The normal distributions are parameterized in terms of mean and variance.

$$ \begin{aligned} \alpha &\sim \text{Normal}(3, 4) \ \beta &\sim \text{Normal}(7, 2) \end{aligned} $$

Imagine that there is a parameter $\gamma$ that is equal to the sum of $\alpha$ and $\beta$. I define the function $F$ for notational purposes.

$$ \begin{aligned} F(x, y) &= x + y \ \gamma &= F(\alpha, \beta) \end{aligned} $$ In the prior, $\alpha$ and $\beta$ are indpendent. It is known analytically that the sum of two independent normal random variables is a normal random variable with mean equal to the sum of the means and variance equal to the sum of the variances. Thus, the prior on $\gamma$ is

$$ \gamma \sim \text{Normal}(10, 6) $$ This basic logic holds for any function $F$ (including $F$ of any number of arguments), which is the critical logic that allows for the prior on the deviance parameter $D$ to be known. In practical terms, as long as the same function is applied to samples from the prior and posterior to calculate some dummy parameter, the prior for that dummy parameter will be correct. It is not neccessary for the prior to be known analytically.

Bayes Factor Estimation with Savage-Dickey

By default, CMBBHT estimates Bayes factors using the Savage-Dickey Density Ratio (SDDR) as suggested by @WagenmakersEtAl2010. The SDDR is used to test point hypotheses about parameter values. The Bayes factor in favor of the hypothesis that some parameter $\phi = 0$ is equal, under certain conditions, to the ratio of the posterior density of $\phi$ at 0 to the prior density of $\phi$ at 0.

There are some limitations for the use of the SDDR which are discussed by @WagenmakersEtAl2010 so I will not reiterate them here. The SDDR is fairly general, so it is likely fine to use for your use case.

The following code snippet calculates the SDDR for the deviance parameter, $D$.

library(polspline)

# Estimate prior density at 0
prior_ls = logspline(prior_D, lbound=0)
prior_dens = dlogspline(0, prior_ls)

# Estimate posterior density at 0
post_ls = logspline(post_D, lbound=0)
post_dens = dlogspline(0, post_ls)

# Calculate Bayes factors with the SDDR
bf_01 = post_dens / prior_dens
bf_10 = 1 / bf_01

bf_01 # BF that D = 0 and there is no effect
bf_10 # BF that D =/= 0 and there is an effect

The last few steps are done by the following function call (along with some extra steps to make it more robust when using priors with thick tails, like the Cauchy):

testFunction_SDDR(prior_eff, post_eff)

See this section for issues that can arise with the default density estimation approach and some possible solutions.

Conclusion

If you read this section, you now have an idea about the basic logic of how this package works. The steps shown above here are all performed by the following function call.

ht = testHypothesis(deltas$prior, deltas$post, factors, "let")
ht$bf10

Notes

Citing this Package

The main logic and mathematics of the hypothesis testing procedure used by this package was published in the Appendix of @RickerHardman2017, a full reference for which can be found in the references section at the end of this document.

If you use this package in your research, please cite it. An APA citation for this package is:

Hardman, K.O. (2017). CMBBHT: Cell Means Based Bayesian Hypothesis Tests (Version r packageVersion("CMBBHT")) [Computer software]. Retrieved from https://github.com/hardmanko/CMBBHT/releases/tag/vr packageVersion("CMBBHT")

Make sure that the version number is correct for the version that you used in both the name and the URL.

Effect of Uninformative Priors {#uninformativePrior}

This section uses Model 1, the basic cell means model. You can use the cell means for some analyses, but the priors on the cell means as in Model 1 are problematic in that kind of design. The issue is that when you directly estimate cell means, a fully general model should have fairly uninformative priors to the cell means in order to deal with the fact that the model does not know the approximate magnitude of the cell means. In the specification of Model 1, the prior variance on $\mu_i$ was set to 10,000 to account for the fact that $\mu$ can have very different magnitudes depending on the data. This is a very uninformative prior.

To get useful Bayes factors, you should not have uninformative priors on the quantities of interest (cell means or cell effects). The logic goes as follows: An uninformative prior on cell-mean-like parameters is typically some kind of very wide almost uniform distribution. This puts excessive mass on values that are profoundly unlikely. The posterior is very unlikely to be as extreme (in aggregate) as the prior. An important step in this method is to measure the dispersion, $D$, of the prior and posterior effect parameters. If the prior is extremely dispersed, the posterior is likely to be less dispersed than the prior. A hypothesis test might conclude that the posterior is less dispersed than the prior, which is evidence that there is no effect. Thus, uninformative priors result in a situation in which there is a bias toward finding evidence in favor of the null hypothesis.

If you use prior knowledge about the approximate magnitude of the cell means, you do not need priors that are particularly uninformative, but it requires that you know the approximate magnitude of the means (if you do know, this is a legitimate use of prior knowledge and you should absolutely use that information). By setting the prior means to have approximately the same magnitude, you can set the prior variances to moderately informative values without sacrificing the accuracy of parameter estimation.

The code below demonstrates what happens with more and less informative priors on cell means. It uses a case where the means of the 4 cells of the design are 0, 1, 2, and 3.

ex1_getCMs = function(mus, posteriorSD = 1, priorMu = 0, priorSD = 3, samples = 1e5) {

    if (length(priorMu) == 1) {
        priorMu = rep(priorMu, length(mus))
    }

    priorCMs = postCMs = matrix(nrow = samples, ncol=length(mus))

    for (i in 1:ncol(postCMs)) {
        priorCMs[,i] = rnorm(nrow(priorCMs), priorMu[i], priorSD)
        postCMs[,i] = rnorm(nrow(postCMs), mus[i], posteriorSD)
    }

    list(prior = priorCMs, post = postCMs)
}


fact = data.frame(f = 1:4)
mus = c( 0, 1, 2, 3 )

cm1 = ex1_getCMs(mus, priorSD = 5)

# Get effect parameters and calculate D
pf_5 = getEffectParameters(cm1$prior, fact, "f")
prior_D_5 = apply(pf_5, 1, var)


cm1 = ex1_getCMs(mus, priorSD = 20)

pf_20 = getEffectParameters(cm1$prior, fact, "f")
prior_D_20 = apply(pf_20, 1, var)
xs = seq(0, 100, 0.1)

plot(xs, estimateDDens(prior_D_5, xs), type='l')
lines(xs, estimateDDens(prior_D_20, xs), col="green")

dens5 = estimateDDens(prior_D_5, 0)
dens20 = estimateDDens(prior_D_20, 0)

points(0, dens5, pch=16)
points(0, dens20, pch=16, col="green")

legend("topright", legend = c("SD = 5", "SD = 20"), lty=1, col=c("black", "green"))

A prior SD of 5 is black and a prior SD of 20 is green. As you can see, changing the informativeness of the prior has substantially changed the distribution of $D$. It has especially affected the density at 0, with the ratio dens5 / dens20 = r round(dens5 / dens20, 3). When the prior SD is very high, the prior density at 0 is tiny. Even if the posterior distribution of $D$ is not at 0, it is easily possible for the posterior density of $D$ at 0 to be higher than the prior density of $D$ at 0 when the prior SD is high. If the prior density at 0 is less than the posterior density, that is evidence in favor of the null hypothesis under the logic of the SDDR.

One additional unfortunate effect of lowered density at 0 is that accurate estimation of that density is more difficult as there is less local data on which to base the density estimate.

Adding Effects to Priors

Note that if you do set the priors on cell means to be moderately informative, you should use the same location/mean for the prior on each cell mean. The reason is that if you "guess" that there is a main effect or interaction by using different locations for different cells, then you are effectively building that effect into the prior. Thus, the null hypothesis would be not that there is no effect, but that there is a specific effect.

The following code shows what happens when you build an effect into the prior. There is a main effect, but it is the same main effect as in the prior, so the evidence is in favor of the null.

set.seed(543)
cm1_h1_builtIn = ex1_getCMs(mus, priorMu = mus, priorSD = 3, posteriorSD = 1)
ht = testHypothesis(cm1_h1_builtIn$prior, cm1_h1_builtIn$post, fact, "f")
ht$bf01

Due to the issues with the basic cell means model, I prefer the Model 2 model or one like it. Even then, you still need to use reasonably informative priors on the cell effects, $\delta_i$.

Hierarchical Models

One example of a case that is a little complicated is when a parameter has an estimated prior as in a hierarchical model. For a model like in Model 2, it seems reasonable to place a hierarchical prior on the cell effects ($\delta$s in that model) to constrain them. For example, imagine that the prior on $\delta_i$ is changed as follows: $$ \begin{aligned} \delta_i &\sim \text{Normal}(0, \sigma^2_\delta) \ \sigma^2_\delta &\sim \text{Inverse Gamma}(6, 10) \end{aligned} $$ There are two things to be aware of with this setup:

First, the prior on $\delta_i$ is based on prior values of $\sigma^2_\delta$, not posterior values of $\sigma^2_\delta$. To sample from the prior on $\delta_i$, you first sample from the prior on $\sigma^2_\delta$ and then sample from the normal prior on $\delta_i$ given the prior samples of $\sigma^2_\delta$. This is illustrated in the following code snippet.

# Sample from the prior on s2_delta
prior_s2_delta = MCMCpack::rinvgamma(1000, 6, 10)

# Sample form the prior on delta_i
prior_delta_i = rnorm(1000, 0, sqrt(prior_s2_delta))

Secondly, the prior on $\sigma^2_\delta$ should be such that the normal prior on $\delta_i$ is reasonably informative. The choices of 6 and 10 for the parameters of the inverse gamma result in a mean $\sigma^2_\delta$ of 10 / (6 - 1) = 2 and fairly low variance, as can be seen in the left histogram below.

par(mfrow=c(1,2), mar=c(4, 4, 0.5, 0.5))
hist(prior_s2_delta, main="", prob=TRUE)
hist(prior_delta_i, main="", prob=TRUE)

As a result, the prior on $\delta_i$, on the right, is moderately informative. As was discussed, uninformative priors are problematic for obtaining useful Bayes factors using this method. You should try to choose reasonably informative priors even in hierarchical settings.

Encompassing Prior Approach {#encompassingPrior}

@WetzelsGrasmanWagenmakers2010 suggest a different method of estimating the Bayes factor than the Savage-Dickey Density Ratio. They call it the encompassing prior approach (EPA). Essentially, it works nicely for interval tests and other non-point null hypotheses (the SDDR tests point hypotheses). Intervals can be used to test directional hypotheses about parameters. In addition, the encompassing prior approach does not require point density estimation, which can be computationally advantageous.

Imagine that you have two hypotheses about effect parameters, $\alpha$s, that you want to compare. The $\alpha$s are related to some effect that you want to test and can be calaculated by the getEffectParameters function. One model, the encompassing model, states that the $\alpha$s are free to vary. The other model places a constraint on the $\alpha$s, that they are near 0 with some interval size set by $\epsilon$.

$$ \begin{aligned} M_1&: \alpha_i \ M_0&: | \alpha_i | < \epsilon \end{aligned} $$

I will use different notation than @WetzelsGrasmanWagenmakers2010 and notate the encompassing/alternative model as $M_1$ and the constrained/null model as $M_0$.

The gist of the EPA is that, under certain circumstances, the Bayes factor in favor of the null model ($M_0$) over the alternative model ($M_1$) can be expressed as the proportion of the posterior of $M_1$ that satisfies the constraint of $M_0$ divided by the proportion of the prior of $M_1$ that satisfies the constraint of $M_0$. You only need to fit $M_1$ and obtain prior and posterior samples from it. Then you retroactively apply the $M_0$ constraint to the prior and posterior of $M_1$ and see what proportion of each satisfies that constraint.

This idea is expressed in Equation 2 of @WetzelsGrasmanWagenmakers2010, which is reproduced below with some adjustments and simplifications.

$$ BF_{01} = \frac{ \frac{1}{m} \sum_{j=1}^m I_{M0}( \vec{\alpha}^{(j)} | D, M_1) }{ \frac{1}{n} \sum_{k=1}^n I_{M0}( \vec{\alpha}^{(k)} | M_1) } $$

The numerator contains posterior $\alpha$s (indicated with $| D$ meaning "given data") and the denominator contains prior $\alpha$s. $\vec{\alpha}^{(j)}$ is the $j$th vector of posterior effect parameters (i.e. one posterior sample of all of the $\alpha$s) while $\vec{\alpha}^{(k)}$ is the $k$th vector of prior effect parameters. $I_{M0}$ is an indicator function that enforces the constraint(s) the null model, $M_0$. It returns 1 (true) if the constraints of $M_0$ hold or 0 (false) if the constraints of $M_0$ do not hold.

To make this concrete, let's set $\epsilon$ to 1.5, which makes $M_0$ be that all of the $\alpha$s must be within 1.5 units of 0.

# Get prior and posterior effect parameters for some effect
prior_eff = getEffectParameters(deltas$prior, factors, "let")
post_eff = getEffectParameters(deltas$post, factors, "let")

# The indicator function takes a vector of effect parameters and returns
# TRUE (numeric 1) if all absolute values are less than 1 and FALSE (0) otherwise
I_M0 = function(alpha_vect) {
    all(abs(alpha_vect) < 1.5)
}

# Calculate the numerator: The proportion of posterior samples satisfying I_M0
post_I = apply(post_eff, 1, I_M0)
numerator = mean(post_I)

# Calculate the denominator: The proportion of prior samples satisfying I_M0
prior_I = apply(prior_eff, 1, I_M0)
denominator = mean(prior_I)

bf_01 = numerator / denominator #In favor of the constrained hypothesis/model
bf_10 = 1 / bf_01 # In favor of the encompassing hypothesis/model

The logic of the EPA is very much like the SDDR. The posterior proportion satisfying $M_0$, numerator = r numerator, and the prior proportion satisfying $M_0$, denominator = r denominator. Thus, in the prior, there is far more evidence that the $\alpha$s are near 0 than in the posterior. As a result, the test finds evidence in favor of the encompassing model, $M_1$, which is evidence against the constraint imposed by $M_0$. The Bayes factor in favor of $M_1$, bf_10 = r bf_10.

EPA and SDDR are so similar, in fact, that @WetzelsGrasmanWagenmakers2010 show that as $\epsilon \rightarrow 0$, the encompassing prior approach reduces to the SDDR (see Figure 3 of that article). Note that for small $\epsilon$, the number of prior and posterior samples from $M_1$ that meet the constraints of $M_0$ becomes small. As a result, the estimated Bayes factor becomes noisy. As such, the SDDR is typically preferable to the encompassing prior approach for very narrow intervals.

On the other hand, for the encompassing prior approach, we did not need to calculate the deviance, $D$, of the effect parameters. We also did not need to estimate the density of $D$ at 0, which is often fairly low, particularly for uninformative priors, given that $D$ can only be 0 if all $\alpha$s are 0.

The EPA also differs from the SDDR in that SDDR tests a point null, while encompassing prior tests interval nulls. This has both advantages and disadvantages. The advantage is that encompassing prior allows for a test of interval nulls. The disadvantage is that selecting the width and/or location of the interval is subjective and can strongly influence the results. Point nulls do not require selection of an interval width.

The EPA can be used with testFunction_EPA as follows:

ht = testFunction_EPA(prior_eff, post_eff, I_M0)
ht$bf10

If you want pass an encompassing prior test function to testHypothesis, you will need to make a curried function in which the indicator function is passed to testFunction_EPA.

curriedTestFun = function(priorEffects, postEffects) {
    testFunction_EPA(priorEffects, postEffects, I_M0)
}

ht = testHypothesis(deltas$prior, deltas$post, factors, "let", testFunction=curriedTestFun)
ht$bf10

We can make a different constrained model just by changing the indicator function. In this case, $\epsilon$ is reduced to 1, which makes the constrained model more constrained.

narrower = function(priorEffects, postEffects) {
    I_fun = function(alpha_vect) {
        all(abs(alpha_vect) < 1)
    }
    testFunction_EPA(priorEffects, postEffects, I_fun)
}

ht = testHypothesis(deltas$prior, deltas$post, factors, "let", testFunction=narrower)
ht$bf10

If you want to test if the effect parameters are in an interval, which is common, there is a helper function called create_EPA_intervalTF. This tests the null hypothesis that all of the effect parameters are between -1 and 1, which is equivalent to testing that their absolute values are less than 1, so it should produce the same result as the previous test.

intervalTf = create_EPA_intervalTF(-1, 1)
ht = testHypothesis(deltas$prior, deltas$post, factors, "let", testFunction=intervalTf)
ht$bf10

We can also use a different kind of constraint on the null model. Rather than making an interval around 0, we could instead use a measure of the deviance of the $\alpha$s, which is conceptually more like the SDDR approach.

smallSd = function(priorEffects, postEffects) {
    I_fun = function(alpha_vect) {
        sd(alpha_vect) < 1
    }
    testFunction_EPA(priorEffects, postEffects, I_fun)
}

ht = testHypothesis(deltas$prior, deltas$post, factors, "let", testFunction=smallSd)
ht$bf10

Note that using the SDDR test function produces different, if still substantively equivalent, results.

ht = testHypothesis(deltas$prior, deltas$post, factors, "let", testFunction=testFunction_SDDR)
ht$bf10

The major downside of the EPA is that specifics about the intervals used in null hypotheses substantially affect the resulting Bayes factors, as was shown in these examples. It is possible to create null hypotheses that cover none of the posterior, which results in infinite Bayes factors in faver of the null even if most of the posterior mass is just outside of the interval. The SDDR, as used in this package, does not have these drawbacks, so it is the sensible default. In cases where specialized null hypotheses are required, however, the EPA can be useful.

Testing the Intercept/Grand Mean

If you want to test the value of the intercept of the model, use the testIntercept function. Note that this tests the intercept of whatever cell means you provide to it. Thus, if you provide cell effects like from Model 2, you are testing the intercept of those cell effects, not of the actual cell means.

The value of the intercept depends on the type of contrasts that you use. For orthogonal contrasts, the intercept is the grand mean, but it has a different interpretation for, e.g., treatment contrasts (contr.treatment). In particular, when treatment contrasts are used, the intercept is equal to the value of the cell defined by the combination of the first level of each of the factors in the design.

You can test, for example, that the intercept is equal to 2.5.

testIntercept(deltas$prior, deltas$post, factors, testVal = 2.5)

Rather than testing a point hypothesis about the intercept, you can also use the encompassing prior approach to test interval hypotheses. Below, I define a test function that tests an interval 0.5 units around the testVal.

# You don't have to use testVal, but I do here to define the center of the interval.
intTestFun = function(prior_int, post_int, testVal) {
    I_int = function(int) {
        ((testVal - 0.5) < int) && (int < (testVal + 0.5))
    }
    testFunction_EPA(prior_int, post_int, I_int)
}

ht = testIntercept(deltas$prior, deltas$post, factors, testVal = 2.5, 
                            testFunction = intTestFun)
ht$bf01

You can get the intercept with getEffectParameters by providing the special value "(Intercept)" for the testedFactors argument.

prior_int = getEffectParameters(deltas$prior, factors, "(Intercept)")
head(prior_int)

Problems with Density Estimation for SDDR {#densityEstimationProblems}

When using SDDR, you must estimate the density of the prior and posterior at a point. This can be made problematic for a number of reasons. One thing that makes it problematic is when the prior has thick tails, like the Cauchy distribution. We will work with a model like Model 2, but with the following prior on the cell effects, $\delta_i$.

$$ \delta_i \sim \text{Cacuchy}(0, 3) $$

M2_sampleDeltas_cauchy = function(mus, posteriorSD = 1, priorLoc = 0, priorScale = 3, cornerstone = 1, samples = 10000) {

    mus = mus - mus[cornerstone]

    priorDelta = postDelta = matrix(0, nrow = samples, ncol=length(mus))

    for (i in 1:ncol(postDelta)) {
        if (!(i %in% cornerstone)) {
            priorDelta[,i] = rcauchy(nrow(priorDelta), priorLoc, priorScale)
            postDelta[,i] = rnorm(nrow(postDelta), mus[i], posteriorSD)
        }
    }

    list(prior = priorDelta, post = postDelta)
}
mus = 2:7
deltas = M2_sampleDeltas_cauchy(mus) # Sample from prior and posterior of alphas

prior_eff = getEffectParameters(deltas$prior, factors, "let")
prior_D = apply(prior_eff, 1, var)

post_eff = getEffectParameters(deltas$post, factors, "let")
post_D = apply(post_eff, 1, var)

library(polspline)

# Estimating the posterior density is usually easy.
post_ls = logspline(post_D, lbound=0)
post_dens = dlogspline(0, post_ls)

# The prior density is harder when using Cauchy priors, 
# because you get really large values.
# Strip off the values larger than 10 times the largest 
# posterior sample.

prior_max = max(post_D) * 10
prior_kept = prior_D < prior_max

prior_ls = logspline(prior_D[ prior_kept ], lbound=0, ubound=prior_max)
prior_dens = dlogspline(0, prior_ls)

# Adjust the density to account for the removed prior samples.
prior_dens = prior_dens * mean(prior_kept)

# Calculate Bayes factors with the SDDR
bf_01 = post_dens / prior_dens
bf_10 = 1 / bf_01

bf_01 # BF that D = 0 and there is no effect
bf_10 # BF that D =/= 0 and there is an effect

One thing that is important to consider is the proportion of right tail of the prior that gets cut off. The estimate of the prior density depends on what amount of the prior is actually used.

printPriorDensity = function(prior_max) {
    prior_ls = logspline(prior_D[ prior_D < prior_max ], lbound=0, ubound=prior_max)
    prior_dens = dlogspline(0, prior_ls)

    cat("density = ")
    cat(prior_dens)
    cat("\npKept = ")
    cat( mean(prior_D < prior_max) )
}

printPriorDensity(max(post_D) * 10^1)
printPriorDensity(max(post_D) * 10^3)
printPriorDensity(max(post_D) * 10^5)

As you can see, as the point at which the prior is cut off (prior_max) changes, the prior density at 0 changes. In addition, pKept changes as expected. Ideally, there would be no cutoff point. Note, however, that once the cutoff gets too large, the density estimate is no longer realistic (Inf). What if the top cutoff if left off?

tryCatch({
    logspline(prior_D, lbound=0)
}, error = function(e) {
    print(e)
})

Density estimation errors out. It is possible that a density estimation procedure specialized for estimating density at the edge of a distribution would work better, but the polspline package works pretty well for the most part.

This package also provides functionality to estimate the Bayes factor with the encompassing prior approach, which can avoid some of the problems with density estimation.

Modifying Test Functions

Depending on how problematic your prior and posterior distributions are, it may be difficult for logspline to estimate the prior and posterior densities. You may end up needing to estimate the densities in some other way. However, you may just need to change some of the default settings of the test function.

There are a few arguments of testFunction_SDDR that are related to some of the issues discussed in the previous section. These include postMaxMult, which controls where the prior is cut off relative to the maximum of the posterior, min_pKept, which controls how much of the prior must be kept, and truncatePosterior, which cuts off the same proportion of the posterior and was cut from the prior.

You can modify the default values for the arguments of testFunction_SDDR by creating a curried function and passing that function to the testFunction argument of testHypothesis.

curriedTestFunction = function(priorEffects, postEffects) {
    testFunction_SDDR(priorEffects, postEffects, postMaxMult = 3)
}

ht = testHypothesis(deltas$prior, deltas$post, factors, "let", testFunction = curriedTestFunction)
ht$bf10

If you need to write your own test function, just know that it takes two matrices of effect parameters. The first is the prior and the second is the posterior.

See also the Encompassing Prior section for more examples of curried test functions.

Non-Fully-Crossed/Unbalanced Designs {#unbalancedDesigns}

In short, designs that are not fully crossed (unbalanced) are supported by default, in the sense that the package does something that is correct. When dealing with unbalanced designs, however, there may be more than one correct thing to do depending on specific details about what hypothesis you want to test. You may want to do something other than what the package does by default.

This section is an exploration of why non-fully-crossed designs are weird to analyze. It concludes with some information about how to control what this package does with unbalanced designs.

Consider the following non-fully-crossed (unbalanced) design, in which there are two factors (let and num), each with three levels.

factors = data.frame(
    let = c('a', 'b', 'b', 'c', 'c', 'c'),
    num = c('1', '1', '2', '1', '2', '3')
)

mus   = c( 0,   0,   6,   0,   0,   0 )

Let's imagine that for some parameter we get the following pattern of cell means. A "." indicates a missing cell in the design.

/ | 1 | 2 | 3 | ---|---|---|---| a | 0 | . | . | b | 0 | 6 | . | c | 0 | 0 | 0 |

In this design, you can conceptually think of the following effect parameters: A grand mean or intercept, main effects of the let factor, main effects of the numbers factor, and interaction terms.

Assume that you want to test the main effect of the let factor. There are two ways we can think about the main effect. The first is to consider the main effect of let without considering the interaction between let and num. The second is to consider the main effect in the context of the interaction. These approaches are equivalent in a fully-crossed design that uses orthogonal contrasts (the package default for fully-crossed designs), but they are not equivalent when the design is not fully crossed, in which case the contrasts cannot be orthogonal.

We will start with the first approach, examining the main effect of let without the interaction. I will do this by creating a design matrix and then solving for the effect parameters given the cell means.

X = model.matrix( ~ let, factors, 
  contrasts.arg = list(let = "contr.treatment"))
X

First, the design matrix, $X$, is created using treatment contrasts, which are valid when the design is not fully crossed. The solution for the vector of effect parameters is $\beta = (X^TX)^{-1}X^T\mu$, where $\mu$ are the cell means.

beta = solve(t(X) %*% X) %*% t(X) %*% mus
zapsmall(beta)

We can see that the parameter corresponding to the b level of let, letb, is 3, while the letc parameter is 0. Given that treatment contrasts were used, the value for leta is 0, i.e. nothing is added to the a level of let beyond the intercept. Thus, the effect parameters correspond to the marginal means of 0, 3, and 0 for a, b, and c, respectively. If we assume very low noise in the data, there appears to be a main effect of let.

Now let's use the second approach and include the interaction in the design and see what happens. Let's make the design matrix, this time including both factors and their interaction.

X = model.matrix( ~ let * num, factors, 
  contrasts.arg = list(
    let = "contr.treatment", 
    num = "contr.treatment"
  )
)
X

This design matrix, however, has too many columns (9) for the number of cells in the design (6), making it be not full rank (one outcome of which is that $X^{T}X$ is uninvertable, which it needs to be). I'll take advantage of a little trick to drop excess columns.

q = qr(X)
kept = q$pivot[ 1:q$rank ]
X = X[ , kept ]
X

Now $X$ is full rank. You can see from the columns of $X$ that there is one interaction term, letb:num2. This term corresponds to the cell of the design with mean 6. We can calculate the effect parameters as before.

beta = solve(t(X) %*% X) %*% t(X) %*% mus
zapsmall(beta)

Now when we look at the parameter values, we can see that all of the let main effect parameters are 0 because all of the marginal effect of cell b2 is accounted for by the interaction. Thus, there does not appear to be a main effect of let.

Here is a different example.

/ | 1 | 2 | 3 | ---|---|---|---| a | 0 | 2 | 4 | b | 0 | . | . | c | 0 | . | . |

We can see that there is a marginal main effect of let when nothing else is considered. However, when the main effect of num is included in the design, the parameters associated with the 2 and 3 levels of the num factor will account for the apparent marginal let effect. Thus, main effects can be affected by the inclusion of other main effects.

factors = data.frame(
  let = c('a', 'a', 'a', 'b', 'c'),
  num = c('1', '2', '3', '1', '1')
)
mus   = c( 0,   2,   4,   0,   0)

X = model.matrix( ~ let + num, factors, 
  contrasts.arg = list(
    let = "contr.treatment", 
    num = "contr.treatment"
  )
)

beta = solve(t(X) %*% X) %*% t(X) %*% mus
zapsmall(beta)

When working with non-fully-crossed designs, you have to choose what main effects or interactions to account for. By default, testHypothesis takes the first approach: Only estimate the main effect, but not interactions. More generally, it only includes terms up to and including the tested effect. This is conceptually equivalent to what frequentist ANOVA does. For example, if you are testing two-factor interaction, it will estimate that interaction and the two associated main effects. If there is a third factor, it will not estimate the main effect of that third factor or any interactions including that third factor.

To get more control over what effects are estimated, you can provide the dmFactors argument to testHypothesis or getEffectParameters. Here I use a formula to specify that I want to use a design matrix with both main effects.

getEffectParameters(matrix(mus, nrow=1), factors, "num", dmFactors= "~ let + num")

See also the contrastType argument of testHypothesis and getEffectParameters.

Subsetting Designs

What if you want to use only a subset of the cells in your design? Easy, just exclude the cells you don't want to use. You do this by dropping rows of the factors data.frame and by dropping the same columns of the cell means/effects. The example below will demonstrate this.

Tests of Simple Effects

A case in which you might want to subset a design is if you want to do follow-up tests of simple effects when there is an interaction present in the data.

factors = data.frame(
    let = c('a', 'a', 'b', 'b'),
    num = c('1', '2', '1', '2')
)

# Build in an interaction
mus = c( 1, 5, 3, 3 )

deltas = M2_sampleDeltas(mus)

testHypothesis(deltas$prior, deltas$post, factors, c("let", "num"))$bf10

Now that we know that there is an interaction in the design, we should not interpret main effects. Instead, we should test for main effects of one factor within each level of the other factor to determine where effects are present.

In the following examples, we condition on first the a level of let and then the b level of let. In both cases, the main effect of num is tested.

js = (factors$let == 'a')
testHypothesis(deltas$prior[, js], deltas$post[,js], factors[js,], "num")$bf10

js = (factors$let == 'b')
testHypothesis(deltas$prior[, js], deltas$post[,js], factors[js,], "num")$bf10

We find the expected result: There is a simple effect of num within the a level of let, but no simple effect of num within the b level of let.

Nested Design

You might have a design in which the levels of one factor are nested within the levels of another factor. For example, assume a design in which the a level of the let factor has levels 1 to 4 of the num factor, but level b of the let factor has levels 1 and 2 of the num factor. The design is such that num 1 has a different meaning depending on whether it was paired with let a or b. As such, you must analyze each level of let individually.

factors = data.frame(
    let = c('a', 'a', 'a', 'a', 'b', 'b'),
    num = c('1', '2', '3', '4', '1', '2')
)

mus =   c( 2,   3,   5,   5,   3,   2 )

deltas = M2_sampleDeltas(mus)

# Select cells within the 'a' level of let
cells = (factors$let == 'a')
# Test the main effect of num for those cells
testHypothesis(deltas$prior[, cells], deltas$post[,cells], factors[cells,], "num")$bf10

cells = (factors$let == 'b')
testHypothesis(deltas$prior[, cells], deltas$post[,cells], factors[cells,], "num")$bf10

These results fit with the pattern in the cell means: There is a main effect of num within the a level of let, but no main effect of num within the b level of let.

Appendix of Ricker and Hardman (2017)

What follows is the Appendix of @RickerHardman2017 which is the published description of the basic method used by the CMBBHT package. This appendix was written before I understood the method as well as I do now, so I have added parenthetical notes (Note: blah) where needed. I also made some additional minor changes to improve readability that I did not bother noting.

The appendix:

For the analyses of the parameters estimated from the @HardmanVergauweRicker2017 model, we performed some Bayesian hypothesis tests on main effects and interactions between experimental factors, such as serial position and consolidation time. The technique that we used has not, in our knowledge, been published, was not straightforward and requires some explanation, which is given here. The technique is fairly general and can be applied to a range of Bayesian models. It is conceptually equivalent to analysis of variance (ANOVA), although the mechanics of the technique are fairly different from ANOVA. The technique only works for fully crossed designs, where you have data in every cell of the design (Note: This turns out to have not been true; designs do not have to be fully crossed). Note that although we used a within-participants design, this hypothesis testing technique does not require a within-participants design, only that the model has parameters that account for differences between conditions. The reader may find it helpful to read the Appendix of Hardman et al. (2017) for information on the specifics of the specification of that model (Note: Or see Model 4 for the gist).

In the Hardman et al. (2017) model, parameter values are allowed to vary between task conditions. For example, the $P^M$ parameter varies between conditions. This is done by estimating a single parameter that is shared by all participants that is added to all participant-level parameters, shifting each participant's parameter by the same amount. For example, each participant has their own $P^M$ parameter that reflects their baseline performance, but their effective performance in any condition is based on their own $P^M$ parameter plus the effect of task condition. We call the single, shared parameter for each condition a condition effect. (Note: See the second equation in Model 4, in which $\phi_i$ is the participant parameter and $\phi_j$ is the condition effect.)

The condition effects are differences from a cornerstone condition. Other models may instead estimate cell means or differences from a grand mean. For the purposes of this technique, there is no substantial difference between true cell means, differences from a grand mean, or differences from a cornerstone condition. It is conceptually most simple to think of the condition effects as cell means, so we will do so. (Note: As discussed with Model 1, the downside of using cell means is that the priors are not as easy to work with for cell means then they are for differences from a cornerstone condition.)

This hypothesis testing technique is for fully crossed factorial experimental designs (Note: Also for not fully crossed designs that use appropriate contrasts) and it can be thought of as a kind of ANOVA. This hypothesis testing technique works for experimental designs with any number of factors, but we will use a two-factor design as an example. The example will be a design with two levels of the A factor and three levels of the B factor. You could think of the A factor as consolidation time and the B factor as serial position. A two-factor ANOVA model can be written as

$$ \mu_{ij} = \mu_G + \alpha_i + \beta_j + \pi_{ij} $$ where $\mu_{ij}$ is the cell mean (or, in our case, condition effect) for the $i$th row and $j$th column of the design, $\mu_G$ is the grand mean, $\alpha_i$ is the main effect parameter for the $i$th row, $\beta_j$ is the main effect parameter for the $j$th column, and $\pi_{ij}$ is the interaction parameter for the $i$th row and $j$th column. The main effect and interaction parameters are constructed with sums-to-zero constraints on the rows and columns (Note: Sums-to-zero contraints are not required, but are one of the best choices for balanced designs.). This means that the $\alpha$s and $\beta$s sum to zero, as do the $\pi$s within each row or column. For the sake of clarity, we will use the term ANOVA parameters for $\mu_G$, $\alpha_i$, $\beta_j$, and $pi_{ij}$.

It is possible to construct a design matrix, $X$, that satisfies the equality $$ \mu = X\theta $$ where $\mu$ is a vector of cell means (or condition effects) and $\theta$ is a vector of ANOVA parameters. Some of the ANOVA parameters are not included in $\theta$ because of the sums to zero constraints. We will choose to drop the first level of each of the factors. As a result, for the 2 by 3 example, $$ \theta = [ \mu_G ~ \alpha_2 ~ \beta_2 ~ \beta_3 ~ \pi_{22} ~ \pi_{23} ]^T $$ where superscript $T$ indicates transpose (i.e., $\theta$ is a column vector). The sums-to-zero constraints are enforced by the way in which $X$ is constructed. This can be done in R (R Core Team, 2015) with the model.matrix function using the contr.sum function for creating sums-to-zero contrasts.

In this hypothesis testing technique, as in standard ANOVA, we have the cell means, $\mu$, and must solve for $\theta$. This is straightforward and results in the solution $$ \theta = (X^TX)^{-1}X^T\mu $$ For the derivation of this equation, see a GLM textbook (e.g., Kutner, Nachtsheim, Neter, & Li, 2004; Ch. 5, p. 199). We will define $$ S = (X^TX)^{-1}X^T $$ where $S$ can be thought of as the matrix that converts cell means into ANOVA parameters.

In the Hardman et al. (2017) model, each of the condition effect parameters has a prior on it. The priors are independent Cauchy distributions with location 0 and a scale that depends on which model parameter is being used. See Hardman et al. for more information on the default priors (Note: See instead the CatContModel R package documentation for information on the current default priors). The $\theta$ vector contains parameters calculated from the cell means, $\mu$. Thus, the priors on the $\theta$s depend on the priors on the $\mu$s. The priors on the $\mu$s are 0 centered and the Cauchy distribution is symmetrical, which results in the marginal priors on the $\theta$s being 0 centered. It is not necessary to know more about the priors on the $\theta$s for the purpose of this technique.

For the sake of example, we will examine a test the main effect of the B factor. We would use the 2 parameters in $\theta$ that are related to the main effect of B: $\beta_2$ and $\beta_3$. There are three levels of B, but the sums to zero constraints mean that there are only two $\beta$s in $\theta$. We will need $\beta_1$ as well, which we can calculate from $\beta_2$ and $\beta_3$ using the knowledge that the $\beta$s sum to 0. The implied $\beta$, $\beta_1$, is equal to the negative sum of the other betas: $\beta_1 = -(\beta_2 + \beta_3)$. When an interaction is being tested, calculating implied interaction terms is more complex, but simply requires that the same proce dure be applied iteratively to rows and columns. (Note: I found a better way to calculate the implied elements in $\theta$, so this paragraph should be thought of as a conceptual example rather than giving specifics about part of the procedure.)

We can calculate the $\beta$s based on both the prior and posterior cell means separately. As discussed, the prior on each of the $\beta$s is 0 centered. As such, the prior belief is that the $\beta$s are dispersed around 0. In the posterior, the $\beta$s may be more or less dispersed around 0 than in the prior. If there is no main effect, we would expect the posterior $\beta$s to be more closely clustered around 0 than the prior $\beta$s. If there is a main effect, we would expect that the posterior $\beta$s would be more dispersed from 0 than in the prior. This logic establishes the basic setup for the hypothesis test, but there are a few more steps.

We will use the Savage-Dickey density ratio as part of this hypothesis testing technique. The Savage-Dickey density ratio can be used to provide the Bayes factor for a test of the hypothesis that a parameter has a given value (@WagenmakersEtAl2010). It requires the estimation of the density at a point in the prior distribution and at the same point in the posterior distribution. Under the null hypothesis that there is no main effect of the B factor, we want to test that the $\beta$ parameters are all at 0. Thus, we need to estimate the prior and posterior densities at $\bold{0}$ (the zero vector, of length equal to the number of $\beta$ parameters). This is complicated by the fact that it is nontrivial to estimate the density of a multidimensional distribution. Thus, to use Savage-Dickey, we must reduce the dimensionality of the problem: Rather than working with the prior and posterior distributions of the $\beta$s, we would like to collapse the $\beta$s down to, ideally, 1 dimension.

The hypothesis we want to test relates to how near to 0 the $\beta$s are. Thus, some measure of the dispersion of the $\beta$s around 0 would be an appropriate way to reduce the dimensionality of the problem. We will call this measure of dispersion $\Delta$. We chose to use the sample variance to measure dispersion, so we define $\Delta = \text{Var}(\beta)$. We chose to use variance because it resulted in relatively stable density estimates when compared with other measures of dispersion that we tried, such as standard deviation or the length of the $\beta$ vector (Note: The length of the vector in space, not the number of elements).

We are interested in testing the hypothesis that all $\beta$s are 0, but using $\Delta$ instead of the $\beta$s. If all of the $\beta$s are 0, then $\Delta = 0$. Thus, to calculate the Savage-Dickey density ratio, we should compare the prior and posterior densities of $\Delta$ at 0. We used the polspline package (Kooperberg, 2015) for R for kernel density estimation as suggested by Wagenmakers et al. (2010).

We are able to sample directly from the prior distribution of the cell means, $\mu$, because the model places priors directly on the cell means. We are also able to indirectly sample from the posterior distribution of the cell means, which is what happens during parameter estimation. We do not, however, know how to sample directly from the prior or posterior distribution of the ANOVA parameters, $\theta$. However, given samples from the prior and posterior distributions of the $\mu$s and the matrix $S$, we can calculate the prior and posterior distributions of the ANOVA parameters with the equation $\theta = S\mu$. This calculation done for each sample from the prior or posterior of $\mu$, resulting in prior and posterior distributions of $\theta$.

To summarize, the steps of the procedure follow.

  1. Sample from the posterior distribution of the cell means, $\mu$, by doing model parameter estimation.
  2. Sample from the prior distribution of the cell means. Take a number of samples equal to the number of samples taken from the posterior distribution, less burn-in iterations.
  3. Create $X$ (Note: e.g. in R with model.matrix) and calculate $S$.
  4. Do the following substeps separately for the prior and posterior distributions: a. Using $S$ and the cell means, $\mu$, calculate $\theta$ (the ANOVA parameters). Do this for each sample of $\mu$ (i.e., each MCMC iteration or each sample from the prior). b. For each test of interest (i.e., main effects or interactions), select from $\theta$ the appropriate subset of ANOVA parameters (e.g., the $\beta$s for the test of the main effect of factor B). Do this for each sample of $\theta$. c. For each sample of $\theta$, calculate the ANOVA parameter(s) implied by the sums-to-zero constraint (e.g., $\beta_1$ from $\beta_2$ and $\beta_3$). d. Calculate $\Delta$ for each sample of ANOVA parameters (e.g., $\beta$s). e. Estimate the density of $\Delta$ at 0 with, for example, the polspline package.
  5. Calculate the Savage-Dickey density ratio by dividing the prior density by the posterior density. This is an estimate of the Bayes factor in favor of the hypothesis that there is an effect (a main effect or interaction, depending on what is being tested). All of the preceding steps must be done separately for each model parameter (e.g., $P^M$).

To obtain an estimate of the variability in Bayes factors, you can repeat this procedure either with 1. subsets of the iterations that form the posterior distribution and/or 2. with different samples from the prior. Taking new samples from the prior is very computationally cheap, so we used that approach. This whole procedure, including estimating the variability in the Bayes factors, is a part of the CatContModel package (Hardman, 2016) and can be accessed with the testMainEffectsAndInteractions function. (Note: CMBBHT had not been released yet, but the method as described in the appendix was implemented in the CatContModel package.)

References



hardmanko/CMBBHT documentation built on June 9, 2022, 12:44 a.m.