req_suggested_packages <- c("emmeans", "multcomp", "ggplot2") pcheck <- lapply(req_suggested_packages, requireNamespace, quietly = TRUE) if (any(!unlist(pcheck))) { message("Required package(s) for this vignette are not available/installed and code will not be executed.") knitr::opts_chunk$set(eval = FALSE) }

op <- options(width = 90) knitr::opts_chunk$set(dpi=72)

This documents reanalysis a dataset from an Experiment performed by Singmann and Klauer (2011) using the ANOVA functionality of **afex** followed by post-hoc tests using package **emmeans** (Lenth, 2017). After a brief description of the dataset and research question, the code and results are presented.

Singmann and Klauer (2011) were interested in whether or not conditional reasoning can be explained by a single process or whether multiple processes are necessary to explain it. To provide evidence for multiple processes we aimed to establish a double dissociation of two variables: instruction type and problem type. Instruction type was manipulated between-subjects, one group of participants received deductive instructions (i.e., to treat the premises as given and only draw necessary conclusions) and a second group of participants received probabilistic instructions (i.e., to reason as in an everyday situation; we called this "inductive instruction" in the manuscript). Problem type consisted of two different orthogonally crossed variables that were manipulated within-subjects, validity of the problem (formally valid or formally invalid) and plausibility of the problem (inferences which were consisted with the background knowledge versus problems that were inconsistent with the background knowledge). The critical comparison across the two conditions was among problems which were valid and implausible with problems that were invalid and plausible. For example, the next problem was invalid and plausible:

If a person is wet, then the person fell into a swimming pool.

A person fell into a swimming pool.

How valid is the conclusion/How likely is it that the person is wet?

For those problems we predicted that under deductive instructions responses should be lower (as the conclusion does not necessarily follow from the premises) as under probabilistic instructions. For the valid but implausible problem, an example is presented next, we predicted the opposite pattern:

If a person is wet, then the person fell into a swimming pool.

A person is wet.

How valid is the conclusion/How likely is it that the person fell into a swimming pool?

Our study also included valid and plausible and invalid and implausible problems.

In contrast to the analysis reported in the manuscript, we initially do not separate the analysis into affirmation and denial problems, but first report an analysis on the full set of inferences, MP, MT, AC, and DA, where MP and MT are valid and AC and DA invalid. We report a reanalysis of our Experiment 1 only. Note that the factor `plausibility`

is not present in the original manuscript, there it is a results of a combination of other factors.

We begin by loading the packages we will be using throughout.

library("afex") # needed for ANOVA functions. library("emmeans") # emmeans must now be loaded explicitly for follow-up tests. library("multcomp") # for advanced control for multiple testing/Type 1 errors. library("ggplot2") # for customizing plots.

data(sk2011.1) str(sk2011.1)

An important feature in the data is that each participant provided two responses for each cell of the design (the content is different for each of those, each participant saw all four contents). These two data points will be aggregated automatically by `afex`

.

with(sk2011.1, table(inference, id, plausibility))

To get the full ANOVA table for the model, we simply pass it to `aov_ez`

using the design as described above. We save the returned object for further analysis.

a1 <- aov_ez("id", "response", sk2011.1, between = "instruction", within = c("inference", "plausibility")) a1 # the default print method prints a data.frame produced by nice

The equivalent calls (i.e., producing exactly the same output) of the other two ANOVA functions `aov_car`

or `aov4`

is shown below.

aov_car(response ~ instruction + Error(id/inference*plausibility), sk2011.1) aov_4(response ~ instruction + (inference*plausibility|id), sk2011.1)

As mentioned before, the two responses per cell of the design and participants are aggregated for the analysis as indicated by the warning message. Furthermore, the degrees of freedom are Greenhouse-Geisser corrected per default for all effects involving `inference`

, as `inference`

is a within-subject factor with more than two levels (i.e., MP, MT, AC, & DA). In line with our expectations, the three-way interaction is significant.

The object printed per default for `afex_aov`

objects (produced by `nice`

) can also be printed nicely using `knitr`

:

knitr::kable(nice(a1))

Alternatively, the `anova`

method for `afex_aov`

objects returns a `data.frame`

of class `anova`

that can be passed to, for example, `xtable`

for nice formatting:

print(xtable::xtable(anova(a1), digits = c(rep(2, 5), 3, 4)), type = "html")

To further analyze the data we need to pass it to package `emmeans`

, a package that offers great functionality for both plotting and contrasts of all kind. A lot of information on `emmeans`

can be obtained in its vignettes and faq. `emmeans`

can work with `afex_aov`

objects directly as **afex** comes with the necessary methods for the generic functions defined in `emmeans`

. When using the default `multivariate`

option for follow-up tests, `emmeans`

uses the ANOVA model estimated via base R's `lm`

method (which in the case of a multivariate response is an object of class `c("mlm", "lm")`

). `afex`

also supports a univariate model (i.e., `emmeans_model = "univariate"`

, which requires that `include_aov = TRUE`

in the ANOVA call) in which case `emmeans`

uses the object created by base R's `aov`

function (this was the previous default but is not recommended as it does not handle unbalanced data well).

This object can now be passed to `emmeans`

, for example to obtain the marginal means of the four inferences:

m1 <- emmeans(a1, ~ inference) m1

This object can now also be used to compare whether or not there are differences between the levels of the factor:

```
pairs(m1)
```

To obtain more powerful p-value adjustments, we can furthermore pass it to `multcomp`

(Bretz, Hothorn, & Westfall, 2011):

summary(as.glht(pairs(m1)), test=adjusted("free"))

We could now also be interested in the marginal means of the inferences across the two instruction types. `emmeans`

offers two ways to do so. The first splits the contrasts across levels of the factor using the `by`

argument.

m2 <- emmeans(a1, "inference", by = "instruction") ## equal: emmeans(a1, ~ inference|instruction) m2

Consequently, tests are also only performed within each level of the `by`

factor:

```
pairs(m2)
```

The second version considers all factor levels together. Consequently, the number of pairwise comparisons is a lot larger:

m3 <- emmeans(a1, c("inference", "instruction")) ## equal: emmeans(a1, ~inference*instruction) m3 pairs(m3)

Objects returned from `emmeans`

can also be used to test specific contrasts. For this, we can simply create a list, where each element corresponds to one contrasts. A contrast is defined as a vector of constants on the reference grid (i.e., the object returned from `emmeans`

, here `m3`

). For example, we might be interested in whether there is a difference between the valid and invalid inferences in each of the two conditions.

c1 <- list( v_i.ded = c(0.5, 0.5, -0.5, -0.5, 0, 0, 0, 0), v_i.prob = c(0, 0, 0, 0, 0.5, 0.5, -0.5, -0.5) ) contrast(m3, c1, adjust = "holm") summary(as.glht(contrast(m3, c1)), test = adjusted("free"))

The results can be interpreted as in line with expectations. Responses are larger for valid than invalid problems in the deductive, but not the probabilistic condition.

Since version `0.22`

, `afex`

comes with its own plotting function based on `ggplot2`

, `afex_plot`

, which works directly with `afex_aov`

objects.

As said initially, we are interested in the three-way interaction of instruction with inference, plausibility, and instruction. As we saw above, this interaction was significant. Consequently, we are interested in plotting this interaction.

For `afex_plot`

, we need to specify the `x`

-factor(s), which determine which factor-levels or combinations of factor-levels are plotted on the x-axis. We can also define `trace`

factor(s), which determine which factor levels are connected by lines. Finally, we can also define `panel`

factor(s), which determine if the plot is split into subplots. `afex_plot`

then plots the estimated marginal means obtained from `emmeans`

, confidence intervals, and the raw data in the background. Note that the raw data in the background is per default drawn using an alpha blending of .5 (i.e., 50% semi-transparency). Thus, in case of several points lying directly on top of each other, this point appears noticeably darker.

afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility")

In the default settings, the error bars show 95%-confidence intervals based on the standard error of the underlying model (i.e., the `lm`

model in the present case). In the present case, in which each subplot (defined by `x`

- and `trace`

-factor) shows a combination of a within-subjects factor (i.e., `inference`

) and a between-subjects (i.e., `instruction`

) factor, this is not optimal. The error bars only allow to assess differences regarding the between-subjects factor (i.e., across the lines), but not inferences regarding the within-subjects factor (i.e., within one line). This is also indicated by a warning.

An alternative would be within-subject confidence intervals:

afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility", error = "within")

However, those only allow inferences regarding the within-subject factors and not regarding the between-subjecta factor. So the same warning is emitted again.

A further alternative is to suppress the error bars altogether. This is the approach used in our original paper and probably a good idea in general when figures show both between- and within-subjects factors within the same panel. The presence of the raw data in the background still provides a visual depiction of the variability of the data.

afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility", error = "none")

`afex_plot`

allows to customize the plot in a number of different ways. For example, we can easily change the aesthetic mapping associated with the `trace`

factor. So instead of using lineytpe and shape of the symbols, we can use color. Furthermore, we can change the graphical element used for plotting the data points in the background. For example, instead of plotting the raw data, we can replace this with a boxplot. Finally, we can also make both the points showing the means and the lines connecting the means larger.

p1 <- afex_plot(a1, x = "inference", trace = "instruction", panel = "plausibility", error = "none", mapping = c("color", "fill"), data_geom = geom_boxplot, data_arg = list(width = 0.4), point_arg = list(size = 1.5), line_arg = list(size = 1)) p1

Note that `afex_plot`

returns a `ggplot2`

plot object which can be used for further customization. For example, one can easily change the `theme`

to something that does not have a grey background:

p1 + theme_light()

We can also set the theme globally for the remainder of the `R`

session.

theme_set(theme_light())

The full set of customizations provided by `afex_plot`

is beyond the scope of this vignette. The examples on the help page at `?afex_plot`

provide a good overview.

However, the plots shown so far are not particularly helpful with respect to the research question. Next, we fit a new ANOVA model in which we separate the data in affirmation and denial inferences. This was also done in the original manuscript. We then lot the data a second time.

a2 <- aov_ez("id", "response", sk2011.1, between = "instruction", within = c("validity", "plausibility", "what")) a2

Then we plot the data from this ANOVA. Because each panel would again show a mixed-design, we suppress the error bars.

afex_plot(a2, x = c("plausibility", "validity"), trace = "instruction", panel = "what", error = "none")

We see the critical and predicted cross-over interaction in the left of those two graphs. For implausible but valid problems deductive responses are larger than probabilistic responses. The opposite is true for plausible but invalid problems. We now tests these differences at each of the four x-axis ticks in each plot using custom contrasts (`diff_1`

to `diff_4`

). Furthermore, we test for a validity effect and plausibility effect in both conditions.

(m4 <- emmeans(a2, ~instruction+plausibility+validity|what)) c2 <- list( diff_1 = c(1, -1, 0, 0, 0, 0, 0, 0), diff_2 = c(0, 0, 1, -1, 0, 0, 0, 0), diff_3 = c(0, 0, 0, 0, 1, -1, 0, 0), diff_4 = c(0, 0, 0, 0, 0, 0, 1, -1), val_ded = c(0.5, 0, 0.5, 0, -0.5, 0, -0.5, 0), val_prob = c(0, 0.5, 0, 0.5, 0, -0.5, 0, -0.5), plau_ded = c(0.5, 0, -0.5, 0, -0.5, 0, 0.5, 0), plau_prob = c(0, 0.5, 0, -0.5, 0, 0.5, 0, -0.5) ) contrast(m4, c2, adjust = "holm")

We can also pass these tests to `multcomp`

which gives us more powerful Type 1 error corrections.

summary(as.glht(contrast(m4, c2)), test = adjusted("free"))

Unfortunately, in the present case this function throws several warnings. Nevertheless, the p-values from both methods are very similar and agree on whether or not they are below or above .05. Because of the warnings it seems advisable to use the one provided by `emmeans`

directly and not use the ones from `multcomp`

.

The pattern for the affirmation problems is in line with the expectations: We find the predicted differences between the instruction types for valid and implausible (`diff_2`

) and invalid and plausible (`diff_3`

) and the predicted non-differences for the other two problems (`diff_1`

and `diff_4`

). Furthermore, we find a validity effect in the deductive but not in the probabilistic condition. Likewise, we find a plausibility effect in the probabilistic but not in the deductive condition.

Choosing the right correction for multiple testing can be difficult. In fact `multcomp`

comes with an accompanying book (Bretz et al., 2011). If the degrees-of-freedom of all contrasts are identical using `multcomp`

's method `free`

is more powerful than simply using the Bonferroni-Holm method. `free`

is a generalization of the Bonferroni-Holm method that takes the correlations among the model parameters into account and uniformly more powerful.

- Bretz, F., Hothorn, T., & Westfall, P. H. (2011).
*Multiple comparisons using R*. Boca Raton, FL: CRC Press. https://CRAN.R-project.org/package=multcomp - Singmann, H., & Klauer, K. C. (2011). Deductive and inductive conditional inferences: Two modes of reasoning.
*Thinking & Reasoning*, 17(3), 247-281. doi: 10.1080/13546783.2011.572718 - Lenth, R. (2017).
*emmeans: Estimated Marginal Means, aka Least-Squares Means*. R package version 0.9.1. https://CRAN.R-project.org/package=emmeans

```
options(op)
```

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.