options(width = 90)
This documents reanalysis a dataset from an Experiment performed by Singmann and Klauer (2011) using the ANOVA functionality of afex followed by post-hoc tests using package lsmeans (Lenth, 2015). After a brief description of the dataset and research question, the code and results are presented.
Singmann and Klauer (2011) were interested in whether or not conditional reasoning can be explained by a single process or whether multiple processes are necessary to explain it. To provide evidence for multiple processes we aimed to establish a double dissociation of two variables: instruction type and problem type. Instruction type was manipulated between-subjects, one group of participants received deductive instructions (i.e., to treat the premises as given and only draw necessary conclusions) and a second group of participants received probabilistic instructions (i.e., to reason as in an everyday situation; we called this "inductive instruction" in the manuscript). Problem type consisted of two different orthogonally crossed variables that were manipulated within-subjects, validity of the problem (formally valid or formally invalid) and plausibility of the problem (inferences which were consisted with the background knowledge versus problems that were inconsistent with the background knowledge). The critical comparison across the two conditions was among problems which were valid and implausible with problems that were invalid and plausible. For example, the next problem was invalid and plausible:
If a person is wet, then the person fell into a swimming pool.
A person fell into a swimming pool.
How valid is the conclusion/How likely is it that the person is wet?
For those problems we predicted that under deductive instructions responses should be lower (as the conclusion does not necessarily follow from the premises) as under probabilistic instructions. For the valid but implausible problem, an example is presented next, we predicted the opposite pattern:
If a person is wet, then the person fell into a swimming pool.
A person is wet.
How valid is the conclusion/How likely is it that the person fell into a swimming pool?
Our study also included valid and plausible and invalid and implausible problems.
In contrast to the analysis reported in the manuscript, we initially do not separate the analysis into affirmation and denial problems, but first report an analysis on the full set of inferences, MP, MT, AC, and DA, where MP and MT are valid and AC and DA invalid. We report a reanalysis of our Experiment 1 only. Note that the factor plausibility
is not present in the original manuscript, there it is a results of a combination of other factors.
require(afex) # needed for ANOVA, lsmeans is loaded automatically. require(multcomp) # for advanced control for multiple testing/Type 1 errors. require(lattice) # for plots
data(sk2011.1) str(sk2011.1)
An important feature in the data is that each participant provided two responses for each cell of the design (the content is different for each of those, each participant saw all four contents). These two data points will be aggregated automatically by afex
.
with(sk2011.1, table(inference, id, plausibility))
To get the full ANOVA table for the model, we simply pass it to aov_ez
(aov_car
or aov4
would be alternatives producing the same results) using the design as described above. We save the returned object for further analysis.
a1 <- aov_ez("id", "response", sk2011.1, between = "instruction", within = c("inference", "plausibility")) a1 # the default print method prints a data.frame produced by nice
As mentioned before, the two responses per cell of the design and participants are aggregated for the analysis as indicated by the warning message. Furthermore, the degrees of freedom are Greenhouse-Geisser corrected per default for all effects involving inference
, as inference
is a within-subject factor with more than two levels (i.e., MP, MT, AC, & DA). In line with our expectations, the three-way interaction is significant.
The object printed per default for afex_aov
objects (produced by nice
) can also be printed nicely using knitr
:
knitr::kable(nice(a1))
Alternatively, the anova
method for afex_aov
objects returns a data.frame
of class anova
that can be passed to, for example, xtable
for nice formatting:
print(xtable::xtable(a1$anova_table, digits = c(rep(2, 5), 3, 4)), type = "html")
To further analyze the data we need to pass it to package lsmeans
, a package that offers great functionality for both plotting and contrasts of all kind. A lot of information on lsmeans
can be obtained in its vignette. lsmeans
can work with afex_aov
objects directly as afex comes with the necessary methods for the generic functions defined in lsmeans
. lsmeans
uses the ANOVA model estimated via base R's aov
function that is part of an afex_aov
object.
This object can now be passed to lsmeans
, for example to obtain the marginal means of the four inferences:
m1 <- lsmeans(a1, ~ inference) m1
This object can now also be used to compare whether or not there are differences between the levels of the factor:
pairs(m1)
To obtain more powerful p-value adjustments, we can furthermore pass it to multcomp
(Bretz, Hothorn, & Westfall, 2011):
summary(as.glht(pairs(m1)), test=adjusted("free"))
We could now also be interested in the marginal means of the inferences across the two instruction types. lsmeans
offers two ways to do so. The first splits the contrasts across levels of the factor.
m2 <- lsmeans(a1, ~ inference|instruction) m2
Consequently test are also only performed within each level:
pairs(m2)
The second version treats all factor combinations together, producing a considerably larger number of pairwise comparisons:
m3 <- lsmeans(a1, ~ inference:instruction) m3 pairs(m3)
Objects returned from lsmeans
can also be used to test specific contrasts. For this, we can simply create a list, where each element corresponds to one contrasts. A contrast is defined as a vector of constants on the reference grid (i.e., the object returned from lsmeans
, here m3
). For example, we might be interested in whether there is a difference between the valid and invalid inferences in each of the two conditions.
c1 <- list( v_i.ded = c(0.5, 0.5, -0.5, -0.5, 0, 0, 0, 0), v_i.prob = c(0, 0, 0, 0, 0.5, 0.5, -0.5, -0.5) ) contrast(m3, c1, adjust = "holm") summary(as.glht(contrast(m3, c1)), test =adjusted("free"))
The results can be interpreted as in line with expectations. Responses are larger for valid than invalid problems in the deductive, but not the probabilistic condition.
Function lsmip
from package lsmeans
can be used for plotting the data directly from an afex_aov
object. As said initially, we are interested in the three-way interaction of instruction with inference, plausibility, and instruction. A plot of this interaction could be the following:
lsmip(a1, instruction ~ inference|plausibility)
As this plot is not very helpful, we now fit a new ANOVA model in which we separate the data in affirmation and denial inferences, as done in the original manuscript and plot the data then a second time.
a2 <- aov_ez("id", "response", sk2011.1, between = "instruction", within = c("validity", "plausibility", "what")) a2
Then we plot the data from this ANOVA.
lsmip(a2, ~instruction ~ plausibility+validity|what, scales = list(x=list( at = 1:4, labels = c("pl:v", "im:v", "pl:i", "im:i") )))
We see the critical predicted cross-over interaction in the left of those two graphs. For valid but implausible problems (im:v
) deductive responses are larger than probabilistic responses. The opposite is true for invalid but plausible problems (pl:i
). We now tests these differences at each of the four x-axis ticks in each plot using custom contrasts (diff_1
to diff_4
). Furthermore, we test for a validity effect and plausibility effect in both conditions.
(m4 <- lsmeans(a2, ~instruction+plausibility+validity|what)) c2 <- list( diff_1 = c(1, -1, 0, 0, 0, 0, 0, 0), diff_2 = c(0, 0, 1, -1, 0, 0, 0, 0), diff_3 = c(0, 0, 0, 0, 1, -1, 0, 0), diff_4 = c(0, 0, 0, 0, 0, 0, 1, -1), val_ded = c(0.5, 0, 0.5, 0, -0.5, 0, -0.5, 0), val_prob = c(0, 0.5, 0, 0.5, 0, -0.5, 0, -0.5), plau_ded = c(0.5, 0, -0.5, 0, -0.5, 0, 0.5, 0), plau_prob = c(0, 0.5, 0, -0.5, 0, 0.5, 0, -0.5) ) contrast(m4, c2, adjust = "holm")
As the resulting eight contrasts have different numbers of degrees-of-freedom, we can only pass them to multcomp
in small batches. This gives us more powerful Type 1 error corrections but overall a reduced correction as we now control for three families of tests (i.e., overall Type 1 error probability of .15).
summary(as.glht(contrast(m4, c2[1:4])), test =adjusted("free")) summary(as.glht(contrast(m4, c2[5:6])), test =adjusted("free")) summary(as.glht(contrast(m4, c2[7:8])), test =adjusted("free"))
The pattern for the affirmation problems is in line with the expectations: We find the predicted differences between the instruction types for valid and implausible (diff_2
) and invalid and plausible (diff_3
) and the predicted non-differences for the other two problems (diff_1
and diff_4
). Furthermore, we find a validity effect in the deductive but not in the probabilistic condition. Likewise, we find a plausibility effect in the probabilistic but not in the deductive condition.
lsmeans
. The contrasts use uncorrected degrees of freedom that are Satterthwaite approximated. This most likely produces anti-conservative tests if compound symmetry/sphericity is violated.aov
is usually not the correct choise. This is why the test of effects is based on car::Anova
. However, for lsmeans
we need to use aov
models. However, lsmeans
offers the option to weight the marginal means differently in case of different group sizes (i.e., unbalanced data). For example, it offers the option that each group is assumed to be of equal size (i.e., weights = "equal"
) or proportionally (i.e., weights = "proportional"
). See help of lsmeans
for more information.multcomp
comes with an accompanying book (Bretz et al., 2011). If the degrees-of-freedom of all contrasts are identical using multcomp
's method free
is more powerful than simply using the Bonferroni-Holm method. free
is a generalization of the Bonferroni-Holm method that takes the correlations among the model parameters into account and uniformly more powerful.aov
object can take some time (i.e., compared to producing the ANOVA table which is usually very fast). Those objects are also comparatively large, often multiple MB. If speed is important and one is not interested in employing lsmeans
one can set return = "nice"
in the call to the ANOVA function to only receive the ANOVA table (this can also be done globally: afex_options(return_aov = "nice")
).Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.