knitr::opts_chunk$set(echo = TRUE) require(emmeans) require(multcomp, quietly = TRUE)
Compact letter displays (CLDs) are a popular way to display multiple comparisons, especially when there are more than a few means to compare. They are problematic, however, because they are prone to misinterpretation (more details later). Here we present some background on CLDs, and show some adaptations and alternatives that may be less prone to misinterpretation.
CLDs generalize an "underlining" technique shown in some old experimental design and analysis textbooks, where results may be displayed something like this:
trt1 ctrl trt3 trt2 trt4 ---------- ------------------
The observed means are sorted in increasing order, so in this illustration,
trt1
has the lowest mean, ctrl
has the next lowest, and trt4
has the
highest. The underlines group the means such that the extremes of each group
are not significantly different according to a statistical test conducted at a
specified alpha level. So in this illustration, trt1
is significantly less
than trt3
, trt2
, and trt4
, but not ctrl
; and in fact trt4
is
significantly greater than all the others.
This grouping also illustrates the dangers created by careless interpretations.
Some observers of this chart might say that "trt1
and ctrl
are equal" and
that "ctrl
, trt3
, and trt2
are equal" -- when in fact we have merely
failed to show they are different. And further confusion results because
mathematical equality is transitive -- that is, these two statements of equality
would imply that trt
and trt2
must be equal, seemingly contradicting the
finding that they are significantly different. Statistical nonsignificance does
not have the transitivity property!
The underlining method becomes problematic in any case where the standard errors (SEs) of the comparisons are unequal -- for example if we have unequal sample sizes, or a model with non-homogeneous variances. When the SEs are unequal, it is possible, for example, for two adjacent means to be significantly different, while two more distant ones do not differ significantly. If that happens, we can't use underlines to group the means. The problem here is that lines are continuous, and that continuousness forces a continuum of groupings.
However, Piepho (2004) solved this problem by using symbols instead of lines, and creating a display where any two means associated with the same symbol are deemed to not be statistically different. Using symbols, it is possible to have non-contiguous groupings, e.g., it is possible for two means to share a symbol while an intervening one does not share the same symbol. Such a display is called a compact letter display. We do not absolutely require actual letters, just symbols that can be distinguished from one another. In the case where all the differences have equal SEs, the CLD will be the "same" as the result of grouping lines, in that each distinct symbol will span a contiguous range of means that can be interpreted as a grouping line.
The R package
multcompView (Graves et al., 2019) provides an implementation of the
Piepho algorithm. The multcomp package (Hothorn et al. 2008) provides a
generic cld()
function, and the emmeans package provides a
cld()
method for emmGrid
objects.
As a moving example, we simulate some data from an unbalanced design with 7 treatments labeled A, B, ..., G; and fit a model to those
set.seed(22.10) mu = c(16, 15, 19, 15, 15, 17, 16) # true means n = c(19, 15, 16, 18, 29, 2, 14) # sample sizes foo = data.frame(trt = factor(rep(LETTERS[1:7], n))) foo$y = rnorm(sum(n), mean = mu[as.numeric(foo$trt)], sd = 1.0) foo.lm = lm(y ~ trt, data = foo)
There are only four distinct true means underlying these seven treatments: Treatments B
, D
, and E
have mean 15, treatments A
and G
have mean 16, and treatments F
and C
are solo players with means 17 and 19 respectively.
Let's see a compact letter display for the marginal means. (Call this CLD #1)
foo.emm = emmeans(foo.lm, "trt") library(multcomp) cld(foo.emm)
The default "letters" for the emmeans implementation are actually numbers,
and we have three groupings indicated by the symbols 1
, 2
, and 3
. This
illustrates a case where grouping lines would not have worked, as we see in the
fact that group 1
is not contiguous. We have (among other results) that
treatment A
differs significantly from treatments B
, E
, D
, G
, and C
(at the default 0.05 significance level, with Tukey adjustment for multiple
testing). and that C
is significantly greater than all the other means since
it is the only mean in group 3
.
An annotation warns that two means in the same group are not necessarily
the same; yet CLDs present a strong visual message that they are. The careless reader who makes this mistake will have trouble with the gap in group 1
, asking how A
can differ
from G
and yet G
and F
, are "the same." The explanation is that the SE of
F
is huge, owing to its very small sample size, so it is hard for it to be
statistically different from other means. It is almost a gift to obtain a
non-contiguous grouping like this, as it forces the user to think more carefully
about what these grouping do and do not imply.
Given the discussion above, one might wonder if it is possible to construct a CLD in such a way that means sharing the same symbol are actually shown to be the same? The answer is yes (otherwise we wouldn't have asked the question!) -- and it is quite easy to do, thanks to two things:
TRUE
for
any pair that is statistically different (those means must receive different
grouping letters), and FALSE
otherwise; and the algorithm works for any
such Boolean matrixFALSE
is
they are shown to be equivalent and TRUE
if not shown to be equivalent.
For our example, suppose, based on subject-matter considerations, that two means
that differ by less than 1.0 can be considered equivalent. In the emmeans
setup, we specify that we want equivalence testing simply by providing this
nonzero threshold value as a delta
argument. In addition, we typically will
not make multiplicity adjustments to equivalence tests. Here is the result we
obtain (call this CLD #2)
cld(foo.emm, delta = 1, adjust = "none")
So we obtain five groupings -- but only two if we ignore those that apply to
only one mean. We have that treatments B
and E
can be considered equivalent,
and treatments E
, D
, and G
are considered equivalent. It is also important
to know that we cannot say that means in different groups are significantly
different.
Unlike CLD #1, we are showing only groupings of means that
we can show to be the same. The first four means, which were grouped together
earlier, are now assigned to two equivalence groupings. And treatment F
is not
grouped with any other mean -- which makes sense because we have so little
data on that treatment that we can hardly say anything.
Another variation is to simply reverse all the Boolean flags we used in constructing CLD #1. Then two means will receive the same letter only if they are significantly different. Thus, we really obtain ungrouping letters. We label these groupings "significance sets." The resulting display has a distinctively different appearance, because common symbols tend to be far apart rather than contiguous. (Call this CLD #3)
cld(foo.emm, signif = TRUE)
Here we have five significance sets. By comparing with CLD #1, you can confirm that each significant difference shown explicitly here corresponds to one shown implicitly (by not sharing a group) in CLD #1.
Compact letter displays show symbols based on statistical testing results. In such tests, we have strong conclusions or findings -- those that have small P values, and weak conclusions or non-findings -- those where the P value is not less than some $\alpha$. When we create visual flags such as grouping lines or symbols, those come across visually as findings, and the problem with standard CLDs is that those are the non-findings. We show two simple ways to use software that creates CLDs so that actual findings are flagged with symbols. It is hoped that people will find these modifications useful in visually displaying comparisons among means.
Graves, Spencer, Piepho Hans-Pieter, Selzer, Luciano, and Dorai-Raj, Sundar
(2019). multcompView: Visualizations of Paired Comparisons. R package
version 0.1-8, https://CRAN.R-project.org/package=multcompView
Hothorn, Torsten, Bretz, Frank, and Westfall, Peter (2008). Simultaneous Inference in General Parametric Models. Biometrical Journal 50(3), 346--363.
Piepho, Hans-Peter (2004). An algorithm for a letter-based representation of all pairwise comparisons, Journal of Computational and Graphical Statistics 13(2) 456--466.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.