knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )

```
library(postHoc)
```

We illustrate here some very basic *post hoc* analysis techniques implemented in the package *postHoc*.

Consider the following data-frame supplied with *postHoc*.

```
str(DeIdentifiedExample)
```

Here, *Y* is a response variable and *Treatment* is a classification variable
with $7$ levels. There are $10$ replicates for each level of *Treatment*.

table(DeIdentifiedExample$Treatment)

We use a Gaussian one-way classification model (*i.e.*, an ANOVA model) for
describing these data and testing whether there is an effect of *Treatment*.
Formally, we define the model refered above in the following way.
Let $Y_{t,r}$ be a random variable representing the response of the
$r^\mbox{th}$ replicate ($r = 1, ..., 10$) of the experimental units
subjected to the $t^\mbox{th}$ treatment ($t = A, ..., G$). The model assumes that, $Y_{A, 1}, ... , Y_{G, 10}$ are independent and that
$$
Y_{t,r} \sim N(\mu_t, \sigma^2),
\mbox{ for }
t = A, ..., G \mbox{ and } r = 1, ..., 10 \, .
$$
This model, termed the *full model*, is adjusted in the following way.

FULLmodel <- lm(Y ~ Treatment + 0, data = DeIdentifiedExample) coef(FULLmodel)

Comparing the model above with a *null model* given by
$$
Y_{t,r} \sim N(\mu, \sigma^2),
\mbox{ for }
t = A, ..., G \mbox{ and } r = 1, ..., 10 \, ,
$$
with the full model allow us to test the null hypothesis
$$
H_0 : \mu_A = \mu_B = \mu_C = \mu_D = \mu_E = \mu_F = \mu_G
$$
See the execution below.

NULLmodel <- lm(Y ~ 1, data = DeIdentifiedExample) anova(NULLmodel, FULLmodel)

Since the p-value of the F-test below is very small we reject the null hypothesis $H_0$ and conclude that there is at least one pair $(i,j)$ (with $i,j= A, ..., G$) such that $\mu_i \ne \mu_j$.

Once established that the means of the responses for the different treatments are not all equal, we might want to identify all the pairs that are different using techniques of post-hoc analysis.

The funtion *posthoc* of the package *postHoc* calculates the p-values for the differences of each pair of levels of the factor *Treatment* and group the levels of *Treatment* in groups of significance (given a pre-fixed significance level, here taken as $0.05$). See the execution below.

TT <- posthoc(Model = FULLmodel, EffectLabels = LETTERS[1:7], digits = 1) summary(TT)

The function *posthoc* uses a model as argument (here the "FULLmodel") and
returns an object (here "TT") for which we can extract a range of information
suitable for post hoc analysis. Here is a suite of those methods.

```
print(TT)
```

barplot(TT, ylim = c(5,21)) abline(h = 5)

lines(TT, ylim = c(5,21))

The function *posthoc* performs all the tests between pairs of the parameters
of the models that are listed in the parameter *EffectIndices*. For example,
the following calculations perform the tests for the tretments
"B", "C","D", "E" and "G".

ZZ <- posthoc(Model = FULLmodel, EffectIndices = c(2,3,4,5,7), digits = 2)

The object *ZZ* stores, among other things, the p-values of pairwise comperisons
between those treatments. See below.

```
ZZ$PvaluesMatrix
```

Having computed all the p-values of all the possible pairs of the selected treatments, an undirected graph is constructed (here *graph* reffers to the mathematical structure componsed of vertices, or points, and edges, or lines connecting the points), using the following conventions. The vertices represent the fixed effects in the model (here the treatments), and two vertices are connected by an edge when the corresponding parameters in the model are **not** statistically significanltnly different at the pre-fixed signicance (here we use the default of $0.05$). See below.

set.seed(143) plot(ZZ)

The idea explored for classifying the treatments is to calculate all the maximal cliques in the representation graph. Here a clique is a set of vertices (treatments) for wich all their elements are connected by an edge. A clique is maximal when it is not a subset of a larger clique. Examining the graph despected above it is easy to see that ${ B, D, E }$ and ${C, D, G }$ are both maximal cliques in the graph above. These cliques are identified in the figure above by drawing regions with different colours. The maximal cliques form the groups of significantly different treatments. See below.

```
summary(ZZ)
```

Note that the number of pairs to be compared in a post-hoc analysis growths very quickly with the number of effects studied (there are $n(n-1)/2$) possible pairs of $n$ objects); therefore, very complex patters can appear in post-hoc analyses, which might be difficult to see by nacked eyes. Fortunately, there are effective algoritmes to identify maximal cliques. See the example below constructed with all the $7$ treatments (yielding $7 . 6/2 = 21$ pairwise comparisons).

set.seed(1243) plot(TT)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.