Making Compact Letter Displays
In VizTest: Optimal Confidence Intervals for Visual Testing

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  message = FALSE, 
  warning = FALSE
)

Compact letter displays use letters to identify groups of observations that are not different from each other. This permits pairwise testing for any arbitrary pair of results, potentially adjusted for multiple testing. There are procedures in the \CRANpkg{multcomp} and \CRANpkg{emmeans} packages that produce compact letter displays, but their visual display is a bit limited. The psre package provides a function called letter_plot that will take a set of estimates and their associated compact letter matrix and produce a plot. This is an alternative display to the one produced by the viztest() function and its plot method, but it is an alternative that may be useful to some users. These are particularly useful when the number of estimates to plot is on the low side (< 10 or so) and the patterns of significance are not too complicated.

Chick Weights

We first demonstrate the procedure using the chickwts built-in dataset. The first step is to generate the estimates, we predict chicken weight (weight) with feed type (feed). To make the display easiest to read, we re-rorder the feed type factor by the average weight, which will make the intervals decreasing in their average.

data(chickwts)
chickwts$feed <- reorder(chickwts$feed, 
                         chickwts$weight, 
                         FUN=mean)
chick_mod <- lm(weight~ feed, data=chickwts)

Next, we use the glht() function from the \CRANpkg{multcomp} package to make the paired comparisons.

library(VizTest)
library(multcomp)
chick_gl <- glht(chick_mod, linfct=mcp(feed = "Tukey"))

The function get_letters() is in the \CRANpkg{VizTest} package and will generate the compact letter display matrix from the pairwise comparisons.

lmat_gl <- get_letters(chick_gl, test=adjusted(type="none"))
lmat_gl

We could do the same with the emmeans() function from the \CRANpkg{emmeans} package.

library(emmeans)
chick_em <- emmeans(chick_mod, "feed")
lmat_em <- get_letters(chick_em, adjust="none")
lmat_em

Finally, we could do the same by passing get_letters() a list with elements est (a vector of estimates) and var (a corresponding variance-covariance matrix).

library(marginaleffects)
chick_preds <- predictions(chick_mod, variables="feed", by="feed")
chick_b <- coef(chick_preds)
names(chick_b) <- chick_preds$feed
lmat_df <- get_letters(list(est=chick_b, var=vcov(chick_preds)), adjust="none")
lmat_df

Now that we have the letter matrices, we can make the plots. The letter_plot() function takes two arguments: a data frame with columns x (the grouping variable) and predicted (the predicted values), and the letter matrix. We first prepare the data frame of predictions. We need to rename the relevant variables first.

library(dplyr)
chick_preds_dat <- chick_preds %>% 
  as.data.frame() %>% 
  rename(x = feed, predicted=estimate)

Finally, we can make the figure.

library(psre)
library(ggplot2)
letter_plot(chick_preds_dat, lmat_gl)

The figure shows that horsebean is different from all other feed types because it is the only feed type in letter a and is not in any other letter group. The soybean and linseed estimates are not different from each other because they are both in group b. The soymean and meatmeal estimates are also not different from each other. However, the linseed and meatmeal estimates are different from each other because they are not both members in the same letter group. The estimates for sunflower and casein are not different from each other, but are significantly larger than all other types.

Ornstein Data

The Ornstein data in the \CRANpkg{carData} package contains measures of the assets of ten different sectors in four different nations along with the number of interlocking director and executive positions shared with other firms. We estimate a generalized linear model of interlocks as a function of assets, sector and nation. We then generate predictions for nation and make the letter display.

## Load Data
data(Ornstein, package="carData")

## Estimate Model
imod <- glm(interlocks ~ log2(assets) + sector + nation, data=Ornstein, 
    family=poisson)

## Generate Predictions
ipreds <- predictions(imod, variables = "sector", by = "sector")

The next steps are not strictly necessary - they will allow us to sort the estimates so they are ordered from smallest to largest. We do so by joining the predictions with the observed data. Then, order the sector factor by predicted value. Then, we update the model and generate the predictions again.

## merge predictions and data
Ornstein <- Ornstein %>% 
  left_join(ipreds %>% 
              as.data.frame() %>% 
              dplyr::select(sector, estimate)) %>% 
  ## re-order factor 
  mutate(sector = reorder(sector, estimate, mean)) %>% 
  dplyr::select(-estimate)

## update model
imod <- update(imod, data=Ornstein)

## make predictions again
ipreds <- predictions(imod, variables = "sector", by = "sector")

Next, we save the estimates and name them and use the get_letters() function on the result.

est <- ipreds$estimate
names(est) <- ipreds$sector
i_lets <- get_letters(list(est=est, var = vcov(ipreds)), adjust="none")

Finally, we rename the columns of the predictions so that sector is x and estimate is predicted as expected by the letter_plot() function. Then we make the plot.

ipreds <- ipreds %>% 
  as.data.frame() %>% 
  rename(x = sector, predicted=estimate)
letter_plot(ipreds, i_lets)

Multi-panel Display: UCBAdmissions

Using the UCBAdmissions data, we can make a multi-panel display, in this case two panels - one for men and one for women. To do this, we need to make the letters independently. We could do this from a single model or from multiple models. First, let's turn the dataset into something amenable for modeling and then run the model predicting admission rates by gender and department.

data(UCBAdmissions)

ucb_dat <- as.data.frame(UCBAdmissions) %>% 
  tidyr::pivot_wider(names_from = "Admit", values_from = "Freq")

ucb_mod <- glm(cbind(Admitted, Rejected) ~ Gender * Dept, 
               data=ucb_dat, family=binomial)

We can then make some predictions with the predictions() function from the \CRANPkg{marginaleffects} package. Below, we make one set of predictions and then subset them to make the plots. The benefit of doing it this way is that the letter groups are actually comparable across panels. That is to say, because Department F for both men and women are in letter group A, they are not significantly different from each other.

## make predictions for both variables
ucb_pred <- predictions(ucb_mod, variables=c("Gender", "Dept"), by=c("Gender", "Dept"))

## save coefficients and variance-covariance matrix
ucb_v <- vcov(ucb_pred)
ucb_b <- coef(ucb_pred)

## make joint-name of the group and y-axis variables (in this case, Gender and Dept)
## We also rename the identifying variable "x" and the estimate variable to "predicted" 
## as is expected by letter_plot().
ucb_pred <- ucb_pred %>% 
  as.data.frame() %>% 
  mutate(gd = sprintf("%s-%s", Gender, Dept)) %>% 
  rename(x=gd, predicted=estimate)

## set the names of the estimates as the "x" variable. 
names(ucb_b) <- ucb_pred$x

## Get the letters
ucb_lets <- get_letters(list(est=ucb_b, var=ucb_v), adjust="none")

## subset the letters by looking for "Male" in the rownames
m_lets <- ucb_lets[grepl("Male", rownames(ucb_lets)), ]

## subset the letters by looking for "Female" in the rownames
f_lets <- ucb_lets[grepl("Female", rownames(ucb_lets)), ]

## Subset the predictions by gender
m_preds <- ucb_pred %>% filter(Gender == "Male")
f_preds <- ucb_pred %>% filter(Gender == "Female")

## Make the letter plots - when done, remove "Male" and "Female" from the y-axis labels. 
lp_m <- letter_plot(m_preds, m_lets) +
  scale_y_discrete(labels = ~gsub("Male-", "", .x)) + 
  facet_wrap(~"Male")

lp_f <- letter_plot(f_preds, f_lets) +
  scale_y_discrete(labels = ~gsub("Female-", "", .x)) + 
  facet_wrap(~"Female")

## Put the plots together. 
library(patchwork)
(lp_m + lp_f) + plot_layout(ncol=2)

The alternative way would be to do each plot independently by estimating separate models:

## Estimate subset models. 
ucb_mod_m <- glm(cbind(Admitted, Rejected) ~ Dept, 
               data=subset(ucb_dat, Gender == "Male"), family=binomial)
ucb_mod_f <- glm(cbind(Admitted, Rejected) ~ Dept, 
               data=subset(ucb_dat, Gender == "Female"), family=binomial)

## generate subset model predictions
ucb_pred_m <- predictions(ucb_mod_m, variables="Dept", by="Dept")
ucb_pred_f <- predictions(ucb_mod_f, variables="Dept", by="Dept")

## save and name coefficients; save variance-covariance matrices
ucbb_m <- coef(ucb_pred_m)
ucbb_f <- coef(ucb_pred_f)
names(ucbb_m) <- names(ucbb_f) <- ucb_pred_m$Dept
ucbv_m <- vcov(ucb_pred_m)
ucbv_f <- vcov(ucb_pred_f)

## get letters for each set of predictions and variances
ucb_lets_m <- get_letters(list(est=ucbb_m, var=ucbv_m), adjust="none")
ucb_lets_f <- get_letters(list(est=ucbb_f, var=ucbv_f), adjust="none")

## rename relevant variabels. 
ucb_pred_m <- ucb_pred_m %>% 
  as.data.frame() %>% 
  rename("x" = "Dept", "predicted" = "estimate")

ucb_pred_f <- ucb_pred_f %>% 
  as.data.frame() %>% 
  rename("x" = "Dept", "predicted" = "estimate")

## make plots
lp_m <- letter_plot(ucb_pred_m, ucb_lets_m) + facet_wrap(~"Male")
lp_f <- letter_plot(ucb_pred_f, ucb_lets_f) + facet_wrap(~"Female")

## print plots together 
(lp_m + lp_f) + plot_layout(ncol=2)