options(width = 100)

10.1. Binomial regression

The chimpanzees data:

library(rethinking)
data(chimpanzees)
d <- chimpanzees

Switching from rethinking to brms.

detach(package:rethinking)
library(brms)
rm(chimpanzees)

We start with the simple intercept-only logistic regression model.

b10.1 <-
  brm(data = d, family = binomial,
      pulled_left ~ 1,
      prior = c(set_prior("normal(0, 10)", class = "Intercept")))
library(tidyverse)

fixef(b10.1) %>%
  round(digits = 2)

On page 81, Gelman and Hill (2007) give the formula for invlogit(), which is our alternative to the logistic() function in rethinking.

invlogit <- function(x){1/(1 + exp(-x))}

It's easy to use.

c(.18, .46) %>%
  invlogit()

fixef(b10.1) %>%
  invlogit()

The next two chimp models add predictors.

b10.2 <-
  brm(data = d, family = binomial,
      pulled_left ~ 1 + prosoc_left,
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b")))

b10.3 <-
  brm(data = d, family = binomial,
      pulled_left ~ 1 + prosoc_left + condition:prosoc_left ,
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b")))

Before comparing our models, we'll first save their WAIC estimates as objects. These will come in handy in just a bit.

w_b10.1 <- waic(b10.1)
w_b10.2 <- waic(b10.2)
w_b10.3 <- waic(b10.3)

compare_ic(w_b10.1, w_b10.2, w_b10.3)

For this manuscript, we'll take our color scheme from the wesanderson package's Moonrise2 palette.

# install.packages("wesanderson", dependencies = T)
library(wesanderson)

wes_palette("Moonrise2")

wes_palette("Moonrise2")[1:4]

We'll also take a few formatting cues from Edward Tufte, curtesy of the ggthemes package. The theme_tufte() function will change the default font and remove some chart junk. The theme_set() function, below, will make these adjustments the default for all subsequent ggplot2 plots. To undo this, just code theme_set(theme_default()).

library(ggthemes)
library(bayesplot)

theme_set(theme_default() + 
            theme_tufte() +
            theme(plot.background = element_rect(fill = wes_palette("Moonrise2")[3],
                                                 color = wes_palette("Moonrise2")[3])))

Finally, here's our WAIC plot.

tibble(model = c("b10.1", "b10.2", "b10.3"),
       waic  = c(w_b10.1$estimates[3, 1], w_b10.2$estimates[3, 1], w_b10.3$estimates[3, 1]),
       se    = c(w_b10.1$estimates[3, 2], w_b10.2$estimates[3, 2], w_b10.3$estimates[3, 2])) %>%

  ggplot() +
  geom_pointrange(aes(x = reorder(model, -waic), y = waic,
                      ymin = waic - se,
                      ymax = waic + se,
                      color = model),
                  shape = 16) +
  scale_color_manual(values = wes_palette("Moonrise2")[c(1:2, 4)]) +
  coord_flip() +
  labs(x = NULL, y = NULL,
       title = "WAIC") +
  theme(axis.ticks.y = element_blank(),
        legend.position = "none")

The estimates from model b10.3, which might not have the lowest WAIC value, but is the one most clearly corresponding to the structure of the experiment:

print(b10.3)

Here's what the odds are multiplied by:

fixef(b10.3)[2] %>%
  exp()

Given an estimated value of 4, the probability of a pull, all else equal, would be:

invlogit(4)

Adding the coefficient, fixef(b10.3)[2], would yield:

(4 + fixef(b10.3)[2]) %>%
  invlogit()

For our variant of Figure 10.2., we use brms::pp_average() in place of rethinking::ensemble().

# The combined fitted() results of the three models weighted by their WAICs
pp_a <- 
  pp_average(b10.1, b10.2, b10.3,
             weights = "waic",
             method = "fitted") %>% 
  as_tibble() %>% 
  bind_cols(b10.3$data) %>% 
  distinct(Estimate, `2.5%ile`, `97.5%ile`, condition, prosoc_left) %>% 
  mutate(x_axis = str_c(prosoc_left, condition, sep = "/")) %>%
  mutate(x_axis = factor(x_axis, levels = c("0/0", "1/0", "0/1", "1/1"))) %>% 
  rename(pulled_left = Estimate)

# The empirically-based summaries
d_plot <-
  d %>%
  group_by(actor, condition, prosoc_left) %>%
  summarise(pulled_left = mean(pulled_left)) %>%
  mutate(x_axis = str_c(prosoc_left, condition, sep = "/")) %>%
  mutate(x_axis = factor(x_axis, levels = c("0/0", "1/0", "0/1", "1/1")))

# The plot
ggplot() +
  geom_ribbon(data = pp_a,
              aes(x = x_axis,
                  ymin = `2.5%ile`, 
                  ymax = `97.5%ile`,
                  group = 0),
              fill = wes_palette("Moonrise2")[2]) +
  geom_line(data = pp_a,
            aes(x = x_axis, 
                y = pulled_left,
                group = 0)) +
  geom_line(data = d_plot,
            aes(x = x_axis, y = pulled_left, group = actor),
            color = wes_palette("Moonrise2")[1], size = 1/3) +
  scale_x_discrete(expand = c(.03, .03)) +
  coord_cartesian(ylim = 0:1) +
  labs(x = "prosoc_left/condition",
       y = "proportion pulled left") +
  theme(axis.ticks.x = element_blank())

McElreath didn't show the actual pairs plot in the text. Here's ours using mcmc_pairs().

# this helps us set our custom color scheme
color_scheme_set(c(wes_palette("Moonrise2")[3], 
                   wes_palette("Moonrise2")[1], 
                   wes_palette("Moonrise2")[2], 
                   wes_palette("Moonrise2")[2], 
                   wes_palette("Moonrise2")[1], 
                   wes_palette("Moonrise2")[1]))

# the actual plot
mcmc_pairs(x = posterior_samples(b10.3),
           pars = c("b_Intercept", "b_prosoc_left", "b_prosoc_left:condition"),
           off_diag_args = list(size = 1/10, alpha = 1/6),
           diag_fun = "dens")

As McElreath asserted, the posterior looks pretty multivariate Gaussian.

Enclosing the actor variable within factor() will produce the indexing we need to get actor-specific intercepts.

b10.4 <-
  brm(data = d, family = binomial,
      pulled_left ~ 0 + factor(actor) + prosoc_left + condition:prosoc_left ,
      prior = c(set_prior("normal(0, 10)", class = "b")),
      chains = 2, iter = 2500, warmup = 500, cores = 2,
      control = list(adapt_delta = 0.9))

Within the tidyverse, distinct() yields the information you'd otherwise get from unique().

d %>%
  distinct(actor)

The posterior summary:

print(b10.4)

Here's what the posterior_samples for b10.4 looks like:

post <- posterior_samples(b10.4)

post %>%
  glimpse()

Our variant of Figure 10.3.

post %>%
  ggplot(aes(x = b_factoractor2)) +
  geom_density(color = "transparent",
               fill = wes_palette("Moonrise2")[1]) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(x = NULL,
       title = "Actor 2's large and uncertain intercept",
       subtitle = "Once your log-odds are above, like, 4, it's all\npretty much a probability of 1.")

Figure 10.4., the idiographic trajectories for four of our chimps.

d_plot_4 <-
  d_plot %>%
  filter(actor %in% c(3, 5:7)) %>%
  ungroup() %>% 
  mutate(actor = str_c("actor ", actor))

ftd <-
  fitted(b10.4) %>% 
  as_tibble() %>% 
  bind_cols(b10.4$data) %>% 
  filter(actor %in% c(3, 5:7)) %>% 
  distinct(Estimate, `2.5%ile`, `97.5%ile`, condition, prosoc_left, actor) %>% 
  select(actor, everything()) %>% 
  mutate(actor = str_c("actor ", actor)) %>% 
  mutate(x_axis = str_c(prosoc_left, condition, sep = "/")) %>%
  mutate(x_axis = factor(x_axis, levels = c("0/0", "1/0", "0/1", "1/1"))) %>% 
  rename(pulled_left = Estimate)

  ggplot(data = ftd,
         aes(x = x_axis, y = pulled_left, group = actor)) +
  geom_ribbon(aes(x = x_axis,
                  ymin = `2.5%ile`, 
                  ymax = `97.5%ile`),
              fill = wes_palette("Moonrise2")[2]) +
  geom_line(aes(x = x_axis, 
                y = pulled_left)) +
  geom_line(data = d_plot_4,
            color = wes_palette("Moonrise2")[1], size = 1.25) +
  scale_x_discrete(expand = c(.03, .03)) +
  coord_cartesian(ylim = 0:1) +
  labs(x = "prosoc_left/condition",
       y = "proportion pulled left") +
  theme(axis.ticks.x = element_blank(),
        # color came from: http://www.color-hex.com/color/ccc591
        panel.background = element_rect(fill = "#d1ca9c",
                                        color = "transparent")) +
  facet_wrap(~actor)

10.1.2. Aggregated binomial: Chimpanzees again, condensed.

With the tidyverse, we use group_by() and summarise() to achieve what McElreath did with aggregate().

d_aggregated <-
  d %>%
  select(-recipient, -block, -trial, -chose_prosoc) %>%
  group_by(actor, condition, prosoc_left) %>%
  summarise(x = sum(pulled_left))

d_aggregated %>%
  slice(1:8)

To fit an aggregated binomial model in brms, we use the [criterion] | trials() syntax where the value that goes in trials() is either a fixed number, as in this case, or an index variable.

b10.5 <-
  brm(data = d_aggregated, family = binomial,
      x | trials(18) ~ 1 + prosoc_left + condition:prosoc_left ,
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

We might compare b10.3 with b10.5 like this.

fixef(b10.3) %>% round(digits = 2)
fixef(b10.5) %>% round(digits = 2)

Close within rounding error.

10.1.3. Aggregated binomial: Graduate school admissions.

The infamous UCBadmit data:

# detach(package:brms)
library(rethinking)
data(UCBadmit)
d <- UCBadmit

Switching from rethinking to brms.

detach(package:rethinking)
library(brms)
rm(UCBadmit)

d

Here's our newly-constructed predictor, male, and the models that do/do not put it to work.

d <- 
  d %>%
  mutate(male = ifelse(applicant.gender == "male", 1, 0))

b10.6 <-
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 1 + male ,
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

b10.7 <-
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 1,
      prior = c(set_prior("normal(0, 10)", class = "Intercept")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

The WAIC comparison:

waic(b10.6, b10.7)
Information criteria digression.

Let's see what happens if we switch to the LOO.

l_b10.6 <- loo(b10.6)
l_b10.7 <- loo(b10.7)

If you just ape the text and use the WAIC, everything appears fine. But holy smokes look at those nasty warning messages from the loo()! One of the frightening but ultimately handy things about working with the PSIS-LOO is that it requires we estimate a Pareto $k$ parameter, which you can learn all about in the loo-package section of the loo reference manual. As it turns out, the Pareto $k$ parameter which can be used as a diagnostic tool. Each case in the data gets its own $k$ value and we like it when those $k$s are low. The makers of the loo package get worried when those $k$s exceed 0.7 and as a result, loo() spits out a warning message when they do.

First things first, if you explicitly open the loo package, you’ll have access to some handy diagnostic functions.

library(loo)

Using the loo-object for model b10.6, which we've named l_b10.6, let's take a look at the pareto_k_table() function.

pareto_k_table(l_b10.6) 

You may have noticed that this same table pops out when you just do something like loo(b10.6). Recall that this data set has 12 observations (i.e., count(d)). With pareto_k_table(), we see how the Pareto $k$ values have been categorized into bins ranging from "good" to "very bad". Clearly, we like nice and low $k$s. In this example, our observations are all over the place, with r pareto_k_ids(l_b10.6, threshold = 1) %>% length() in the "bad" $k$ range We can take a closer look like this:

plot(l_b10.6)

So when you plot() a loo object, you get a nice diagnostic plot for those $k$ value, ordered by observation number. Our plot indicates cases 1, 3, 11, and 12 had "bad" $k$ values for this model. If we wanted to further verify to ourselves which observations those were, we'd use the pareto_k_ids() function.

pareto_k_ids(l_b10.6, threshold = 1)

Note our use of the threshold argument. Play around with it to see how it works.

If you want an explicit look at those $k$ values, you do:

l_b10.6$diagnostics

The pareto_k values can be used to examine cases that are overly-influential on the model parameters, something like a Cook's $D_{i}$. See, for example this discussion on stackoverflow.com in which several members of the Stan team weighed in. The issue is also discussed in this paper and in this presentation by Aki Vehtari.

Anyway, the implication of all this is that these values suggest that model b10.6 isn't a great fit for these data.

Part of the warning message for model b10.6 read:

It is recommended to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 6 times to compute the ELPDs for the problematic observations directly.

Well, let's do that:

l_b10.6_reloo <- loo(b10.6, reloo = T)
l_b10.6_reloo

Now that looks better. We'll do the same thing for model b10.7.

l_b10.7_reloo <- loo(b10.7, reloo = T)

Okay, let's compare our PSIS-LOO values before and after adjusting loo() with reloo = T.

compare_ic(l_b10.6, l_b10.7)
compare_ic(l_b10.6_reloo, l_b10.7_reloo)

In this case, the results are kinda similar. But holy smokes, look at the size of those SEs! Anyway, watch out for this in your real-world data. You also might check out this vignette on how the loo package’s Pareto $k$ can help detect outliers.

But this has all been a tangent from the central thrust of this section. Let's get back on track. Here's a look at b10.6, the unavailable model:

print(b10.6)
fixef(b10.6)[2] %>%
  exp() %>%
  round(digits = 2)

The difference in admission probabilities.

post <- posterior_samples(b10.6)

post %>%
  mutate(p_admit_male   = invlogit(b_Intercept + b_male),
         p_admit_female = invlogit(b_Intercept),
         diff_admit     = p_admit_male - p_admit_female) %>%
  summarise(`2.5%`  = quantile(diff_admit, probs = .025),
            `50%`   = median(diff_admit),
            `97.5%` = quantile(diff_admit, probs = .975))

Here's our version of Figure 10.5.

d <-
  d %>%
  mutate(case = factor(1:12))

p_10.6 <- 
  predict(b10.6) %>% 
  as_tibble() %>% 
  bind_cols(d)

d_text <-
  d %>%
  group_by(dept) %>%
  summarise(case = mean(as.numeric(case)),
            admit = mean(admit/applications) + .05)

ggplot(data = d, aes(x = case, y = admit/applications)) +
  geom_pointrange(data = p_10.6, 
                  aes(y = Estimate/applications,
                      ymin = `2.5%ile`/applications ,
                      ymax = `97.5%ile`/applications),
                  color = wes_palette("Moonrise2")[1],
                  shape = 1, alpha = 1/3) +
  geom_point(color = wes_palette("Moonrise2")[2]) +
  geom_line(aes(group = dept),
            color = wes_palette("Moonrise2")[2]) +
  geom_text(data = d_text,
            aes(y = admit, label = dept),
            color = wes_palette("Moonrise2")[2],
            family = "serif") +
  coord_cartesian(ylim = 0:1) +
  labs(y = "Proportion admitted",
       title = "Posterior validation check") +
  theme(axis.ticks.x = element_blank())

As alluded to in all that LOO/pareto_k talk, above, this is not a great fit.

We don't need to coerce an index. But here are the models.

b10.8 <-
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 0 + dept,
      prior = c(set_prior("normal(0, 10)", class = "b")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

b10.9 <-
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 0 + dept + male ,
      prior = c(set_prior("normal(0, 10)", class = "b")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

Here we compare all four models by the LOO.

loos <- loo(b10.6, b10.7, b10.8, b10.9, 
            reloo = T,
            cores = 2)
loos

The parameters summaries for our multivariable model, b10.9, look like this:

fixef(b10.9) %>% round(digits = 2)

Since we've been using brms, there's no need to fit our version of McElreath's m10.9stan. We already have that in our b10.9. But just for kicks and giggles, here's another way to get the model summary.

b10.9$fit

Here's our version of Figure 10.6, the posterior validation check.

predict(b10.9) %>% 
  as_tibble() %>% 
  bind_cols(d) %>% 

ggplot(aes(x = case, y = admit/applications)) +
  geom_pointrange(aes(y = Estimate/applications,
                      ymin = `2.5%ile`/applications ,
                      ymax = `97.5%ile`/applications),
                  color = wes_palette("Moonrise2")[1],
                  shape = 1, alpha = 1/3) +
  geom_point(color = wes_palette("Moonrise2")[2]) +
  geom_line(aes(group = dept),
            color = wes_palette("Moonrise2")[2]) +
  geom_text(data = d_text,
            aes(y = admit, label = dept),
            color = wes_palette("Moonrise2")[2],
            family = "serif") +
  coord_cartesian(ylim = 0:1) +
  labs(y = "Proportion admitted",
       title = "Posterior validation check") +
  theme(axis.ticks.x = element_blank())

Imperfect, but way more valid than before.

10.1.4. Fitting binomial regressions with glm().

We're not here to learn frequentist code, so we're going to skip most of this section. But model b.good is worth fitting. Here's the data.

# outcome and predictor almost perfectly associated
y <- c(rep(0, 10), rep(1, 10))

x <- c(rep(-1, 9), rep(1, 11))

The b.good model:

b.good <-
  brm(data = list(y = y, x = x), family = binomial,
      y ~ 1 + x,
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b")))

Our model summary will differ a bit from the one in the text. It seems this is because of the MAP/HMC contrast.

print(b.good)

Here's the pairs() plot McElreath excluded from the text:

pairs(b.good,
      off_diag_args = list(size = 1/10, alpha = 1/6))

This plot deserves and extensive quote from McElreath. "Inspecting the pairs plot ~~(not shown)~~ demonstrates just how subtle even simple models can be, once we start working with GLMs. I don't say this to scare the reader. But it's true that even simple models can behave in complicated ways. How you fit the model is part of the model, and in principle no GLM is safe for MAP estimation."

10.2. Poisson regression

We'll simulate our sweet count data.

set.seed(9968400) # making the results reproducible

y <- rbinom(1e5, 1000, 1/1000)

y %>%
  mean()

y %>%
  var()
library(rethinking)
data(Kline)
d <- Kline

Switching from rethinking to brms.

detach(package:rethinking)
library(brms)
rm(Kline)

d

Here are our new columns.

d <-
  d %>%
  mutate(log_pop = log(population),
         contact_high = ifelse(contact == "high", 1, 0))

Our first Poisson model!

b10.10 <-
  brm(data = d, family = poisson,
      total_tools ~ 1 + log_pop + contact_high + contact_high:log_pop,
      prior = c(set_prior("normal(0, 100)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b")),
      iter = 3000, warmup = 1000, chains = 4, cores = 4)
print(b10.10)

Here's the simple matrix of correlation point estimates for the parameters.

post <-
  posterior_samples(b10.10)

post %>%
  select(-lp__) %>% 
  rename(b_interaction = `b_log_pop:contact_high`) %>%
  cor() %>% 
  round(digits = 2)

And here's the coefficient plot:

# We'll set a renewed color theme
color_scheme_set(c(wes_palette("Moonrise2")[2],
                   wes_palette("Moonrise2")[1], 
                   wes_palette("Moonrise2")[4], 
                   wes_palette("Moonrise2")[2], 
                   wes_palette("Moonrise2")[1], 
                   wes_palette("Moonrise2")[1]))

post %>%
  select(-lp__) %>% 
  rename(b_interaction = `b_log_pop:contact_high`) %>%

  mcmc_intervals(prob = .5, prob_outer = .95) +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_text(hjust = 0))

How plausible is it a high-contact island will have more tools than a low-contact island?

post <-
  post %>%
  mutate(lambda_high = exp(b_Intercept + b_contact_high + (b_log_pop + `b_log_pop:contact_high`)*8),
         lambda_low = exp(b_Intercept + b_log_pop*8),
         diff = lambda_high - lambda_low) 

post %>%
  summarise(sum = sum(diff > 0)/length(diff))

Quite.

Here we are, Figure 10.8.a.

post %>%
  ggplot(aes(x = diff)) +
  geom_density(color = "transparent",
               fill = wes_palette("Moonrise2")[1]) +
  geom_vline(xintercept = 0, linetype = 2,
             color = wes_palette("Moonrise2")[2]) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(x = "lambda_high - lambda_low")

I’m not happy with how clunky this solution is, but one way to get those marginal dot and line plots for the axes is to make intermediary tibbles. Anyway, here’s a version of Figure 10.8.b.

# Intermediary tibbles for our the dot and line portoin of the plot
point_tibble <-
  tibble(x = c(median(post$b_contact_high), min(post$b_contact_high)),

         y = c(min(post$`b_log_pop:contact_high`), median(post$`b_log_pop:contact_high`)))

line_tibble <-
  tibble(parameter = rep(c("b_contact_high", "b_log_pop:contact_high"), each = 2),

         x = c(quantile(post$b_contact_high, probs = c(.025, .975)),
               rep(min(post$b_contact_high), times = 2)),

         y = c(rep(min(post$`b_log_pop:contact_high`), times = 2),
               quantile(post$`b_log_pop:contact_high`, probs = c(.025, .975))))

# the plot
post %>% 
  ggplot(aes(x = b_contact_high, y = `b_log_pop:contact_high`)) +
  geom_point(color = wes_palette("Moonrise2")[1],
             size = 1/10, alpha = 1/10) +
  geom_point(data = point_tibble,
             aes(x = x, y = y)) +
  geom_line(data = line_tibble,
            aes(x = x, y = y, group = parameter))

Here we deconstruct model b10.10, bit by bit. While we're at it, we'll practice our brms::update() skills.

# no interaction
b10.11 <- 
  update(b10.10, formula = total_tools ~ 1 + log_pop + contact_high,
         iter = 3000, warmup = 1000, chains = 4, cores = 4)

# no contact rate
b10.12 <-
  update(b10.10, formula = total_tools ~ 1 + log_pop,
         iter = 3000, warmup = 1000, chains = 4, cores = 4)

# no log-population
b10.13 <-
  update(b10.10, formula = total_tools ~ 1 + contact_high,
         iter = 3000, warmup = 1000, chains = 4, cores = 4)

# intercept only
b10.14 <-
  update(b10.10, formula = total_tools ~ 1,
         iter = 3000, warmup = 1000, chains = 4, cores = 4)
# no interaction
b10.11 <- 
  brm(data = d, family = poisson,
      total_tools ~ 1 + log_pop + contact_high,
      prior = c(set_prior("normal(0, 100)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b")),
      iter = 3000, warmup = 1000, chains = 4, cores = 4)

# no contact rate
b10.12 <-
  brm(data = d, family = poisson,
      total_tools ~ 1 + log_pop,
      prior = c(set_prior("normal(0, 100)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b")),
      iter = 3000, warmup = 1000, chains = 4, cores = 4)

# no log-population
b10.13 <-
  brm(data = d, family = poisson,
      total_tools ~ 1 + contact_high,
      prior = c(set_prior("normal(0, 100)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b")),
      iter = 3000, warmup = 1000, chains = 4, cores = 4)

# intercept only
b10.14 <-
  brm(data = d, family = poisson,
      total_tools ~ 1,
      prior = c(set_prior("normal(0, 100)", class = "Intercept")),
      iter = 3000, warmup = 1000, chains = 4, cores = 4)

I know we got all excited with the LOO, above. Let's just be lazy and go WAIC, here. [Though beware, the LOO opens up a similar can of worms, here, to what we dealt with above.]

w_b10.10 <- waic(b10.10)
w_b10.11 <- waic(b10.11)
w_b10.12 <- waic(b10.12)
w_b10.13 <- waic(b10.13)
w_b10.14 <- waic(b10.14)

compare_ic(w_b10.10, w_b10.11, w_b10.12, w_b10.13, w_b10.14)
tibble(model = c("b10.10", "b10.11", "b10.12", "b10.13", "b10.14"),
       waic  = c(w_b10.10$estimates[3, 1], w_b10.11$estimates[3, 1], w_b10.12$estimates[3, 1], w_b10.13$estimates[3, 1], w_b10.14$estimates[3, 1]),
       se    = c(w_b10.10$estimates[3, 2], w_b10.11$estimates[3, 2], w_b10.12$estimates[3, 2], w_b10.13$estimates[3, 2], w_b10.14$estimates[3, 2])) %>%

  ggplot() +
  geom_pointrange(aes(x = reorder(model, -waic), y = waic,
                      ymin = waic - se,
                      ymax = waic + se,
                      color = model),
                  shape = 16) +
  scale_color_manual(values = wes_palette("Moonrise2")[c(1, 2, 1, 1, 1)]) +
  coord_flip() +
  labs(x = NULL, y = NULL,
       title = "WAIC") +
  theme(axis.ticks.y = element_blank(),
        legend.position = "none")

Here's our version of Figure 10.9. Recall, to do an "ensemble" posterior prediction in brms, one uses the pp_average() function. I know we were just lazy and focused on the WAIC. But let's play around, a bit. Here we'll weight the models based on the LOO by adding weights = "loo" to the pp_average() function. If you check the corresponding section of the brms reference manual, you'll find several weighting schemes.

nd <-
  tibble(log_pop = rep(seq(from = 6.5, 
                           to = 13, 
                           length.out = 50),
                       times = 2),
         contact_high = rep(0:1, each = 50))

ppa_10.9 <- 
  pp_average(b10.10, b10.11, b10.12,
             weights = "loo",
             method = "fitted",
             newdata = nd) %>%
  as_tibble() %>%
  bind_cols(nd)

ppa_10.9 %>%
  ggplot(aes(x = log_pop,
             group = contact_high)) +
  geom_ribbon(aes(ymin = `2.5%ile`,
                  ymax = `97.5%ile`,
                  fill = contact_high),
              alpha = 1/4) +
  geom_line(aes(y = Estimate, color = contact_high)) +
  geom_text(data = d, 
             aes(y = total_tools,
                 label = total_tools,
                 color = contact_high),
             size = 3.5) +
  coord_cartesian(xlim = c(7.1, 12.4),
                  ylim = c(12, 70)) +
    labs(x = "log population",
         y = "total tools",
         subtitle = "Blue is the high contact rate and black is the low.") +
  theme(legend.position = "none",
        panel.border = element_blank())

In case you were curious, here are those LOO weights:

model_weights(b10.10, b10.11, b10.12, 
              weights = "loo")

10.2.2. MCMC islands.

We fit our analogue to m10.10stan, b10.10, some time ago.

print(b10.10)

Here's the x-centered version.

d <-
  d %>%
  mutate(log_pop_c = log_pop - mean(log_pop))

b10.10.c <-
  brm(data = d, family = poisson,
      total_tools ~ 1 + log_pop_c + contact_high + contact_high:log_pop_c,
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b")),
      iter = 3000, warmup = 1000, chains = 4, cores = 4)

The results of our centering:

print(b10.10.c)

Figure 10.10.a.

# this helps us set our custom color scheme
color_scheme_set(c(wes_palette("Moonrise2")[3], 
                   wes_palette("Moonrise2")[1], 
                   wes_palette("Moonrise2")[2], 
                   wes_palette("Moonrise2")[2], 
                   wes_palette("Moonrise2")[1], 
                   wes_palette("Moonrise2")[1]))

# the actual plot
mcmc_pairs(x = posterior_samples(b10.10),
           pars = c("b_Intercept", "b_log_pop", "b_contact_high", "b_log_pop:contact_high"),
           off_diag_args = list(size = 1/10, alpha = 1/10),
           diag_fun = "dens")

Figure 10.10.b.

mcmc_pairs(x = posterior_samples(b10.10.c),
           pars = c("b_Intercept", "b_log_pop_c", "b_contact_high", "b_log_pop_c:contact_high"),
           off_diag_args = list(size = 1/10, alpha = 1/10),
           diag_fun = "dens")

If you really want the correlation point estimates, lowerCor() from the psych package gives a nice way to get the lower triangle of the matrix.

library(psych)

lowerCor(posterior_samples(b10.10)[, 1:4])
lowerCor(posterior_samples(b10.10.c)[, 1:4])

10.2.3. Example: Exposure and the offset.

Here we simulate our data.

set.seed(3838) # making it reproducible 

num_days <- 30
y <- rpois(num_days, 1.5)
set.seed(3838) # making it reproducible 

num_weeks <- 4
y_new <- rpois(num_weeks, 0.5*7)

Let's make them tidy.

d <- 
  tibble(y = c(y, y_new), 
         days = c(rep(1, 30), rep(7, 4)),
         monastery = c(rep(0, 30), rep(1, 4)))

d

Here we compute the offset and fit the model. With the brms package, you use the offset() syntax, in which you put a pre-processed variable like log_days or the log of a variable, such as log(days).

d <-
  d %>%
  mutate(log_days = log(days))

b10.15 <-
  brm(data = d, family = poisson,
      y ~ 1 + offset(log_days) + monastery,
      prior = c(set_prior("normal(0, 100)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

The model summary:

print(b10.15)
posterior_samples(b10.15) %>%
  mutate(lambda_old = exp(b_Intercept),
         lambda_new  = exp(b_Intercept + b_monastery)) %>%
  gather(key, value, -(b_Intercept:lp__)) %>%
  mutate(key = factor(key, levels = c("lambda_old", "lambda_new"))) %>%
  group_by(key) %>%
  summarise(Mean = mean(value) %>% round(digits = 2),
            StdDev = sd(value) %>% round(digits = 2),
            LL = quantile(value, probs = .025) %>% round(digits = 2),
            UL = quantile(value, probs = .975) %>% round(digits = 2)) 

10.3. Other count regressions

10.3.1. Multinomial.

More simulation.

detach(package:brms)
library(rethinking)

# simulate career choices among 500 individuals
N <- 500             # number of individuals
income <- 1:3        # expected income of each career
score <- 0.5*income  # scores for each career, based on income

# next line converts scores to probabilities
p <- softmax(score[1], score[2], score[3])

# now simulate choice
# outcome career holds event type values, not counts
career <- rep(NA, N)  # empty vector of choices for each individual

set.seed(2078)
# sample chosen career for each individual
for(i in 1:N) career[i] <- sample(1:3, size = 1, prob = p)

Here's what the data look like.

career %>%
  as_tibble() %>%
  ggplot(aes(x = value %>% as.factor())) +
  geom_bar(size = 0, fill = wes_palette("Moonrise2")[2])

Here's my naive attempt to fit the model in brms.

detach(package:rethinking)
library(brms)

b10.16 <-
  brm(data = list(career = career), 
      family = categorical(link = "logit"),
      career ~ 1,
      prior = c(set_prior("normal(0, 5)", class = "Intercept")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

This differs from McElreath's m10.16. Most obviously, this has two parameters. McElreath's m10.16 only has one. If you have experience with these models and know how to reproduce McElreath's results in brms, hit me up.

print(b10.16)
detach(package:brms)
library(rethinking)

N <- 100

set.seed(2078)
# simulate family incomes for each individual
family_income <- runif(N)

# assign a unique coefficient for each type of event
b <- (1:-1)
career <- rep(NA, N)  # empty vector of choices for each individual

for (i in 1:N) {
    score <- 0.5*(1:3) + b*family_income[i]
    p <- softmax(score[1], score[2], score[3])
    career[i] <- sample(1:3, size = 1, prob = p)
}

Here's the brms version of McElreath's m10.17.

detach(package:rethinking)
library(brms)

b10.17 <-
  brm(data = list(career = career,
                family_income = family_income), 
      family = categorical(link = "logit"),
      career ~ 1 + family_income,
       prior = c(set_prior("normal(0, 5)", class = "Intercept"),
                 set_prior("normal(0, 5)", class = "b")),
      iter = 2500, warmup = 500, cores = 2, chains = 2)

Happily, these results cohere with the rethinking model.

print(b10.17)

McElreath described the parameters as "on a scale that is very hard to interpret (p. 325)." Indeed.

10.3.1.2.

Back to Berkeley

library(rethinking)

data(UCBadmit)
d <- UCBadmit
rm(UCBadmit)

detach(package:rethinking)
library(brms)
# binomial model of overall admission probability
b_binom <-
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 1,
      prior = c(set_prior("normal(0, 100)", class = "Intercept")),
      iter = 2000, warmup = 1000, cores = 3, chains = 3)

# Poisson model of overall admission rate and rejection rate
d <-
  d %>%
  mutate(rej = reject) # 'reject' is a reserved word

b_pois <-
  brm(data = d, family = poisson,
      cbind(admit, rej) ~ 1,
      prior = c(set_prior("normal(0, 100)", class = "Intercept")),
      iter = 2000, warmup = 1000, cores = 3, chains = 3)

Note, the cbind() syntax made b_pois a multivariate Poisson model. Starting with version 2.0.0., brms supports a variety of multivariate models. Anyway, here are the implications of b_pois.

post <- posterior_samples(b_pois)

post %>%
  ggplot(aes(x = exp(b_admit_Intercept))) +
  geom_density(fill = wes_palette("Moonrise2")[2], size = 0) +
  geom_vline(xintercept = mean(d$admit), color = wes_palette("Moonrise2")[1]) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(x = "# applications",
       title = "Mean acceptance # across departments:",
       subtitle = "The density is the posterior distribution. The line is the\nvalue in the data.")

post %>%
  ggplot(aes(x = exp(b_rej_Intercept))) +
  geom_density(fill = wes_palette("Moonrise2")[1], size = 0) +
  geom_vline(xintercept = mean(d$rej), color = wes_palette("Moonrise2")[2]) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(x = "# applications",
       title = "Mean rejection # across departments:",
       subtitle = "The density is the posterior distribution. The line is the\nvalue in the data.")

The model summaries:

print(b_binom)
print(b_pois)

Here's the posterior mean for the probability of admission, based on b_binom.

fixef(b_binom) %>%
  invlogit()

Happily, we get the same value within simulation error from model b_pois.

k <- 
  fixef(b_pois) %>%
  as.numeric()

exp(k[1])/(exp(k[1]) + exp(k[2]))

10.3.2. Geometric.

# simulate
N <- 100
set.seed(1028)
x <- runif(N)

set.seed(1028)
y <- rgeom(N, prob = invlogit(-1 + 2*x))

In case you're curious, here are the data.

list(y = y, x = x) %>%
  as_tibble() %>%
  ggplot(aes(x = x, y = y)) +
  geom_point(size = 3/5, alpha = 2/3)

Our geometric model:

b10.18 <-
  brm(data = list(y = y, x = x), 
      family = geometric(link = "log"),
      y ~ 0 + intercept + x,
      prior = c(set_prior("normal(0, 10)", class = "b", coef = "intercept"),
                set_prior("normal(0, 1)", class = "b")),
      chains = 2, iter = 2500, warmup = 500, cores = 2)

The results:

print(b10.18, digits = 2)

It appears brms uses a different parameterization for the exponential distribution than rethinking does. Even though the parameters brms yielded look different from those in the text, their predictions describe the data well. Here's the marginal_effects() plot:

plot(marginal_effects(b10.18),
     points = T,
     point_args = c(size = 3/5, alpha = 2/3),
     line_args = c(color = wes_palette("Moonrise2")[1],
                   fill = wes_palette("Moonrise2")[1]))
rm(d, b10.1, invlogit, b10.2, b10.3, w_b10.1, w_b10.2, w_b10.3, post, d_plot, b10.4, d_plot_4, d_aggregated, b10.5, b10.6, b10.7, l_b10.6, l_b10.7, l_b10.6_reloo, l_b10.7_reloo, p_10.6, d_text, b10.8, b10.9, loos, y, x, b.good, b10.10, point_tibble, line_tibble, b10.11, b10.12, b10.13, b10.14, w_b10.10, w_b10.11, w_b10.12, w_b10.13, w_b10.14, nd, ppa_10.9, b10.10.c, num_days, num_weeks, y_new, b10.15, N, income, score, p, career, i, b10.16, family_income, b, b10.17, k, b_binom, b_pois, b10.18)

Note. The analyses in this document were done with:

Reference

McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman & Hall/CRC Press.



joepowers16/rethinking documentation built on June 2, 2019, 6:52 p.m.