`stats::glm()`

assumes that a tabular data set with case weights corresponds
to "different observations have different dispersions" (see `?glm`

).

In some cases, the case weights reflect that the same covariate pattern was
observed multiple times (i.e., *frequency weights*). In this case,
`stats::glm()`

expects the data to be formatted as the number of events for
each factor level so that the outcome can be given to the formula as
`cbind(events_1, events_2)`

.

`glm_grouped()`

converts data with integer case weights to the expected
"number of events" format for binomial data.

```
glm_grouped(formula, data, weights, ...)
```

`formula` |
A formula object with one outcome that is a two-level factors. |

`data` |
A data frame with the outcomes and predictors (but not case weights). |

`weights` |
An integer vector of weights whose length is the same as the
number of rows in |

`...` |
Options to pass to |

A object produced by `stats::glm()`

.

```
#----------------------------------------------------------------------------
# The same data set formatted three ways
# First with basic case weights that, from ?glm, are used inappropriately.
ucb_weighted <- as.data.frame(UCBAdmissions)
ucb_weighted$Freq <- as.integer(ucb_weighted$Freq)
head(ucb_weighted)
nrow(ucb_weighted)
# Format when yes/no data are in individual rows (probably still inappropriate)
library(tidyr)
ucb_long <- uncount(ucb_weighted, Freq)
head(ucb_long)
nrow(ucb_long)
# Format where the outcome is formatted as number of events
ucb_events <-
ucb_weighted %>%
tidyr::pivot_wider(
id_cols = c(Gender, Dept),
names_from = Admit,
values_from = Freq,
values_fill = 0L
)
head(ucb_events)
nrow(ucb_events)
#----------------------------------------------------------------------------
# Different model fits
# Treat data as separate Bernoulli data:
glm(Admit ~ Gender + Dept, data = ucb_long, family = binomial)
# Weights produce the same statistics
glm(
Admit ~ Gender + Dept,
data = ucb_weighted,
family = binomial,
weights = ucb_weighted$Freq
)
# Data as binomial "x events out of n trials" format. Note that, to get the same
# coefficients, the order of the levels must be reversed.
glm(
cbind(Rejected, Admitted) ~ Gender + Dept,
data = ucb_events,
family = binomial
)
# The new function that starts with frequency weights and gets the correct place:
glm_grouped(Admit ~ Gender + Dept, data = ucb_weighted, weights = ucb_weighted$Freq)
```

