In camille-s/camiller: Lots of convenience functions

knitr::opts_chunk$set(
    message = FALSE,
    warning = FALSE,
    collapse = TRUE,
    comment = "#>",
    fig.showtext = TRUE
)

This is a simple workflow that makes use of functions in camiller to analyze and visualize some Census American Community Survey data. The data was downloaded with tidycensus, cleaned up a little bit, and is all included in this package.

library(dplyr)
library(tidyr)
library(ggplot2)
library(forcats)
library(camiller)
library(showtext)

font_add_google("PT Sans", "ptsans")
showtext_auto()

The dataset pov_age contains estimates and margins of error (MOEs) of residents by age group in different ratio brackets compared to the federal poverty line by town in Greater New Haven.

head(pov_age)

The function add_grps calls the function make_grps, which makes it easy to aggregate sums by collapsing multiple subgroups into one larger group. These functions take a list of larger categories, with the indexes of different subgroups in that column. These are easy to figure out by calling unique on the column of interest.

unique(pov_age$ratio)

unique(pov_age$age)

ratio_grps <- list(
  determined = 1,
  poverty = 2:4,
  low_income = 2:9
)

age_grps <- list(
  young_children = 1,
  children = 1:3,
  seniors = 9:10
)

Alternatively, show_uniq will print out the unique values of a column with their indexes, and return the original data frame unchanged. This is convenient for finding positions without having to break a workflow.

pov_age %>%
  show_uniq(age) %>%
  group_by(name, ratio)

Using dplyr::group_by(name, ratio) and then add_grps gives aggregates of estimates and MOEs for ratio levels and towns, with the original age groups collapsed into the desired, larger age groups. MOE calculations are done using tidycensus. For example:

c("Under 6 years", "6 to 11 years", "12 to 17 years")

becomes children. The same is then done to collapse ratios. calc_shares then calculates shares of residents in each group over the denominator "determined", and, optionally, calculates MOEs for this proportion.

pov_rates <- pov_age %>%
  mutate(across(c(age, ratio), as_factor)) %>%
  group_by(name, ratio) %>%
  add_grps(age_grps, group = age, value = estimate, moe = moe) %>%
  group_by(name, age) %>%
  add_grps(ratio_grps, group = ratio, value = estimate, moe = moe) %>%
  calc_shares(group = ratio, denom = "determined", value = estimate, moe = moe)

head(pov_rates)

The function brk_labels allows for cleaning up break labels, such as those generated using cut. See ?brk_labels for formatting options. theme_din is a clean ggplot2 theme that works well for dot plots and bar charts.

This legend is unnecessary and not a good idea, just a place to display the output of brk_labels.

pal <- c("#FDCC8A", "#FC8D59", "#D7301F")
pov_rates %>%
  filter(ratio == "low_income", age == "children") %>%
  mutate(share = signif(share, digits = 2)) %>%
  ungroup() %>%
  mutate(name = as.factor(name) %>% fct_reorder(share, max)) %>%
  arrange(name) %>% 
  mutate(brk = cut(share, breaks = c(min(share), 0.14, 0.3, max(share)), include.lowest = T)) %>%
  ggplot(aes(x = name, y = share, color = brk)) +
    geom_point(size = 4) +
    coord_flip() +
    scale_color_manual(values = pal, 
                       labels = function(x) brk_labels(x, format = "percent", mult_by = 100)) +
    theme_din(base_family = "ptsans", base_size = 10, xgrid = T, ygrid = "dotted") +
    scale_y_continuous(labels = function(x) sprintf("%0g", x * 100)) +
    labs(x = NULL, y = NULL, color = "Rate", title = "Child low-income rate by town", subtitle = "Greater New Haven towns, 2016", caption = "Source: US Census Bureau 2016 ACS 5-year estimate")

The moe_test function applies t-tests for differences between two estimates, given their MOEs. This works well for comparing values in one year to those in another, or between related locations or groups. The grunt-work of these calculations is done with tidycensus functions. Included in the package is the pov_trend tibble for testing out significance testing on data in both 2010 and 2016.

pov_trend <- pov_age_10_16 %>%
  mutate(across(c(age, ratio), as_factor)) %>%
  group_by(name, year, ratio) %>%
  add_grps(age_grps, group = age, value = estimate, moe = moe) %>%
  group_by(name, year, age) %>%
  add_grps(ratio_grps, group = ratio, value = estimate, moe = moe) %>%
  calc_shares(group = ratio, denom = "determined", value = estimate, moe = moe)

The output of moe_test includes optional intermediary calculations, such as standard errors and Z-scores used for significance testing.

pov_sigs <- pov_trend %>%
  filter(ratio == "low_income") %>%
  select(-estimate, -moe, -ratio) %>%
  ungroup() %>%
  mutate(year = paste0("y", year)) %>%
  make_wide(share, sharemoe, group = year) %>%
  moe_test(est1 = y2010_share, moe1 = y2010_sharemoe, est2 = y2016_share, moe2 = y2016_sharemoe, alpha = 0.1)

pov_sigs %>%
  select(name, age, diff:isSig_90) %>%
  filter(name %in% c("New Haven", "Hamden"))