knitr::opts_chunk$set( message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>", fig.showtext = TRUE )
This is a simple workflow that makes use of functions in camiller
to analyze and visualize some Census American Community Survey data. The data was downloaded with tidycensus
, cleaned up a little bit, and is all included in this package.
library(dplyr) library(tidyr) library(ggplot2) library(forcats) library(camiller) library(showtext)
font_add_google("PT Sans", "ptsans") showtext_auto()
The dataset pov_age
contains estimates and margins of error (MOEs) of residents by age group in different ratio brackets compared to the federal poverty line by town in Greater New Haven.
head(pov_age)
The function add_grps
calls the function make_grps
, which makes it easy to aggregate sums by collapsing multiple subgroups into one larger group. These functions take a list of larger categories, with the indexes of different subgroups in that column. These are easy to figure out by calling unique
on the column of interest.
unique(pov_age$ratio) unique(pov_age$age) ratio_grps <- list( determined = 1, poverty = 2:4, low_income = 2:9 ) age_grps <- list( young_children = 1, children = 1:3, seniors = 9:10 )
Alternatively, show_uniq
will print out the unique values of a column with their indexes, and return the original data frame unchanged. This is convenient for finding positions without having to break a workflow.
pov_age %>% show_uniq(age) %>% group_by(name, ratio)
Using dplyr::group_by(name, ratio)
and then add_grps
gives aggregates of estimates and MOEs for ratio levels and towns, with the original age groups collapsed into the desired, larger age groups. MOE calculations are done using tidycensus
. For example:
c("Under 6 years", "6 to 11 years", "12 to 17 years")
becomes children
. The same is then done to collapse ratios. calc_shares
then calculates shares of residents in each group over the denominator "determined"
, and, optionally, calculates MOEs for this proportion.
pov_rates <- pov_age %>% mutate(across(c(age, ratio), as_factor)) %>% group_by(name, ratio) %>% add_grps(age_grps, group = age, value = estimate, moe = moe) %>% group_by(name, age) %>% add_grps(ratio_grps, group = ratio, value = estimate, moe = moe) %>% calc_shares(group = ratio, denom = "determined", value = estimate, moe = moe)
head(pov_rates)
The function brk_labels
allows for cleaning up break labels, such as those generated using cut
. See ?brk_labels
for formatting options. theme_din
is a clean ggplot2
theme that works well for dot plots and bar charts.
This legend is unnecessary and not a good idea, just a place to display the output of brk_labels
.
pal <- c("#FDCC8A", "#FC8D59", "#D7301F") pov_rates %>% filter(ratio == "low_income", age == "children") %>% mutate(share = signif(share, digits = 2)) %>% ungroup() %>% mutate(name = as.factor(name) %>% fct_reorder(share, max)) %>% arrange(name) %>% mutate(brk = cut(share, breaks = c(min(share), 0.14, 0.3, max(share)), include.lowest = T)) %>% ggplot(aes(x = name, y = share, color = brk)) + geom_point(size = 4) + coord_flip() + scale_color_manual(values = pal, labels = function(x) brk_labels(x, format = "percent", mult_by = 100)) + theme_din(base_family = "ptsans", base_size = 10, xgrid = T, ygrid = "dotted") + scale_y_continuous(labels = function(x) sprintf("%0g", x * 100)) + labs(x = NULL, y = NULL, color = "Rate", title = "Child low-income rate by town", subtitle = "Greater New Haven towns, 2016", caption = "Source: US Census Bureau 2016 ACS 5-year estimate")
The moe_test
function applies t-tests for differences between two estimates, given their MOEs. This works well for comparing values in one year to those in another, or between related locations or groups. The grunt-work of these calculations is done with tidycensus
functions. Included in the package is the pov_trend
tibble for testing out significance testing on data in both 2010 and 2016.
pov_trend <- pov_age_10_16 %>% mutate(across(c(age, ratio), as_factor)) %>% group_by(name, year, ratio) %>% add_grps(age_grps, group = age, value = estimate, moe = moe) %>% group_by(name, year, age) %>% add_grps(ratio_grps, group = ratio, value = estimate, moe = moe) %>% calc_shares(group = ratio, denom = "determined", value = estimate, moe = moe)
The output of moe_test
includes optional intermediary calculations, such as standard errors and Z-scores used for significance testing.
pov_sigs <- pov_trend %>% filter(ratio == "low_income") %>% select(-estimate, -moe, -ratio) %>% ungroup() %>% mutate(year = paste0("y", year)) %>% make_wide(share, sharemoe, group = year) %>% moe_test(est1 = y2010_share, moe1 = y2010_sharemoe, est2 = y2016_share, moe2 = y2016_sharemoe, alpha = 0.1) pov_sigs %>% select(name, age, diff:isSig_90) %>% filter(name %in% c("New Haven", "Hamden"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.