knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Let's say that you are a data intern for a wastewater agency in the southeastern part of Wisconsin. Your mentors have asked you to find insights from the American Community Survey.
They tell you that the wastewater commission's area of responsibility is essentially identical to the local school district's, Racine Unified.
You look for geographical levels related to schools in
hercacstables::METADATA_FOR_ACS_GEOGRAPHIES
.
#| label: look-for-school-geographies SCHOOL_GEOGRAPHIES <- dplyr::filter(hercacstables::METADATA_FOR_ACS_GEOGRAPHIES, grepl("school", .data$`Geographic Level`))
This table tells us two things.
First, the geographic level that you should examine is "school district
(unified)".
Second, you also need to specify the state when you're fetching data about it.
With hercacstables::fetch_data()
, and tools from the tidyverse,
you can find the FIPS codes for unified school districts in Wisconsin that have
the word "Racine" in them.
#| label: find-the-fips-for-racine-unified RACINE_DISTRICTS <- "NAME" |> hercacstables::fetch_data( variables = _, year = 2020, # the most recent decennial for_geo = "school district (unified)", # see above for_items = "*", # this wildcard gets every item survey_type = "dec", # not actually the ACS *flex* table_or_survey_code = "pl", # this data set is similar to ACS state = 55L # the Badger State ) |> dplyr::filter( stringr::str_detect(.data$NAME, "Racine") )
#| echo: false knitr::kable(RACINE_DISTRICTS)
The first thing that you could do is to check for data related to plumbing. If you were doing this without R, you might visit data.census.gov and search for "ACS plumbing".
With hercacstables
, you would search within its glossary about ACS groups:
#| label: search-for-group-about-plumbing PLUMBING_GROUPS <- hercacstables::METADATA_FOR_ACS_GROUPS |> dplyr::filter( grepl("plumbing", # search for the term "plumbing" .data$Description, # within the "Description" column ignore.case = TRUE) # ignore capitalization )
#| label: show-plumbing-groups #| echo: false knitr::kable(PLUMBING_GROUPS)
It appears that the keyword "plumbing" appears in r nrow(PLUMBING_GROUPS)
different ACS groups.
Let's say that "Tenure^[ Whether the occupants rent or own, not if they can't be fired. ] by plumbing facilities" jumps out to you as potentially interesting.
#| label: define-group GROUP <- "B25049"
There hercacstables
also provides glossary about variables.
You can search it to see details about each variable in group r GROUP
.
#| label: search-for-variables-in-group PLUMBING_VARIABLES <- hercacstables::METADATA_FOR_ACS_VARIABLES |> dplyr::filter( .data$Dataset == "ACS5", # use the 1-year dataset only .data$Group == GROUP # pull all of the variables for this group )
#| label: show-raw-plumbing-variables #| echo: false knitr::kable(PLUMBING_VARIABLES)
You can see that group r GROUP
reports the number of households that have, or
do not have, plumbing facilities.
It further breaks those households down into renters and owner-occupants.
The raw glossary from the Census, and in hercacstables::METADATA_FOR_ACS_VARIABLES
,
packs all of that information into the "Details" column.
It would be more useful if we actually had separate columns for the types of
tenure and plumbing facilities.
You can use hercacstables::unpack_group_details()
to do this, in combination
with tools from the tidyverse.
You might also decide that you don't the rows that report the total number of households (row 1) and subtotals by tenure (2 and 5). You'll have that data anyway from 3, 4, 6, and 7.
#| label: make-a-helpful-glossary-table PLUMBING_VARIABLES <- GROUP |> hercacstables::unpack_group_details() |> dplyr::filter( .data$Dataset == "ACS5" ) |> dplyr::rename( Tenure = "A", Plumbing = "B" ) |> dplyr::filter( dplyr::if_all(c("Tenure", "Plumbing"), \(.) (nchar(.) > 0)) )
#| label: show-plumbing-variables #| echo: false knitr::kable(PLUMBING_VARIABLES)
You are finally ready to pull a some census data about plumbing!
For starters, you could compare the local area to the whole county, the state,
its census region, and the whole country.
Once again, we'll use hercacstables::fetch_data()
and the
tidyverse.
#| label: fetch-racine-plumbing #| cache: true YEAR <- hercacstables::most_recent_vintage("acs", "acs5") RAW_PLUMBING <- tibble::tribble( ~ geo, ~ items, ~ other, "us", "1", list(), "region", "2", list(), "state", "55", list(), "county", "101", list(state = 55), "school district (unified)", "12360", list(state = 55) ) |> purrr::pmap( \(geo, items, other) rlang::inject( hercacstables::fetch_data( variables = c("NAME", PLUMBING_VARIABLES$Variable), year = YEAR, for_geo = geo, for_items = items, survey_type = "acs", table_or_survey_code = "acs5", !!!other ) ) )
#| label: show-raw-racine-plumbing #| echo: false #| results: asis RAW_PLUMBING |> purrr::walk( \(.d) .d |> dplyr::slice(1) |> knitr::kable() |> print() )
The data in RAW_PLUMBING
do not have much meaning without the information in
PLUMBING_VARIABLES
.
Fortunately, we can use the tidyverse to join the two tables
together.
#| label: wrangle-racine-plumbing PLUMBING <- RAW_PLUMBING |> purrr::map( \(.) dplyr::select(., "Group", "Index", Geography = "NAME", Households = "Value" ) ) |> purrr::list_rbind() |> dplyr::right_join( PLUMBING_VARIABLES, by = c("Group", "Index") ) |> dplyr::select( "Geography", "Tenure", "Plumbing", "Households" )
#| label: show_plumbing knitr::kable(PLUMBING)
Finally, you can do some tidyverse-only magic to look at the different rates of plumbing for renters versus owners.
#| label: wrangle-rates-of-plumbing RATES_OF_PLUMBING <- PLUMBING |> dplyr::mutate( dplyr::across(c("Tenure", "Plumbing"), \(.) stringr::str_extract(., "^\\S+")) ) |> tidyr::pivot_wider( names_from = "Tenure", values_from = "Households" ) |> dplyr::mutate( `All Households` = .data$Owner + .data$Renter ) |> tidyr::pivot_longer( cols = c("Renter", "Owner", "All Households"), names_to = "Tenure", values_to = "Households" ) |> tidyr::pivot_wider( names_from = "Plumbing", values_from = "Households" ) |> dplyr::mutate( Households = .data$Complete + .data$Lacking, Rate = .data$Complete / .data$Households ) |> dplyr::select( "Geography", "Tenure", "Rate" ) |> tidyr::pivot_wider( names_from = "Tenure", values_from = "Rate" )
#| label: show-rates-of-plumbing #| echo: false RATES_OF_PLUMBING |> dplyr::mutate( dplyr::across(!"Geography", \(.) scales::label_percent(accuracy = 0.01)(.)) ) |> knitr::kable( align = "lrrr" )
Three things might jump out at you. The first is that almost every household in the US has complete plumbing facilities. The second thing is that plumbing access seems a little different for renters and owners. The third thing is that who has more access to plumbing is different in Racine than at larger geographic levels. At the scale of the nation, region, and state, renters are a little less likely to have complete access to plumbing. In contrast, renters in Racine County and the Racine Unified School District are actually a little more likely to have access to plumbing.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.