knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Let's say that you are a data intern for a wastewater agency in the southeastern part of Wisconsin. Your mentors have asked you to find insights from the American Community Survey.

Specifying your community

They tell you that the wastewater commission's area of responsibility is essentially identical to the local school district's, Racine Unified.

You look for geographical levels related to schools in hercacstables::METADATA_FOR_ACS_GEOGRAPHIES.

#| label: look-for-school-geographies
SCHOOL_GEOGRAPHIES <- dplyr::filter(hercacstables::METADATA_FOR_ACS_GEOGRAPHIES,
                                    grepl("school", .data$`Geographic Level`))

This table tells us two things. First, the geographic level that you should examine is "school district (unified)". Second, you also need to specify the state when you're fetching data about it. With hercacstables::fetch_data(), and tools from the tidyverse, you can find the FIPS codes for unified school districts in Wisconsin that have the word "Racine" in them.

#| label: find-the-fips-for-racine-unified
RACINE_DISTRICTS <- "NAME" |>
    hercacstables::fetch_data(
        variables = _,
        year = 2020,                           # the most recent decennial
        for_geo = "school district (unified)", # see above
        for_items = "*",                       # this wildcard gets every item
        survey_type = "dec",                   # not actually the ACS *flex*
        table_or_survey_code = "pl",           # this data set is similar to ACS
        state = 55L                            # the Badger State
    ) |>
    dplyr::filter(
        stringr::str_detect(.data$NAME, "Racine")
    )
#| echo: false

knitr::kable(RACINE_DISTRICTS)

Finding relevant data

The first thing that you could do is to check for data related to plumbing. If you were doing this without R, you might visit data.census.gov and search for "ACS plumbing".

Groups with relevant data

With hercacstables, you would search within its glossary about ACS groups:

#| label: search-for-group-about-plumbing

PLUMBING_GROUPS <- hercacstables::METADATA_FOR_ACS_GROUPS |>
    dplyr::filter(
        grepl("plumbing",         # search for the term "plumbing"
              .data$Description,  # within the "Description" column
              ignore.case = TRUE) # ignore capitalization
    )
#| label: show-plumbing-groups
#| echo: false

knitr::kable(PLUMBING_GROUPS)

It appears that the keyword "plumbing" appears in r nrow(PLUMBING_GROUPS) different ACS groups.

Specific variables about plumbing

Let's say that "Tenure^[ Whether the occupants rent or own, not if they can't be fired. ] by plumbing facilities" jumps out to you as potentially interesting.

#| label: define-group

GROUP <- "B25049"

There hercacstables also provides glossary about variables. You can search it to see details about each variable in group r GROUP.

#| label: search-for-variables-in-group

PLUMBING_VARIABLES <- hercacstables::METADATA_FOR_ACS_VARIABLES |>
    dplyr::filter(
        .data$Dataset == "ACS5", # use the 1-year dataset only
        .data$Group == GROUP     # pull all of the variables for this group
    )
#| label: show-raw-plumbing-variables
#| echo: false

knitr::kable(PLUMBING_VARIABLES)

Unpacking variable details

You can see that group r GROUP reports the number of households that have, or do not have, plumbing facilities. It further breaks those households down into renters and owner-occupants. The raw glossary from the Census, and in hercacstables::METADATA_FOR_ACS_VARIABLES, packs all of that information into the "Details" column.

It would be more useful if we actually had separate columns for the types of tenure and plumbing facilities. You can use hercacstables::unpack_group_details() to do this, in combination with tools from the tidyverse.

You might also decide that you don't the rows that report the total number of households (row 1) and subtotals by tenure (2 and 5). You'll have that data anyway from 3, 4, 6, and 7.

#| label: make-a-helpful-glossary-table
PLUMBING_VARIABLES <- GROUP |>
    hercacstables::unpack_group_details() |>
    dplyr::filter(
        .data$Dataset == "ACS5"
    ) |>
    dplyr::rename(
        Tenure = "A",
        Plumbing = "B"
    ) |>
    dplyr::filter(
        dplyr::if_all(c("Tenure", "Plumbing"),
                      \(.) (nchar(.) > 0))
    )
#| label: show-plumbing-variables
#| echo: false

knitr::kable(PLUMBING_VARIABLES)

Fetch data

You are finally ready to pull a some census data about plumbing! For starters, you could compare the local area to the whole county, the state, its census region, and the whole country. Once again, we'll use hercacstables::fetch_data() and the tidyverse.

#| label: fetch-racine-plumbing
#| cache: true

YEAR <- hercacstables::most_recent_vintage("acs", "acs5")

RAW_PLUMBING <- tibble::tribble(
    ~ geo,                       ~ items, ~ other,
    "us",                        "1",     list(),
    "region",                    "2",     list(),
    "state",                     "55",    list(),
    "county",                    "101",   list(state = 55),
    "school district (unified)", "12360", list(state = 55)
) |>
    purrr::pmap(
        \(geo, items, other) rlang::inject(
            hercacstables::fetch_data(
                variables = c("NAME", PLUMBING_VARIABLES$Variable),
                year = YEAR,
                for_geo = geo,
                for_items = items,
                survey_type = "acs",
                table_or_survey_code = "acs5",
                !!!other
            )
        )
    )
#| label: show-raw-racine-plumbing
#| echo: false
#| results: asis

RAW_PLUMBING |>
    purrr::walk(
        \(.d) .d |>
            dplyr::slice(1) |>
            knitr::kable() |>
            print()
    )

The data in RAW_PLUMBING do not have much meaning without the information in PLUMBING_VARIABLES. Fortunately, we can use the tidyverse to join the two tables together.

#| label: wrangle-racine-plumbing

PLUMBING <- RAW_PLUMBING |>
    purrr::map(
        \(.) dplyr::select(.,
                           "Group",
                           "Index",
                           Geography = "NAME",
                           Households = "Value"
        )
    ) |>
    purrr::list_rbind() |>
    dplyr::right_join(
        PLUMBING_VARIABLES,
        by = c("Group", "Index")
    ) |>
    dplyr::select(
        "Geography",
        "Tenure",
        "Plumbing",
        "Households"
    )
#| label: show_plumbing

knitr::kable(PLUMBING)

Finally, you can do some tidyverse-only magic to look at the different rates of plumbing for renters versus owners.

#| label: wrangle-rates-of-plumbing

RATES_OF_PLUMBING <- PLUMBING |>
    dplyr::mutate(
        dplyr::across(c("Tenure", "Plumbing"),
                      \(.) stringr::str_extract(., "^\\S+"))
    ) |>
    tidyr::pivot_wider(
        names_from = "Tenure",
        values_from = "Households"
    ) |>
    dplyr::mutate(
        `All Households` = .data$Owner + .data$Renter
    ) |>
    tidyr::pivot_longer(
        cols = c("Renter", "Owner", "All Households"),
        names_to = "Tenure",
        values_to = "Households"
    ) |>
    tidyr::pivot_wider(
        names_from = "Plumbing",
        values_from = "Households"
    ) |>
    dplyr::mutate(
        Households = .data$Complete + .data$Lacking,
        Rate = .data$Complete / .data$Households
    ) |>
    dplyr::select(
        "Geography",
        "Tenure",
        "Rate"
    ) |>
    tidyr::pivot_wider(
        names_from = "Tenure",
        values_from = "Rate"
    )
#| label: show-rates-of-plumbing
#| echo: false

RATES_OF_PLUMBING |>
    dplyr::mutate(
        dplyr::across(!"Geography",
                      \(.) scales::label_percent(accuracy = 0.01)(.))
    ) |>
    knitr::kable(
        align = "lrrr"
    )

Three things might jump out at you. The first is that almost every household in the US has complete plumbing facilities. The second thing is that plumbing access seems a little different for renters and owners. The third thing is that who has more access to plumbing is different in Racine than at larger geographic levels. At the scale of the nation, region, and state, renters are a little less likely to have complete access to plumbing. In contrast, renters in Racine County and the Racine Unified School District are actually a little more likely to have access to plumbing.



higherX4Racine/hercacstables documentation built on Jan. 15, 2025, 9:58 p.m.