In higherX4Racine/hercacstables: Work with American Community Survey Tables through the US Census API

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

commanism <- function(.) {
    scales::label_comma(accuracy = 1)(.)
}

set.seed(42)

hercacstables

This package provides an R interface to the United States Census Bureau's data API. The primary focus, as per its name, is pulling information from the detailed tables of the American Community Survey.

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("higherX4Racine/hercacstables")

Fetching data from the API

The main way that one uses hercacstables to interact with the Census API is the fetch_data() function.

A vanilla call to `fetch_data()`

Here is a modestly complicated use of the function without any setup or post-processing.

#| label: fetch-pops-and-households
#| cache: true

POPS_AND_HOUSEHOLDS <- hercacstables::fetch_data(
    # the API works one year at a time
    year = hercacstables::most_recent_vintage("acs", "acs1"),

    # the API can only query one data source at a time
    survey_type = "acs",           # the American Community Survey
    table_or_survey_code = "acs1", # the 1-year dataset of the ACS

    # the API fetches values for one or more instances of a specific geography
    for_geo = "state",             # fetch values for entire states
    for_items = c(
        "11",                      # the District of Columbia
        "72"                       # Puerto Rico
    ),

    # the codes for the specific data that we are fetching
    variables = c(
        "NAME",                    # the geographic area's name
        "B01001_001E",             # the total number of people
        "B11002_002E",             # people in family households
        "B11002_012E"              # people in nonfamily households
    )
)

#| label: show-pops-and-households
#| echo: false
POPS_AND_HOUSEHOLDS |>
    dplyr::mutate(
        Value = commanism(.data$Value)
    ) |>
    knitr::kable(
        align = "lrlrrr"
    )

Best practice

The previous example is good enough for a README file. An approach centered on reusability is better for actual practice.

Identifying which items to fetch

People working with the Census often know the topic that they are interested in, but not the specific variables that provide information about that topic. The hercacstables package has utilities that help you search for Census variables if you don't already know which ones you want to use.

Search for groups with keywords

A good first step is to search for tables that could be relevant with the search_in_columns() function and the built-in METADATA_FOR_ACS_GROUPS.

#| label: search-in-glossary
EDUCATION_TABLES <- hercacstables::search_in_columns(
    hercacstables::METADATA_FOR_ACS_GROUPS,
    Group = "\\d$",          # "Group" values need to end in a digit.
    Universe = "population", # "Universe" values need to include "population".
    Description = "enroll",  # "Description values need to include "enroll"
    `-Description` = c("sex",        # but not "sex", that's too much detail
                       "computer",   # or "computer", also too detailed
                       "quarters",   # or "quarters", also too detailed
                       "insur",      # or "insur", not health care enrollments
                       "allocation") # or "allocation", these are metadata
)

#| label: show-glossary_search-results
#| echo: false
knitr::kable(EDUCATION_TABLES)

Unpack variables for a group

Once you know the group that you are interested in, you will want to see what information is captured in each of its rows. The unpack_group_details() function searches the built-in METADATA_FOR_ACS_VARIABLES for all of the variables of one group. It then expands the Census's description of each variable into columns. Users will need to identify the concepts that are captured by each column.

The following example unpacks the variables in the B14001 table.

#| label: unpack-B14001
UNPACKED_B14001 <- hercacstables::unpack_group_details("B14001")

#| label: show-B14001
#| echo: false
UNPACKED_B14001 |>
    dplyr::filter(.data$Dataset == "ACS1") |>
    dplyr::select(!"Dataset") |>
    knitr::kable()

Shortcuts

The package also includes functions for common special cases. They are wrappers for fetch_data() with some arguments hard-coded.

Decennial populations by race and ethnicity

One example is pulling trends of racial/ethnic populations from the decennial census for some specific geographical units.

#| label: fetch-decennial-pops-by-race
#| cache: true
POPS_BY_RACE <-
    hercacstables::fetch_decennial_pops_by_race(
        for_geo = "state", # one cannot fetch the whole nation from 2000 or 2010
        for_items = "*"    # so we pull the data for every state
    ) |>
    dplyr::count( # then compute the nationwide populations for each vintage
        .data$`Race/Ethnicity`,
        .data$Vintage,
        wt = .data$Population,
        name = "Population"
    ) |>
    tidyr::pivot_wider( # and reshape the table for display
        names_from = "Vintage",
        values_from = "Population"
    )

#| label: show-decennial-pops-by-race
#| echo: false
POPS_BY_RACE |>
    dplyr::mutate(
        dplyr::across(tidyselect::ends_with("0"),
                      commanism)
    ) |>
    knitr::kable(
        align = "lrrr"
    )