In higherX4Racine/hercacstables: Work with American Community Survey Tables through the US Census API

#| label: setup
#| include: FALSE

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = FALSE
)

Preface

The American Community Survey API returns data in a very terse format. Each response contains four parts: geographic information, a table code, a row number, and a value. The geographic information varies by the level of detail one asks for. The value is a number that may be a population size, number of households, an income in dollars, a percentage, or several other quantities. The table code and row number are what determine the exact meaning of the value.

Generally speaking, all of the values in a table will be reporting the same kind of information. Again, the information might be counts of people or households, incomes or costs in dollars, hours commuting or working, or a percentage of just about anything. The row numbers then let you know what group of people the information is for. The first row always has a value for the entire population of a geographic area. Subsequent rows may have information about a very specific subgroup, or may contain a summary value for a combination of subgroups. For example, in a sex-and-age table, you might find the value for all men and boys (males) in row 2, then the value for boys under 5 in row 3. In short, a lot of information is packed into the two fields of table name and row number.

While packing information is efficient for storing and transmitting data, it means that users of Census information must unpack things in order to be clear. The hercacstables package has many tools and glossary tables that should make it easier and more convenient to unpack Census data.

Racial and ethnic categories in the American Community Survey

This vignette describes some features of hercacstables that help with the common and repetitive chores of unpacking racial and ethnic data from Census API responses. The Census has several ways that it reports by racial and ethnic identity. They do not always all agree or line up. That makes sense because the concepts of race and ethnicity are slippery, changing across place and time.

Broad strokes by the Census

The first racial/ethnic system that we will discuss involves ten categories. There are seven categories of race, one of ethnicity, one that combines both, and an "All" category: "Total", "White alone", "Black or African American alone", "American Indian and Alaska Native alone", "Asian alone", "Native Hawaiian and Other Pacific Islander alone", "Some Other Race alone", "Two or More Races", "White alone, not Hispanic or Latino", and "Hispanic or Latino". This is a very coarse and unsophisticated way of characterizing identities. Nevertheless, it occurs throughout the American Community Survey.

Subtable categories of race and ethnicity

#| label: find-subtables-with-ethnicity
#| echo: false

broad_subtables <- hercacstables::METADATA_FOR_ACS_GROUPS |>
    dplyr::filter(stringr::str_detect(.data$Group, "I$")) |>
    dplyr::select("Group", "Description") |>
    dplyr::mutate(
        Group = stringr::str_remove(.data$Group, "I$"),
        Description = .data$Description |>
            stringr::str_remove("\\(HISPANIC OR LATINO[^)]*\\)") |>
            stringr::str_squish()
    )

There are r nrow(broad_subtables) separate sets of tables that subdivide their information according to this ten-category scheme. Each identity is designated by a suffix at the end of a table's name. In other words, all of the values from a table with one of these suffixes in its name will pertain to people of just one racial/ethnic category.

Since the case is so common, hercacstables describes it with the glossary table hercacstables::RACE_ETHNICITY_SUBTABLES. It maps from the suffix in a table's name to the racial/ethnic category the table describes.

#| label: show-race-ethnicity-subtable-glossary
#| echo: false

hercacstables::RACE_ETHNICITY_SUBTABLES |>
    dplyr::select(
        "Census Race",
        "Suffix"
    ) |>
    knitr::kable()

Tables with "race" in their description

#| label: find-race-in-table-descriptions
#| echo: false

tables_that_name_race <- hercacstables::METADATA_FOR_ACS_GROUPS |>
    dplyr::filter(
        stringr::str_detect(.data$Group, "\\d$"),
        stringr::str_detect(.data$Description,
                            stringr::regex("RACE", ignore_case = TRUE))
    ) |>
    dplyr::select(
        "Group", "Description"
    ) |>
    dplyr::left_join(
        dplyr::count(hercacstables::METADATA_FOR_ACS_VARIABLES,
                     .data$Group,
                     name = "rows"),
        by = "Group"
    )

There are also r nrow(tables_that_name_race) tables that have the word "RACE" in their description.

#| label: show-the-tables-that-name-race
#| echo: false

knitr::kable(tables_that_name_race)

They seem to fall into four categories. There are two tables with 10 rows each, B02001 and B25006. These are probably convenience tables that pull data from other race-specific subtables. Six tables, B02008 through B020013, have only one row each. These seem to report inclusive counts, combining people who claim a racial identity either as their sole identity or in combination with another. One table, B03002, has 21 rows. This table appears to report detailed information about Hispanic ethnicity and specific racial identities. Tables B98013 and B99021 have descriptions that suggest that they report methodological details. Each of these categories of table deserves a little more discussion.

Convenience tables

Tables B02001 and B25006 deal with counts of individuals and households, respectively. They seem to be redundant, showing the first rows of the tables in the B01001 and B11001, respectively.

We can use hercacstables to check that.

Map meaning to Census Variables

The first step is to make a table that connects the Census's opaque variable names to the real-world meanings that we are interested in. For example, tables B01001* and B02001 deal with counts of people, while tables B11001* and B25006 deal with counts of households. Similarly, the racial identity being counted is defined either by the subtable suffix or the row number. Finally, we are interested in whether the data come from, a subtable or a convenience table. We can lay that all out in a way that "maps" from the Census variable to the real-world meaning.

#| label: check-variables-for-convenience-tables

convenience_check_variables <- tibble::tribble(
    ~ Table,  ~ Suffix,   ~ Index, ~ Population, ~ Race,
    "B01001", "A",              1, "People",     "White",
    "B01001", "B",              1, "People",     "Black",
    "B02001", "",               2, "People",     "White",
    "B02001", "",               3, "People",     "Black",
    "B11001", "A",              1, "Households", "White",
    "B11001", "B",              1, "Households", "Black",
    "B25006", "",               2, "Households", "White",
    "B25006", "",               3, "Households", "Black"
) |>
    dplyr::mutate(
        Group = paste0(.data$Table, .data$Suffix),
        Variable = hercacstables::build_api_variable(group_code = .data$Table,
                                                     race_code = .data$Suffix,
                                                     item_number = .data$Index),
        `Table Type` = dplyr::if_else(nchar(.data$Suffix) > 0,
                                      "Subtable",
                                      "Convenience")
    )

#| label: show-the-convenience-table
#| echo: false
knitr::kable(convenience_check_variables)

Fetch the raw data

With our variables defined, we can fetch the data from the Census API. Notice that, since this is an API call, the code block is set to cache its results. API calls are much slower than local functions, so it is usually a good idea to isolate them and run them as few times as possible.

#| label: check-values-for-convenience-tables

LATEST_YEAR <- hercacstables::most_recent_vintage("acs", "acs1")

convenience_check_values_raw <- hercacstables::fetch_data(
    convenience_check_variables$Variable,
    year = LATEST_YEAR,           # the most recent one available at the time of writing
    for_geo = "us",               # the entire nation
    for_items = "*",              # all nation-level geographies
    survey_type = "acs",          # as opposed to, e.g. "dec," for Decennial survey data
    table_or_survey_code = "acs5" # the specific survey is the 5-year ACS.
)

Examine values in context

Now that we have the raw data, we can check to see if the values from the convenience tables do, in fact, match up with the values from the subtables.

#| label: wrangle-convenience-check-data

convenience_check_values <- convenience_check_values_raw |>
    dplyr::inner_join(
        convenience_check_variables,
        by = c("Group", "Index")
    ) |>
    dplyr::select(
        "Population",
        "Race",
        "Table Type",
        "Value"
    ) |>
    tidyr::pivot_wider(
        names_from = "Table Type",
        values_from = "Value"
    ) |>
    dplyr::mutate(
        Identical = dplyr::if_else(.data$Subtable == .data$Convenience,
                                   "Yes",
                                   "No")
    )

convenience_check_values |>
    dplyr::mutate(
        dplyr::across(tidyselect::all_of(c("Subtable", "Convenience")),
                      scales::label_comma(accuracy = 1))
    ) |>
    knitr::kable(
        align = "llrrl"
    )

We don't have to include any of the confusing "Group," "Index," or "Variable" columns in our final result.

Inclusive identity counts

Six others, tables "B02008" through "B02013", show the numbers of people who identified with specific races and ethnicities. The totals from these tables will be larger than the US population because someone who identified with more than one category will be counted in each corresponding table.

These tables correspond to six of the ten broad categories of race/ethnicity, so they are actually already in RACE_ETHNICITY_SUBTABLES. I just hid them before because it would have been confusing.

#| label: show-inclusive-columns-too
#| echo: false

hercacstables::RACE_ETHNICITY_SUBTABLES |>
    dplyr::select(
        "Census Race",
        "Suffix",
        "Inclusive Group"
    ) |>
    knitr::kable()

The population reported in the inclusive tables should be as large, or larger, than the population reported in the exclusive tables. We can use hercacstables to check this, too.

Census variables for inclusive and exclusive counts

#| label: inclusive-exclusive-population-variables

incl_excl_pop_variables <- hercacstables::RACE_ETHNICITY_SUBTABLES |>
    dplyr::filter(
        nchar(.data$`Inclusive Group`) > 0
    ) |>
    dplyr::mutate(
        `Exclusive Group` = paste0("B01001", .data$Suffix)
    ) |>
    dplyr::select(
        "Census Race",
        Inclusive = "Inclusive Group",
        Exclusive = "Exclusive Group"
    ) |>
    tidyr::pivot_longer(
        cols = tidyselect::ends_with("clusive"),
        names_to = "Type of count",
        values_to = "Group"
    ) |>
    dplyr::mutate(
        Variable = hercacstables::build_api_variable(.data$Group, 1)
    )

knitr::kable(incl_excl_pop_variables)

Raw inclusive and exclusive counts

#| label: fetch-inclusive-exclusive-populations
#| cache: true

incl_excl_pop_values_raw <- hercacstables::fetch_data(
    variables = incl_excl_pop_variables$Variable,
    year = 2022,
    for_geo = "us",
    for_items = "*",
    survey_type = "acs",
    table_or_survey_code = "acs5"
)

Comparing inclusive and exclusive counts

#| label: wrangle-inclusive-exclusive-populations

incl_excl_pop_values <- incl_excl_pop_values_raw |>
    dplyr::inner_join(
        incl_excl_pop_variables,
        by = c("Group")
    ) |>
    dplyr::select(
        "Census Race",
        "Type of count",
        "Value"
    ) |>
    tidyr::pivot_wider(
        names_from = "Type of count",
        values_from = "Value"
    ) |>
    dplyr::mutate(
        Difference = .data$Inclusive - .data$Exclusive,
        `Percent Multiracial` = .data$Difference / .data$Inclusive
    ) |>
    dplyr::arrange(
        dplyr::desc(.data$Inclusive)
    )

incl_excl_pop_values |>
    dplyr::mutate(
        dplyr::across(tidyselect::all_of(c("Inclusive",
                                           "Exclusive",
                                           "Difference")),
                      scales::label_comma(accuracy = 1)),
        `Percent Multiracial` = scales::label_percent(accuracy = 1)(
            .data$`Percent Multiracial`
        )
    ) |>
    knitr::kable(
        align = "lrrrr"
    )

Hispanic ethnicity and broad racial identity

Table B03002 contains counts of people by Hispanic ethnicity for each of the ten broad racial identities. As always, the first row is the total population size. There are then two groups of rows. Rows 2-11 count people who are not Hispanic. Rows 12-21 count people who identify as Hispanic. Rows 2 and 12 are the total populations of non-Hispanic and Hispanic people. Rows 10, 11, 20, and 21 contain subgroupings that distinguish between people who identify as biracial and those who identify as multiracial.

That means that we can take rows one through eight of hercacstables::RACE_ETHNICITY_SUBTABLES and map them onto rows in B03002. In fact, this is so useful that it is also included in hercacstables::RACE_ETHNICITY_SUBTABLES.

#| label: hispanic-and-broad-race-variables
#| echo: false

hercacstables::RACE_ETHNICITY_SUBTABLES |>
    dplyr::filter(
        nchar(.data$`non-Hispanic`) > 0
    ) |>
    dplyr::select(
        "Census Race",
        "non-Hispanic",
        "Hispanic"
    ) |>
    knitr::kable()

Let's use these to look at nationwide trends across racial identities in their percentages of Hispanic ethnicity.

First, we will define our glossary table.

#| label: define-hispanic-and-broad-race-variable

hispanic_and_broad_race_variables <- hercacstables::RACE_ETHNICITY_SUBTABLES |>
    dplyr::filter(
        nchar(.data$`non-Hispanic`) > 0
    ) |>
    dplyr::select(
        "Census Race",
        "non-Hispanic",
        "Hispanic"
    ) |>
    tidyr::pivot_longer(
        cols = tidyselect::ends_with("Hispanic"),
        names_to = "Ethnicity",
        values_to = "Variable"
    ) |>
    tidyr::separate_wider_position(
        cols = "Variable",
        widths = c(Group = 6, 1,
                   Index = 3, 1),
        cols_remove = FALSE
    ) |>
    dplyr::mutate(
        Index = as.integer(.data$Index)
    )

knitr::kable(hispanic_and_broad_race_variables)

Next, we pull r LATEST_YEAR - 2005 years of ACS data.

#| label: fetch-hispanic-and-broad-race-data

hispanic_and_broad_race_raw <- c(2005:2019, 2021:LATEST_YEAR) |>
    purrr::map(
        ~ hercacstables::fetch_data(
            hispanic_and_broad_race_variables$Variable,
            year = .,
            for_geo = "us",
            for_items = "*",
            survey_type = "acs",
            table_or_survey_code = "acs1"
        )
    )

Then, we put the data into a nice, tidy format.

#| label: tidy-hispanic-and-broad-race-data

hispanic_and_broad_race <- hispanic_and_broad_race_raw |>
    purrr::list_rbind() |>
    dplyr::inner_join(
        hispanic_and_broad_race_variables,
        by = "Index"
    ) |>
    dplyr::select(
        "Census Race",
        "Ethnicity",
        "Year",
        "Value"
    ) |>
    tidyr::pivot_wider(
        names_from = "Ethnicity",
        values_from = "Value",
        values_fill = 0
    ) |>
    dplyr::mutate(
        Total = .data$Hispanic + .data$`non-Hispanic`,
        `Percent Hispanic` = .data$Hispanic / .data$Total
    )

hispanic_and_broad_race |>
    dplyr::slice_sample(
        n = 1,
        by = "Year"
    ) |>
    dplyr::mutate(
        dplyr::across(c("Hispanic", "non-Hispanic", "Total"),
                      scales::label_comma(accuracy = 1)),
        dplyr::across("Percent Hispanic",
                      scales::label_percent(accuracy = 1))
    ) |>
    knitr::kable(
        align = "lrrrrr"
    )

Finally, we plot it

#| label: plot-changes-in-hispanic-percentage
#| fig-dim: !expr "c(8, 8)"
#| dpi: 72

hispanic_and_broad_race |>
    dplyr::mutate(
        `Census Race` = stringr::str_to_title(.data$`Census Race`)
    ) |>
    ggplot2::ggplot(
        ggplot2::aes(x = .data$Year,
                     y = .data$`Percent Hispanic`,
                     color = .data$`Census Race`)
    ) +
    ggplot2::geom_line(
        linewidth = 1,
        lineend = "round",
        linejoin = "mitre"
    ) +
    ggplot2::geom_point(
        size = 3
    ) +
    ggplot2::scale_x_continuous(
        name = NULL,
        labels = scales::label_number(big.mark = ""),
        limits = c(2005, 2025),
        breaks = scales::breaks_width(5),
        minor_breaks = scales::breaks_width(1)
    ) +
    ggplot2::scale_y_continuous(
        name = "Percentage Hispanic",
        labels = scales::label_percent(accuracy = 1),
        limits = c(0, 1),
        breaks = scales::breaks_width(0.2),
        minor_breaks = scales::breaks_width(0.05)
    ) +
    ggplot2::scale_color_discrete(
        guide = ggplot2::guide_legend(
            title = NULL,
            position = "top",
            nrow = 4
        )
    ) +
    ggplot2::theme_minimal()

Methodological detail tables

Tables B98013 and B99021 give information about the Census's data collection methods. Statisticians can use to describe how much uncertainty there is in the data concerning racial identities. We can skip those.

Appendix

Subtables of race and ethnicity

#| label: show-broad-subtables
#| echo: false

knitr::kable(broad_subtables)

OMB Minimum Reporting Categories

| Minimum Race/Ethnicity Reporting Category | Definition | |-------------------------------------------|------------| | American Indian or Alaska Native | Individuals with origins in any of the original peoples of North, Central, and South America, including, for example, Navajo Nation, Blackfeet Tribe of the Blackfeet Indian Reservation of Montana, Native Village of Barrow Inupiat Traditional Government, Nome Eskimo Community, Aztec, and Maya. | | Asian | Individuals with origins in any of the original peoples of Central or East Asia, Southeast Asia, or South Asia, including, for example, Chinese, Asian Indian, Filipino, Vietnamese, Korean, and Japanese. | | Black or African American | Individuals with origins in any of the Black racial groups of Africa, including, for example, African American, Jamaican, Haitian, Nigerian, Ethiopian, and Somali. | | Hispanic or Latino | Includes individuals of Mexican, Puerto Rican, Salvadoran, Cuban, Dominican, Guatemalan, and other Central or South American or Spanish culture or origin. | | Middle Eastern or North African | Individuals with origins in any of the original peoples of the Middle East or North Africa, including, for example, Lebanese, Iranian, Egyptian, Syrian, Iraqi, and Israeli. | | Multiracial and/or Multiethnic | Those who identify with multiple race/ethnicity minimum reporting categories. | | Native Hawaiian or Pacific Islander | Individuals with origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands, including, for example, Native Hawaiian, Samoan, Chamorro, Tongan, Fijian, and Marshallese. | | White | Individuals with origins in any of the original peoples of Europe, including, for example, English, German, Irish, Italian, Polish, and Scottish. |

higherX4Racine/hercacstables documentation built on Jan. 15, 2025, 9:58 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

higherX4Racine/hercacstables
Work with American Community Survey Tables through the US Census API

In higherX4Racine/hercacstables: Work with American Community Survey Tables through the US Census API

Preface

Racial and ethnic categories in the American Community Survey

Broad strokes by the Census

Subtable categories of race and ethnicity

Tables with "race" in their description

Convenience tables

Map meaning to Census Variables

Fetch the raw data

Examine values in context

Inclusive identity counts

Census variables for inclusive and exclusive counts

Raw inclusive and exclusive counts

Comparing inclusive and exclusive counts

Hispanic ethnicity and broad racial identity

Methodological detail tables

Appendix

Subtables of race and ethnicity

OMB Minimum Reporting Categories

R Package Documentation

Browse R Packages

We want your feedback!

higherX4Racine/hercacstables Work with American Community Survey Tables through the US Census API

In higherX4Racine/hercacstables: Work with American Community Survey Tables through the US Census API

Preface

Racial and ethnic categories in the American Community Survey

Broad strokes by the Census

Subtable categories of race and ethnicity

Tables with "race" in their description

Convenience tables

Map meaning to Census Variables

Fetch the raw data

Examine values in context

Inclusive identity counts

Census variables for inclusive and exclusive counts

Raw inclusive and exclusive counts

Comparing inclusive and exclusive counts

Hispanic ethnicity and broad racial identity

Methodological detail tables

Appendix

Subtables of race and ethnicity

OMB Minimum Reporting Categories

R Package Documentation

Browse R Packages

We want your feedback!

higherX4Racine/hercacstables
Work with American Community Survey Tables through the US Census API