In ColoradoDemography/tidycensus: Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(tidycensus)
census_api_key("5ed58a5745802102fb83d4eec5d1f7326f65ffab")
library(tidyverse)
options(tibble.print.min = 6)

As opposed to the decennial Census datasets available with tidycensus, datasets from the five-year American Community Survey include estimates with an associated margin of error, as the ACS is based on an annual sample of around 3 million households. As explained by the Census Bureau:

While the main function of the decennial census is to provide counts of people for the purpose of congressional apportionment and legislative redistricting, the primary purpose of the ACS is to measure the changing social and economic characteristics of the U.S. population. As a result, the ACS does not provide official counts of the population in between censuses. [^1]

In turn, ACS data do not represent precise counts of population subgroups, but are rather designed to give a general sense of how socioeconomic indicators vary across the country. In many cases, ACS margins of error can be quite large - at times exceeding the estimate. Let's say we want to study aging populations in Ramsey County, Minnesota from the 2011-2015 ACS by Census tracts. We can pull up a dataset with get_acs:

library(tidycensus)
library(tidyverse)

# We need to construct a vector of variables for males and females age 65 and up
# purrr can help us with this (I may build this in to tidycensus in the future)

vars <- map_chr(c(20:25, 44:49), function(x) paste0("B01001_0", x))

ramsey <- get_acs(geography = "tract", variables = vars, state = "MN", county = "Ramsey")

head(ramsey %>% select(-NAME))

We can see the issue already: for Census Tract 301 in Ramsey County, the ACS estimates a male population aged between 75 and 79 of 16 - with a margin of error exceeding the estimate. One way to address this is through data aggregation. While the estimate for males age 75 to 79 in this Census tract may be unreliable, the estimate for the total population over age 65 in the tract is likely better, but is not available directly from the ACS API. In turn, we can summarize our data.

The US Census Bureau publishes guidelines on how to calculate margins of error for derived estimates (see the footnotes). These formulas are implemented in tidycensus in the moe_sum, moe_prop, moe_ratio, and moe_product functions. The example below illustrates the use of moe_sum to calculate the margin of error around a derived estimate for Census tract population over age 65.

ramsey65 <- ramsey %>%
  group_by(GEOID) %>%
  summarize(estimate = sum(estimate), 
            moe = moe_sum(moe))

head(ramsey65)

The margins of error for this aggregate population are more reasonable. However, the Census Bureau does issue this warning:

These methods do not consider the correlation or covariance between the basic estimates. They may be overestimates or underestimates of the derived estimate’s standard error, depending on whether the two basic estimates are highly correlated in either the positive or negative direction. As a result, the approximated standard error may not match direct calculations of standard errors or calculations obtained through other methods. [^2]

Dealing with margins of error in the ACS is a complex, yet important topic. This is especially true when dealing with spatial data, and considering ways to aggregate areal units to improve the reliability of estimates. I recommend the following papers for further reading:

Spielman, S.E., and Folch, D.C. (2015). Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization. PLOS ONE. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115626; Python implementation at https://github.com/geoss/ACS_Regionalization

Spielman, S.E., and Singleton, A. (2015) Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach. Annals of the Association of American Geographers. http://www.tandfonline.com/doi/full/10.1080/00045608.2015.1052335; R implementation at https://github.com/geoss/acs_demographic_clusters.

Wong, D.W., and Sun, M. (2013). Handling Data Quality Information of Survey Data in GIS: A Case of Using the American Community Survey Data. Spatial Demography. http://spatialdemography.org/wp-content/uploads/2013/04/2.-Wong-and-Sun.pdf.

[^1]: United States Census Bureau (2008). A Compass for Understanding and Using American Community Survey Data. https://www.census.gov/content/dam/Census/library/publications/2008/acs/ACSGeneralHandbook.pdf.
[^2]: United States Census Bureau (2016). Instructions for Applying Statistical Testing to the 2011-2015 ACS 5-Year Data. https://www2.census.gov/programs-surveys/acs/tech_docs/statistical_testing/2015StatisticalTesting5year.pdf.

ColoradoDemography/tidycensus documentation built on May 18, 2019, 7:04 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ColoradoDemography/tidycensus
Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

In ColoradoDemography/tidycensus: Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

R Package Documentation

Browse R Packages

We want your feedback!

ColoradoDemography/tidycensus Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

In ColoradoDemography/tidycensus: Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

R Package Documentation

Browse R Packages

We want your feedback!

ColoradoDemography/tidycensus
Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames