In hrecht/censusapi: Retrieve Data from the Census APIs

NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(message = FALSE, 
                                            warning = FALSE,
                                            purl = NOT_CRAN,
                                            comment = "#>")

library(censusapi)
knitr::opts_chunk$set(message = FALSE, warning = FALSE)

How do I learn more about a particular dataset?

Read the online documentation for your survey. Some information is not included in the developer metadata or documentation pages and is only available in PDFs on the Census Bureau website.

How can I see the underlying API call sent to the Census Bureau?

You can see the underlying API call sent to the Census Bureau servers by setting getCensus(show_call = TRUE) when running your code. If your getCensus() call results in an error, it will automatically print this underlying API call in your R console. You can copy and paste this URL in your web browser to view it directly.

What does “There was an error while running your query” mean?

Occasionally you might get the general error message “There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience.” This comes from the Census Bureau and could be caused by any number of problems, including server issues. Try rerunning your API call. If that doesn't work and you are requesting a large amount of data, try reducing the amount that you're requesting. If you're still having trouble, see below for ways to get help.

My `getCensus()` call worked last year but now it gives an error. Why?

The Census Bureau makes frequent changes to the APIs. For annual datasets, like the American Community Survey, variable names and available geographies can change year to year. Options for timeseries datasets sometimes change with new releases. Check the Census Bureau's online documentation for your dataset and use listCensusMetadata() to make sure you're using the current syntax.

Are the Census APIs case sensitive?

Yes. Run listCensusMetadata(type = "variables") on your dataset to see what variables are available. If the variable name you want is uppercase you'll need to write it uppercase in your getCensus() request. Most of the APIs use uppercase, but some use lowercase and some even use sentence case variable names.

How do I know what geographies are available for my dataset? What is a FIPS code?

Run listCensusMetadata(type = "geographies") on your dataset to check which geographies you can use. Each API has its own list of valid geographies and they occasionally change as the Census Bureau makes updates.

Most geographies in the Census APIs are specified using FIPS (Federal Information Processing Standards) codes. For example, Autauga County, Alabama is assigned state code 01 and county code 001. Its combined GEOID, which uniquely identifies counties nationally, is 01001.

See the Census Bureau FIPS reference for valid codes and the geographic glossary for more information. You can also download more geographic identifying information from the Census gazetteer files, including full GEOID, name, and centroid coordinates.

FIPS codes are characters, not numbers. For example, state-level FIPS codes are two characters long. region = state:01 will work but region =``state:1 will usually not.

How do I get data for every state, county, or metro area?

Most Census datasets allow you to use a wildcard — the * symbol — to get data for all of a geography class. For example, you can get data for every state in the American Community Survey by using region = state:*.

state_data <- getCensus(
    name = "acs/acs1", 
    vintage = 2022, 
    vars = c("NAME", "B19013_001E"), 
    region = "state:*")
head(state_data)

Data for some small geographies, like Census tract or block group, need to be nested using the regionin argument. Run listCensusMetadata(type = "geographies") to see those options.

Here's an example getting block group population data within a specific Census tract using the 2020 Decennial Census.

block_group <- getCensus(
    name = "dec/dhc",
    vintage = 2020,
    vars = c("NAME", "P1_001N"),
    region = "block group:*",
    regionin = "state:36+county:027+tract:220300")
block_group

I'm still stuck or got an unexpected result. How can I get help?

Use listCensusMetadata() to make sure you're using the right variable names and/or geography names.
Join the Census Bureau's public Slack channel and ask your question in the R or API rooms. Census Bureau staff and other censusapi users (and the censusapi package developer!) check this Slack regularly. This is the fastest way to get help.
Open a Github issue for bugs or issues that you suspect are caused by this R package.
For questions about a specific survey, you can email the contact listed in the dataset metadata found in listCensusApis().

Why is my data -666666666 or another weird value? What is an annotation?

Some Census datasets, including the American Community Survey, use annotated values. These values use numbers or symbols to indicate that the data is unavailable, has been top coded, has an insufficient sample size, or other noteworthy characteristics. Read more from the Census Bureau on ACS annotation meanings and ACS variable types.

The censusapi package is intended to return the data as-is so that you can receive those unaltered annotations. If you are using data for a small geography like Census tract or block group make sure to check for values like -666666666 or check the annotation columns for non-empty values to exclude as needed.

As an example, we'll get the total number of households (B11012_001E) and median household income with associated annotations and margins of error (B19013 group) for three census tracts in Washington, DC.

The value for one tract is available, one is top coded, and one is unavailable. Notice that income is top coded at \$250,000 — meaning any tract's income that is above that threshold is listed as \$250,001. You can see a value has a special meaning in the "EA" (estimate annotation) and "MA" (margin of error annotation) columns.

annotations_example <- getCensus(
    name = "acs/acs5",
    vintage = 2022, 
    vars = c("B11012_001E", "group(B19013)"), 
    region = "tract:006804,007703,000903",
    regionin = "county:001&state:11")
annotations_example