library(censusapi)
knitr::opts_chunk$set(message = FALSE, warning = FALSE)

This package provides basic support for the Census's new microdata APIs, using the same getCensus() functions used for summary data. Getting the data with getCensus() is easy. Using it responsibly takes some homework.

About microdata

Microdata contains individual-level responses: one row per person. It is a vital tool to perform custom analysis, but with great power comes great responsibility. Appropriately weighting the individual-level responses is required. You'll often need to work with household relationships and will need to handle responses that aren't in the universe of the question (for example, removing children in an analysis about college graduation rate.)

If you're new to working with microdata you'll need to do some reading before diving in. Here are some resources from the Census Bureau:

As for all other endpoints, censusapi retrieves the data so that you can perform your own analysis using your methodology of choice. If you're looking for an interactive microdata analysis tool, try the data.census.gov microdata interactive tool or the IPUMS online data analysis tool.

Once you've learned how to use microdata and gained and understanding of weighting, getting the data using censusapi is simple.

Getting microdata with censusapi

As an example, we'll get data from the 2020 Current Population Survey Voting Supplement. This survey asks people if they voted, how, and when, and includes useful demographic data.

See the available variables:

voting_vars <- listCensusMetadata(
    name = "cps/voting/nov",
    vintage = 2020,
    type = "variables")
head(voting_vars)

From the CPS Voting supplement, get data on method of voting in New York state using PES5 (Vote in person or by mail?) and PESEX (gender), along with the appropriate weighting variable, PWSSWGT. We'll only get data for people with a response of 1 (yes) to PES1 (Did you vote?).

cps_voting <- getCensus(
    name = "cps/voting/nov",
    vintage = 2020,
    vars = c("PES5", "PESEX", "PWSSWGT"),
    region = "state:36",
    PES1 = 1)
head(cps_voting)

Making a data dictionary

Most of microdata variables are encoded, which means that your data will have a lot of numbers instead of text labels.

A data dictionary, which includes the definitions and labels for every variable in the dataset, is helpful. This is possible with listCensusMetasdata(include_values = "TRUE) returns a data dictionary with one row for each variable-label pair. That means if there are 30 codes for a given variable, it will have 30 rows in the data dictionary. Variables that don't have value labels in the metadata will have only one row.

voting_dict <- listCensusMetadata(
    name = "cps/voting/nov",
    vintage = 2020,
    type = "variables",
    include_values = TRUE)
head(voting_dict)

You can also look up the meaning of those codes for a single variable using the same function, listCensusMetadata(). Here are the values of PES5, the variable for "Vote in person or by mail?"

PES5_values <- listCensusMetadata(
    name = "cps/voting/nov",
    vintage = 2020,
    type = "values",
    variable = "PES5")
PES5_values

Other ways to access microdata

The Census Bureau microdata APIs are helpful for working with a limited just-released datasets. But they're not your only option. Some other ways to get microdata are:



hrecht/censusapi documentation built on April 8, 2024, 9:21 a.m.