In psrc/psrccensus: Summarize Census Data for use at PSRC

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(knitr)

Under the hood: sources for PUMS microdata

By default, the psrccensus::get_psrc_pums() function downloads microdata for the entire state from the Census FTP site. Because each call to get_psrc_pums() can retrieve variables from the household and person datasets, it downloads both--which can take around a minute for 5-yr surveys, depending on the internet connection. (An earlier build pulled only requested variables via the Census API, but package authors found it slower than the FTP approach, and subject to unexplained, almost daily downtimes after 5pm Eastern time.)

Strategies to minimize downloads

One way to minimize download time is to reduce separate calls to get_psrc_pums(), requesting all variables you need for a given span-year-level combination in one set. This also reduces memory loads. Related efficiency hints include batching multiyear analysis by year rather than by variable and summarizing with the incl_na=FALSE option instead of creating multiple filtered objects.

Shift to local files using `dir=`

You can also skip the download altogether by loading prepared household and person tables stored locally. This shortcut is activated by specifying the directory in which to find the data in the dir= argument to get_psrc_pums(). The data must be stored as gzip-compressed .rds files with a specific naming convention--concatenated data year, level, and span; e.g. "2022h1.gz" or "2017p5.gz" (for PSRC staff, these files already exist on the network; see the PSRC Data wiki entry, "Working_with_PUMS_data".)

Create the local file

```{make .gz file, message=FALSE, eval=FALSE} library(data.table) library(magrittr) dir <- "your/desired/storage/directory/path" # Specify your storage directory

write_pums_rds(2023, dir) # Save local .rds files for a single year (2023)

lapply(2005:2023, write_pums_rds) # Save local .rds files for a range

Function exclude combos that don't exist (e.g. 2005 5-yr; 2020 1-yr)

### Utilize the local file option

Once the prepared files are in place--i.e. both the household file and the person file for a given year & span--it is straightforward to reference the directory location in the `get_psrc_pums(dir=)` call:

```{using local file, message=FALSE, eval=FALSE}
library(psrccensus)
pums_rds <- "your/desired/storage/directory/path"

my_data <- get_psrc_pums(5, 2022, "p", c("HINCP", "SOC2"), dir = pums_rds)

Use cases and considerations

Loading from these files can improve speed significantly, so you may want to consider it if you are working with PUMS a lot or are building a server-hosted app that involves calling get_psrc_pums(). The downside is the required file management, e.g. when the Census Bureau releases a new dataset, you'll need to be sure the corresponding .gz file is created before the option will work for that data year.

To increase speed for a server-hosted app, you may want to consider going a step farther and carry through all the potential statistical analyses, so the app will draw from stored summary results rather than call any psrccensus functions itself. Although this involves an upfront investment of thought, time, and data processing, it will pay off in dramatically lower response times for app users.

psrc/psrccensus documentation built on June 1, 2025, 1:06 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com