get_data_pums: Get PUMS microdata from storage

View source: R/get_data.R

get_data_pumsR Documentation

Get PUMS microdata from storage

Description

Retrieves American Community Survey (ACS) Public Use Microdata Sample (PUMS) data from storage. Can return person-level, household-level, or combined records with appropriate survey weights applied.

Usage

get_data_pums(cols = NULL, year = NULL, kingco = TRUE, records = "person")

Arguments

cols

Character vector specifying which columns to include in the returned data. If NULL, all columns will be included. Note that survey weight columns (wgtp/pwgtp) and chi_year are always included regardless of selection. Defaults to cols = NULL

year

Integer vector specifying which years to include in the data. Can be either a single year for 1-year estimates or five consecutive years for 5-year estimates. If NULL, the most recent single year available will be used. Note that 2020 is not available due to COVID-19 pandemic survey disruptions. Defaults to year = NULL

kingco

Logical indicating whether to restrict the data to King County records only. Defaults to kingco = TRUE

records

Character string specifying whether to return person-level, household-level, or combined records. Must be one of "person", "household", or "combined". When 'combined' is selected, person and household records are merged using the household identifier (serialno) and survey set for person-level analyses. Defaults to records = 'person'

Details

The function automatically applies the appropriate survey weights (person or household) based on the records parameter. For person-level and combined records, it uses the person weight (pwgtp) and its replicate weights. For household-level records, it uses the household weight (wgtp) and its replicate weights.

The function uses the JK1 (jackknife) method for variance estimation with 80 replicate weights, following Census Bureau recommendations for PUMS data.

When you select records = "combined", household-level variables with the same names as person-level variables are given a '_hh' suffix to distinguish them. You are strongly encouraged to review the Census Bureau's ACS PUMS documentation if you plan to set records = "combined".

Value

Returns a survey-weighted dtsurvey/data.table object with the specified columns and years that is ready for use with calc.

References

For information regarding the ACS PUMS ETL process, file locations, data dictionaries, etc., see: https://github.com/PHSKC-APDE/svy_acs

Examples


# Get person-level data for specific columns from the most recent year
pums_person <- get_data_pums(
  cols = c("agep", "race4"),
  kingco = TRUE
)

# Get household-level data for a 5-year period
pums_households <- get_data_pums(
  year = 2018:2022,
  records = "household"
)

# Get combined person-household level data for WA State in 2022
pums_combo <- get_data_pums(
  year = 2022,
  records = "combined",
  kingco = FALSE
)





PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.