get_rnhgis_ds: Retrieve NHGIS datasets with caching and lookup table...

View source: R/get_rnhgis.R

get_rnhgis_dsR Documentation

Retrieve NHGIS datasets with caching and lookup table generation.

Description

This function retrieves datasets from NHGIS (National Historical Geographic Information System) using the 'ipumsr' package, with caching capabilities to avoid redundant downloads. It also generates a lookup table containing metadata about the dataset.

Usage

get_rnhgis_ds(
  ...,
  lkp = FALSE,
  refresh = FALSE,
  save_dir = here::here("data-raw/rnhgis_ds/")
)

Arguments

...

Arguments to be passed to 'ipumsr::ds_spec()', specifying the datasets to retrieve.

lkp

Logical. If 'TRUE', returns the lookup table; if 'FALSE' (default), returns the dataset.

refresh

Logical. If 'TRUE', forces a refresh of the cached; if 'FALSE' (default), uses the cached data.

save_dir

Directory where downloaded data and lookup tables are saved. Defaults to "data-raw/rnhgis_data/".

Details

This function first checks if a cached parquet file exists for the specified dataset. If it does, and 'lkp' is 'FALSE', the cached dataset is returned. If 'lkp' is 'TRUE', the cached lookup table is returned. If the cached file does not exist, the function downloads the data from NHGIS using the 'ipumsr' package. It requires an NHGIS API key to be set as an environment variable named "IPUMS_API_KEY". The downloaded data and the generated lookup table are then saved as parquet files in the specified 'save_dir'.

The lookup table contains information about each variable in the dataset, including its name, type, and attributes.

Value

A data frame (if 'lkp = FALSE') or a lookup table (if 'lkp = TRUE') containing the requested NHGIS data.

Examples

## Not run: 
## Example to get places pop from NHGIS
ds <- ipumsr::get_metadata_nhgis(type = "datasets") %>% setDT()
ds[grepl("2000", name) & grepl("SF1", name)]

sf1 <- ipumsr::get_metadata_nhgis(dataset = "2000_SF1a")

## Find the variable name
sf1$data_tables %>% as.data.table() %>% .[grepl("Total Population", description)]
## Find the geography level
sf1$geog_levels %>% as.data.table() %>% .[grepl("Place", description)] %>% .[4]

dt_place_pop00 <- get_rnhgis_ds(
  name = "2000_SF1a",
  data_tables = "NP001A",
  geog_levels = "place"
)

dt_place_pop00_lkp <- get_rnhgis_ds(
  name = "2000_SF1a",
  data_tables = "NP001A",
  geog_levels = "place",
  lkp = TRUE
)

## End(Not run)


ChandlerLutz/CLmisc documentation built on Feb. 28, 2025, 10:05 p.m.