read_nhgis: Read tabular data from an NHGIS extract
In ipumsr: An R Interface for Downloading, Reading, and Handling IPUMS Data

read_nhgis

R Documentation

Read tabular data from an NHGIS extract

Description

Read a .csv or fixed-width (.dat) file downloaded from the NHGIS extract system.

This function has been deprecated in favor of read_ipums_agg(), which can read .csv files from both IPUMS aggregate data collections (IPUMS NHGIS and IPUMS IHGIS). Please use that function instead.

Note that fixed-width file reading is not supported in read_ipums_agg() and will likely be retired with read_nhgis(). We therefore encourage you to create NHGIS extracts in .csv format going forward. For previously-submitted fixed-width extracts, we suggest regenerating them in .csv format and loading them with read_ipums_agg(). Use the data_format argument of define_extract_agg() to create a .csv extract for submission via the IPUMS API.

To read spatial data from an NHGIS extract, use read_ipums_sf().

Usage

read_nhgis(
  data_file,
  file_select = NULL,
  vars = NULL,
  col_types = NULL,
  n_max = Inf,
  guess_max = min(n_max, 1000),
  do_file = NULL,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  remove_extra_header = TRUE,
  verbose = TRUE
)

Arguments

`data_file`	Path to a .zip archive containing an NHGIS extract or a single file from an NHGIS extract.
`file_select`	If `data_file` is a .zip archive that contains multiple files, an expression identifying the file to load. Accepts a character vector specifying the file name, a tidyselect selection, or an index position. This must uniquely identify a file.
`vars`	Names of variables to include in the output. Accepts a vector of names or a tidyselect selection. If `NULL`, includes all variables in the file.
`col_types`	One of `NULL`, a `cols()` specification or a string. If `NULL`, all column types will be inferred from the values in the first `guess_max` rows of each column. Alternatively, you can use a compact string representation to specify column types: c = character i = integer n = number d = double l = logical f = factor D = date T = date time t = time ? = guess _ or - = skip See `read_delim()` for more details.
`n_max`	Maximum number of lines to read.
`guess_max`	For .csv files, maximum number of lines to use for guessing column types. Will never use more than the number of lines read.
`do_file`	For fixed-width files, path to the .do file associated with the provided `data_file`. The .do file contains the parsing instructions for the data file. By default, looks in the same path as `data_file` for a .do file with the same name. See Details section below.
`var_attrs`	Variable attributes to add from the codebook (.txt) file included in the extract. Defaults to all available attributes. See `set_ipums_var_attributes()` for more details.
`remove_extra_header`	If `TRUE`, remove the additional descriptive header row included in some NHGIS .csv files. This header row is not usually needed as it contains similar information to that included in the `"label"` attribute of each data column (if `var_attrs` includes `"var_label"`).
`verbose`	Logical controlling whether to display output when loading data. If `TRUE`, displays IPUMS conditions, a progress bar, and column types. Otherwise, all are suppressed. Will be overridden by `readr.show_progress` and `readr.show_col_types` options, if they are set.

Details

The .do file that is included when downloading an NHGIS fixed-width extract contains the necessary metadata (e.g. column positions and implicit decimals) to correctly parse the data file. read_nhgis() uses this information to parse and recode the fixed-width data appropriately.

If you no longer have access to the .do file, consider resubmitting the extract that produced the data. You can also change the desired data format to produce a .csv file, which does not require additional metadata files to be loaded.

For more about resubmitting an existing extract via the IPUMS API, see vignette("ipums-api", package = "ipumsr").

Value

A tibble containing the data found in data_file

Examples

# Example files
csv_file <- ipums_example("nhgis0972_csv.zip")
fw_file <- ipums_example("nhgis0730_fixed.zip")

# Previously:
read_nhgis(csv_file)

# For CSV files, please update to use the following:
read_ipums_agg(csv_file)

# Fixed-width files are parsed with the correct column positions
# and column types automatically:
read_nhgis(fw_file, file_select = contains("ts"), verbose = FALSE)

ipumsr documentation built on June 8, 2025, 1:30 p.m.