define_extract-micro: Define an extract request for an IPUMS microdata collection

define_extract-microR Documentation

Define an extract request for an IPUMS microdata collection

Description

Define the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.

Currently supported microdata collections include:

  • IPUMS USA: define_extract_usa()

  • IPUMS CPS: define_extract_cps()

  • IPUMS International: define_extract_ipumsi()

Learn more about the IPUMS API in vignette("ipums-api") and microdata extract definitions in vignette("ipums-api-micro").

Usage

define_extract_usa(
  description,
  samples,
  variables,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

define_extract_cps(
  description,
  samples,
  variables,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

define_extract_ipumsi(
  description,
  samples,
  variables,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

Arguments

description

Description of the extract.

samples

Vector of samples to include in the extract request. Use get_sample_info() to identify sample IDs for a given collection.

variables

Vector of variable names or a list of detailed variable specifications to include in the extract request. Use var_spec() to create a var_spec object containing a detailed variable specification. See examples.

data_format

Format for the output extract data file. Either "fixed_width" or "csv".

Note that while "stata", "spss", or "sas9" are also accepted, these file formats are not supported by ipumsr data-reading functions.

Defaults to "fixed_width".

data_structure

Data structure for the output extract data.

  • "rectangular" provides person records with all requested household information attached to respective household members.

  • "hierarchical" provides household records followed by person records.

Defaults to "rectangular".

rectangular_on

If data_structure is "rectangular", records on which to rectangularize. Currently only "P" (person records) is supported.

Defaults to "P" if data_structure is "rectangular" and NULL otherwise.

case_select_who

Indication of how to interpret any case selections included for variables in the extract definition.

  • "individuals" includes records for all individuals who match the specified case selections.

  • "households" includes records for all members of each household that contains an individual who matches the specified case selections.

Defaults to "individuals". Use var_spec() to add case selections for specific variables.

data_quality_flags

Set to TRUE to include data quality flags for all applicable variables in the extract definition. This will override the data_quality_flags specification for individual variables in the definition.

Use var_spec() to add data quality flags for specific variables.

Value

An object of class micro_extract containing the extract definition.

See Also

submit_extract() to submit an extract request for processing.

save_extract_as_json() and define_extract_from_json() to share an extract definition.

Examples

usa_extract <- define_extract_usa(
  description = "2013-2014 ACS Data",
  samples = c("us2013a", "us2014a"),
  variables = c("SEX", "AGE", "YEAR")
)

usa_extract

# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_usa(
  description = "Example USA extract definition",
  samples = c("us2013a", "us2014a"),
  variables = var_spec(
    "SEX",
    case_selections = "2",
    attached_characteristics = c("mother", "father")
  )
)

# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_cps(
  description = "Example CPS extract definition",
  samples = c("cps2020_02s", "cps2020_03s"),
  variables = list(
    var_spec("AGE", data_quality_flags = TRUE),
    var_spec("SEX", case_selections = "2"),
    "RACE"
  )
)

cps_extract

# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")

my_vars <- purrr::map(
  var_names,
  ~ var_spec(.x, attached_characteristics = "mother")
)

ipumsi_extract <- define_extract_ipumsi(
  description = "Extract definition with predefined variables",
  samples = c("br2010a", "cl2017a"),
  variables = my_vars
)

# Extract specifications can be indexed by name
names(ipumsi_extract$samples)

names(ipumsi_extract$variables)

ipumsi_extract$variables$AGE

## Not run: 
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)

## End(Not run)

ipumsr documentation built on Oct. 20, 2023, 5:10 p.m.