define_extract_micro: Define an extract request for an IPUMS microdata collection

View source: R/api_define_extract.R

define_extract_microR Documentation

Define an extract request for an IPUMS microdata collection

Description

Define the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.

The IPUMS API currently supports the following microdata collections:

  • IPUMS USA

  • IPUMS CPS

  • IPUMS International

  • IPUMS Time Use (ATUS, AHTUS, MTUS)

  • IPUMS Health Surveys (NHIS, MEPS)

Note that not all extract request parameters and options apply to all collections. For a summary of supported features by collection, see the IPUMS API documentation.

Learn more about the IPUMS API in vignette("ipums-api") and microdata extract definitions in vignette("ipums-api-micro").

Usage

define_extract_micro(
  collection,
  description,
  samples,
  variables = NULL,
  time_use_variables = NULL,
  sample_members = NULL,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

Arguments

collection

Code for the IPUMS collection represented by this extract request. See ipums_data_collections() for supported microdata collection codes.

description

Description of the extract.

samples

Vector of samples to include in the extract request. Use get_sample_info() to identify sample IDs for a given collection.

variables

Vector of variable names or a list of detailed variable specifications to include in the extract request. Use var_spec() to create a var_spec object containing a detailed variable specification. See examples.

time_use_variables

Vector of names of IPUMS-defined time use variables or a list of specifications for user-defined time use variables to include in the extract request. Use tu_var_spec() to create a tu_var_spec object containing a time use variable specification. See examples.

Time use variables are only available for IPUMS Time Use collections ("atus", "ahtus", and "mtus").

sample_members

Indication of whether to include additional sample members in the extract request. If provided, must be one of "include_non_respondents", "include_household_members", or both.

Sample member selection is only available for the IPUMS ATUS collection ("atus").

data_format

Format for the output extract data file. Either "fixed_width" or "csv".

Note that while "stata", "spss", and "sas9" are also accepted, these file formats are not supported by ipumsr data-reading functions.

Defaults to "fixed_width".

data_structure

Data structure for the output extract data.

  • "rectangular" provides data in which every row has the same record type (determined by "rectangular_on"), with variables from other record types written onto associated records of the chosen type (e.g. household variables written onto person records).

  • "hierarchical" provides data that include rows of differing record types, with records ordered according to their hierarchical structure (e.g. each person record is followed by the activity records for that person).

  • "household_only" provides household records only. This data structure is only available for the IPUMS USA collection ("usa").

Defaults to "rectangular".

rectangular_on

If data_structure is "rectangular", records on which to rectangularize. One of "P" (person), "A" (activity), "I" (injury) or "R" (round).

Defaults to "P" if data_structure is "rectangular" and NULL otherwise.

case_select_who

Indication of how to interpret any case selections included for variables in the extract definition.

  • "individuals" includes records for all individuals who match the specified case selections.

  • "households" includes records for all members of each household that contains an individual who matches the specified case selections.

Defaults to "individuals". Use var_spec() to add case selections for specific variables.

data_quality_flags

Set to TRUE to include data quality flags for all applicable variables in the extract definition. This will override the data_quality_flags specification for individual variables in the definition.

Use var_spec() to add data quality flags for specific variables.

Value

An object of class micro_extract containing the extract definition.

See Also

submit_extract() to submit an extract request for processing.

save_extract_as_json() and define_extract_from_json() to share an extract definition.

Examples

usa_extract <- define_extract_micro(
  collection = "usa",
  description = "2013-2014 ACS Data",
  samples = c("us2013a", "us2014a"),
  variables = c("SEX", "AGE", "YEAR")
)

usa_extract

# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_micro(
  collection = "usa",
  description = "Example USA extract definition",
  samples = c("us2013a", "us2014a"),
  variables = var_spec(
    "SEX",
    case_selections = "2",
    attached_characteristics = c("mother", "father")
  )
)

# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_micro(
  collection = "cps",
  description = "Example CPS extract definition",
  samples = c("cps2020_02s", "cps2020_03s"),
  variables = list(
    var_spec("AGE", data_quality_flags = TRUE),
    var_spec("SEX", case_selections = "2"),
    "RACE"
  )
)

cps_extract

# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")

my_vars <- purrr::map(
  var_names,
  ~ var_spec(.x, attached_characteristics = "mother")
)

ipumsi_extract <- define_extract_micro(
  collection = "ipumsi",
  description = "Extract definition with predefined variables",
  samples = c("br2010a", "cl2017a"),
  variables = my_vars
)

# Extract specifications can be indexed by name
names(ipumsi_extract$samples)

names(ipumsi_extract$variables)

ipumsi_extract$variables$AGE

# IPUMS Time Use collections allow selection of IPUMS-defined and
# user-defined time use variables:
define_extract_micro(
  collection = "atus",
  description = "ATUS extract with time use variables",
  samples = "at2007",
  time_use_variables = list(
    "ACT_PCARE",
    tu_var_spec(
      "MYTIMEUSEVAR",
      owner = "example@example.com"
    )
  )
)

## Not run: 
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)

## End(Not run)

ipumsr documentation built on Sept. 12, 2024, 7:38 a.m.