selection_language: tidyselect selection language in ipumsr

selection_languageR Documentation

tidyselect selection language in ipumsr

Description

Slightly modified implementation of tidyselect selection language in ipumsr.

Syntax

In general, the selection language in ipumsr operates the same as in tidyselect.

Where applicable, variables can be selected with:

  • A character vector of variable names (c("var1", "var2"))

  • A bare vector of variable names (c(var1, var2))

  • A selection helper from tidyselect (starts_with("var")). See below for a list of helpers.

Primary differences

  • tidyselect selection is generally intended for use with column variables in data.frame-like objects. In contrast, ipumsr allows selection language syntax in other cases as well (for instance, when selecting files from within a .zip archive). ipumsr functions will indicate whether they support the selection language.

  • Selection with where() is not consistently supported.

Selection helpers (from tidyselect)

  • var1:var10: variables lying between var1 on the left and var10 on the right.

  • starts_with("a"): names that start with "a"

  • ends_with("z"): names that end with "z"

  • contains("b"): names that contain "b"

  • matches("x.y"): names that match regular expression x.y

  • num_range(x, 1:4): names following the pattern ⁠x1, x2, ..., x4⁠

  • all_of(vars)/any_of(vars): matches names stored in the character vector vars. all_of(vars) will error if the variables aren't present; any_of(vars) will match just the variables that exist.

  • everything(): all variables

  • last_col(): furthest column to the right

Operators for combining those selections:

  • !selection: only variables that don't match selection

  • selection1 & selection2: only variables included in both selection1 and selection2

  • selection1 | selection2: all variables that match either selection1 or selection2

Examples

cps_file <- ipums_example("cps_00157.xml")

# Load 3 variables by name
read_ipums_micro(
  cps_file,
  vars = c("YEAR", "MONTH", "PERNUM"),
  verbose = FALSE
)

# "Bare" variables are supported
read_ipums_micro(
  cps_file,
  vars = c(YEAR, MONTH, PERNUM),
  verbose = FALSE
)

# Standard tidyselect selectors are also supported
read_ipums_micro(cps_file, vars = starts_with("ASEC"), verbose = FALSE)

# Selection methods can be combined
read_ipums_micro(
  cps_file,
  vars = c(YEAR, MONTH, contains("INC")),
  verbose = FALSE
)

read_ipums_micro(
  cps_file,
  vars = starts_with("S") & ends_with("P"),
  verbose = FALSE
)

# Other selection arguments also support this syntax.
# For instance, load a particular file based on a tidyselect match:
read_nhgis(
  ipums_example("nhgis0731_csv.zip"),
  file_select = contains("nominal_state"),
  verbose = FALSE
)

ipumsr documentation built on Sept. 12, 2024, 7:38 a.m.