subset_surveys: Subset and optionally harmonize surveys

View source: R/subset_surveys.R

subset_surveysR Documentation

Subset and optionally harmonize surveys

Description

Subset one or more surveys by retaining a specified set of variables. Subsetting can be performed either on surveys already loaded in memory or directly from survey files on disk.

If a crosswalk table is supplied, variables are selected based on the variables listed for each survey in the crosswalk, and variable names can optionally be harmonized using 'var_name_target'.

This function replaces the deprecated helpers [subset_waves()] and [subset_save_surveys()].

Usage

subset_surveys(
  survey_list,
  survey_paths = NULL,
  rowid = "rowid",
  subset_name = "subset",
  subset_vars = NULL,
  crosswalk_table = NULL,
  import_path = NULL,
  export_path = NULL
)

subset_waves(waves, subset_vars = NULL)

subset_save_surveys(
  crosswalk_table,
  subset_name = "subset",
  survey_list = NULL,
  subset_vars = NULL,
  survey_paths = NULL,
  import_path = NULL,
  export_path = NULL
)

Arguments

survey_list

A list of survey objects created by [read_surveys()]. If 'NULL', surveys are read from disk.

survey_paths

A character vector of full file paths to survey files. Used when 'survey_list' is 'NULL'.

rowid

Name of the unique observation identifier column. Defaults to '"rowid"'.

subset_name

Character string appended to filenames of subsetted surveys. Defaults to '"subset"'.

subset_vars

Character vector of variable names to retain. If 'NULL', all variables are retained.

crosswalk_table

Optional crosswalk table created with [crosswalk_table_create()]. If supplied, variables are selected per survey based on 'var_name_orig', and variable names may be harmonized using 'var_name_target'.

import_path

Optional directory containing survey files. Used to resolve filenames when subsetting from disk.

export_path

Optional directory where subsetted surveys are saved as '.rds' files. If 'NULL', surveys are returned in memory.

waves

A list of surveys imported with [read_surveys()].

Details

The function supports multiple workflows:

* **In-memory subsetting** using 'survey_list' * **File-based subsetting** using 'survey_paths' or 'import_path' * **Crosswalk-driven subsetting**, where variables are selected per survey using a crosswalk table created by [crosswalk_table_create()]

If 'export_path' is provided, subsetted surveys are written to disk as '.rds' files. Otherwise, subsetted surveys are returned in memory.

Value

Either: * a list of subsetted survey objects (if 'export_path = NULL'), or * a character vector of filenames written to 'export_path'.

See Also

[crosswalk_table_create()], [harmonize_survey_variables()], [read_surveys()]

Examples

examples_dir <- system.file("examples", package = "retroharmonize")
survey_files <- dir(examples_dir, pattern = "\\.rds$")

surveys <- read_surveys(
  file.path(examples_dir, survey_files),
  export_path = NULL
)

subset_surveys(
  survey_list = surveys,
  subset_vars = c("rowid", "isocntry", "qa10_1", "qa14_1"),
  subset_name = "example_subset"
)


retroharmonize documentation built on Jan. 14, 2026, 9:08 a.m.