get_whdh_path: Generate file paths for the Triple Billions WHDH data lake

View source: R/utils_whdh.R

get_whdh_pathR Documentation

Generate file paths for the Triple Billions WHDH data lake

Description

get_whdh_path() simplifies the process of generating accurate file paths for downloading/uploading files from the Triple Billions WHDH data lake.

Usage

get_whdh_path(
  operation = c("download", "upload"),
  data_type = c("wrangled_data", "projected_data", "final_data", "ingestion_data"),
  billion = c("all", "hep", "hpop", "uhc"),
  ind_codes = "all",
  file_names = NULL,
  experiment = "unofficial"
)

Arguments

operation

(string) Either "download" or "upload".

data_type

(string) The type of data to load.

  • wrangled_data (default): raw data that has been wrangled into a suitable form for analysis.

  • projected_data: data that has been fully projected to the target year but has not yet been transformed or calculated upon.

  • final_data: the complete set of billions data with transformed values, contributions, and all calculations available.

  • ingestion_data: raw data in its original form as received from the technical program, GHO, or other sources. These files have not been wrangled or modified in any way.

billion

(string) One of "all" (default), "hep", "hpop", or "uhc". If "all", the file paths for all indicators in all three bilions are returned.

ind_codes

(character vector) The name of the indicator (or indicators) to load data for. If all, returns paths for all indicators for a given billion. If billion = "all", this argument is ignored and the file paths for all indicators in all three bilions are returned.

file_names

(character vector) The name(s) of the file(s) to download. NULL by default. Ignored if either billion = "all" or ind_codes = "all".

experiment

(string) Either NULL or a string ("unofficial" by default).

  • If NULL, the root folder for the data layers is the 3B folder (i.e., where the "official" data is stored (e.g., ⁠3B/...⁠).

  • If a string, the root folder for the data layers is a sub-folder within the Sandbox layer of the 3B data lake (e.g., if experiment = "my_exp", then paths would be of the form ⁠3B/Sandbox/my_exp/Silver/...⁠)

Details

Using this function when working with the data lake is highly recommended as it ensures file paths abide by the established standards and directory structure for the data lake.

Value

A character vector.

See Also

Wrangle data functions add_missing_xmart_rows(), get_data_lake_name(), has_xmart_cols(), save_gho_backup_to_whdh(), save_wrangled_output(), wrangle_gho_data(), wrangle_gho_rural_urban_data(), wrangle_unsd_data(), xmart_col_types(), xmart_cols()


gpw13/billionaiRe documentation built on Sept. 27, 2024, 10:05 p.m.