ms_generate_attribution: Easily cite and/or acknowledge primary sources of the...

View source: R/ms_generate_attribution.R

ms_generate_attributionR Documentation

Easily cite and/or acknowledge primary sources of the MacroSheds dataset

Description

Also returns contact information, URLs, DOIs, and intellectual rights details, all based on an input data.frame in MacroSheds format.

Usage

ms_generate_attribution(
  d,
  chem_source = "both",
  abide_by = "requirements only",
  include_all_ws_attr = FALSE,
  write_to_dir = NULL
)

Arguments

d

data.frame. A data.frame in MacroSheds format (see details). If d is omitted, all attribution records will be returned and include_all_ws_attr is treated as TRUE.

chem_source

character. Whether d includes "stream" or "precip" chemistry data (as these cannot be distinguished by ms_generate_attribution). If both are included, use "both". If neither, this parameter will be ignored. If d is not provided, this parameter pertains to the full MacroSheds dataset.

abide_by

character. Primary source Intellectual Rights (IR) stipulations may use language like "should" or "encouraged to", or they might use "must", "required to", etc. If you set this parameter to "suggestions", all IR clauses will be returned. If you set it to "requirements only", clauses with mild language will be filtered out. See details.

include_all_ws_attr

logical. If TRUE, attribution information will be generated for all watershed attribute (ws_attr) products. If FALSE, only products included in d will be attributed. TRUE is useful if you're using a large number of columns from ws_attr_summaries (watershed attribute summaries), which can't be easily converted to MacroSheds format. If d is omitted, include_all_ws_attr is treated as TRUE.

write_to_dir

character. A path to an existing directory where attribution files will be written. A new directory called macrosheds_attribution_information will be created there. If NULL (the default), all attribution information will be returned as a list. If specified, some information will still be returned as a list, including primary source contact information and DOIs. See Value.

Details

MacroSheds format (only site_code and var are required in inputs to this function):

column definition
date Date in YYYY-mm-dd
site_code A unique identifier for each MacroSheds site, identical to primary source site code where possible. See ms_load_sites().
grab_sample Boolean integer indicating whether the observation was obtained via grab sample or installed sensor. 1 = TRUE (grab sample), 0 = FALSE (installed sensor).
var Variable code. See ms_load_variables().
val Data value. See ms_load_variables() for units.
ms_status Boolean integer. 0 = clean value. 1 = questionable value. See "Technical Validation" section of the MacroSheds data paper for details.
ms_interp Boolean integer. 0 = measured or imputed by primary source. 1 = interpolated by MacroSheds. See "Temporal Imputation and Aggregation" section of the MacroSheds data paper for details.
val_err The combined standard uncertainty associated with the corresponding data point, if estimable. See "Detection Limits and Propagation of Uncertainty" section of the MacroSheds data paper for details.

Core time-series datasets generated by ms_load_product are already in MacroSheds format.

Note that the world of data IR is still being constructed, and there's a lot of legal gray area around whether end-users of data syntheses like MacroSheds are held to the same expectations as we were when we assembled MacroSheds. We recommend acknowledging/citing our primary sources in any case. Whether you adhere to expectations about e.g. contacting primary sources to ask permission to use their data... well, we leave that up to you, because the fact is you're using products derived from their data. If you're using much or all of the MacroSheds dataset for an analysis, it's not reasonable to ask you to contact 20 different institutions and ask for various permissions. However, if you're only using one or a few MacroSheds domains in your analysis, it seems only right that you fulfill all of their IR clauses, just as if you were interacting directly with primary source data. Still, pay special attention to the noncommercial and sharealike licenses attached to some of the MacroSheds domains. These licenses are legally black-and-white, and you can definitely get in trouble if you disregard them.

Value

Returns a list. If write_to_dir is not provided, this list contains the full output:

  • acknowledgements: a string of acknowledgement text

  • bibliography: a vector of BibTeX entries

  • intellectual_rights_explanations: a vector of definitions pertaining to intellectual_rights_notifications

  • intellectual_rights_notifications: a list of tibbles containing special notifications

  • full_details_timeseries: a tibble containing full IR, URL, and contact information for each primary source time-series dataset

  • full_details_ws_attr: a tibble containing full IR, URL, and contact information for each primary source watershed attribute dataset

If write_to_dir is provided, this list contains only full_details_timeseries and full_details_ws_attr, and all other information is written to files in write_to_dir/macrosheds_attribution_information.

Author(s)

Mike Vlah, vlahm13@gmail.com

Wes Slaughter

See Also

ms_download_core_data()

Examples

d1 <- macrosheds::ms_load_product(
    macrosheds_root = 'my/macrosheds/root/',
    prodname = 'stream_chemistry',
    domains = c('hbef', 'niwot', 'santee'))

d2 <- macrosheds::ms_load_product(
    macrosheds_root = 'my/macrosheds/root/',
    prodname = 'discharge',
    domains = 'hbef')

ms_generate_attribution(bind_rows(d1, d2), chem_source = 'precip')

MacroSHEDS/macrosheds documentation built on Oct. 30, 2024, 11:15 a.m.