read_data: Read published data

View source: R/read_data.R

read_dataR Documentation

Read published data

Description

Read published data

Usage

read_data(
  id = NULL,
  parse_datetime = TRUE,
  unique_keys = FALSE,
  site = "all",
  startdate = NA,
  enddate = NA,
  package = "basic",
  check.size = FALSE,
  nCores = 1,
  forceParallel = FALSE,
  token = NA,
  neon.data.save.dir = NULL,
  neon.data.read.path = NULL,
  ...,
  from = NULL,
  format = "new"
)

Arguments

id

(character) Identifier of dataset to read. Identifiers are listed in the "id" column of the search_data() output. Older versions of datasets can be read, but a warning is issued.

parse_datetime

(logical) Parse datetime values if TRUE, otherwise return as character strings.

unique_keys

(logical) Whether to create globally unique primary keys (and associated foreign keys). Useful in maintaining referential integrity when working with multiple datasets. If TRUE, id is appended to each table's primary key and associated foreign key. Default is FALSE.

site

(character) For NEON data, a character vector of site codes to filter data on. Sites are listed in the "sites" column of the search_data() output. Defaults to "all", meaning all sites.

startdate

(character) For NEON data, the start date to filter on in the form YYYY-MM. Defaults to NA, meaning all available dates.

enddate

(character) For NEON data, the end date to filter on in the form YYYY-MM. Defaults to NA, meaning all available dates.

package

(character) For NEON data, either 'basic' or 'expanded', indicating which data package to download. Defaults to basic.

check.size

(logical) For NEON data, should the user approve the total file size before downloading? Defaults to FALSE.

nCores

(integer) For NEON data, the number of cores to parallelize the stacking procedure. Defaults to 1.

forceParallel

(logical) For NEON data, if the data volume to be processed does not meet minimum requirements to run in parallel, this overrides. Defaults to FALSE.

token

(character) For NEON data, a user specific API token (generated within neon.datascience user accounts).

neon.data.save.dir

(character) For NEON data, an optional and experimental argument (i.e. may not be supported in future releases), indicating the directory where NEON source data should be saved upon download from the NEON API. Data are downloaded using neonUtilities::loadByProduct() and saved in this directory as an .rds file. The filename will follow the format <NEON data product ID>_<timestamp>.rds

neon.data.read.path

(character) For NEON data, an optional and experimental argument (i.e. may not be supported in future releases), defining a path to read in an .rds file of 'stacked NEON data' from neonUtilities::loadByProduct(). See details below for more information.

...

For NEON data, other arguments to neonUtilities::loadByProduct()

from

(character) Full path of file to be read (if .rds), or path to directory containing saved datasets (if .csv).

format

(character) Format of returned object, which can be: "new" (the new implementation) or "old" (the original implementation; deprecated). In the new format, the top most level of nesting containing the "id" field has been moved to the same level as the "tables", "metadata", and "validation_issues" fields.

Details

Validation checks are applied to each dataset ensuring it complies with the ecocomDP model. A warning is issued when any validation checks fail. All datasets are returned, even if they fail validation.

Column classes are coerced to those defined in the ecocomDP specification.

Validation happens each time files are read, from source APIs or local environments.

Details for read_data() function regarding NEON data: Using this function to read data with an id that begins with "neon.ecocomdp" will result in a query to download NEON data from the NEON Data Portal API using neonUtilities::loadByProduct(). If a query includes provisional data (or if you are not sure if the query includes provisional data), we recommend saving a copy of the data in the original format provided by NEON in addition to the derived ecocomDP data package. To do this, provide a directory path using the neon.data.read.path argument. For example, the query my_ecocomdp_data <- read_data(id = "neon.ecocomdp.10022.001.001", neon.data.save.dir = "my_neon_data") will download the data for NEON Data Product ID DP1.10022.001 (ground beetles in pitfall traps) and convert it to the ecocomDP data model. In doing so, a copy of the original NEON download will be saved in the directory "my_ neon_data with the filename "DP1.10022.001_<timestamp>.RDS" and the derived data package in the ecocomDP format will be stored in your R environment in an object named "my_ecocomdp_data". Further, if you wish to reload a previously downloaded NEON dataset into the ecocomDP format, you can do so using my_ecocomdp_data <- read_data(id = "neon.ecocomdp.10022.001.001", neon.data.read.path = "my_neon_data/DP1.10022.001_<timestamp>.RDS")

Provisional NEON data. Despite NEON's controlled data entry, at times, errors are found in published data; for example, an analytical lab may adjust its calibration curve and re-calculate past analyses, or field scientists may discover a past misidentification. In these cases, Level 0 data are edited and the data are re-processed to Level 1 and re-published. Published data files include a time stamp in the file name; a new time stamp indicates data have been re-published and may contain differences from previously published data. Data are subject to re-processing at any time during an initial provisional period; data releases are never re-processed. All records downloaded from the NEON API will have a "release" field. For any provisional record, the value of this field will be "PROVISIONAL", otherwise, this field will have a value indicating the version of the release to which the record belongs. More details can be found at https://www.neonscience.org/data-samples/data-management/data-revisions-releases.

Value

(list) A dataset with the structure:

  • id - Dataset identifier

  • metadata - List of info about the dataset. NOTE: This object is underdevelopment and content may change in future releases.

  • tables - List of dataset tables as data.frames.

  • validation_issues - List of validation issues. If the dataset fails any validation checks, then descriptions of each issue are listed here.

Note

This function may not work between 01:00 - 03:00 UTC on Wednesdays due to regular maintenance of the EDI Data Repository.

Examples

## Not run: 
# Read from EDI
dataset <- read_data("edi.193.5")
str(dataset, max.level = 2)

# Read from NEON (full dataset)
dataset <- read_data("neon.ecocomdp.20120.001.001")

# Read from NEON with filters (partial dataset)
dataset <- read_data(
 id = "neon.ecocomdp.20120.001.001", 
 site = c("COMO", "LECO", "SUGG"),
 startdate = "2017-06", 
 enddate = "2019-09",
 check.size = FALSE)

# Read with datetimes as character
dataset <- read_data("edi.193.5", parse_datetime = FALSE)
is.character(dataset$tables$observation$datetime)

# Read from saved .rds
save_data(dataset, tempdir())
dataset <- read_data(from = paste0(tempdir(), "/dataset.rds"))

# Read from saved .csv
save_data(dataset, tempdir(), type = ".csv")# Save as .csv
dataset <- read_data(from = tempdir())

## End(Not run)


ecocomDP documentation built on Sept. 11, 2024, 6:58 p.m.