neon_read | R Documentation |
read in neon tabular data
neon_read(
table = NA,
product = NA,
site = NA,
start_date = NA,
end_date = NA,
ext = NA,
timestamp = NA,
release = NA,
dir = neon_dir(),
files = NULL,
sensor_metadata = TRUE,
keep_filename = FALSE,
altrep = FALSE,
...
)
table |
the name of a downloaded NEON table in the store, see neon_index |
product |
A NEON |
site |
4-letter site code(s) to filter on. Leave as |
start_date |
Download only files as recent as ( |
end_date |
Download only files up to end_date ( |
ext |
only match files with this file extension(s) |
timestamp |
only match timestamps prior this. See details in |
release |
Select only data files associated with a particular release tag, see https://www.neonscience.org/data-samples/data-management/data-revisions-releases, e.g. "RELEASE-2021". Releases are associated with a specific DOI and the promise that files associated with a particular release will not change. |
dir |
Location where files should be downloaded. By default will
use the appropriate applications directory for your system
(see |
files |
optionally, specify a vector of file paths directly (e.g. as
provided from neon_index) and specify |
sensor_metadata |
logical, default TRUE. Should we add metadata fields from file names of sensor data into the table? Adds DomainID, SiteID, horizontalPosition, verticalPosition, and publicationDate. Results in slower parsing. |
keep_filename |
Should we include a column indicating the original
file name for each row? Can be a useful source of additional metadata that
NEON may omit from the raw files (i.e. |
altrep |
enable or disable altrep. Logical, default |
... |
additional arguments to vroom::vroom, can usually be omitted. |
NEON's tabular data files are separated out into separate .csv
files for each site for each month of sampling. In principle,
each file has identical columns. vroom::vroom can read in a
data table that has been sharded into many files like this much
much faster than other parsers can read in each table iteratively,
(and thus can greatly out-perform the 'stacking" methods in neonUtilities
).
When reading in very large numbers of files, it may be helpful to set
altrep = FALSE
to opt out of vroom
's fast altrep mechanism, which
can cause neon_read()
to fail when stacking thousands of files.
Unfortunately, not all datasets are entirely consistent in their use
of columns. neon_read
works around this by parsing such tables in
groups of matching schema, which is still reasonably fast.
NEON sensor data products currently do not include important metadata columns
containing DomainID, SiteID, horizontalPosition, verticalPosition, and
publicationDate in the data files themselves, but only encode this in the
in the raw file names. All though these values are shared across a raw
data file, this information is lost when stacking the tables unless explicit
columns are added to the data. This requires us to parse the files
one-by-one, which is much slower. By default this information is added to
the table, altering the stacked table schema from that of the raw table.
Disable this behavior by setting sensor_metadata = FALSE
. Future
NEON sensor data products may start including this information in
the raw data files, as is already the case for observational data.
neon_read("brd_countdata-expanded")
## Sensor inputs will add metadata columns by default
neon_read("waq_instantaneous", site = c("CRAM","SUGG"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.