read_data: Import Data with Units

Description Usage Arguments Details Value

Description

Reads tabular data from a file and represents them as data frame. Attributes varnames (representing variable names) and units (representing units of measurement or space efficient metadata) are assigned to each column.

Usage

1
2
3
4
5
read_data(file, header = TRUE, units = TRUE, sep = ",", dec = ".",
  quote = "\"", units_fill = "-", nrows = -1, skip = 0,
  na.strings = c("NA", "-9999.0", "-9999"), fill = TRUE,
  check_input = TRUE, correct = TRUE, comment.char = "",
  col_classes = NA, tz = metadata$tz_name, ...)

Arguments

file

The file name with input data to be read. It can be a file name inside the current working directory, relative or absolute path or connection. See read.table for more detailed description. Connections to anonymous file or clipboard are not allowed. To read from clipboard use "clipboard" string instead of connection.

header

A logical value indicating whether the names of variables are included as the first line of the input file. If FALSE, column names and variable names of attribute varnames will be automatically generated.

units

A logical value indicating whether the units for respective variables are included one line above the data region in the input file. If FALSE, the units attribute of each column will be set to units_fill string representing missing values.

sep

A character that separates the fields of input. Default separator for CSV files is ",". See read.table for other options.

quote

A character string that contains the quoting characters.

units_fill

A character string that represents missing value of units attribute.

nrows

An integer specifying the maximum number of rows to read in. Negative and other invalid values are ignored.

skip

An integer. The number of lines to skip in the input file before reading data.

na.strings

A character vector of strings representing NA values in the input file. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.

check_input

A logical value that determines if values in the input will be checked for erroneous "-10000" value. If TRUE (default), any encountered "-10000" value in the data will trigger an error message.

correct

A logical value that determines if units and varnames should undergo standard formatting corrections. Defaults to TRUE and prints a message if names have been properly corrected.

comment.char

A character that is interpreted as comment or empty string to turn off this behaviour.

...

Further arguments passed to read.table

Details

read_data extends the possibilities of read.table so it can also read units of measurement. However, it uses default arguments of read.csv to accomodate loading of data for the most common input type. read_data also sets useful defaults common for eddy covariance (eddy) data. Missing values are often reported as "-9999.0" or "-9999" by post-processing software, therefore na.strings = c("NA", "-9999.0", "-9999") is used as default.

Attribute varnames contains original variable name without automated corrections/simplifications. This provides control over conversion of original column names and keeps variable names of vectors when they are separated from the original data frame.

Units are expected to be one line below the header in the input file. Instead of units of measurement, it is possible to include any space efficient metadata that is relevant to the respective variables. E.g. format of timestamp or structure of coded variable. One line below units and further in the input file is the region with data. Any missing values or blank fields (converted to empty strings) in the line interpreted as units will be substituted by units_fill string instead.

The automated check for "-10000" values in the data region is provided by check_input = TRUE (default) and produces error message if the value is found. The "-10000" values can be introduced to the dataset by rounding "-9999" values due to the incorrect file conversion or data manipulation. Using check_input = FALSE will skip the check (this could improve performance for large input files).

Adapted from: openeddy - read_eddy

Value

A data frame is produced with additional attributes varnames and units assigned to each respective column.


grahamstewart12/tidyflux documentation built on June 4, 2019, 7:44 a.m.