read_eddy: Data Input with Units

View source: R/Data_handling.R

read_eddyR Documentation

Data Input with Units

Description

Reads tabular data from a file and represents them as data frame. Attributes varnames (representing variable names) and units (representing units of measurement or space efficient metadata) are assigned to each column.

Usage

read_eddy(
  file,
  header = TRUE,
  units = TRUE,
  sep = ",",
  quote = "\"",
  dec = ".",
  units_fill = "-",
  na.strings = c("NA", "-9999.0", "-9999"),
  colClasses = NA,
  nrows = -1,
  skip = 0,
  fill = TRUE,
  comment.char = "",
  check_input = TRUE,
  ...
)

Arguments

file

The file name with input data to be read. It can be a file name inside the current working directory, relative or absolute path or connection. See read.table for more detailed description. Connections to anonymous file or clipboard are not allowed. To read from clipboard use "clipboard" string instead of connection.

header

A logical value indicating whether the names of variables are included as the first line of the input file. If FALSE, column names and variable names of attribute varnames will be automatically generated.

units

A logical value indicating whether the units for respective variables are included one line above the data region in the input file. If FALSE, the units attribute of each column will be set to units_fill string representing missing values.

sep

A character that separates the fields of input. Default separator for CSV files is ",". See read.table for other options.

quote

A character string that contains the quoting characters.

dec

A character that specifies decimal mark used in the input.

units_fill

A character string that represents missing value of units attribute.

na.strings

A character vector of strings representing NA values in the input file. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.

colClasses

A character vector of classes to be assumed for the columns and recycled as necessary. See read.table for more detailed description.

nrows

An integer specifying the maximum number of rows to read in. Negative and other invalid values are ignored.

skip

An integer. The number of lines to skip in the input file before reading data.

fill

A logical value. If set to TRUE (default), the rows that have unequal length will be corrected with blank fields.

comment.char

A character that is interpreted as comment or empty string to turn off this behaviour.

check_input

A logical value that determines if values in the input will be checked for erroneous "-10000" value. If TRUE (default), any encountered "-10000" value in the data will trigger an error message.

...

Further arguments to be passed to the internal read.table function

Details

read_eddy extends the possibilities of read.table so it can also read units of measurement. However, it uses default arguments of read.csv to accomodate loading of data for the most common input type. read_eddy also sets useful defaults common for eddy covariance (eddy) data. Missing values are often reported as "-9999.0" or "-9999" by post-processing software, therefore na.strings = c("NA", "-9999.0", "-9999") is used as default.

Attribute varnames contains original variable name of respective column without automated conversion that is done for column name. The main purpose of varnames attribute is to provide control over conversion of original column names and keep variable name of a vector when it is separated from the original data frame.

Units are expected to be one line below the header in the input file. Instead of units of measurement, it is possible to include any space efficient metadata that is relevant to the respective variables. E.g. format of timestamp or structure of coded variable. One line below units and further in the input file is the region with data. Any missing values or blank fields (converted to empty strings) in the line interpreted as units will be substituted by units_fill string instead.

The automated check for "-10000" values in the data region is provided by check_input = TRUE (default) and produces error message if the value is found. The "-10000" values can be introduced to the dataset by rounding "-9999" values due to the incorrect file conversion or data manipulation. Using check_input = FALSE will skip the check (this could improve the performance for large input files).

Value

A data frame is produced with additional attributes varnames and units assigned to each respective column.

See Also

read.table for information about further arguments passed to read.table.

write_eddy to save data frame with units attributes specified for each column.

Examples

## Storing timestamp metadata (format) and unit of height.
xx <- read_eddy(text =
"timestamp,height
%d.%m.%Y,m
24.1.2015,1.70
24.1.2016,1.72")
str(xx)
(varnames <- varnames(xx))
(units <- units(xx))

## Note that 'varnames' and 'units' attributes are dropped when you subset
## rows but unchanged if you subset columns:
str(xx[, 1])
str(yy <- xx[1, ])
varnames(yy) <- varnames
units(yy) <- units
str(yy)

## Computations with columns also drop 'varnames' and 'units' attributes:
xx$date <- as.Date(xx$timestamp, units(xx$timestamp))
str(xx)

## Varnames store the original header without automated conversions:
aa <- read_eddy(text =
"u*,(z-d)/L,x_70%
m s-1,-,m
1.412908015,-4.05E-02,153.7963035")
str(aa)

## header = FALSE and units = FALSE:
bb <- read_eddy(header = FALSE, units = FALSE, text =
"24.1.2015,1.70
24.1.2016,1.72")
str(bb)


lsigut/openeddy documentation built on Aug. 5, 2023, 12:25 a.m.