merge_eddy: Merge Regular Date-Time Sequence and Data Frames
In lsigut/openeddy: Post-process eddy covariance data with ease

merge_eddy

R Documentation

Merge Regular Date-Time Sequence and Data Frames

Description

Merge generated regular date-time sequence with single or multiple data frames.

Usage

merge_eddy(
  x,
  start = NULL,
  end = NULL,
  check_dupl = TRUE,
  interval = NULL,
  format = "%Y-%m-%d %H:%M",
  tz = "GMT"
)

Arguments

`x`	List of data frames, each with `"timestamp"` column of class `"POSIXt"`. Optionally with attributes `varnames` and `units` for each column.
`start`, `end`	A value specifying the first (last) value of the generated date-time sequence. If `NULL`, `min` (`max`) is taken across the values in `"timestamp"` columns across `x` elements. If numeric, the value specifies the year for which the first (last) date-time value will be generated, considering given time `interval` and convention of assigning of measured records to the end of the time interval. Otherwise, character representation of specific half hour is expected with given `format` and `tz`.
`check_dupl`	A logical value specifying whether rows with duplicated date-time values checked across `x` elements should be excluded before merging.
`interval`	A numeric value specifying the time interval (in seconds) of the generated date-time sequence.
`format`	A character string. Format of `start` (`end`) if provided as a character string.The default `format` is `"%Y-%m-%d %H:%M"`.
`tz`	A time zone (see `time zones`) specification to be used for the conversion of `start` (`end`) if provided as a character string.

Details

The primary purpose of merge_eddy is to combine chunks of data vertically along their column "timestamp" with date-time information. This "timestamp" is expected to be regular with given time interval. Resulting data frame contains added rows with expected date-time values that were missing in "timestamp" column, followed by NAs. In case that check_dupl = TRUE and "timestamp" values across x elements overlap, detected duplicated rows are removed (the order in which duplicates are evaluated depends on the order of x elements). A special case when x has only one element allows to fill missing date-time values in "timestamp" column of given data frame. Storage mode of "timestamp" column is set to be integer instead of double. This simplifies application of round_df but could lead to unexpected behavior if the date-time information is expected to resolve fractional seconds.

The list of data frames, each with column "timestamp", is sequentially merged using Reduce. A (full) outer join, i.e. merge(..., all = TRUE), is performed to keep all columns of x elements. The order of x elements can affect the result. Duplicated column names within x elements are corrected using make.unique. The merged data frame is then merged on the validated "timestamp" column that can be either automatically extracted from x or manually specified.

For horizontal merging (adding columns instead of rows) check_dupl = FALSE must be set but simple merge could be preferred. Combination of vertical and horizontal merging should be avoided as it depends on the order of x elements and can lead to row duplication. Instead, data chunks from different data sources should be first separately vertically merged and then merged horizontally in a following step.

Value

A data frame with attributes varnames and units for each column, containing date-time information in column "timestamp".

Examples

set.seed(123)
n <- 20 # number of half-hourly records in one non-leap year
tstamp <- seq(c(ISOdate(2021,3,20)), by = "30 mins", length.out = n)
x <- data.frame(
timestamp = tstamp,
H = rf(n, 1, 2, 1),
LE = rf(n, 1, 2, 1),
qc_flag = sample(c(0:2, NA), n, replace = TRUE)
)
openeddy::varnames(x) <- c("timestamp", "sensible heat", "latent heat",
                           "quality flag")
openeddy::units(x) <- c("-", "W m-2", "W m-2", "-")
str(x)
y1 <- ex(x, 1:10)
y2 <- ex(x, 11:20)
y <- merge_eddy(list(y1, y2))
str(y)
attributes(y$timestamp)
typeof(y$timestamp)

# Duplicated rows and different number of columns
z1 <- ex(x, 8:20, 1:3)
z <- merge_eddy(list(y1, z1))

lsigut/openeddy documentation built on Jan. 15, 2025, 8:14 a.m.