merge_eddy: Merge Regular Date-Time Sequence and Data Frames

View source: R/Data_handling.R

merge_eddyR Documentation

Merge Regular Date-Time Sequence and Data Frames

Description

Merge generated regular date-time sequence with single or multiple data frames.

Usage

merge_eddy(
  x,
  start = NULL,
  end = NULL,
  check_dupl = TRUE,
  interval = NULL,
  format = "%Y-%m-%d %H:%M",
  tz = "GMT"
)

Arguments

x

List of data frames, each with "timestamp" column of class "POSIXt". Optionally with attributes varnames and units for each column.

start, end

A value specifying the first (last) value of the generated date-time sequence. If NULL, min (max) is taken across the values in "timestamp" columns across x elements. If numeric, the value specifies the year for which the first (last) date-time value will be generated, considering given time interval and convention of assigning of measured records to the end of the time interval. Otherwise, character representation of specific half hour is expected with given format and tz.

check_dupl

A logical value specifying whether rows with duplicated date-time values checked across x elements should be excluded before merging.

interval

A numeric value specifying the time interval (in seconds) of the generated date-time sequence.

format

A character string. Format of start (end) if provided as a character string.The default format is "%Y-%m-%d %H:%M".

tz

A time zone (see time zones) specification to be used for the conversion of start (end) if provided as a character string.

Details

The primary purpose of merge_eddy is to combine chunks of data vertically along their column "timestamp" with date-time information. This "timestamp" is expected to be regular with given time interval. Resulting data frame contains added rows with expected date-time values that were missing in "timestamp" column, followed by NAs. In case that check_dupl = TRUE and "timestamp" values across x elements overlap, detected duplicated rows are removed (the order in which duplicates are evaluated depends on the order of x elements). A special case when x has only one element allows to fill missing date-time values in "timestamp" column of given data frame. Storage mode of "timestamp" column is set to be integer instead of double. This simplifies application of round_df but could lead to unexpected behavior if the date-time information is expected to resolve fractional seconds.

The list of data frames, each with column "timestamp", is sequentially merged using Reduce. A (full) outer join, i.e. merge(..., all = TRUE), is performed to keep all columns of x elements. The order of x elements can affect the result. Duplicated column names within x elements are corrected using make.unique. The merged data frame is then merged on the validated "timestamp" column that can be either automatically extracted from x or manually specified.

For horizontal merging (adding columns instead of rows) check_dupl = FALSE must be set but simple merge could be preferred. Combination of vertical and horizontal merging should be avoided as it depends on the order of x elements and can lead to row duplication. Instead, data chunks from different data sources should be first separately vertically merged and then merged horizontally in a following step.

Value

A data frame with attributes varnames and units for each column, containing date-time information in column "timestamp".

See Also

merge, Reduce, strptime, time zones, make.unique

Examples

set.seed(123)
n <- 20 # number of half-hourly records in one non-leap year
tstamp <- seq(c(ISOdate(2021,3,20)), by = "30 mins", length.out = n)
x <- data.frame(
timestamp = tstamp,
H = rf(n, 1, 2, 1),
LE = rf(n, 1, 2, 1),
qc_flag = sample(c(0:2, NA), n, replace = TRUE)
)
openeddy::varnames(x) <- c("timestamp", "sensible heat", "latent heat",
                           "quality flag")
openeddy::units(x) <- c("-", "W m-2", "W m-2", "-")
str(x)
y1 <- ex(x, 1:10)
y2 <- ex(x, 11:20)
y <- merge_eddy(list(y1, y2))
str(y)
attributes(y$timestamp)
typeof(y$timestamp)

# Duplicated rows and different number of columns
z1 <- ex(x, 8:20, 1:3)
z <- merge_eddy(list(y1, z1))


lsigut/openeddy documentation built on Aug. 5, 2023, 12:25 a.m.