agg_mean: Time series summarization

View source: R/Summarizing.R

agg_meanR Documentation

Time series summarization

Description

Utilities that simplify aggregation of data and their uncertainties over defined time intervals.

Usage

agg_mean(x, format, breaks = NULL, interval = NULL, tz = "GMT", ...)

agg_fun(x, format, fun, breaks = NULL, interval = NULL, tz = "GMT", ...)

agg_sum(
  x,
  format,
  agg_per = NULL,
  breaks = NULL,
  interval = NULL,
  NEE_scor = TRUE,
  GPP_scor = FALSE,
  quant = grep("^PAR|^PPFD|^APAR", names(x), value = TRUE),
  power = grep("^GR|^Rg|^SW|^SR|^LW|^LR|^Rn|^NETRAD|^G$|^H|^LE", names(x), value = TRUE),
  carbon = grep("^NEE|^GPP|^Reco", names(x), value = TRUE),
  ET = grep("^ET", names(x), value = TRUE),
  tz = "GMT",
  ...
)

agg_fsd(
  x,
  format,
  agg_per = NULL,
  breaks = NULL,
  interval = NULL,
  quant = grep("^PAR|^PPFD|^APAR", names(x), value = TRUE),
  power = grep("^GR|^Rg|^SW|^SR|^LW|^LR|^Rn|^NETRAD|^G$|^H|^LE", names(x), value = TRUE),
  carbon = grep("^NEE", names(x), value = TRUE),
  ET = grep("^ET", names(x), value = TRUE),
  tz = "GMT"
)

agg_DT_SD(
  x,
  format,
  agg_per = NULL,
  breaks = NULL,
  interval = NULL,
  carbon = grep("^Reco|^GPP", names(x), value = TRUE),
  tz = "GMT"
)

Arguments

x

A data frame with required timestamp column (x$timestamp) of class "POSIXt".

format

A character string specifying x$timestamp formatting for aggregation through internal strftime function.

breaks

A vector of cut points or number giving the number of intervals which x$timestamp is to be cut into or an interval specification, one of "sec", "min", "hour", "day", "DSTday", "week", "month", "quarter" or "year", optionally preceded by an integer and a space, or followed by "s".

interval

A numeric value specifying the time interval (in seconds) of the generated date-time sequence. If NULL, interval autodetection is attempted.

tz

A character string specifying the time zone to be used for the conversion. System-specific (see as.POSIXlt or timezones), but "" is the current time zone, and "GMT" is UTC. Invalid values are most commonly treated as UTC, on some platforms with a warning.

...

Further arguments to be passed to the internal aggregate function.

fun

Either a function or a non-empty character string naming the function to be called.

agg_per

A character string providing the time interval of aggregation that will be appended to units (e.g. "hh-1", "week-1" or "month-1").

NEE_scor, GPP_scor

A logical value. Should sign correction of NEE (GPP) be performed? See Sign Correction in Details.

quant

A character vector listing variable names that require conversion from quantum to energy units before aggregation.

power

A character vector listing variable names that require conversion from power to energy units before aggregation.

carbon

A character vector listing variable names that require conversion from CO2 concentration to C mass flux units before aggregation.

ET

A character vector listing variable names that require conversion from hourly interval to actual measurement interval before aggregation. Designed for evapotranspiration (ET) typically reported in mm hour-1 for half-hourly measurements.

Details

agg_mean and agg_sum compute mean and sum over intervals defined by format and/or breaks for all columns.

agg_fun allows to apply any function over defined time intervals (e.g. min, max, median). No unit conversions are attempted. Notice that agg_mean(x, format) and agg_fun(x, format, mean) are identical.

agg_fsd and agg_DT_SD estimate aggregated mean and summed uncertainties over defined time periods for REddyProc package gap-filling and daytime-based flux partitioning outputs, respectively. The uncertainty aggregation accounts for autocorrelation among records. It is performed only for autodetected columns with appropriate suffixes (see further). Note that uncertainty products of agg_fsd and agg_DT_SD are reported as standard deviations (SD) and require further correction to represent uncertainty bounds for given confidence interval (e.g. SD * 1.96 for 95% confidence level).

The summarizations are done on a data frame x with required timestamp column (x$timestamp) of class "POSIXt". With exception of agg_mean, the timestamp must form regular sequence without NAs due to time resolution estimation.

Change of aggregation interval can be achieved through breaks and format arguments.

The data frame x can be cut to custom intervals using argument breaks. Note that labels are constructed from the left-hand end of the intervals and converted to "POSIXct" class. This can be useful when aggregating e.g. half-hourly data over hourly (breaks = "60 mins") or three-day (breaks = "3 days") intervals.

The formatting of the timestamp (original or after cutting) using format is another (preferable) way to change aggregation intervals. For example changing original "POSIXt" time format ("%Y-%m-%d %H:%M:%S") to "%Y-%m-%d", "%W_%y", "%m-%y" or "%Y" will result in daily, weekly, monthly or yearly aggregation intervals, respectively. Note that improper format can repress expected effect of breaks.

agg_fsd and agg_DT_SD require certain columns with defined suffixes in order to evaluate uncertainty correctly. These columns are a product of REddyProc package gap-filling and flux partitioning methods and are documented here: https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebOutput. Detailed description of uncertainty aggregation is available here: https://github.com/bgctw/REddyProc/blob/master/vignettes/aggUncertainty.md.

agg_fsd requires columns with suffixes _fall, _orig, _fqc and _fsd for each variable.

agg_DT_SD requires corresponding columns with regexp patterns "^NEE_.*_orig$", "^NEE_.*_fqc$", "^Reco_DT_", "^GPP_DT_", "^Reco_DT_.*_SD$" and "^GPP_DT_.*_SD$".

Value

agg_mean, agg_fun and agg_sum produce a data frame with attributes varnames and units assigned to each respective column.

agg_fsd and agg_DT_SD produce a list with two data frames mean and sum with attributes varnames and units assigned to each respective column or NULL value if required columns are not recognized.

Each produced data frame has first column called "Intervals" with vector of labels describing aggregation period provided as factor, and second column "days" providing fraction (or multiple) of days aggregated within each period.

Unit Conversion

In case of aggregation using sum, i.e. agg_sum, agg_fsd and agg_DT_SD, appropriate unit conversion can be applied to columns defined by quant, power, carbon and ET arguments. The conversion factor used for approximate PAR conversion from umol m-2 s-1 to W m-2 is 4.57 as proposed by Thimijan and Heins (1983; Tab. 3, Lightsource - Sun and sky, daylight).

Sign Correction

Although the sign convention used for measured NEE (Net Ecosystem Exchange) typically denotes negative fluxes as CO2 uptake, summed NEE is typically reported with the opposite sign convention and is assumed to converge to NEP (Net Ecosystem Production), especially over longer aggregation intervals. Similarly, estimated negative GPP (Gross Primary Production) typically denotes carbon sink but should be corrected to positive values if summed over a time period.

There is no reliable way to guess the sign convention used in the data set. Thus agg_sum allows to specify whether NEE (NEE_scor) and/or GPP (GPP_scor) sign correction is required. By default NEE_scor = TRUE and GPP_scor = FALSE considering sign conventions used in REddyProc package. agg_sum automatically detects all NEE and GPP columns in x using regular expressions and applies the sign correction settings.

References

Bayley, G. and Hammersley, J., 1946. The "Effective" Number of Independent Observations in an Autocorrelated Time Series. Supplement to the Journal of the Royal Statistical Society, 8(2), 184-197. doi: https://doi.org/10.2307/2983560

Thimijan, R.W. and Heins R.D., 1983. Photometric, Radiometric, and Quantum Light Units of Measure: A Review of Procedures for Interconversion. Horticultural Science, Vol. 18(6), 818-822.

Zieba, A. and Ramza, P., 2011. Standard Deviation of the Mean of Autocorrelated Observations Estimated with the Use of the Autocorrelation Function Estimated From the Data. Metrology and Measurement Systems, 18(4), 529-542. doi: https://doi.org/10.2478/v10178-011-0052-x

See Also

aggregate, as.POSIXlt, cut.POSIXt, mean, regexp, strftime, sum, timezones, varnames

Examples

## Not run: 

library(REddyProc)
library(bigleaf)

# Load example dataset from REddyProc package and use selected variables
DETha98 <- fConvertTimeToPosix(Example_DETha98, 'YDH', Year = 'Year',
Day = 'DoY', Hour = 'Hour')[-(2:4)]
EProc <- sEddyProc$new('DE-Tha', DETha98,
c('NEE', 'LE', 'Rg', 'Tair', 'VPD', 'Ustar'))
names(DETha98)[1] <- "timestamp"

# Center timestamp to represent the middle of the averaging period
# - necessary for reliable data aggregation
DETha98$timestamp <- DETha98$timestamp - 60*15

# Aggregate by averaging
# - by default any NA value in an aggregation period produces NA
agg_mean(DETha98, "%b-%y")
agg_mean(DETha98, "%b-%y", na.rm = TRUE)

# Aggregate by summation
# - sign and unit conversions are demonstrated
(zz <- agg_sum(DETha98, "%b-%y", agg_per = "month-1"))
openeddy::units(zz, names = TRUE)

# Extract minimum and maximum within the intervals
# - two notations possible: a function (min) or function name ("max")
agg_fun(DETha98, "%b-%y", min, na.rm = TRUE)
agg_fun(DETha98, "%b-%y", "max", na.rm = TRUE)

# Gap-fill NEE using approximate fixed uStar threshold
EProc$sMDSGapFillAfterUstar('NEE', uStarTh = 0.3, FillAll = TRUE)

# Gap-fill all other selected variables
for (i in c('LE', 'Rg', 'Tair', 'VPD')) EProc$sMDSGapFill(i, FillAll = TRUE)

# Export results and convert latent heat (LE) to evapotranspiration (ET)
# - typical ET units are mm hour-1 independent of actual measurement interval
results <- cbind(DETha98["timestamp"], EProc$sExportResults())
LE_vars <- c("LE_orig", "LE_f", "LE_fqc", "LE_fall", "LE_fsd")
ET_vars <- gsub("LE", "ET", LE_vars)
results[, ET_vars] <-
  lapply(LE_vars,
         function(x) LE.to.ET(results[, x], results$Tair_f) * 3600)
openeddy::units(results[ET_vars]) <- rep("mm hour-1", length(ET_vars))

# Overwrite ET_fqc with proper values
results$ET_fqc <- results$LE_fqc
openeddy::units(results$ET_fqc) <- "-"

# Aggregate uncertainty derived from look-up table standard deviation (SD)
# - sign and unit conversions are demonstrated
(unc <- agg_fsd(results, "%b-%y", agg_per = "month-1"))
lapply(unc, openeddy::units, names = TRUE)

# Perform Lasslop et al. (2010) flux partitioning based on DayTime (DT) data
# - Reco and GPP uncertainty evaluation is available only for this method
# - Reichstein et al. (2005) Reco model uncertainty is not exported and
#   GPP is computed as residual (not modelled)
EProc$sSetLocationInfo(LatDeg = 51.0, LongDeg = 13.6, TimeZoneHour = 1)
EProc$sGLFluxPartition(suffix = "uStar")

# Aggregate uncertainty derived from SD of Reco and GPP models
# - unit conversions are demonstrated
results <- cbind(DETha98["timestamp"], EProc$sExportResults())
(unc_DT <- agg_DT_SD(results, "%b-%y", agg_per = "month-1"))
lapply(unc_DT, openeddy::units, names = TRUE)

## End(Not run)


lsigut/openeddy documentation built on Aug. 5, 2023, 12:25 a.m.