agg_mean | R Documentation |
Utilities that simplify aggregation of data and their uncertainties over defined time intervals.
agg_mean(x, format, breaks = NULL, interval = NULL, tz = "GMT", ...)
agg_fun(x, format, fun, breaks = NULL, interval = NULL, tz = "GMT", ...)
agg_sum(
x,
format,
agg_per = NULL,
breaks = NULL,
interval = NULL,
NEE_scor = TRUE,
GPP_scor = FALSE,
quant = grep("^PAR|^PPFD|^APAR", names(x), value = TRUE),
power = grep("^GR|^Rg|^SW|^SR|^LW|^LR|^Rn|^NETRAD|^G$|^H|^LE", names(x), value = TRUE),
carbon = grep("^NEE|^GPP|^Reco", names(x), value = TRUE),
ET = grep("^ET", names(x), value = TRUE),
tz = "GMT",
...
)
agg_fsd(
x,
format,
agg_per = NULL,
breaks = NULL,
interval = NULL,
quant = grep("^PAR|^PPFD|^APAR", names(x), value = TRUE),
power = grep("^GR|^Rg|^SW|^SR|^LW|^LR|^Rn|^NETRAD|^G$|^H|^LE", names(x), value = TRUE),
carbon = grep("^NEE", names(x), value = TRUE),
ET = grep("^ET", names(x), value = TRUE),
tz = "GMT"
)
agg_DT_SD(
x,
format,
agg_per = NULL,
breaks = NULL,
interval = NULL,
carbon = grep("^Reco|^GPP", names(x), value = TRUE),
tz = "GMT"
)
x |
A data frame with required timestamp column ( |
format |
A character string specifying |
breaks |
A vector of cut points or number giving the number of intervals
which |
interval |
A numeric value specifying the time interval (in seconds) of
the generated date-time sequence. If |
tz |
A character string specifying the time zone to be used for the
conversion. System-specific (see |
... |
Further arguments to be passed to the internal
|
fun |
Either a function or a non-empty character string naming the function to be called. |
agg_per |
A character string providing the time interval of aggregation
that will be appended to units (e.g. |
NEE_scor , GPP_scor |
A logical value. Should sign correction of NEE (GPP) be performed? See Sign Correction in Details. |
quant |
A character vector listing variable names that require conversion from quantum to energy units before aggregation. |
power |
A character vector listing variable names that require conversion from power to energy units before aggregation. |
carbon |
A character vector listing variable names that require conversion from CO2 concentration to C mass flux units before aggregation. |
ET |
A character vector listing variable names that require conversion from hourly interval to actual measurement interval before aggregation. Designed for evapotranspiration (ET) typically reported in mm hour-1 for half-hourly measurements. |
agg_mean
and agg_sum
compute mean and sum over intervals
defined by format
and/or breaks
for all columns.
agg_fun
allows to apply any function over defined time intervals
(e.g. min, max, median). No unit conversions are attempted. Notice that
agg_mean(x, format)
and agg_fun(x, format, mean)
are
identical.
agg_fsd
and agg_DT_SD
estimate aggregated mean and summed
uncertainties over defined time periods for REddyProc
package
gap-filling and daytime-based flux partitioning outputs, respectively. The
uncertainty aggregation accounts for autocorrelation among records. It is
performed only for autodetected columns with appropriate suffixes (see
further). Note that uncertainty products of agg_fsd
and
agg_DT_SD
are reported as standard deviations (SD
) and require
further correction to represent uncertainty bounds for given confidence
interval (e.g. SD * 1.96
for 95% confidence level).
The summarizations are done on a data frame x
with required timestamp
column (x$timestamp
) of class "POSIXt"
. With exception of
agg_mean
, the timestamp must form regular sequence without NA
s
due to time resolution estimation.
Change of aggregation interval can be achieved through breaks
and
format
arguments.
The data frame x
can be cut to custom intervals
using argument breaks
. Note that labels are constructed from the
left-hand end of the intervals and converted to "POSIXct"
class. This
can be useful when aggregating e.g. half-hourly data over hourly
(breaks = "60 mins"
) or three-day (breaks = "3 days"
)
intervals.
The formatting of the timestamp (original or after cutting) using
format
is another (preferable) way to change aggregation intervals.
For example changing original "POSIXt"
time format ("%Y-%m-%d
%H:%M:%S"
) to "%Y-%m-%d"
, "%W_%y"
, "%m-%y"
or
"%Y"
will result in daily, weekly, monthly or yearly aggregation
intervals, respectively. Note that improper format
can repress
expected effect of breaks
.
agg_fsd
and agg_DT_SD
require certain columns with defined
suffixes in order to evaluate uncertainty correctly. These columns are a
product of REddyProc
package gap-filling and flux partitioning methods
and are documented here:
https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebOutput.
Detailed description of uncertainty aggregation is available here:
https://github.com/bgctw/REddyProc/blob/master/vignettes/aggUncertainty.md.
agg_fsd
requires columns with suffixes _fall
, _orig
,
_fqc
and _fsd
for each variable.
agg_DT_SD
requires corresponding columns with regexp
patterns "^NEE_.*_orig$"
, "^NEE_.*_fqc$"
, "^Reco_DT_"
,
"^GPP_DT_"
, "^Reco_DT_.*_SD$"
and "^GPP_DT_.*_SD$"
.
agg_mean
, agg_fun
and agg_sum
produce a data
frame with attributes varnames and units assigned to each respective
column.
agg_fsd
and agg_DT_SD
produce a list with two data frames
mean
and sum
with attributes varnames and units assigned to
each respective column or NULL
value if required columns are not
recognized.
Each produced data frame has first column called "Intervals" with vector of labels describing aggregation period provided as factor, and second column "days" providing fraction (or multiple) of days aggregated within each period.
In case of aggregation using sum
, i.e.
agg_sum
, agg_fsd
and agg_DT_SD
, appropriate unit
conversion can be applied to columns defined by quant
, power
,
carbon
and ET
arguments. The conversion factor used for
approximate PAR conversion from umol m-2 s-1 to W m-2 is 4.57 as proposed
by Thimijan and Heins (1983; Tab. 3, Lightsource - Sun and sky, daylight).
Although the sign convention used for measured NEE (Net Ecosystem Exchange) typically denotes negative fluxes as CO2 uptake, summed NEE is typically reported with the opposite sign convention and is assumed to converge to NEP (Net Ecosystem Production), especially over longer aggregation intervals. Similarly, estimated negative GPP (Gross Primary Production) typically denotes carbon sink but should be corrected to positive values if summed over a time period.
There is no reliable way to guess the sign convention used in the data set.
Thus agg_sum
allows to specify whether NEE (NEE_scor
) and/or
GPP (GPP_scor
) sign correction is required. By default
NEE_scor = TRUE
and GPP_scor = FALSE
considering sign
conventions used in REddyProc
package. agg_sum
automatically
detects all NEE and GPP columns in x
using regular expressions and
applies the sign correction settings.
Bayley, G. and Hammersley, J., 1946. The "Effective" Number of Independent Observations in an Autocorrelated Time Series. Supplement to the Journal of the Royal Statistical Society, 8(2), 184-197. doi: https://doi.org/10.2307/2983560
Thimijan, R.W. and Heins R.D., 1983. Photometric, Radiometric, and Quantum Light Units of Measure: A Review of Procedures for Interconversion. Horticultural Science, Vol. 18(6), 818-822.
Zieba, A. and Ramza, P., 2011. Standard Deviation of the Mean of Autocorrelated Observations Estimated with the Use of the Autocorrelation Function Estimated From the Data. Metrology and Measurement Systems, 18(4), 529-542. doi: https://doi.org/10.2478/v10178-011-0052-x
aggregate
, as.POSIXlt
,
cut.POSIXt
, mean
, regexp
,
strftime
, sum
, timezones
,
varnames
## Not run:
library(REddyProc)
library(bigleaf)
# Load example dataset from REddyProc package and use selected variables
DETha98 <- fConvertTimeToPosix(Example_DETha98, 'YDH', Year = 'Year',
Day = 'DoY', Hour = 'Hour')[-(2:4)]
EProc <- sEddyProc$new('DE-Tha', DETha98,
c('NEE', 'LE', 'Rg', 'Tair', 'VPD', 'Ustar'))
names(DETha98)[1] <- "timestamp"
# Center timestamp to represent the middle of the averaging period
# - necessary for reliable data aggregation
DETha98$timestamp <- DETha98$timestamp - 60*15
# Aggregate by averaging
# - by default any NA value in an aggregation period produces NA
agg_mean(DETha98, "%b-%y")
agg_mean(DETha98, "%b-%y", na.rm = TRUE)
# Aggregate by summation
# - sign and unit conversions are demonstrated
(zz <- agg_sum(DETha98, "%b-%y", agg_per = "month-1"))
openeddy::units(zz, names = TRUE)
# Extract minimum and maximum within the intervals
# - two notations possible: a function (min) or function name ("max")
agg_fun(DETha98, "%b-%y", min, na.rm = TRUE)
agg_fun(DETha98, "%b-%y", "max", na.rm = TRUE)
# Gap-fill NEE using approximate fixed uStar threshold
EProc$sMDSGapFillAfterUstar('NEE', uStarTh = 0.3, FillAll = TRUE)
# Gap-fill all other selected variables
for (i in c('LE', 'Rg', 'Tair', 'VPD')) EProc$sMDSGapFill(i, FillAll = TRUE)
# Export results and convert latent heat (LE) to evapotranspiration (ET)
# - typical ET units are mm hour-1 independent of actual measurement interval
results <- cbind(DETha98["timestamp"], EProc$sExportResults())
LE_vars <- c("LE_orig", "LE_f", "LE_fqc", "LE_fall", "LE_fsd")
ET_vars <- gsub("LE", "ET", LE_vars)
results[, ET_vars] <-
lapply(LE_vars,
function(x) LE.to.ET(results[, x], results$Tair_f) * 3600)
openeddy::units(results[ET_vars]) <- rep("mm hour-1", length(ET_vars))
# Overwrite ET_fqc with proper values
results$ET_fqc <- results$LE_fqc
openeddy::units(results$ET_fqc) <- "-"
# Aggregate uncertainty derived from look-up table standard deviation (SD)
# - sign and unit conversions are demonstrated
(unc <- agg_fsd(results, "%b-%y", agg_per = "month-1"))
lapply(unc, openeddy::units, names = TRUE)
# Perform Lasslop et al. (2010) flux partitioning based on DayTime (DT) data
# - Reco and GPP uncertainty evaluation is available only for this method
# - Reichstein et al. (2005) Reco model uncertainty is not exported and
# GPP is computed as residual (not modelled)
EProc$sSetLocationInfo(LatDeg = 51.0, LongDeg = 13.6, TimeZoneHour = 1)
EProc$sGLFluxPartition(suffix = "uStar")
# Aggregate uncertainty derived from SD of Reco and GPP models
# - unit conversions are demonstrated
results <- cbind(DETha98["timestamp"], EProc$sExportResults())
(unc_DT <- agg_DT_SD(results, "%b-%y", agg_per = "month-1"))
lapply(unc_DT, openeddy::units, names = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.