mc_agg: Aggregate data by function

View source: R/agg.R

mc_aggR Documentation

Aggregate data by function

Description

mc_agg has two basic uses:

  • aggregate (upscale) time step of microclimatic records with specified function (e. g. 15 min records to daily mean);

  • convert myClim object from Raw-format to Agg-format see myClim-package without time-series modification, this behavior appears when fun=NULL, period=NULL.

Usage

mc_agg(
  data,
  fun = NULL,
  period = NULL,
  use_utc = TRUE,
  percentiles = NULL,
  min_coverage = 1,
  custom_start = NULL,
  custom_end = NULL,
  custom_functions = NULL
)

Arguments

data

cleaned myClim object in Raw-format: output of mc_prep_clean() or Agg-format as it is allowed to aggregate data multiple times.

fun

aggregation function; one of ("min", "max", "mean", "percentile", "sum", "range", "count", "coverage") and functions defined in custom_functions. See details of custom_functions argument. Can be single function name, character vector of function names or named list of vector function names. Named list of functions allows apply different function(s) to different sensors e.g. list(TMS_T1=c("max", "min"), TMS_T2="mean", TMS_T3_GDD="sum") if NULL records are not aggregated, but myClim object is only converted to Agg-format without modifing time-series. See details.

period

Time period for aggregation - same as breaks in cut.POSIXt, e.g. ("hour", "day", "month"); if NULL then no aggregation

There are special periods "all" and "custom". Period "all" returning single value for each sensor based on function applied across all records within the sensor. Period "custom" aggregates data in yearly cycle. You can aggregate e.g. water year, vegetation season etc. by providing start, end datetime. See custom_start and custom_end parameters. The output of special periods "all" and "custom"are not allowed to be aggregated again in mc_agg() function, regardless multiple aggregations are allowed in general.

Start day of week is Monday.

use_utc

default TRUE using UTC time, if set FALSE, the time is shifted by offset if available in locality metadata. Shift can be e.g. to solar time mc_prep_solar_tz() or political time with custom offset mc_prep_meta_locality()). Non-UTC time can by used only for aggregation of the data with period shorter than day (seconds, minutes, hours) into period day and longer.

percentiles

vector of percentile numbers; numbers are from range 0-100; each specified percentile number generate new virtual sensor, see details

min_coverage

value from range 0-1 (default 1); the threshold specifying how many missing values can you accept within aggregation period. e.g. when aggregating from 15 min to monthly mean and set min_coverage=1 then a single NA value within the specific month cause monthly mean = NA. When min_coverage=0.9 then you will get your monthly mean in case there are no more than 10 % missing values, if there were more than 10% you will get NA. Ignored for functions count and coverage

custom_start

date of start, only use for custom period (default NULL); Character in format "mm-dd" or "mm-dd H:MM" recycled in yearly cycle for time-series longer than 1 year.

custom_end

date of end only use for custom period (default NULL); If NULL then calculates in year cycle ending on custom_start next year. (useful e.g. for hydrological year) When custom_end is provided, then data out of range custom_start-custom_end are ignored. Character in format "mm-dd" or "mm-dd H:MM". custom_end row (the last record) is not included. I.e.complete daily data from year 2020 ends in 2021-01-01 custom_end="01-01".

custom_functions

user define one or more functions in format list(function_name=function(values){...}); then you will feed function_name(s) you defined to the fun parameter. e.g. custom_functions = list(positive_count=function(x){length(x[x>0])}), fun="positive_count",

Details

Any output of mc_agg is in Agg-format. That means the hierarchical level of logger is removed (Locality<-Logger<-Sensor<-Record), and all microclimatic records within the sensors are on the level of locality (Locality<-Sensor<-Record). See myClim-package.

In case mc_agg() is used only for conversion from Raw-format to Agg-format (⁠fun=NULL, period=NULL⁠) then microclimatic records are not modified. Equal step in all sensors is required for conversion from Raw-format to Agg-format, otherwise period must be specified.

When fun and period are specified, microclimatic records are aggregated based on a selected function into a specified period. The name of the aggregated variable will contain also the name of the function used for the aggregation (e.g. TMS_T1_mean). Aggregated time step is named after the first time step of selected period i.e. day = c(2022-12-29 00:00, 2022-12-30 00:00...); week = c(2022-12-19 00:00, 2022-12-28 00:00...); month = c(2022-11-01 00:00, 2022-12-01 00:00...); year = c(2021-01-01 00:00, 2022-01-01 00:00...). When first or last period is incomplete in original data, the incomplete part is extended with NA values to match specified period. For example, when you want to aggregate time-series to monthly mean, but your time-series starts on January 15 ending December 20, myClim will extend the time-series to start on January 1 and end on December 31. If you want to still use the data from the aggregation periods with not complete data coverage, you can adjust the parameter min_coverage.

Empty sensors with no records are excluded. mc_agg() return NA for empty vector except from fun=count which returns 0. When aggregation functions are provided as vector or list e.g. c(mean, min, max), than they are all applied to all the sensors and multiple results are returned from each sensors. When named list (names are the sensor ids) of functions is provided then mc_agg() apply specific functions to the specific sensors based on the named list list(TMS_T1=c("max", "min"), TMS_T2="mean"). mc_agg returns new sensors on the localities putting aggregation function in its name (TMS_T1 -> TMS_T1_max), despite sensor names contains aggregation function, sensor_id stays the same as before aggregation in sensor metadata (e.g. TMS_T1 -> TMS_T1). Sensors created with functions min, max, mean, percentile, sum, range keeps identical sensor_id and value_type as original input sensors. When function sum is applied on logical sensor (e.g. snow as TRUE, FALSE) the output is integer i.e. number of TRUE values.

Sensors created with functions count has sensor_id count and value_type integer, function coverage has sensor_id coverage and value_type real

If the myClim object contains any states (tags) table, such as error tags or quality tags, the datetime defining the start and end of the tag will be rounded according to the aggregation period parameter.

Value

Returns new myClim object in Agg-format see myClim-package When fun=NULL, period=NULL records are not modified but only converted to Agg-format. When fun and period are provided then time step is aggregated based on function.

Examples

hour_data <- mc_agg(mc_data_example_clean, c("min", "max", "percentile"),
                              "hour", percentiles = 50, min_coverage=0.5)
day_data <- mc_agg(mc_data_example_clean, list(TMS_T1=c("max", "min"), TMS_T2="mean"),
                             "day", min_coverage=1)
month_data <- mc_agg(mc_data_example_clean, fun=list(TMS_T3="below5"),period = "month",
                               custom_functions = list(below5=function(x){length(x[x<(-5)])}))

myClim documentation built on Oct. 21, 2024, 5:07 p.m.