knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
sapfluxnetr
package offers a very flexible but powerful API based on the
tidyverse
packages to aggregate and summarise the site/s data in the form
of the sfn_metrics
function. All the metrics family of functions (?metrics
)
make use of the sfn_metrics
function under the hood. If you want full control
to the statistics returned and aggregation periods, we recommend you to use this
API. This vignette will show you how.
daily_metrics
monthly_metrics
predawn_metrics
midday_metrics
nightly_metrics
daylight_metrics
See each function help for a detailed description and examples of use.
daily_metrics
and related functions return a complete set of metrics ready for
use, but if you want different metrics you can supply your own summarising
functions using the .funs
argument.
The correct way of specifying the functions to use is described in
the summarise_all
help (?dplyr::summarise_all
). The recommended way is a
list of formulas with the function call:
# libraries library(sapfluxnetr) library(dplyr) ### only mean and sd at a daily scale # data data('ARG_TRE', package = 'sapfluxnetr') # summarising funs (as a list of formulas) custom_funs <- list(mean = ~ mean(., na.rm = TRUE), std_dev = ~ sd(., na.rm = TRUE)) # metrics foo_simpler_metrics <- sfn_metrics( ARG_TRE, period = '1 day', .funs = custom_funs, solar = TRUE, interval = 'general' ) foo_simpler_metrics[['sapf']]
When supplying only one function to .funs, names of variables are not changed to contain the metric name at the end, as the summary function returns the same columns as the original data
You can also choose if the "special interest" intervals (predawn, midday, nighttime or daylight) are calculated or not. For example, if you are only interested in the midday interval you can use:
foo_simpler_metrics_midday <- sfn_metrics( ARG_TRE, period = '1 day', .funs = custom_funs, solar = TRUE, interval = 'midday', int_start = 11, int_end = 13 ) foo_simpler_metrics_midday[['sapf']]
period
argument in sfn_metrics
is passed to .collapse_timestamp
function,
and so, it can use the same input:
# weekly foo_weekly <- sfn_metrics( ARG_TRE, period = '7 days', .funs = custom_funs, solar = TRUE, interval = 'general' ) foo_weekly[['env']]
...
) argument
of sfn_metrics
. Also, this function always must return a vector of timestamps
of the same length as the original timestamp.quarter
function from the lubridate package:foo_custom <- sfn_metrics( AUS_CAN_ST2_MIX, period = lubridate::quarter, .funs = custom_funs, solar = TRUE, interval = 'general', with_year = TRUE # argument for lubridate::quarter ) foo_custom['env']
sfn_metrics
has a ...
parameter intended to supply additional parameters to
the internal functions used:
.collapse_timestamp
accepts the following extra arguments:
side
dplyr::summarise_all
accepts extra arguments intended to be applied to
the summarising functions provided (to all, so they all must have the
argument provided or an error will be raised). That's the reason because we
recommend to use the list way, as the arguments are specified for the
individual functions.
For example, if we want the TIMESTAMPs after aggregation to show the end of the period instead the beginning (default) we can do the following:
foo_simpler_metrics_end <- sfn_metrics( ARG_TRE, period = '1 day', .funs = custom_funs, solar = TRUE, interval = 'general', side = "end" ) foo_simpler_metrics_end[['sapf']]
If it is compared with the foo_simpler_metrics
calculated before, now the
period is identified in the TIMESTAMP by the ending of the period (daily in this
case).
When supplying custom functions as "period" argument, the default coverage statistic is not reliable as there is no way of knowing beforehand the period/s in minutes.
The internal aggregation process in sfn_metrics
generates some transitory
columns which can be used in the summarising functions:
TIMESTAMP_coll
When aggregating by the declared period (i.e. "daily"
), the TIMESTAMP column
collapses to the period start/end value (meaning thet all the TIMESTAMP values
for the same day becomes identical).
This makes impossible to use any summarise functions thet obtain the
time of the day at which one event happens (i.e. time of the day at which the
maximum sap flow occurs) because all TIMESTAMP values are identical.
For thet kind of summarising functions, a transitory column called
TIMESTAMP_coll
is created. So in this case we can create a function thet
takes de variable values for the day, the TIMESTAMP_coll values for the day
and return the TIMESTAMP at which the max sap flow occurs and use it with
sfn_metrics
:
max_time <- function(x, time) { # x: vector of values for a day # time: TIMESTAMP for the day # if all the values in x are NAs (a daily summmarise of no measures day for # example) this will return a length 0 POSIXct vector, which will crash # dplyr summarise step. So, check if all NA and if true return NA as POSIXct if(all(is.na(x))) { return(as.POSIXct(NA, tz = attr(time, 'tz'), origin = lubridate::origin)) } else { time[which.max(x)] } } custom_funs <- list(max = ~ max(., na.rm = TRUE), ~ max_time(., TIMESTAMP_coll)) max_time_metrics <- sfn_metrics( ARG_TRE, period = '1 day', .funs = custom_funs, solar = TRUE, interval = 'general' ) max_time_metrics[['sapf']]
sfn_metrics
allows to perform sub-daily aggregations, by means of the period
parameter. Sapfluxnet datasets have sub-daily data usually in the range of 30
minutes to 2 hours. This means thet data can be aggregated in periods above 2
hours. We can aggregate to a 3 hours period easily:
custom_funs <- list(max = ~ max(., na.rm = TRUE)) three_hours_agg <- sfn_metrics( ARG_TRE, period = '3 hours', .funs = custom_funs, solar = TRUE, interval = 'general' ) three_hours_agg[['sapf']]
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.