knitr::opts_chunk$set( collapse = TRUE, comment = "##", fig.path = "man/figures/index-", fig.retina = 2 ) library(tbrf) library(dplyr) library(ggplot2)
The goal of tbrf is to provide time-window based rolling statistical functions. The package differs from other rolling statistic packages because the intended use is for irregular measured data. Although tbrf can be used to apply statistical functions to regularly sampled data, zoo
, RcppRoll
, and other packages provide fast, efficient, and rich implementations of rolling/windowed functions.
An appropriate example case is water quality data that is measured at irregular time intervals. Regulatory compliance is often based on a statistical average measure or exceedance probability applied to all samples collected in the previous $n$-years. For each row of data, tbrf functions select previous observations in the time windows specified by the user and applies the statistical function.
tbrf is available on CRAN:
install.packages("tbrf")
The development version is maintained on github and can be installed as:
devtools::install.github("mps9506/tbrf")
tbr_binom
: Rolling binomial probability with confidence intervals.
tbr_gmean
: Rolling geometric mean with confidence intervals.
tbr_mean
: Rolling mean with confidence intervals.
tbr_median
: Rolling median with confidence intervals.
tbr_misc
: Accepts user specified function.
tbr_sd
: Rolling standard deviation.
tbr_sum
: Rolling sum.
Core functions include five arguments:
.tbl = dataframe used by the function x = column containing the values to calculate the statistic on tcolumn = formatted date-time or date column unit = character indicating the time unit used, one of "years", "months", "weeks", "days", "hours", "minutes", "seconds" n = numeric, indicating the window length
Additional arguments for calculating confidence intervals in tbr_gmean
, tbr_mean
, and tbr_median
are passed to boot
and boot.ci
.
For example, get the 5-year rolling mean:
tbr_mean(Dissolved_Oxygen, x = Average_DO, tcolumn = Date, unit = "years", n = 5)
This works in a tidy-workflow as:
library(ggalt) #for stat="stepribbon"" Dissolved_Oxygen %>% mutate(Station_ID = as.factor(Station_ID)) %>% group_by(Station_ID) %>% tbr_mean(Average_DO, Date, "years", 5, conf = 0.95, type = "perc") %>% ggplot() + geom_step(aes(Date, mean, color = Station_ID)) + geom_ribbon(aes(Date, ymin = lwr_ci, ymax = upr_ci, fill = Station_ID), stat = "stepribbon", alpha = 0.5)
tbrf works with times or dates.
## Generate some sample data sampledata <- function(N, start, end) { start <- as.POSIXct(start, "%Y-%m-%d %H:%M:%S", tz = "") end <- as.POSIXct(end, "%Y-%m-%d %H:%M:%S", tz = "") time <- sample(seq(start, end, by = "min"), N) df <- tibble(time = time, y = -1000*log(runif(N))) return(df) } df <- bind_rows(sampledata(100, "2017-01-01 00:01:00", "2017-01-04 23:00:00"), sampledata(100, "2017-01-07 00:01:00", "2017-01-08 23:00:00"))
Plot 120-minute geometric means:
df %>% tbr_gmean(y, time, unit = "minutes", n = 120, conf = 0.95, type = "perc") %>% ggplot() + geom_point(aes(time, y), alpha = 0.25) + geom_line(aes(time, mean)) + geom_ribbon(aes(time, ymin = lwr_ci, ymax = upr_ci), alpha = 0.5) + scale_y_log10()
Plot 24-hour geometric means:
df %>% tbr_gmean(y, time, unit = "hours", n = 24, conf = 0.95, type = "perc") %>% ggplot() + geom_point(aes(time, y), alpha = 0.25) + geom_line(aes(time, mean)) + geom_ribbon(aes(time, ymin = lwr_ci, ymax = upr_ci), alpha = 0.5) + scale_y_log10()
Plot 4-day geometric means:
df %>% tbr_gmean(y, time, unit = "days", n = 4, conf = 0.95, type = "perc") %>% ggplot() + geom_point(aes(time, y), alpha = 0.25) + geom_line(aes(time, mean)) + geom_ribbon(aes(time, ymin = lwr_ci, ymax = upr_ci), alpha = 0.5) + scale_y_log10()
Confidence Intervals in tbr_gmean
, tbr_mean
, and tbr median
are calculated using boot::boot_ci
. If you do not need confidence intervals, calculation times are substantially shorter. parallel
, ncores
, and cl
arguments are passed to boot
and can improve computation times. See ?boot
for more details on parallel operations. An example for parallel processing in Windows is shown below:
library(microbenchmark) library(snow) cl <- makeCluster(4, type = "SOCK") x <- microbenchmark( "noCI" = tbr_gmean(Dissolved_Oxygen, Average_DO, Date, "years", 5), "single" = tbr_gmean(Dissolved_Oxygen, Average_DO, Date, "years", 5, R = 500, conf = .95, type = "perc"), "parallel" = tbr_gmean(Dissolved_Oxygen, Average_DO, Date, "years", 5, R = 500, conf = .95, type = "perc", parallel = "snow", ncpus = 4, cl = cl), times = 10, unit = "s") stopCluster(cl) print(x) autoplot(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.