knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "##",
  fig.path = "man/figures/index-",
  fig.retina = 2
)

library(tbrf)
library(dplyr)
library(ggplot2)

CRAN version Travis build status AppVeyor build status Coverage status

The goal of tbrf is to provide time-window based rolling statistical functions. The package differs from other rolling statistic packages because the intended use is for irregular measured data. Although tbrf can be used to apply statistical functions to regularly sampled data, zoo, RcppRoll, and other packages provide fast, efficient, and rich implementations of rolling/windowed functions.

An appropriate example case is water quality data that is measured at irregular time intervals. Regulatory compliance is often based on a statistical average measure or exceedance probability applied to all samples collected in the previous $n$-years. For each row of data, tbrf functions select previous observations in the time windows specified by the user and applies the statistical function.

Installation

tbrf is available on CRAN:

install.packages("tbrf")

The development version is maintained on github and can be installed as:

devtools::install.github("mps9506/tbrf")

Available Functions

Usage

Core functions include five arguments:

.tbl = dataframe used by the function

x = column containing the values to calculate the statistic on

tcolumn = formatted date-time or date column

unit = character indicating the time unit used, one of "years", "months", "weeks", "days", "hours", "minutes", "seconds"

n = numeric, indicating the window length

Additional arguments for calculating confidence intervals in tbr_gmean, tbr_mean, and tbr_median are passed to boot and boot.ci.

Basic Example

For example, get the 5-year rolling mean:

tbr_mean(Dissolved_Oxygen, x = Average_DO,
         tcolumn = Date, unit = "years", n = 5)

This works in a tidy-workflow as:

library(ggalt) #for stat="stepribbon""

Dissolved_Oxygen %>%
  mutate(Station_ID = as.factor(Station_ID)) %>%
  group_by(Station_ID) %>%
  tbr_mean(Average_DO, Date, "years", 5, conf = 0.95, type = "perc") %>%
  ggplot() +
  geom_step(aes(Date, mean, color = Station_ID)) +
  geom_ribbon(aes(Date, ymin = lwr_ci, ymax = upr_ci, fill = Station_ID), 
              stat = "stepribbon", alpha = 0.5)

Different Time Units

tbrf works with times or dates.

## Generate some sample data

sampledata <- function(N, start,  end) {
  start <- as.POSIXct(start, "%Y-%m-%d %H:%M:%S", tz = "")
  end <- as.POSIXct(end, "%Y-%m-%d %H:%M:%S", tz = "")
  time <- sample(seq(start, end, by = "min"), N)

  df <- tibble(time = time,
                   y = -1000*log(runif(N)))
  return(df)
}

df <- bind_rows(sampledata(100, "2017-01-01 00:01:00", "2017-01-04 23:00:00"),
                sampledata(100, "2017-01-07 00:01:00", "2017-01-08 23:00:00"))

Plot 120-minute geometric means:

df %>% 
  tbr_gmean(y, time, unit = "minutes", n = 120, conf = 0.95, type = "perc") %>%
  ggplot() +
  geom_point(aes(time, y), alpha = 0.25) +
  geom_line(aes(time, mean)) +
  geom_ribbon(aes(time, ymin = lwr_ci, ymax = upr_ci), alpha = 0.5) +
  scale_y_log10()

Plot 24-hour geometric means:

df %>% 
  tbr_gmean(y, time, unit = "hours", n = 24, conf = 0.95, type = "perc") %>%
  ggplot() +
  geom_point(aes(time, y), alpha = 0.25) +
  geom_line(aes(time, mean)) +
  geom_ribbon(aes(time, ymin = lwr_ci, ymax = upr_ci), alpha = 0.5) +
  scale_y_log10()

Plot 4-day geometric means:

df %>% 
  tbr_gmean(y, time, unit = "days", n = 4, conf = 0.95, type = "perc") %>%
  ggplot() +
  geom_point(aes(time, y), alpha = 0.25) +
  geom_line(aes(time, mean)) +
  geom_ribbon(aes(time, ymin = lwr_ci, ymax = upr_ci), alpha = 0.5) +
  scale_y_log10()

Parallel Processing

Confidence Intervals in tbr_gmean, tbr_mean, and tbr median are calculated using boot::boot_ci. If you do not need confidence intervals, calculation times are substantially shorter. parallel, ncores, and cl arguments are passed to boot and can improve computation times. See ?boot for more details on parallel operations. An example for parallel processing in Windows is shown below:

library(microbenchmark)
library(snow)

cl <- makeCluster(4, type = "SOCK")

x <- microbenchmark(
  "noCI" = tbr_gmean(Dissolved_Oxygen, Average_DO, Date, 
         "years", 5),
  "single" = tbr_gmean(Dissolved_Oxygen, Average_DO, Date, 
         "years", 5, R = 500, conf = .95,
         type = "perc"),
  "parallel" = tbr_gmean(Dissolved_Oxygen, Average_DO, Date, 
         "years", 5, R = 500, conf = .95, 
         type = "perc", parallel = "snow", 
         ncpus = 4, cl = cl),
  times = 10, unit = "s")

stopCluster(cl)

print(x)

autoplot(x)


mps9506/tbrf documentation built on May 20, 2022, 10:49 a.m.