summarise_by_time: Summarise (for Time Series Data)
In timetk: A Tool Kit for Working with Time Series

View source: R/dplyr-summarise_by_time.R

summarise_by_time

R Documentation

Summarise (for Time Series Data)

Description

summarise_by_time() is a time-based variant of the popular dplyr::summarise() function that uses .date_var to specify a date or date-time column and .by to group the calculation by groups like "5 seconds", "week", or "3 months".

summarise_by_time() and summarize_by_time() are synonyms.

Usage

summarise_by_time(
  .data,
  .date_var,
  .by = "day",
  ...,
  .type = c("floor", "ceiling", "round"),
  .week_start = NULL
)

summarize_by_time(
  .data,
  .date_var,
  .by = "day",
  ...,
  .type = c("floor", "ceiling", "round"),
  .week_start = NULL
)

Arguments

`.data`	A `tbl` object or `data.frame`
`.date_var`	A column containing date or date-time values to summarize. If missing, attempts to auto-detect date column.
`.by`	A time unit to summarise by. Time units are collapsed using `lubridate::floor_date()` or `lubridate::ceiling_date()`. The value can be: `second` `minute` `hour` `day` `week` `month` `bimonth` `quarter` `season` `halfyear` `year` Arbitrary unique English abbreviations as in the `lubridate::period()` constructor are allowed.
`...`	Name-value pairs of summary functions. The name will be the name of the variable in the result. The value can be: A vector of length 1, e.g. `min(x)`, `n()`, or `sum(is.na(y))`. A vector of length `n`, e.g. `quantile()`. A data frame, to add multiple columns from a single expression.
`.type`	One of "floor", "ceiling", or "round. Defaults to "floor". See `lubridate::round_date`.
`.week_start`	when unit is weeks, specify the reference day. 7 represents Sunday and 1 represents Monday.

Value

A tibble or data.frame

Useful summary functions

Sum: sum()
Center: mean(), median()
Spread: sd(), var()
Range: min(), max()
Count: dplyr::n(), dplyr::n_distinct()
Position: dplyr::first(), dplyr::last(), dplyr::nth()
Correlation: cor(), cov()

Examples

# Libraries
library(dplyr)

# First value in each month
m4_daily %>%
    group_by(id) %>%
    summarise_by_time(
        .date_var = date,
        .by       = "month", # Setup for monthly aggregation
        # Summarization
        value  = first(value)
    )

# Last value in each month (day is first day of next month with ceiling option)
m4_daily %>%
    group_by(id) %>%
    summarise_by_time(
        .by        = "month",
        value      = last(value),
        .type      = "ceiling"
    ) %>%
    # Shift to the last day of the month
    mutate(date = date %-time% "1 day")

# Total each year (.by is set to "year" now)
m4_daily %>%
    group_by(id) %>%
    summarise_by_time(
        .by        = "year",
        value      = sum(value)
    )

timetk documentation built on Nov. 2, 2023, 6:18 p.m.