bin_by_date: Aggregate data by time periods

View source: R/bin_by_date.R

bin_by_dateR Documentation

Aggregate data by time periods

Description

Aggregates data by specified time periods (e.g., weeks, months) and calculates (weighted) counts. Incidence rates are also calculated using the provided population numbers.

This function is the core date binning engine used by geom_epicurve() and stat_bin_date() for creating epidemiological time series visualizations.

Usage

bin_by_date(
  x,
  dates_from,
  n = 1,
  population = 1,
  fill_gaps = FALSE,
  date_resolution = "week",
  week_start = 1,
  .groups = "drop"
)

Arguments

x

Either a data frame with a date column, or a date vector.
Supported date formats are date and datetime and also commonly used character strings:

  • ISO dates "2024-03-09"

  • Month "2024-03"

  • Week "2024-W09" or "2024-W09-1"

dates_from

Column name containing the dates to bin. Used when x is a data.frame.

n

Numeric column with case counts (or weights). Supports quoted and unquoted column names.

population

A number or a numeric column with the population size. Used to calculate the incidence.

fill_gaps

Logical; If TRUE, gaps in the time series will be filled with 0 cases. Useful for ensuring complete time series without missing periods. Defaults to FALSE.

date_resolution

Character string specifying the time unit for date aggregation. Possible values include: "hour", "day", "week", "month", "bimonth", "season", "quarter", "halfyear", "year". Special values:

  • "isoweek": ISO week standard (week starts Monday, week_start = 1)

  • "epiweek": US CDC epiweek standard (week starts Sunday, week_start = 7)

  • "isoyear": ISO year (corresponding year of the ISO week, differs from year by 1-3 days)

  • "epiyear": Epidemiological year (corresponding year of the epiweek, differs from year by 1-3 days) Defaults to "week".

week_start

Integer specifying the start of the week (1 = Monday, 7 = Sunday). Only used when date_resolution involves weeks. Defaults to 1 (Monday). Overridden by "isoweek" (1) and "epiweek" (7) settings.

.groups

See dplyr::summarise().

Details

The function performs several key operations:

  1. Date coercion: Converts the date column to proper Date format

  2. Gap filling (optional): Generates complete temporal sequences to fill missing time periods with zeros

  3. Date binning: Rounds dates to the specified resolution using lubridate::floor_date()

  4. Weight and population handling: Processes count weights and population denominators

  5. Aggregation: Groups by binned dates and sums weights to get counts and incidence

Grouping behaviour: The function respects existing grouping in the input data frame.

Value

A data frame with the following columns:

  • A date column with the same name as dates_from, where values are binned to the start of the specified time period.

  • n: Count of observations (sum of weights) for each time period

  • incidence: Incidence rate calculated as n / population for each time period

  • Any existing grouping variables are preserved

Examples

library(dplyr)

# Create sample data
outbreak_data <- data.frame(
  onset_date = as.Date("2024-12-10") + sample(0:100, 50, replace = TRUE),
  cases = sample(1:5, 50, replace = TRUE)
)

# Basic weekly binning
bin_by_date(outbreak_data, dates_from = onset_date)

# Weekly binning with case weights
bin_by_date(outbreak_data, onset_date, n = cases)

# Monthly binning
bin_by_date(outbreak_data, onset_date,
  date_resolution = "month"
)

# ISO week binning (Monday start)
bin_by_date(outbreak_data, onset_date,
  date_resolution = "isoweek"
) |>
  mutate(date_formatted = strftime(onset_date, "%G-W%V")) # Add correct date labels

# US CDC epiweek binning (Sunday start)
bin_by_date(outbreak_data, onset_date,
  date_resolution = "epiweek"
)

# With population data for incidence calculation
outbreak_data$population <- 10000
bin_by_date(outbreak_data, onset_date,
  n = cases,
  population = population
)

ggsurveillance documentation built on July 2, 2025, 5:09 p.m.