bin_by_date: Aggregate data by time periods
In ggsurveillance: Tools for Outbreak Investigation/Infectious Disease Surveillance

bin_by_date

R Documentation

Aggregate data by time periods

Description

Aggregates data by specified time periods (e.g., weeks, months) and calculates (weighted) counts. Incidence rates are also calculated using the provided population numbers.

This function is the core date binning engine used by geom_epicurve() and stat_bin_date() for creating epidemiological time series visualizations.

Usage

bin_by_date(
  x,
  dates_from,
  n = 1,
  population = 1,
  fill_gaps = FALSE,
  date_resolution = "week",
  week_start = 1,
  .groups = "drop"
)

Arguments

`x`	Either a data frame with a date column, or a date vector. Supported date formats are `date` and `datetime` and also commonly used character strings: ISO dates `"2024-03-09"` Month `"2024-03"` Week `"2024-W09"` or `"2024-W09-1"`
`dates_from`	Column name containing the dates to bin. Used when x is a data.frame.
`n`	Numeric column with case counts (or weights). Supports quoted and unquoted column names.
`population`	A number or a numeric column with the population size. Used to calculate the incidence.
`fill_gaps`	Logical; If `TRUE`, gaps in the time series will be filled with 0 cases. Useful for ensuring complete time series without missing periods. Defaults to `FALSE`.
`date_resolution`	Character string specifying the time unit for date aggregation. Possible values include: `"hour"`, `"day"`, `"week"`, `"month"`, `"bimonth"`, `"season"`, `"quarter"`, `"halfyear"`, `"year"`. Special values: `"isoweek"`: ISO week standard (week starts Monday, `week_start = 1`) `"epiweek"`: US CDC epiweek standard (week starts Sunday, `week_start = 7`) `"isoyear"`: ISO year (corresponding year of the ISO week, differs from year by 1-3 days) `"epiyear"`: Epidemiological year (corresponding year of the epiweek, differs from year by 1-3 days) Defaults to `"week"`.
`week_start`	Integer specifying the start of the week (1 = Monday, 7 = Sunday). Only used when `date_resolution` involves weeks. Defaults to 1 (Monday). Overridden by `"isoweek"` (1) and `"epiweek"` (7) settings.
`.groups`	See `dplyr::summarise()`.

Details

The function performs several key operations:

Date coercion: Converts the date column to proper Date format
Gap filling (optional): Generates complete temporal sequences to fill missing time periods with zeros
Date binning: Rounds dates to the specified resolution using lubridate::floor_date()
Weight and population handling: Processes count weights and population denominators
Aggregation: Groups by binned dates and sums weights to get counts and incidence

Grouping behaviour: The function respects existing grouping in the input data frame.

Value

A data frame with the following columns:

A date column with the same name as dates_from, where values are binned to the start of the specified time period.
n: Count of observations (sum of weights) for each time period
incidence: Incidence rate calculated as n / population for each time period
Any existing grouping variables are preserved

Examples

library(dplyr)

# Create sample data
outbreak_data <- data.frame(
  onset_date = as.Date("2024-12-10") + sample(0:100, 50, replace = TRUE),
  cases = sample(1:5, 50, replace = TRUE)
)

# Basic weekly binning
bin_by_date(outbreak_data, dates_from = onset_date)

# Weekly binning with case weights
bin_by_date(outbreak_data, onset_date, n = cases)

# Monthly binning
bin_by_date(outbreak_data, onset_date,
  date_resolution = "month"
)

# ISO week binning (Monday start)
bin_by_date(outbreak_data, onset_date,
  date_resolution = "isoweek"
) |>
  mutate(date_formatted = strftime(onset_date, "%G-W%V")) # Add correct date labels

# US CDC epiweek binning (Sunday start)
bin_by_date(outbreak_data, onset_date,
  date_resolution = "epiweek"
)

# With population data for incidence calculation
outbreak_data$population <- 10000
bin_by_date(outbreak_data, onset_date,
  n = cases,
  population = population
)

ggsurveillance documentation built on July 2, 2025, 5:09 p.m.