align_to_baseline: Align case tracking locations (for example) to a common...

View source: R/align_to_baseline.R

align_to_baselineR Documentation

Align case tracking locations (for example) to a common baseline

Description

When endeavoring to compare epidemic curves (cases vs date, for example), particularly when making graphical displays, it is helpful to set a "time baseline" that aligns where all the curves start.

Usage

align_to_baseline(df, filter_criteria, date_column = "date", group_vars)

Arguments

df

data.frame that includes a date column and at least one other column for filtering, typically a case count.

filter_criteria

an expression as would normally be specified directly to dplyr::filter().

date_column

character(1) column name of the column for ordering the data to define a "beginning" of the curve. It is called a "date column", but anything with a natural ordering will likely work.

group_vars

optional character() column_name(s) that specify grouping done before calculating minimum dates. Concretely, if the goal is to compare several countries, then the group_vars='country' with a column in df called country.

Details

This function takes this basic approach:

  1. Filter all all data using the filter_criteria, expressed as a dplyr::filter() expression.

  2. Optionally group the dataset.

  3. Find the minimum date left after applying the filter criteria

  4. "Subtract" the minimum date (on a per group basis if grouping columns are used).

The result is a plot that shifts all the curves to start at the "same" starting time with respect to the "start" of the pandemic. For example, for the COVID-19 pandemic, China started much earlier than the rest of the world. To compare the time course of China versus other countries, setting the time to the point where each country had 100 cases allows direct comparison of the shapes of the countries' curves.

Value

A data.frame with a new column, index, that gives the number of time intervals (typically days) from when the baseline counts are first encountered, done by group.

Author(s)

Sean Davis seandavi@gmail.com

See Also

Other case-tracking: beoutbreakprepared_data(), bulk_estimate_Rt(), combined_us_cases_data(), coronadatascraper_data(), covidtracker_data(), ecdc_data(), estimate_Rt(), jhu_data(), nytimes_county_data(), owid_data(), plot_epicurve(), test_and_trace_data(), usa_facts_data(), who_cases()

Other plotting: plot_epicurve()

Examples

library(dplyr)
library(ggplot2)

# use European CDC dataset
ecdc = ecdc_data()
head(ecdc)
dplyr::glimpse(ecdc)

# get top 10 countries by cumulative
# number of deaths
top_10 = ecdc %>%
    dplyr::filter(subset=='deaths_weekly') %>%
    dplyr::group_by(location_name) %>%
    dplyr::summarize(deaths = max(count)) %>%
    dplyr::arrange(dplyr::desc(deaths)) %>%
    head(10)

top_10

# limit ecdc data to "deaths" and
# top 10 countries

ecdc_top10 = ecdc %>%
    dplyr::filter(location_name %in% top_10[['location_name']] & subset=='deaths_weekly')
plot_epicurve(ecdc_top10, color='location_name', case_column='count')

ecdc_top10_baseline = align_to_baseline(ecdc_top10, count>100, group_vars='location_name')

plot_epicurve(ecdc_top10_baseline, date_column='index', color='location_name') +
    ggtitle('Deaths over time, aligned to date of 100 deaths per country') 


seandavi/sars2pack documentation built on May 13, 2022, 3:41 p.m.