source(file.path("vignettes", "_common.R")) knitr::opts_chunk$set( fig.path = "man/figures/README-" )
The {epiprocess}
package works with epidemiological time series data and
provides tools to manage, analyze, and process the data in preparation for
modeling. It is designed to work in tandem with
epipredict, which provides
pre-built epiforecasting models and as well as tools to build custom models.
Both packages are designed to lower the barrier to entry and implementation cost
for epidemiological time series analysis and forecasting.
{epiprocess}
contains:
epi_df()
and epi_archive()
, two data frame classes (that work like a
{tibble}
with {dplyr}
verbs) for working with epidemiological time
series dataepi_df
is for working with a snapshot of data at a single point in timeepi_archive
is for working with histories of data that changes over timeepi_archive
is for accurate backtesting of
forecasting models, see vignette("backtesting", package="epipredict")
epi_slide()
for sliding window operations (aids with feature creation)epix_slide()
for sliding window operations on archives (aids with
backtesting)growth_rate()
for computing growth ratesdetect_outlr()
for outlier detectionepi_cor()
for computing correlationsIf you are new to this set of tools, you may be interested learning through a book format: Introduction to Epidemiological Forecasting.
You may also be interested in:
{epidatr}
, for accessing wide range
of epidemiological data sets, including COVID-19 data, flu data, and more.This package is provided by the Delphi group at Carnegie Mellon University.
To install:
# Stable version pak::pkg_install("cmu-delphi/epiprocess@main") # Dev version pak::pkg_install("cmu-delphi/epiprocess@dev")
The package is not yet on CRAN.
Once epiprocess
and epidatr
are installed, you can use the following code to
get started:
library(epiprocess) library(epidatr) library(dplyr) library(magrittr)
Get COVID-19 confirmed cumulative case data from JHU CSSE for California, Florida, New York, and Texas, from March 1, 2020 to January 31, 2022
df <- pub_covidcast( source = "jhu-csse", signals = "confirmed_cumulative_num", geo_type = "state", time_type = "day", geo_values = "ca,fl,ny,tx", time_values = epirange(20200301, 20220131), as_of = as.Date("2024-01-01") ) %>% select(geo_value, time_value, cases_cumulative = value) df
Convert the data to an epi_df object and sort by geo_value and time_value. You
can work with an epi_df
like you can with a {tibble}
by using {dplyr}
verbs
edf <- df %>% as_epi_df(as_of = as.Date("2024-01-01")) %>% arrange_canonical() %>% group_by(geo_value) %>% mutate(cases_daily = cases_cumulative - lag(cases_cumulative, default = 0)) edf
Compute the 7 day moving average of the confirmed daily cases for each geo_value
edf <- edf %>% group_by(geo_value) %>% epi_slide_mean(cases_daily, .window_size = 7, na.rm = TRUE) %>% rename(smoothed_cases_daily = slide_value_cases_daily) edf
Autoplot the confirmed daily cases for each geo_value
edf %>% autoplot(smoothed_cases_daily)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.