knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 )
This document provides an under-the-hood walk-through of the steps used to
produce an output for the confirmed case metrics for ingestion into Tableau
using the functions in the {ohiCovidMetrics
} package. These functions were
written up to the specifications in the Confirmed Case Metric Reproducibility
Document
on the OHI Surveillance Team's SharePoint site.
library(tidyverse) library(ohiCovidMetrics) library(readxl)
Eventually, this package will contain multiple ways to pull the data depending
on the user's level of access. In this vignette, we will grab data from the
public facing COVID-19 Historical Data Table via
an API call with the pull_histTable()
function which takes no arguments. Note,
I stopped the time series at 17 June 2020 to be able to compare the results to the
official file prepared by Jeff Bond for the dashboard.
hdt_clean <- pull_histTable(end_date = "2020-09-17")
This function does six things:
st_read()
function from the {sf
} package.clean_reversals()
function. Note: this is done at the county level and
then aggregated up to the State level.[^1]: It pulls down the new deaths and tests time series as well for more context.
The result is a data.frame where there is one row per county/HERC region/state per day.
The confirmed case metrics are all based on 7 day bins looking back from the current date, e.g., For data ending on 10 June 2020, the 7 day bins are:
and so on. To make things easier, we need to aggregate the daily counts
within geographies, or regions, and 7 day bins. This is done with
the internal shape_case_data()
function that is called by pull_histTable()
.
This function:
rolling_week()
.Thus the data are in a format that is ready for Tableau ingestion and that facilitates the calculations of each metric in its own column.
Finally, we can calculate the metrics. For production purposes, I have
wrapped all of these calculations into the process_confirmed_cases()
function, which takes the cleaned case time-series data as it's only
argument:
hdt_out <- process_confirmed_cases(hdt_clean)
Since this is the meat of the analysis, I want to go through them each, step-by-step.
See the top panel of Table 1 in the Confirmed Case Metric Reproducibility
Document
for the definitions. Following the CDC's State Indicator Report, we calculate
Burden based on the number of confirmed cases over the past 14 days per 100,000
population in the region. The numerical value of burden is calculated with the
score_burden()
function and mapped onto the 4 categories using class_burden()
as shown below.
burden <- score_burden(curr = hdt_clean$case_weekly_1, prev = hdt_clean$case_weekly_2, pop = hdt_clean$pop_2020) burden_c <- class_burden(burden)
The definitions for the confirmed case metrics are shown in the middle panel
of Table 1 and the methods are described in the text that follows it. In short,
we want to pay attention both to the statistical significance of the changing
case counts as well as the substantive significance, or magnitude, of the
change. Thus, we calculate the trajectory as the ratio of cases in the current
7 day period to the previous 7 day period using the score_trajectory()
function. We calculate the statistical significance using a two-sided exact
test for equality of these two counts via the poisson.test()
function from
the {stats
} package with the pval_trajectory()
function. Once we have these
two quantities, we can classify the trajectory using class_trajectory()
The final trajectory based metric that we calculate is the Benjemini and Hochman's
False Discovery Rate (with the p.adjust(method = 'fdr')
function from the {stats
}
package). This is of most use to us at DHS since by calculating 80 p-values we should
expect 4 statistically significant trajectories to occur each week just by chance.
trajectory <- score_trajectory(curr = hdt_clean$case_weekly_1, prev = hdt_clean$case_weekly_2) trajectory_p <- pval_trajectory(curr = hdt_clean$case_weekly_1, prev = hdt_clean$case_weekly_2) trajectory_c <- class_trajectory(traj = trajectory, pval = trajectory_p) trajectory_f <- fdr_trajectory(pval = trajectory_p)
The bottom panel of Table 1 illustrates how we summarize a regions
confirmed case burden and trajectory into a single composite status.
This status is based on the cross-classification of the burden classification
and the trajectory classification and takes on one of the following three
values: Low (light blue), Moderate (blue), High (dark blue). This
cross-classification is done with the confirmed_case_composite()
function.
composite <- confirmed_case_composite(traj_class = trajectory_c, burd_class = burden_c)
The process_confirmed_cases()
function also does some final
cleaning: expressing trajectory in percentage point changes, rounding
burden and trajectory, renaming columns, and reordering columns to fit the
Tableau team's requirements.
The final product is:
knitr::kable(hdt_out, caption = "Confirmed case metrics output for the period ending 17 June 2020")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.