knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)

This document provides an under-the-hood walk-through of the steps used to produce an output for the confirmed case metrics for ingestion into Tableau using the functions in the {ohiCovidMetrics} package. These functions were written up to the specifications in the Confirmed Case Metric Reproducibility Document on the OHI Surveillance Team's SharePoint site.

library(tidyverse)
library(ohiCovidMetrics)
library(readxl)

Pulling the data

Eventually, this package will contain multiple ways to pull the data depending on the user's level of access. In this vignette, we will grab data from the public facing COVID-19 Historical Data Table via an API call with the pull_histTable() function which takes no arguments. Note, I stopped the time series at 17 June 2020 to be able to compare the results to the official file prepared by Jeff Bond for the dashboard.

hdt_clean <- pull_histTable(end_date = "2020-09-17") 

This function does six things:

  1. Imports data from the REST API https://dhsgis.wi.gov/server/rest/services/DHS_COVID19/COVID19_WI/FeatureServer/10/query?where=GEO%20%3D%20%27COUNTY%27&outFields=GEOID,GEO,NAME,DATE,NEGATIVE,POSITIVE,DEATHS,TEST_NEW,POS_NEW,DTH_NEW&outSR=4326&f=geojson using the st_read() function from the {sf} package.
  2. Some basic data wrangling to convert the date into the correct format and create the daily new confirmed cases time series.[^1]
  3. Recursively corrects any reversals in the confirmed cases time series using the clean_reversals() function. Note: this is done at the county level and then aggregated up to the State level.
  4. Aggregates county data up to the HERC region and state levels
  5. Appends population data
  6. Finally, it produces a cumulative case time-series based on the cleaned version.

[^1]: It pulls down the new deaths and tests time series as well for more context.

The result is a data.frame where there is one row per county/HERC region/state per day.

Preparing the data to calculate metrics

The confirmed case metrics are all based on 7 day bins looking back from the current date, e.g., For data ending on 10 June 2020, the 7 day bins are:

and so on. To make things easier, we need to aggregate the daily counts within geographies, or regions, and 7 day bins. This is done with the internal shape_case_data() function that is called by pull_histTable().

This function:

  1. Calculates the 7 day bins based on the mosts recent date in the case time-series data.frame using rolling_week().
  2. Aggregates the case count within these bins for each region.
  3. Keeps the two most recent 7 day bins and reshapes the data.frame to 'wide' format so that there is one row per region and one column per case counts in the current and previous 7-day case count.

Thus the data are in a format that is ready for Tableau ingestion and that facilitates the calculations of each metric in its own column.

Calculating Metrics

Finally, we can calculate the metrics. For production purposes, I have wrapped all of these calculations into the process_confirmed_cases() function, which takes the cleaned case time-series data as it's only argument:

hdt_out <- process_confirmed_cases(hdt_clean)

Since this is the meat of the analysis, I want to go through them each, step-by-step.

Burden metrics

See the top panel of Table 1 in the Confirmed Case Metric Reproducibility Document for the definitions. Following the CDC's State Indicator Report, we calculate Burden based on the number of confirmed cases over the past 14 days per 100,000 population in the region. The numerical value of burden is calculated with the score_burden() function and mapped onto the 4 categories using class_burden() as shown below.

burden <- score_burden(curr = hdt_clean$case_weekly_1, 
                       prev = hdt_clean$case_weekly_2,
                       pop  = hdt_clean$pop_2020)

burden_c <- class_burden(burden)

Trajectory metrics

The definitions for the confirmed case metrics are shown in the middle panel of Table 1 and the methods are described in the text that follows it. In short, we want to pay attention both to the statistical significance of the changing case counts as well as the substantive significance, or magnitude, of the change. Thus, we calculate the trajectory as the ratio of cases in the current 7 day period to the previous 7 day period using the score_trajectory() function. We calculate the statistical significance using a two-sided exact test for equality of these two counts via the poisson.test() function from the {stats} package with the pval_trajectory() function. Once we have these two quantities, we can classify the trajectory using class_trajectory()

The final trajectory based metric that we calculate is the Benjemini and Hochman's False Discovery Rate (with the p.adjust(method = 'fdr') function from the {stats} package). This is of most use to us at DHS since by calculating 80 p-values we should expect 4 statistically significant trajectories to occur each week just by chance.

trajectory <- score_trajectory(curr = hdt_clean$case_weekly_1, 
                               prev = hdt_clean$case_weekly_2)

trajectory_p <- pval_trajectory(curr = hdt_clean$case_weekly_1, 
                                prev = hdt_clean$case_weekly_2)

trajectory_c <- class_trajectory(traj = trajectory,
                                 pval = trajectory_p)

trajectory_f <- fdr_trajectory(pval = trajectory_p)

Composite (Summary) confirmed case metric

The bottom panel of Table 1 illustrates how we summarize a regions confirmed case burden and trajectory into a single composite status. This status is based on the cross-classification of the burden classification and the trajectory classification and takes on one of the following three values: Low (light blue), Moderate (blue), High (dark blue). This cross-classification is done with the confirmed_case_composite() function.

composite <- confirmed_case_composite(traj_class = trajectory_c,
                                      burd_class = burden_c)

Final cleaning and renaming for output to .csv

The process_confirmed_cases() function also does some final cleaning: expressing trajectory in percentage point changes, rounding burden and trajectory, renaming columns, and reordering columns to fit the Tableau team's requirements.

The final product is:

knitr::kable(hdt_out, 
             caption = "Confirmed case metrics output for the period ending 17 June 2020")


carlbfrederick/ohiCovidMetrics documentation built on Jan. 10, 2022, 12:20 p.m.