In lcrayton18/State_package_burdenandtrajectory: Used to calculate sub-state metrics relating to the COVID-19 pandemic

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)

This document provides an under-the-hood walk-through of the steps used to produce an output for the confirmed case metrics for ingestion into Tableau using the functions in the {ohiCovidMetrics} package. These functions were written up to the specifications in the Confirmed Case Metric Reproducibility Document on the OHI Surveillance Team's SharePoint site.

library(tidyverse)
library(ohiCovidMetrics)
library(readxl)

Pulling the data

Eventually, this package will contain multiple ways to pull the data depending on the user's level of access. In this vignette, we will grab data from the public facing COVID-19 Historical Data Table via an API call with the pull_histTable() function which takes no arguments. Note, I stopped the time series at 17 June 2020 to be able to comparet the results to the official file prepared by Jeff Bond for the dashboard.

hdt_raw <- pull_histTable(end_date = "2020-06-17")

This function does six things:

Imports data from the REST API https://services1.arcgis.com/ISZ89Z51ft1G16OK/ArcGIS/rest/services/COVID19_WI/FeatureServer/10/query?where=GEO%20%3D%20'COUNTY'&outFields=GEOID,GEO,NAME,LoadDttm,NEGATIVE,POSITIVE,DEATHS,TEST_NEW,POS_NEW,DTH_NEW&outSR=4326&f=geojson using the st_read() function from the {sf} package.
Some basic data wrangling to convert the date into the correct format and create the daily new confirmed cases time series.[^1]
Recursively corrects any reversals in the confirmed cases time series using the clean_reversals() function. Note: this is done at the county level and then aggregated up to the State level.
Aggregates county data up to the HERC region and State levelsd
Appends population data
Finally, it produces a cumulative case time-series based on the cleaned version.

[^1]: It pulls down the new deaths and tests time series as well for more context.

The result is a data.frame where there is one row per county/HERC region/state per day.

Preparing the data to calculate metrics

The confirmed case metrics are all based on 7 day bins looking back from the current date, e.g., For data ending on 10 June 2020, the 7 day bins are:

4 June 2020 through 10 June 2020,
28 May 2020 through 3 June 2020,
21 May 2020 through 27 May 2020,

and so on. To make things easier, we need to aggregate the daily counts within geographies, or regions, and 7 day bins. This can be done with the shape_case_data() function:

hdt_clean <- shape_case_data(hdt_raw)

This function:

Calculates the 7 day bins based on the mosts recent date in the case time-series data.frame using rolling_week().
Aggregates the case count within these bins for each region.
Keeps the two most recent 7 day bins and reshapes the data.frame to 'wide' format so that there is one row per region and one column per case counts in the current and previous 7-day case count.

Thus the data are in a format that is ready for Tableau ingestion and that facilitates the calculations of each metric in its own column.

Calculating Metrics

Finally, we can calculate the metrics. For production purposes, I have wrapped all of these calculations into the process_confirmed_cases() function, which takes the cleaned case time-series data as it's only argument:

hdt_out <- process_confirmed_cases(hdt_clean)

Since this is the meat of the analysis, I want to go through them each, step-by-step.

Burden metrics

See the top panel of Table 1 in the Confirmed Case Metric Reproducibility Document for the definitions. Following the CDC's State Indicator Report, we calculate Burden based on the number of confirmed cases over the past 14 days per 100,000 population in the region. The numerical value of burden is calculated with the score_burden() function and mapped onto the 4 categories using class_burden() as shown below.

burden <- score_burden(curr = hdt_clean$case_weekly_1, 
                       prev = hdt_clean$case_weekly_2,
                       pop  = hdt_clean$pop_2018)

burden_c <- class_burden(burden)

Trajectory metrics

The definitions for the confirmed case metrics are shown in the middle panel of Table 1 and the methods are described in the text that follows it. In short, we want to pay attention both to the statistical significance of the changing case counts as well as the substantive significance, or magnitude, of the change. Thus, we calculate the trajectory as the ratio of cases in the current 7 day period to the previous 7 day period using the score_trajectory() function. We calculate the statistical significance using a two-sided exact test for equality of these two counts via the poisson.test() function from the {stats} package with the pval_trajectory() function. Once we have these two quantities, we can classify the trajectory using class_trajectory()

The final trajectory based metric that we calculate is the Benjemini and Hochman's False Discovery Rate (with the p.adjust(method = 'fdr') function from the {stats} package). This is of most use to us at DHS since by calculating 80 p-values we should expect 4 statistically significant trajectories to occur each week just by chance.

trajectory <- score_trajectory(curr = hdt_clean$case_weekly_1, 
                               prev = hdt_clean$case_weekly_2)

trajectory_p <- pval_trajectory(curr = hdt_clean$case_weekly_1, 
                                prev = hdt_clean$case_weekly_2)

trajectory_c <- class_trajectory(traj = trajectory,
                                 pval = trajectory_p)

trajectory_f <- fdr_trajectory(pval = trajectory_p)

Composite (Summary) confirmed case metric

The bottom panel of Table 1 illustrates how we summarize a regions confirmed case burden and trajectory into a single composite status. This status is based on the cross-classification of the burden classification and the trajectory classification and takes on one of the following three values: Low (light blue), Moderate (blue), High (dark blue). This cross-classification is done with the confirmed_case_composite() function.

composite <- confirmed_case_composite(traj_class = trajectory_c,
                                      burd_class = burden_c)

Final cleaning and renaming for output to .csv

The process_confirmed_cases() function also does some final cleaning: expressing trajectory in percentage point changes, rounding burden and trajectory, renaming columns, and reordering columns to fit the Tableau team's requirements.

The final product is:

knitr::kable(hdt_out, 
             caption = "Confirmed case metrics output for the period ending 17 June 2020")

Comparing against Jeff's calculations

The real test is to see how the estimates in these functions compare to the ones that Jeff produced, For reference, I am using the 17 June 2020 file that Jeff produced (link here). First we will pull it in and then compare the our results to his. Note: I am not comparing the Count or Rate columns because the Tableau team just removes them.

Note: I have updated the output file to meet the Tableau Team's Specifications

metrics_jeff <- read_xlsx("../inst/file_for_vignette/OHI_BBB_Surveillance_CaseMetric_20200617.xlsx") %>% 
  mutate(
    Region = ifelse(Region == "Saint Croix", "St. Croix", Region),
    Trajectory_Class = ifelse(Trajectory_Class == "No statistically significant change", "No significant change", Trajectory_Class),
    Composite_Class = ifelse(Composite_Class == "Moderate", "Medium", Composite_Class)
  )

#Identical Date
identical(as.Date(unique(metrics_jeff$Date)), unique(hdt_out$Date))

#Identical Region_ID
identical(sort(unique(metrics_jeff$Region_ID)), sort(unique(hdt_out$Region_ID)))

#Identical Region
identical(sort(unique(metrics_jeff$Region)), sort(unique(hdt_out$Region)))

#Burden
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Burden = Burden),
           select(hdt_out, Region_ID, Region, My_Burden = Burden),
           by = c("Region_ID", "Region")) %>% 
  mutate(diff = Jeff_Burden - My_Burden) %>% 
  filter(diff != 0) 

#Trajectory
#re-create original trajectory calculation before cleaning
hdt_out$Trajectory_Orig <- signif(trajectory, 2)
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Trajectory = Trajectory),
           select(hdt_out, Region_ID, Region, My_Trajectory = Trajectory_Orig),
           by = c("Region_ID", "Region")) %>% 
  mutate(diff = Jeff_Trajectory - My_Trajectory) %>% 
  filter(diff != 0) 

#Burden_Class
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Burden_Class = Burden_Class),
           select(hdt_out, Region_ID, Region, My_Burden_Class = Burden_Class),
           by = c("Region_ID", "Region")) %>% 
  mutate(same = Jeff_Burden_Class == My_Burden_Class) %>% 
  filter(same == FALSE) 


#Trajectory_Class
#Change Jeff's labels to new wording
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Trajectory_Class = Trajectory_Class),
           select(hdt_out, Region_ID, Region, My_Trajectory_Class = Trajectory_Class),
           by = c("Region_ID", "Region")) %>% 
  mutate(same = Jeff_Trajectory_Class == My_Trajectory_Class) %>% 
  filter(same == FALSE) 

#Trajectory P-value
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Trajectory_P = Trajectory_P),
           select(hdt_out, Region_ID, Region, My_Trajectory_P = Trajectory_P),
           by = c("Region_ID", "Region")) %>% 
  mutate(diff = Jeff_Trajectory_P - My_Trajectory_P) %>% 
  filter(diff  > 1e-15) 


#Trajectory FDR
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Trajectory_FDR = Trajectory_FDR),
           select(hdt_out, Region_ID, Region, My_Trajectory_FDR = Trajectory_FDR),
           by = c("Region_ID", "Region")) %>% 
  mutate(diff = Jeff_Trajectory_FDR - My_Trajectory_FDR) %>% 
  filter(diff > 1e-15) 

#Composite
inner_join(select(metrics_jeff, Region_ID, Region, Jeff_Composite_Class = Composite_Class),
           select(hdt_out, Region_ID, Region, My_Composite_Class = Composite_Class),
           by = c("Region_ID", "Region")) %>% 
  mutate(diff = Jeff_Composite_Class != My_Composite_Class) %>% 
  filter(diff == TRUE)

Reverse CUSUM Charts

This information is not included in the Tableau reports, but are used to accumulate information from the entire time series and is especially useful for smaller counties or counties with smaller counts. Since these cases use case data from more than the past two weeks, we need to start with the raw data and re-aggregate it into weekly counts within regions.

last_date <- max(hdt_raw$post_date)

#Prep Data for CUSUM functions
hdt_cusum <- hdt_raw %>% 
  group_by(fips, geo_name) %>% 
  mutate(
    week_num = rolling_week(date_vector = post_date,
                               end_date = last_date),
  ) %>% 
  group_by(fips, geo_name, week_num) %>% 
  summarize(
    case_weekly = as.integer(sum(case_daily)),
    week_end = max(post_date),
    .groups = "keep"
  ) %>% 
  group_by(fips, geo_name) %>% 
  arrange(fips, geo_name, week_num) %>% 
  filter(!is.na(week_num)) %>%
  mutate(
    weeks_ago = as.integer(week_num - 1),
    S_N = cumsum(case_weekly) - case_weekly[weeks_ago == 0],
    S_N_z = S_N / weeks_ago - case_weekly[weeks_ago == 0],
    WC_W = case_weekly[weeks_ago == 0],
    lower_control_limit = map2_dbl(WC_W, weeks_ago, rev_cusum_lcl),
    upper_control_limit = map2_dbl(WC_W, weeks_ago, rev_cusum_ucl)
  )

plot_rev_cusum <- function(rev_cusum_df, region_name) {
  rev_cusum_df %>% 
    filter(geo_name == region_name, weeks_ago > 0) %>% 
      ggplot(aes(x = as.character(week_end), group = geo_name, 
                 y = S_N_z, ymax = upper_control_limit, ymin = lower_control_limit)) + 
        geom_hline(yintercept = 0, color = "red") + 
        geom_point(aes(y = S_N_z)) + 
        geom_ribbon(fill = NA, color = "black") + 
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 90)) + 
        labs(x = "End date for 7 Day bins", y = "", 
             title = paste("Reverse CUSUM Chart for", region_name))
}

plot_rev_cusum(hdt_cusum, "Wisconsin")

plot_rev_cusum(hdt_cusum, "La Crosse")

plot_rev_cusum(hdt_cusum, "Dane")

plot_rev_cusum(hdt_cusum, "Iron")

plot_rev_cusum(hdt_cusum, "Fox Valley Area")

lcrayton18/State_package_burdenandtrajectory documentation built on July 19, 2020, 12:37 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lcrayton18/State_package_burdenandtrajectory
Used to calculate sub-state metrics relating to the COVID-19 pandemic

In lcrayton18/State_package_burdenandtrajectory: Used to calculate sub-state metrics relating to the COVID-19 pandemic

Pulling the data

Preparing the data to calculate metrics

Calculating Metrics

Burden metrics

Trajectory metrics

Composite (Summary) confirmed case metric

Final cleaning and renaming for output to .csv

Comparing against Jeff's calculations

Reverse CUSUM Charts

R Package Documentation

Browse R Packages

We want your feedback!

lcrayton18/State_package_burdenandtrajectory Used to calculate sub-state metrics relating to the COVID-19 pandemic

In lcrayton18/State_package_burdenandtrajectory: Used to calculate sub-state metrics relating to the COVID-19 pandemic

Pulling the data

Preparing the data to calculate metrics

Calculating Metrics

Burden metrics

Trajectory metrics

Composite (Summary) confirmed case metric

Final cleaning and renaming for output to .csv

Comparing against Jeff's calculations

Reverse CUSUM Charts

R Package Documentation

Browse R Packages

We want your feedback!

lcrayton18/State_package_burdenandtrajectory
Used to calculate sub-state metrics relating to the COVID-19 pandemic