In olshena/COVIDNearTerm: Near Term Prediction of COVID-Related Population Metrics

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(COVIDNearTerm)

The purpose of this vignette is to show how to make hospitalization predictions using the COVIDNearTerm method from the COVIDNearTerm library. We will focus on recreating a figure like Figure 1 in (ref paper).

install.packages('remotes')
remotes::install_github(repo = 'olshena/COVIDNearTerm', build_vignettes = TRUE)
library(COVIDNearTerm)

The county_hospitalizations data set contains 308 days worth of hospitalization data from six Bay Area counties starting on May 1, 2020. We will make predictions using the 150 days from Santa Clara County as training data.

training_data <- county_hospitalizations$Santa.Clara[1:150]

We will build the model and make predictions using the same function, simulateAR

The arguments are:

vec is the training data

wsize is the number of prior days to estimate the current hospitalization trend

method is how to weigh the wsize prior days of hospitalizations, choices are "equal" (default), "unequal", and "triangle"

lambda is the shrinkage parameter, default is 0 for no shrinkage, but it is good to make nonzero

pdays is the number of days to predict

seed is the seed for the random generator

output_type is the type of output, a good choice is "predictions" for predictions

nsim is the number of simulations, 10000 is good, we use 100 for speed

predicted_paths <- simulateAR(vec         = training_data,
                              wsize       = 14,
                              method      = "unweighted",
                              pdays       = 28, 
                              lambda      = seq(0,1,0.05),
                              seed        = 12345,
                              output_type = "predictions",
                              nsim        = 100)

When output = "predictions", simulateAR() returns a data frame with pdays columns and nsim + 1 rows. The first column, t, indicates prediction day. The remaining columns each correspond to a different run of predictions. So predicted_paths[1:5, "X1"] contains daily predictions for the first five days in prediction path 1.

We are interested in the median prediction at each day.

median_predictions <- apply(predicted_paths[,(-1)],1,median)

We plot the data.

par(mgp=c(2,0.75,0))
plot(c(1,173),
     c(0,1000),
     type = "n",
     axes = FALSE,
     xlab = "",
     ylab = "#Hospitalized")
box()
axis(1,
     c(-2,29,59,90,121,151),
     c(paste0("May\n", 20),
       paste0("Jun\n", 20),
       paste0("Jul\n", 20),
       paste0("Aug\n", 20),
       paste0("Sep\n", 20),
       paste0("Oct\n", 20)),
     las      = 1,
     tck      = -0.01,
     cex.axis = 0.8)
axis(2)
for(i in 1:100) {
  lines(151:178,
        predicted_paths[,(i+1)],
        lty = 1,
        col = 2)
}

lines(151:178,
      median_predictions,
      col = 3,
      lwd = 3)
lines(1:178,
      county_hospitalizations$Santa.Clara[1:178])

lines(c(150,150),
      c(-100,10000),
      col = "brown",
      lwd = 2)

text(65,500,"Training Data",
     cex = 2)

Each red line is a predicted path for daily hospitalizations. The green line is the median path for predicted hospitalizations, and the black line is the truth.

olshena/COVIDNearTerm documentation built on May 27, 2023, 1:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com