dlmwwbe: Dynamic Linear Model for Wastewater-based epidemiology with missing values"
In dlmwwbe: Dynamic Linear Model for Wastewater-Based Epidemiology

knitr::opts_chunk$set(
  comment = "#>",
  fig.width = 7.2,
  fig.height = 4.8,
  fig.align = "center"
)

This package dlmwwbe (Dynamic Linear Model for Wastewater-based Epidemilogy with Missing Data) contains two main function pdlm() (Predictive Dynamic Linear Model) and dllm() (Dynamic Local Level Model). The first one is to fit a dynamic linear model for forecasting the clinical positive cases (or other similar data) using lagged clinical and wastewater data. The second one is to fit a local level model for smoothing the noisy wastewater data. For more details, see papers here.

knitr::opts_chunk$set(echo = TRUE)
library(dlmwwbe)
data(wastewater)
data(wastewaterhealthworker)

Dynamic Local Level Model

First, we implement dllm() on the wastewater data collected between 2022 - 2024 in Twin Cities metro area in Minnesota, United States. For the detail of the data, see papers. There are two possible structures: 1. all wastewater data share a single latent state (S = 'univariate'). 2. Each wastewater data has its own latent sate (S = 'kvariate'). For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument log10 = TRUE. This is because the data better approximates the normality assumption in practice. Other transformation might be necessary depending on the nature of the data. The summary() provides some information of the fitted model.

Consider both wastewater data have their individual latent state. The average of the smoother is provided.

data_TC <- wastewater[wastewater$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- dllm(
  equal.state.var=FALSE,
  equal.obs.var=FALSE,
  log10=TRUE,
  data = data_TC,
  date = "SampleDate",
  obs_cols = c("ORFlab", "Nlab"),
  S = c('kvariate')
)

summary(fit)
plot(fit, type='smoother', conf.int = TRUE)

Predictive Dynamic Linear Model

Next, we implement pdlm() on the clinical and wastewater data. Different number of lags are demonstrated. For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument log10 = TRUE (and add $1$ for the positive count cases for a valid transformation). The summary() provides some information of the fitted model.

Here, We consider $0$ and $2$ lags and plot them along with the observed data on its original scale.

data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- pdlm(
  data=data_TC,
  formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday,
  lags=0,
  log10=TRUE,
  date = NULL,
  equal.state.var = TRUE,
  equal.obs.var = FALSE,
  auto_init = TRUE,
  control = list(maxit = 100))
summary(fit)
plot(fit, conf.int=TRUE)

data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- pdlm(
  data=data_TC,
  formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday,
  lags=2,
  log10=TRUE,
  date = NULL,
  equal.state.var = FALSE,
  equal.obs.var = TRUE,
  auto_init = TRUE,
  control = list(maxit = 100))
summary(fit)
plot(fit, conf.int=TRUE)