In AgueroZZ/OSplines: Constructing O Spline for univariate smoothing

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", fig.height=3, fig.width=5, margins=TRUE
)
knitr::knit_hooks$set(margins = function(before, options, envir) {
  if (!before) return()
  graphics::par(mar = c(1.5 + 0.9, 1.5 + 0.9, 0.2, 0.2), mgp = c(1.45, 0.45, 0), cex = 1.25, bty='n')
})

options(warn=-1)
library(OSplines)

COVID-19 Example

Data and Model

We will illustrate the use of O-spline using the covid_canada dataset, which contains the daily death count of COVID-19 in Canada.

head(covid_canada)

For simplicity, let's consider the following model: [Y_i|\lambda_i \sim \text{Poisson}(\lambda_i)] [\log(\lambda_i) = \mathbf{x}_i^T\boldsymbol{\beta} + f(t_i)] where $\mathbf{x}_i$ denotes the fixed effect of weekdays, and $f$ is an unknown function to be inferred.

To make inference of the unknown function $f$, we use the $\text{IWP}_3(\sigma)$ model: [\frac{\partial^p{f}(t)}{\partial t^p} = \sigma \xi(t),] with the boundary (initial) conditions that $\frac{\partial^q{f}(0)}{\partial t^q} = 0$ for all $0\leq q <p$. Here $\xi(t)$ is the standard Gaussian white noise process, or can be viewed as the distributional derivative of the standard Brownian motion.

Inference

To fit the above model using the OSpline package, we simply do the following:

fit_result <- model_fit(new_deaths ~ weekdays1 + weekdays2 + weekdays3 + weekdays4 + weekdays5 + weekdays6 +
                          f(smoothing_var = t, model = "IWP", order = 3, k = 100, sd.prior = list(prior = "exp", para = list(u = 0.02, alpha = 0.5))), 
                        data = covid_canada, method = "aghq", family = "Poisson")

We can take a look at the posterior summary of this model:

summary(fit_result)

We can also see the inferred function $f$:

plot(fit_result)

We can use the predict function to obtain the posterior summary of $f$ or its derivative at new_data:

For the function $f$:

predict_f <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 615, by = 0.1)))
predict_f %>% ggplot(aes(x = x)) + geom_line(aes(y = mean), lty = "solid") +
  geom_line(aes(y = plower), lty = "dashed") +
  geom_line(aes(y = pupper), lty = "dashed") +
  theme_classic() + xlim(c(605,615)) + ylim(0,6)

For the first derivative:

predict_f1st <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 615, by = 0.1)), degree = 1)
predict_f1st %>% ggplot(aes(x = x)) + geom_line(aes(y = mean), lty = "solid") +
  geom_line(aes(y = plower), lty = "dashed") +
  geom_line(aes(y = pupper), lty = "dashed") +
  theme_classic() + xlim(c(605,615)) + ylim(c(-3,5))

For the second derivative:

predict_f2nd <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 617, by = 0.1)), degree = 2)
predict_f2nd %>% ggplot(aes(x = x)) + geom_line(aes(y = mean), lty = "solid") +
  geom_line(aes(y = plower), lty = "dashed") +
  geom_line(aes(y = pupper), lty = "dashed") +
  theme_classic()

Inference with R-INLA

A similar model with an IWP component could be fitted using the software R-INLA. Instead of using an O-Spline approximation, the R-INLA software uses the augmented space approach to fit the IWP component. The R-INLA software implements two possible choices of the IWP model: the first order IWP (crw1) or the second order IWP (crw2).

Since the third IWP is not compatible with the R-INLA software, here we use the second order IWP as an example:

To fit the model and make inference of the function $f$:

if(requireNamespace("INLA", quietly=TRUE)) {

  inla_result <- INLA::inla(formula = new_deaths~-1 + weekdays1 + weekdays2 + weekdays3 + weekdays4 + weekdays5 + weekdays6 + f(t, model = "crw2", constr = F, hyper = list(prec = list(prior = "pc.prec", param = c(0.02, 0.5)))), data = covid_canada, family = "Poisson", control.compute=list(config = TRUE))

  inla_result$summary.random$t %>% filter(ID >= 0) %>% ggplot(aes(x = ID)) + geom_line(aes(y = mean), lty = "solid") +
  geom_line(aes(y = `0.025quant`), lty = "dashed") +
  geom_line(aes(y = `0.975quant`), lty = "dashed") +
  theme_classic() + xlim(c(605,615)) + ylim(0,6)

}

To make inference of the first derivative $f'$:

if(requireNamespace("INLA", quietly=TRUE)) {

  inla_result$summary.random$t %>% filter(ID <= 0) %>% mutate(ID = -ID) %>% 
  ggplot(aes(x = ID)) + geom_line(aes(y = mean), lty = "solid") +
  geom_line(aes(y = `0.025quant`), lty = "dashed") +
  geom_line(aes(y = `0.975quant`), lty = "dashed") +
  theme_classic() + xlim(c(605,615)) + ylim(c(-3,5))

}

The inferential result of the function $f$ using R-INLA is comparable to the result using OSpline The inference of derivative, on the other hand, tends to be more wiggly using R-INLA. This difference is due to the lower order IWP used in the model. The highest order IWP that R-INLA accommodates is the second order IWP, which only assumes the function to be once differentiable, whereas the third order IWP used by OSpline assumes the function to be twice differentiable.

AgueroZZ/OSplines documentation built on Sept. 17, 2023, 9:24 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com