knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.height=3, fig.width=5, margins=TRUE ) knitr::knit_hooks$set(margins = function(before, options, envir) { if (!before) return() graphics::par(mar = c(1.5 + 0.9, 1.5 + 0.9, 0.2, 0.2), mgp = c(1.45, 0.45, 0), cex = 1.25, bty='n') })
options(warn=-1) library(OSplines)
We will illustrate the use of O-spline using the covid_canada
dataset, which contains the daily death count of COVID-19 in Canada.
head(covid_canada)
For simplicity, let's consider the following model: [Y_i|\lambda_i \sim \text{Poisson}(\lambda_i)] [\log(\lambda_i) = \mathbf{x}_i^T\boldsymbol{\beta} + f(t_i)] where $\mathbf{x}_i$ denotes the fixed effect of weekdays, and $f$ is an unknown function to be inferred.
To make inference of the unknown function $f$, we use the $\text{IWP}_3(\sigma)$ model: [\frac{\partial^p{f}(t)}{\partial t^p} = \sigma \xi(t),] with the boundary (initial) conditions that $\frac{\partial^q{f}(0)}{\partial t^q} = 0$ for all $0\leq q <p$. Here $\xi(t)$ is the standard Gaussian white noise process, or can be viewed as the distributional derivative of the standard Brownian motion.
To fit the above model using the OSpline package, we simply do the following:
fit_result <- model_fit(new_deaths ~ weekdays1 + weekdays2 + weekdays3 + weekdays4 + weekdays5 + weekdays6 + f(smoothing_var = t, model = "IWP", order = 3, k = 100, sd.prior = list(prior = "exp", para = list(u = 0.02, alpha = 0.5))), data = covid_canada, method = "aghq", family = "Poisson")
We can take a look at the posterior summary of this model:
summary(fit_result)
We can also see the inferred function $f$:
plot(fit_result)
We can use the predict
function to obtain the posterior summary of $f$ or its derivative at new_data
:
For the function $f$:
predict_f <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 615, by = 0.1))) predict_f %>% ggplot(aes(x = x)) + geom_line(aes(y = mean), lty = "solid") + geom_line(aes(y = plower), lty = "dashed") + geom_line(aes(y = pupper), lty = "dashed") + theme_classic() + xlim(c(605,615)) + ylim(0,6)
For the first derivative:
predict_f1st <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 615, by = 0.1)), degree = 1) predict_f1st %>% ggplot(aes(x = x)) + geom_line(aes(y = mean), lty = "solid") + geom_line(aes(y = plower), lty = "dashed") + geom_line(aes(y = pupper), lty = "dashed") + theme_classic() + xlim(c(605,615)) + ylim(c(-3,5))
For the second derivative:
predict_f2nd <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 617, by = 0.1)), degree = 2) predict_f2nd %>% ggplot(aes(x = x)) + geom_line(aes(y = mean), lty = "solid") + geom_line(aes(y = plower), lty = "dashed") + geom_line(aes(y = pupper), lty = "dashed") + theme_classic()
A similar model with an IWP component could be fitted using the software R-INLA. Instead of using an O-Spline approximation, the R-INLA software uses the augmented space approach to fit the IWP component. The R-INLA software implements two possible choices of the IWP model: the first order IWP (crw1
) or the second order IWP (crw2
).
Since the third IWP is not compatible with the R-INLA software, here we use the second order IWP as an example:
To fit the model and make inference of the function $f$:
if(requireNamespace("INLA", quietly=TRUE)) { inla_result <- INLA::inla(formula = new_deaths~-1 + weekdays1 + weekdays2 + weekdays3 + weekdays4 + weekdays5 + weekdays6 + f(t, model = "crw2", constr = F, hyper = list(prec = list(prior = "pc.prec", param = c(0.02, 0.5)))), data = covid_canada, family = "Poisson", control.compute=list(config = TRUE)) inla_result$summary.random$t %>% filter(ID >= 0) %>% ggplot(aes(x = ID)) + geom_line(aes(y = mean), lty = "solid") + geom_line(aes(y = `0.025quant`), lty = "dashed") + geom_line(aes(y = `0.975quant`), lty = "dashed") + theme_classic() + xlim(c(605,615)) + ylim(0,6) }
To make inference of the first derivative $f'$:
if(requireNamespace("INLA", quietly=TRUE)) { inla_result$summary.random$t %>% filter(ID <= 0) %>% mutate(ID = -ID) %>% ggplot(aes(x = ID)) + geom_line(aes(y = mean), lty = "solid") + geom_line(aes(y = `0.025quant`), lty = "dashed") + geom_line(aes(y = `0.975quant`), lty = "dashed") + theme_classic() + xlim(c(605,615)) + ylim(c(-3,5)) }
The inferential result of the function $f$ using R-INLA is comparable to the result using OSpline
The inference of derivative, on the other hand, tends to be more wiggly using R-INLA. This difference is due to the lower order IWP used in the model. The highest order IWP that R-INLA accommodates is the second order IWP, which only assumes the function to be once differentiable, whereas the third order IWP used by OSpline
assumes the function to be twice differentiable.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.