knitr::opts_chunk$set(collapse = TRUE, comment = "#")
knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"))
require("lgpr")
require("ggplot2")
require("rstan")

Introduction

In this tutorial we simulate and analyse count data.

Binomial data

We first show how to generate binomial data

set.seed(131)
simData   <- simulate_data(N            = 12,
                           t_data       = seq(12, 72, by = 12),
                           covariates   = c(    2),
                           noise_type   = "binomial",
                           relevances   = c(0,1,1),
                           lengthscales = c(18,24,18),
                           N_trials     = 100,
                           t_jitter     = 1)

plot_sim(simData)

Beta-binomial data

The noise level in the previous example was rather low. To generate more noisy observations, we can draw from the beta-binomial distribution. The overdispersion parameter $\gamma \in (0,1)$ controls the noise level, so that $\gamma \rightarrow 0$ would be equivalent to binomial noise (no overdispersion) and larger values mean more noise. Here we set gamma = 0.3.

set.seed(131)
simData   <- simulate_data(N            = 12,
                           t_data       = seq(12, 72, by = 12),
                           covariates   = c(    2),
                           noise_type   = "bb",
                           gamma        = 0.3,
                           relevances   = c(0,1,1),
                           lengthscales = c(18,24,18),
                           N_trials     = 100,
                           t_jitter     = 1)

plot_sim(simData)
dat <- simData@data
str(dat)

Defining model

To use beta-binomial observation model in our analysis, we use the likelihood argument of lgp(). Here we define a $\text{Beta}(2,2)$ prior for the gamma parameter.

my_prior <- list(
  alpha = student_t(20),
  gamma = bet(2, 2)
)

Fitting the model

We fit a model using the generated beta-binomial data.

fit <- lgp(y ~ age + age|id + age|z,
           data = dat,
           likelihood = "bb",
           num_trials = 100,
           prior = my_prior,
           refresh = 300,
           chains = 3,
           cores = 3,
           control = list(adapt_delta = 0.95),
           iter = 3000)
print(fit)

We see that the posterior mean of the gamma parameter is close to the value used in simulation (0.3). We visualize its posterior draws for each MCMC chain:

plot_draws(fit, type = "trace", regex_pars = "gamma")
relevances(fit)

Visualising inferred signal and its components

We subsample 50 draws from the posterior of each component $f^{(j)}$, $j=1,2,3$

draws <- sample.int(1000, 50)

and visualize them

p <- pred(fit, draws = draws)
plot_components(fit, pred = p, alpha = 0.1, color_by = c(NA, NA, "z", "z"))

as well as the sum $f = f^{(1)} + f^{(2)} + f^{(3)}$

plot_pred(fit, pred = p, alpha = 0.3)

Out-of-sample prediction

t <- seq(2, 100, by = 2)
x_pred <- new_x(dat, t)
p <- pred(fit, x_pred, draws = draws)
plot_pred(fit, p, alpha = 0.3)
plot_components(fit, pred = p, alpha = 0.1, color_by = c(NA, NA, "z", "z"))

Computing environment

sessionInfo()


jtimonen/lgpr documentation built on Oct. 12, 2023, 11:13 p.m.