In jtimonen/lgpr: Longitudinal Gaussian Process Regression

knitr::opts_chunk$set(collapse = TRUE, comment = "#")
knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"))

require("lgpr")
require("ggplot2")
require("rstan")

Introduction

In this tutorial we simulate and analyse count data.

Binomial data

We first show how to generate binomial data

set.seed(131)
simData <- simulate_data(
  N = 12,
  t_data = seq(12, 72, by = 12),
  covariates = c(2),
  noise_type = "binomial",
  relevances = c(0, 1, 1),
  lengthscales = c(18, 24, 18),
  N_trials = 100,
  t_jitter = 1
)

plot_sim(simData)

Beta-binomial data

The noise level in the previous example was rather low. To generate more noisy observations, we can draw from the beta-binomial distribution. The overdispersion parameter $\gamma \in (0,1)$ controls the noise level, so that $\gamma \rightarrow 0$ would be equivalent to binomial noise (no overdispersion) and larger values mean more noise. Here we set gamma = 0.3.

set.seed(131)
simData <- simulate_data(
  N = 12,
  t_data = seq(12, 72, by = 12),
  covariates = c(2),
  noise_type = "bb",
  gamma = 0.3,
  relevances = c(0, 1, 1),
  lengthscales = c(18, 24, 18),
  N_trials = 100,
  t_jitter = 1
)

plot_sim(simData)

dat <- simData@data
str(dat)

Defining model

To use beta-binomial observation model in our analysis, we use the likelihood argument of lgp(). Here we define a $\text{Beta}(2,2)$ prior for the gamma parameter.

my_prior <- list(
  alpha = student_t(20),
  gamma = bet(2, 2)
)

Fitting the model

We fit a model using the generated beta-binomial data.

fit <- lgp(y ~ age + age | id + age | z,
  data = dat,
  likelihood = "bb",
  num_trials = 100,
  prior = my_prior,
  refresh = 300,
  chains = 3,
  cores = 3,
  control = list(adapt_delta = 0.95),
  iter = 3000
)

print(fit)

We see that the posterior mean of the gamma parameter is close to the value used in simulation (0.3). We visualize its posterior draws for each MCMC chain:

plot_draws(fit, type = "trace", regex_pars = "gamma")

relevances(fit)

Visualising inferred signal and its components

We subsample 50 draws from the posterior of each component $f^{(j)}$, $j=1,2,3$

draws <- sample.int(1000, 50)

and visualize them

p <- pred(fit, draws = draws)
plot_components(fit, pred = p, alpha = 0.1, color_by = c(NA, NA, "z", "z"))

as well as the sum $f = f^{(1)} + f^{(2)} + f^{(3)}$

plot_pred(fit, pred = p, alpha = 0.3)

Out-of-sample prediction

t <- seq(2, 100, by = 2)
x_pred <- new_x(dat, t)
p <- pred(fit, x_pred, draws = draws)
plot_pred(fit, p, alpha = 0.3)
plot_components(fit, pred = p, alpha = 0.1, color_by = c(NA, NA, "z", "z"))