In jtimonen/lgpr: Longitudinal Gaussian Process Regression

knitr::opts_chunk$set(collapse = TRUE, comment = "#")
knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"))

require("lgpr")
require("ggplot2")
require("rstan")

Introduction

In this tutorial we simulate and analyse a test data set which contains 8 case and 8 control individuals, and the disease effect on case individuals is modeled using the disease-related age (diseaseAge) as a covariate. The disease-related age is defined as age relative to the observed disease initiation. The disease effect is simulated so that only 4 of the 8 case individuals are affected.

set.seed(1213)
simData <- simulate_data(
  N = 16,
  t_data = seq(12, 72, by = 12),
  covariates = c(0, 2, 2, 2),
  relevances = c(1, 1, 1, 1, 0, 0),
  lengthscales = c(18, 24, 1, 18, 18, 18),
  t_effect_range = c(46, 48),
  snr = 5,
  N_affected = 4,
  t_jitter = 0
)

plot_sim(simData)
# plot_sim.component(simData,3)

Above, the blue line represents the data-generating signal and black dots are noisy observations of the response variable.

dat <- simData@data
str(dat)
simData@effect_times

Declaring a heterogeneous component

We will define a formula where the term het(id)*gp_vm(diseaseAge) declares that the gp_vm term is heterogeneous and one level-specific magnitude parameter is needed for each level of id.

formula <- y ~ zs(id) * gp(age) + gp(age) + het(id) * gp_vm(diseaseAge) + zs(z1) * gp(age) + zs(z2) * gp(age) + zs(z3) * gp(age)

Because diseaseAge is NaN for the control individuals, it is automatically taken into account that an individual-specific magnitude parameter is actually needed just for each case individual.

Fitting the model

fit <- lgp(
  formula = formula,
  data = dat,
  prior = list(wrp = igam(14, 5)),
  iter = 2000,
  chains = 4,
  cores = 4
)

Printing the fit object summarizes the posterior

print(fit)

Printing the model information clarifies the model and priors

model_summary(fit)

Visualizing the individual-specific effect magnitudes

We can visualize the posterior distribution of the individual-specific disease effect magnitude parameters for each case individual. We see that for individuals 1-4 the parameters are close to 1 (affected individuals) and for individuals 5-8 there is considerable posterior mass close to 0.

plot_beta(fit)

Visualizing the inferred heterogeneous effect

Finally we plot the inferred disease component, using posterior median hyperparameters, and see that the inferred effects have the same shape for all individuals, but are heterogeneous in magnitude.

t <- seq(0, 100, by = 1)
x_pred <- new_x(dat, t, x_ns = "diseaseAge")
p <- pred(fit, x_pred, verbose = FALSE, reduce = stats::median)
plot_f(fit, pred = p, comp_idx = 3, color_by = "diseaseAge")