In jtimonen/lgpr: Longitudinal Gaussian Process Regression

knitr::opts_chunk$set(collapse = TRUE, comment = "#")
knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"))

require("lgpr")
require("ggplot2")
require("rstan")

Introduction

In this tutorial we simulate and analyse a test data set which contains 6 case and 6 control individuals, and the disease effect on case individuals is modeled using the disease-related age (diseaseAge) as a covariate. The disease-related age is defined as age relative to the observed disease initiation. The true disease effect times for each case individual $q=1, \ldots,6$ are drawn from $\mathcal{N}(36,4^2)$, but the disease initiation is observable only after time $t_q$ , which is drawn from $t_q∼\text{Exponential}(0.05)$ .

set.seed(121)
relev <- c(0, 1, 1, 1, 0, 0)
effect_time_fun <- function() {
  rnorm(n = 1, mean = 36, sd = 4)
}
obs_fun <- function(t) {
  min(t + stats::rexp(n = 1, rate = 0.05), 96 - 1e-5)
}

simData <- simulate_data(
  N = 12,
  t_data = seq(12, 96, by = 12),
  covariates = c(0, 2, 2, 2),
  relevances = relev,
  lengthscales = c(18, 24, 1.1, 18, 18, 18),
  t_effect_range = effect_time_fun,
  t_observed = obs_fun,
  snr = 3
)

plot_sim(simData) + xlab("Age (months)")
# plot_sim(simData, comp_idx = 3) # to visualize one generated component

Above, the blue line represents the data-generating signal and black dots are noisy observations of the response variable.

dat <- simData@data
str(dat)
simData@effect_times

Declaring effect time uncertainty

We will define a formula where the term unc(id)*gp_vm(diseaseAge) declares that the effect time for the nonstationary gp_vm term is uncertain and that one uncertainty parameter is needed for each level of id.

formula <- y ~ zs(id) * gp(age) + gp(age) + unc(id) * gp_vm(diseaseAge) + zs(z1) * gp(age) + zs(z2) * gp(age) + zs(z3) * gp(age)

Because diseaseAge is NaN for the control individuals, it is automatically taken into account that a separate uncertainty parameter is actually needed just for each case individual.

Defining the effect time prior

Declaring a temporally uncrertain component will add parameters teff to the model. The vector teff has length equal to the number of case individuals. We must define a prior for each teff parameter. This means that the prior argument must be a list containing elements named effect_time and effect_time_info. The first one is specified using any of the basic prior definition functions, like uniform(), normal(), etc. The second one, effect_time_info, must be a named list containing the fields

zero - this is a vector with same length as teff, and can be used to move the center of the prior
backwards - this is a boolean value, and the prior defined in effect_time will be for
(teff - zero) if backwards = FALSE
- (teff - zero) if backwards = TRUE
lower - this is a vector with same length as teff, and defines the lower bound for each teff parameter
upper - this is a vector with same length as teff, and defines the upper bound for each teff parameter

You can give zero, lower, and upper also as just one number, in which case they are turned into vectors that repeat the save value. The prior defined in effect_time will be truncated at lower and upper bounds.

Prior for the effect time directly

We had observed the disease onset at times $48,72,96,36,48,60$ months for each case individual, respectively. Now if think that the true effect of the disease has occurred for each indiviaul at some time point before the detection of the disease, but not before age $18$ months, we could set the prior like here.

obs_onset <- c(48, 72, 96, 36, 48, 60)
lb <- 18
ub <- obs_onset
effect_time_info <- list(zero = 0, backwards = FALSE, lower = lb, upper = ub)
my_prior <- list(
  effect_time = uniform(), # between lb and ub
  effect_time_info = effect_time_info,
  wrp = igam(14, 5) # see how to set this in the 'Basic usage' tutorial
)

Prior relative to a known time point

It is possible that we want a prior where values closer to the observed onset are more likely than those closer to birth. This can be done by defining for example an exponentially decaying prior for - (teff - obs_onset), as is done here.

lb <- 18
ub <- obs_onset
effect_time_info <- list(zero = ub, backwards = TRUE, lower = lb, upper = ub)
my_prior <- list(
  wrp = igam(14, 5),
  effect_time_info = effect_time_info,
  effect_time = gam(shape = 1, inv_scale = 0.05) # = Exponential(rate=0.05)
)

Now our uncertainty priors are actually for time differences relative to the observed disease initiation time, and backwards = TRUE argument is used to define the direction so that the prior is "backwards" in time. We used gam() because the Gamma distribution with shape=1 and inv_scale=lambda is equal to the Exponential distribution with rate= lambda.

Fitting the model

fit <- lgp(
  formula = formula,
  data = dat,
  prior = my_prior,
  iter = 3000,
  chains = 4,
  cores = 4,
  verbose = TRUE
)

Printing the fit object summarizes the posterior

print(fit)

Printing the model information clarifies the model and priors

model_summary(fit)
rstan::get_elapsed_time(fit@stan_fit)

Visualizing the inferred effect times

We can visualize the inferred effect times for each case individual. We see that for individuals 2 and 3 the inferred effect time is much earlier than the observed one.

plot_effect_times(fit) + xlab("Age (months)")

Finally we plot the inferred disease component

t <- seq(0, 100, by = 1)
x_pred <- new_x(dat, t, x_ns = "diseaseAge")
p <- pred(fit, x_pred, verbose = FALSE)
plot_f(fit, pred = p, comp_idx = 3, color_by = "diseaseAge") + xlab("Age (months)")