knitr::opts_chunk$set(collapse = TRUE, comment = "#") knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png"))
require("lgpr") require("ggplot2") require("rstan")
In this tutorial we simulate and analyse a test data set which contains 6 case
and 6 control individuals, and the disease effect on case individuals is
modeled using the disease-related age (diseaseAge
) as a covariate. The
disease-related age is defined as age relative to the observed disease
initiation. The true disease effect times for each case individual
$q=1, \ldots,6$ are drawn from $\mathcal{N}(36,4^2)$, but the disease
initiation is observable only after time $t_q$ , which is drawn from
$t_q∼\text{Exponential}(0.05)$ .
set.seed(121) relev <- c(0,1,1,1,0,0) effect_time_fun <- function(){rnorm(n = 1, mean = 36, sd = 4)} obs_fun <- function(t){min(t + stats::rexp(n = 1, rate = 0.05), 96 - 1e-5)} simData <- simulate_data(N = 12, t_data = seq(12, 96, by = 12), covariates = c( 0,2,2,2), relevances = relev, lengthscales = c(18,24, 1.1, 18,18,18), t_effect_range = effect_time_fun, t_observed = obs_fun, snr = 3) plot_sim(simData) + xlab('Age (months)') #plot_sim(simData, comp_idx = 3) # to visualize one generated component
Above, the blue line represents the data-generating signal and black dots are noisy observations of the response variable.
dat <- simData@data str(dat) simData@effect_times
We will define a formula where the term unc(id)*gp_vm(diseaseAge)
declares that the effect time for the nonstationary gp_vm
term is uncertain
and that one uncertainty parameter is needed for each level of id
.
formula <- y ~ zs(id)*gp(age) + gp(age) + unc(id)*gp_vm(diseaseAge) + zs(z1)*gp(age) + zs(z2)*gp(age) + zs(z3)*gp(age)
Because diseaseAge
is NaN
for the control individuals, it is automatically
taken into account that a separate uncertainty parameter is actually needed
just for each case individual.
Declaring a temporally uncrertain component will add parameters teff
to the
model. The vector teff
has length equal to the number of case individuals.
We must define a prior for each teff
parameter. This means
that the prior
argument must be a list containing elements named
effect_time
and effect_time_info
. The first one is specified using any
of the basic prior definition functions, like uniform()
, normal()
, etc.
The second one, effect_time_info
, must be a named list containing the fields
zero
- this is a vector with same length as teff
, and can be used to
move the center of the priorbackwards
- this is a boolean value, and the prior defined in effect_time
will be for(teff - zero)
if backwards = FALSE
- (teff - zero)
if backwards = TRUE
lower
- this is a vector with same length as teff
, and defines the lower
bound for each teff
parameterupper
- this is a vector with same length as teff
, and defines the upper
bound for each teff
parameterYou can give zero
, lower
, and upper
also as just one number, in which
case they are turned into vectors that repeat the save value. The prior
defined in effect_time
will be truncated at lower and upper bounds.
We had observed the disease onset at times $48,72,96,36,48,60$ months for each case individual, respectively. Now if think that the true effect of the disease has occurred for each indiviaul at some time point before the detection of the disease, but not before age $18$ months, we could set the prior like here.
obs_onset <- c(48,72,96,36,48,60) lb <- 18 ub <- obs_onset effect_time_info <- list(zero = 0, backwards = FALSE, lower = lb, upper = ub) my_prior <- list( effect_time = uniform(), # between lb and ub effect_time_info = effect_time_info, wrp = igam(14, 5) # see how to set this in the 'Basic usage' tutorial )
It is possible that we want a prior where values closer to the observed onset
are more likely than those closer to birth. This can be done by defining
for example an exponentially decaying prior for - (teff - obs_onset)
,
as is done here.
lb <- 18 ub <- obs_onset effect_time_info <- list(zero = ub, backwards = TRUE, lower = lb, upper = ub) my_prior <- list( wrp = igam(14,5), effect_time_info = effect_time_info, effect_time = gam(shape = 1, inv_scale = 0.05) # = Exponential(rate=0.05) )
Now our uncertainty priors are actually for time differences relative to the
observed disease initiation time, and backwards = TRUE
argument is used to
define the direction so that the prior is "backwards" in time. We used gam()
because the Gamma distribution with shape=1
and inv_scale=lambda
is
equal to the Exponential distribution with rate= lambda
.
fit <- lgp(formula = formula, data = dat, prior = my_prior, iter = 3000, chains = 4, cores = 4, verbose = TRUE)
Printing the fit object summarizes the posterior
print(fit)
Printing the model information clarifies the model and priors
model_summary(fit) rstan::get_elapsed_time(fit@stan_fit)
We can visualize the inferred effect times for each case individual. We see that for individuals 2 and 3 the inferred effect time is much earlier than the observed one.
plot_effect_times(fit) + xlab('Age (months)')
Finally we plot the inferred disease component
t <- seq(0, 100, by = 1) x_pred <- new_x(dat, t, x_ns = 'diseaseAge') p <- pred(fit, x_pred, verbose = FALSE) plot_f(fit, pred = p, comp_idx = 3, color_by = 'diseaseAge') + xlab('Age (months)')
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.