This vignette presents the different models used in the Rsero package, their implementation, and the different uses of these models to fit the data. Models of the force of infection are used to explain the change of the profile of seropositivity with age, using as the main model assumption that an increase in seroprevalence is the result of the cumulated exposure to a pathogen during a lifetime. We included in this package various models of the force of infection, but also other factors affecting the serology such as imperfect sensitivity and specificity, waning-immunity (seroreversion) and differential risks of exposure.
We first load the package
library(Rsero)
The data must be in the following format
data <- simulate_SeroData(number_samples = 5) print(data)
We considered five models that define the force of infection at a given year:
In the constant model a single parameter of the force of infection is evaluated.
The $piecewise$ models consider successive constant phases. They take as an input the number $K$ of different constant phases, where $K \geq 1$. The value $K=1$ corresponds to the constant model.
In the case of the outbreak model, the force of infection at year $i$ is given as a sum of $K$ gaussians: $$\lambda_i = \sum_{j=1}^K\bar{\alpha}_{j} e^{-{(i-T_j)^2}}.$$ The $T_j$ are the time points where the force of infection is maximal and is equal to $\bar{\alpha}_j$. The parameter $\bar{\alpha}_j$ is normalized so that $\alpha_j=\bar{\alpha}_j \sum_i e^{-(i-T_j)^2}$. In the case where the $T_j$ are far apart, the gaussians are well separated and around $T_j$ the force of infection is given by a single gaussian.
The different models are defined using the following commands
model <- FOImodel(type = "constant") # a constant model model <- FOImodel(type = "independent") # independent annual force of infection model <- FOImodel(type = "piecewise", K = 2) # Two constant phase model <- FOImodel(type = "outbreak", K = 1) # one outbreak model <- FOImodel(type = "constantoutbreak", K = 2) # two outbreaks and a constant annual force of infection print(model)
Printing recapitulates the main information concerning the model. For instance, in the "constantoutbreak"
model with K=2
, a total of 7 parameters are estimated (1 for the constant annual force of infection and 3 for each outbreak). The prior distribution for the alpha
and T
of each outbreak and for the annual FOI are also printed. Without additional input the defaults normal priors are used.
Seroreversion is modeled as a rate of switching from a seropositive state to a seronegative state. In the Rsero package we implemented a combination of seroreversion with any epidemiological model.
We note $X_a(t)$ the probability that an individual born a years before the survey was seropositive $t$ years before the survey. Starting at birth, where $X_a(a) = 0$ (i.e. we assume all individuals are born seronegative), and considering $\lambda_y$ the FOI on year $y$ and the seroreversion rate $\rho$, the difference equation is
$$
X_a(y-1) =X_a(y) e^{-(\lambda_y+\rho)} +\frac{\lambda_y}{\lambda_y+\rho}(1-e^{-(\lambda_y+\rho)}).
$$
The probability that the individual is seropositive in the survey is $P=X_a(0)$.
A parameter seroreversion = 0
or seroreversion = 1
can be defined in the FOIModel
. By default seroreversion = 0
.
The models can account for uncertainty in the seroprevalence estimates resulting from imperfect sensitivity and specificity of the assays. We note $se$ and $sp$ the sensitivity and specificity of the assay, respectively. In the scenario of perfect sensitivity and specificity ($se=sp=1$) the seroprevalence is the same as the proportion of the population infected by the pathogen. In the general case, if we denote this proportion $Inf$, the probability for an individual to be observed as seropositive is $$ P=1-sp+Inf(se+sp-1) $$
In the implementation of the package, the user can give specific values to the sensitivity and specificity for any transmission models and with or without seroreversion. These parameters are fixed and must be specified by the user along with the model of the force of infection. For example, if the sensitivity is 98% and the specificity 92%, the user provides these number along with the FOI model.
model <- FOImodel(type = "outbreak", K = 1, se = 0.98, sp = 0.92) # one outbreak print(model)
By default se = 1
and sp = 1
, and the parameters se and sp must verify that $se+sp-1$.
The probability that an individual of age $a_j$ is infected depends on the history of the force of infection during his/her lifetime. It is therefore a function of the cumulative force of infection $\lambda_S + \lambda_{S-1}+\dots+ \lambda_{S-a_j}$, where $S$ is the sampling year. More specifically, the contribution to the likelihood of a seropositive individual is $$P_j = 1 - \exp\bigg(-\sum_{i = 0}^{a_j} \lambda_{S-i}\bigg).$$
The contribution to the likelihood of a seronegative individual is $$P_j = \exp\bigg(-\sum_{i = 0}^{a_j} \lambda_{S-i} \bigg).$$
Models can be defined to include different risks of infection by category. Categories are defined in the data.
Let's look at an example dataset provided with the package of a serological survey of CHIKV in Vietnam. This survey was described in Quan et al., 2018 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5823466/) and includes 546 individuals sampled in 4 different areas of the country.
First, we load the dataset:
sero.data = read.csv('../data-raw/CHIKVVietnam.csv', header = TRUE, sep=';') chikv_vietnam = Rsero::SeroData(age_at_sampling = round(as.numeric(sero.data$age)), Y = as.numeric(sero.data$Y)>9, sampling_year = as.numeric(sero.data$sampling_year), category = as.character(sero.data$Loc))
We could have loaded the dataset directly too
data("chikv_vietnam")
We defined categories corresponding to the different regions. This can be seen in the column category of the dataset chikv_vietnam that associates to each individual one of the four region:
print(chikv_vietnam$unique.categories) # the different categories print(head(chikv_vietnam$category))
We define a model of the annual force of infection that accounts for different risks of infection in each region.
modelI2Full =FOImodel(type='piecewise', K=2, cat_lambda = TRUE)
We defined an piecewise model, to describe that we expect a drop of the force of infection at a certain point in time. The parameter cat_lambda = TRUE is used to set different relative values of the force of infection while keeping equal the timing of the drop in infection probability. Therefore, the fit is done simultaneously on the four regions.
FOIfit.vietnam = fit(model = modelI2full, data=chikv_vietnam)
An arbitrary number of categories can be defined. In the following example we define three different categories (sex, location, occupation).
foi =0.01 ns = 600 sex.list = c('male', 'female') location.list = c('Region 1', 'Region 2', "Region 3") occupation.list = c("HCW","OTHER") sex <- sample(sex.list,ns, replace= TRUE) location <- sample(location.list,ns, replace= TRUE) occupation <- sample(occupation.list, ns, replace = TRUE) factor = 1*(sex=='males') + 0.5*(sex=='females') factor =factor*( 1*(location=="Region 1") + 0.5*(location=="Region 2") + 3*(location=="Region 3")) factor =factor*( 1*(occupation=="HCW") + 1.1*(occupation!="HCW") ) ages = round(runif(n=length(sex))*70)+1 data_tmp = SeroData(age_at_sampling = ages , Y = runif(n=ns) < 1-exp(-foi*factor*ages), sampling_year = 2023, category = cbind(sex, location, occupation))
Different measures of model adequacy were implemented: The Pareto smoothed importance sampling leave-one-out (PSIS-lLOO) The deviance information criterion (DIC), the Akaike information criterion (AIC), the Watanabe-Akaike information criterion (WAIC). for a detailed explanation of all three measures, see Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Statistics and computing. 2014 Nov 1;24(6):997-1016 and Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27, 1413-1432.
compute_information_criteria(FOIfit.vietnam)
modelOutbreak =FOImodel(type='outbreak', K=1) FOIfit.vietnam2 = fit(model = modelOutbreak, data=chikv_vietnam) compute_information_criteria(FOIfit.vietnam2)
The DIC of the piecewise constant model is lower than the DIC of the outbreak model. This indicates that the former is a more adequate description of the data. In general, DIC differences of more than 5 points with the lowest DIC indicate weak support for a model.
The user can choose between a uniform, a normal or an exponential distribution. Normal priors were set by default for all parameters. We give below the default values of the parameters that are mean and the standard deviation of the normal distribution, respectively.
priorC1 = 0 priorC2 = 1
priorC1 = 0 priorC2 = 1 priorT1 = 50 priorT2 = 20
prioralpha1 = 0.2 prioralpha2 = 1 priorT1 = 50 priorT2 = 20
prioralpha1 = 0.2 prioralpha2 = 1 priorT1 = 50 priorT2 = 20 priorC1 = 0.01 priorC2 = 1
priorY1 = 0.01 priorY2 = 1
priorRho1 = 0 priorRho2 = 1
In the case of the "outbreak"
and the "constantoutbreak"
models each outbreak can be defined by a specific set of prior distributions. Here we set K=2
. For the first outbreak, the prior for T
is normal(10,30), the prior for alpha is normal(0,5). For the second outbreak, the prior for T
is normal(40,70), the prior for alpha is uniform(0,1).
model <- FOImodel(type = "outbreak", K = 2, prioralpha1 = c(0,0), prioralpha2= c(5, 1), priorT1 = c(10,40), priorT2= c(30, 70)) print(model) ## choosing other prior distributions model_2 <- FOImodel(type = "outbreak", K = 1, seroreversion = 1, prior_distribution_alpha = "uniform", prior_distribution_rho = "exponential", priorRho1=1) print(model_2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.