Seroincidence package tutorial

pkgUrl <- gsub("\n", "", packageDescription("seroincidence")$URL)

\newpage

1. Introduction

Antibody levels measured in a (cross–sectional) population sample can be translated into an estimate of the frequency with which seroconversions (infections) occur in the sampled population. In simple terms: the presence of many high antibody concentrations indicates that many subjects likely experienced infection recently, while mostly low concentrations indicate a low frequency of infections in the sampled population. More information about ELISA methods can be found on ECDC web site.

In order to thus interpret the measured cross-sectional antibody concentrations, characteristics of the time course of the serum antibody response must be known. For any antibody detected in the cross–sectional sample, a quantitative description of its dynamics following infection (seroconversion to an elevated concentration, followed by a gradual decrease) must be available. In published studies, this information on the time course of the serum antibody response has been obtained from follow–up data in subjects who had a symptomatic episode following infection. The onset of symptoms thus provided a proxy for the time that infection occurred. Care must be taken that longitudinal and cross–sectional measurements of antibody concentrations are calibrated on the same scale (or, preferably, expressed in identical units).

The time course of the serum antibody response to infection is assumed generic: anyone infected by the same pathogen is expected to produce a response similar to those seen in the symptomatic cohort used for the follow–up study. Thus, the strategy is to set up a longitudinal model from a follow–up study once, and use the thus obtained information to analyze many different cross–sectional data sets.

Note that it must be assumed that such variation in the time course of antibody concentrations among individuals as observed in the longitudinal cohort is the same as in the population where the cross-sectional sample was taken, where this variation cannot be observed.

Analysis of longitudinal data and estimation of the required parameters is separated from the estimation of seroincidences, because (1) there are few longitudinal data sets available and they are difficult (and expensive) to obtain, (2) such longitudinal models may be considered generic, presumably describing a physiological phenomenon that is similiar in any infected individual, and (3) setting up and running the longitudinal model requires additional skills, not necessary for use of the sero–incidence calculator script

2. The seroincidence estimator

Package seroincidence was designed to calculate incidence of seroconversion, by using the longitudinal seroresponse characteristics. The distribution of serum antibody concentrations in a cross–sectional population sample is calculated, as a function of the (longitudinal) seroresponse and the frequency of seroconversion (or seroincidence). Given the seroresponse, this (marginal) distribution of antibody concentrations can be fitted to the cross-sectional data, by adjusting the seroconversion frequency, thus providing a means to estimate the seroconversion frequency.

2.1 Time course of serum antibodies

The seroresponse, the time course of serum antibodies following seroconversion (infection) is described by means of a set of parameters describing it. These parameters may include:

a) baseline antibody concentration, b) time to reach peak antibody concentration, c) magnitude of the peak concentration, and d) antibody decay parameters describing the time course (shape and rate) of antibody decay.

A model description that is used for the serocalculator is described in [@2; @3]. An alternative model, simplified but with improved interpretation of the underlying biological mechanisms, is given in [@1] and further improved in [@4].

The current model assumes that upon exposure to an initial inoculum, pathogen concentrations increase exponentially. Presence of pathogens (antigen presented to the immune system) signals an increase in antibody production, resulting in exponentially increasing serum antibody concentrations. Increased activity of the immune system, as signalled by circulating antibodies, causes inactivation or die–off of pathogens, with a rate proportional to the serum antibody concentration. Ultimately, when all pathogens are removed, infection is over and the promotion of antibody production stops. Then the decay phase starts, and serum antibody concentrations decrease. Antibody decay in the serum compartment may start fast and slow down later, resulting in non–exponential decay. In [@4] this is modelled using a shape parameter, allowing gradual adjustment between exponential (log–linear) and strongly non–exponential decay.

2.2 Heterogenity

The five parameters describing the seroresponse (in round brackets names of corresponding parameters used in the package, see section 3.2.2 for details): baseline antibody concentration (y0 and yb), time to peak (t1), peak antibody concentration (y1), decay rate (alpha) and shape factor of the decay curve (r) are not fixed numbers. In a population of subjects, individuals each have their own seroresponse, different from those of other subjects. Therefore, these parameters are provided as a (joint) distribution, describing their variation among individuals in the sampled population. Of course, such a distribution depends on the pathogen, the detected antibody, and even the detection assay.

Estimates for these seroresponse parameters may be obtained by fitting the model to longitudinal data. Typically, such data are collected in patients with confirmed infection, where the time of infection is detected by appearance of symptoms. Ideally, blood samples are taken at several times post infection, up to a period long enough to reliably estimate the shape of the serum antibody decay curve.

A Bayesian hierarchical framework allows for individual variation in seroresponses, so that one obtains the (posterior) joint distribution of the parameters as a (Markov chain) Monte Carlo sample, of all relevant parameters [@4].

Note that, due to the limited number of iterations in the Monte Carlo sample, the generated incidence estimates will show minor variations: if the same calculation is run twice, the outcomes may differ slightly. If this variation should be smaller, the size of the Monte Carlo sample may be increased. This however will slow down the calculations.

2.3 Incidence estimation

The longitudinal seroresponses are considered generic: they represent the time course of (specific) human serum antibodies against a (specific) pathogen. Therefore, with any cross–sectional sample of the same antibodies in a human population, an incidence estimate can be calculated.

Given the longitudinal response characteristics, seroincidence R package provide a rapid and computationally simple method for calculating seroconversion rates, as published in [@3].

The methods on which the seroincidence calculator is based were developed with publicly funded research. All procedures have been published and are open to the general public, including the core script on which the seroincidence calculator is based. To make the functionalities provided as broadly available as possible, R was chosen as the computing platform, because it is free and open source, and runs on any modern computer operating system.

2.4 Model variants

The package contains longitudinal models for seroincidence calculations. Four model variants are possible, depending on whether seroconversion is instantaneous ($t_{1} = 0$) or not ($t_{1} > 0$), and whether decay is exponential ($r = 1$) or proceeds as a power function ($r > 1$). Power function decay allows for rapid inital decay followed by a sustained period of very slow decay [@4]. Available model variants are described in Table 1.

\begin{table}[!h] \centering \caption{Model variants} \vspace{0.3cm} \begin{footnotesize} \begin{tabular}{ccc} & Time to & Decay\ Model & Seroconversion ($t_{1}$) & parameter ($r$)\ 1 & 0 & 1\ 2 & 0 & $> 1$\ 3 & $> 0$ & 1\ 4 & $> 0$ & $> 1$\ \end{tabular} \begin{tabular}{lllcccc} & & & \multicolumn{4}{c}{Model}\ Pathogen & Antibody & Units & 1 & 2 & 3 & 4\ \hline {\em Campylobacter jejuni} & IgG & ELISA (Delft) & * & & * & * \ & IgM & ELISA (Delft) & * & & * & * \ & IgA & ELISA (Delft) & * & & * & * \ & IgG & ELISA (SSI) & * & * & & * \ & IgM & ELISA (SSI) & * & * & & * \ & IgA & ELISA (SSI) & * & * & & * \ \hline {\em Salmonella} & IgG & mixed ELISA (SSI) & * & * & & * \ & IgM & mixed ELISA (SSI) & * & * & & * \ & IgA & mixed ELISA (SSI) & * & * & & * \ \hline {\em Bordetella pertussis} & IgG-PT & IU & * & * & * & * \ \hline \end{tabular} \end{footnotesize} \end{table}

3. How to use seroincidence package

This section provides step-by-step directions on the usage of the seroincidence package.

3.1. Loading the package

Functionality of the seroincidence package is made available to the end user only once the library is loaded into the current workspace. Assuming the package is installed already (covered in installation.pdf) loading is achieved by running the following command in the R console (bear in mind that the text after character # is only a comment):

# Load package "seroincidence"
library(seroincidence)

At this moment all functions required to run the seroincidence calculation are made available. It can be checked by running:

# List all objects (functions and data) exposed by package "seroincidence".
ls("package:seroincidence")

Here is the resulting output:

ls("package:seroincidence")

We briefly describe the most important objects and functions:

a) [DISEASE]Params[MODEL_TYPE]: Monte Carlo sample of longitudinal response parameters per antibody for various diseases: Campylobacter, Pertussis, Salmonella. Object class: list. b) [DISEASE]Sim[Low|Medium|High]Data: Example simulated antibody levels data measured in a cross-sectional population for three values of lambda (incidence): 0.036 ("Low"), 0.21 ("Medium") and 1.15 ("High") (1/yr). Object class: data.frame. c) getAdditionalData: Utility function for downloading additional longitudinal response parameters from an online repository. Files available for download are coxiellaIFAParams4.zip and yersiniaSSIParams4.zip. Object class: function. d) estimateSeroincidence: Main function of the package. Estimates seroincidence based on supplied cross-section antibody levels data and longitudinal response parameters. Object class: function.

3.2. Specifying input data

There are two sets of inputs to the serology calculator that must always be specified:

a) Simulated antibody levels measured in cross-sectional population sample (see campylobacterSimLowData, campylobacterSimMediumData, etc. above). b) Longitudinal response characteristic as set of parameters (y1, alpha, yb, r, y0, mu1, t1) (see campylobacterDelftParams1, campylobacterSSIParams2, etc. above).

We start with loading antibody levels measured in cross-sectional population sample.

3.2.1. Specifying cross-sectional antibody levels data

The user has a few options of providing this data to the calculator:

i) Using supplied sample data

Data set campylobacterSimMediumData provides an example of a cross-sectional antibody levels data for Campylobacter. It is loaded into the workspace as soon as the package is loaded, therefore it can be referenced right away. Here is a small exempt from this data set:

# Show three first rows of "campylobacterSimMediumData" data.frame.
head(campylobacterSimMediumData, 3)

Let us create the serology data serologyData based on this data set:

# Assign data.frame "campylobacterDelftData" to object named "serologyData".
serologyData <- with(campylobacterSimMediumData,
                     data.frame(
                       Age = Age,
                       IgG = IgG,
                       IgM = IgM,
                       IgA = IgA,
                       AgeCat = cut(Age, 15, labels = FALSE)))

# Print a few first observations.
head(serologyData)

You can view the data also in a tabular form:

View(serologyData)

We shall pass this object later to the calculator. Object serologyData must contain at least one column named IgG, IgM, or IgA. Additionally, column Age is used in the calculations if it is present. Otherwise, it is created on-the-fly and initialized to NA (not available). Column Age, if supplied, is assumed to represent age in days.

Please, note that the cross-sectional data set can include extra columns allowing performing the seroincidence calculations stratified per factors available in those data. In the example above column AgeCat can be used as a stratum variable. This particular variable is created by mapping the age in days (column Age) into 15 categories, category 1 grouping the youngest subjects and category 15 grouping the oldest subjects.

ii) Loading from external files

The user can load her/his own data into an R session. Details on loading specific file types can be found in online R manual on R Data Import/Export.

It is important that the input data is in a tabular form with each column containing levels measured for a single antibody type. Column name should indicate name of the measured antibody. For instance, a comma separated file with the following content:

IgG, IgM, IgA
5.337300, 0.4414653, 0.3002395
3.534118, 0.3888226, 0.3543486
2.144549, 0.3320178, 0.3884205
2.957854, 0.4967764, 0.3472340

is a valid cross-sectional data set. Suppose this file is named c:\cross-sectional-data.csv. Then it can be loaded into R like this:

# Read content of file "C:\cross-sectional-data.csv" into object named 
# "serologyData".
serologyData <- read.csv(file = "C:\\cross-sectional-data.csv")

3.2.2. Specifying longitudinal response parameters {#longParams}

The longitudinal response parameters set consists of the following items:

Please, refer to vignette methodology.pdf for more details on the underlying methodology.

End user has three options for loading the response parameters into R session:

i) Using supplied data

There are a few longitudinal response parameter data sets provided by the package including Monte-Carlo sample of longitudinal response parameters, like campylobacterDelftParams4. This particular object is a list containing three data.frame objects named IgG, IgM and IgA. Let's have a pick into each of these two data sets separately (notice character $ indicating a sub-object of the parent object):

# Show first rows of data.frame "IgG" in list "campylobacterDelftParams4".
head(campylobacterDelftParams4$IgG)

# Show first rows of data.frame "IgM" in list "campylobacterDelftParams4".
head(campylobacterDelftParams4$IgM)

# Show first rows of data.frame "IgA" in list "campylobacterDelftParams4".
head(campylobacterDelftParams4$IgA)

Let us assign object campylobacterDelftParams4 to variable responseParams:

responseParams <- campylobacterDelftParams4

ii) Loading from local external files

Alternatively, this data can be loaded from an external file. Bear in mind, that most likely your data is organized into three files, separately for antibody IgG, IgM and IgA. Assuming that the data is saved into csv files named IgG.csv, IgM.csv and IgA.csv the end user would run the following code in order to create a valid input to the calculator:

IgGData <- read.csv(file = "IgG.csv")
IgMData <- read.csv(file = "IgM.csv")
IgAData <- read.csv(file = "IgA.csv")

# Create a list named "responseData" containing objects named "IgG", "IgM" and 
# "IgA".
responseParams <- list(IgG = IgGData, IgM = IgMData, IgA = IgAData)

Note that object responseParams is a list with three dataframes named IgG, IgM and IgA as required.

iii) Loading from an external repository

Finally, response parameters can be obtained from an online repository set up for this package. The repository is hosted on ECDC servers. At the time of publishing this package two files are available:

a) coxiellaIFAParams4.zip: longitudinal response parameters per antibody for Coxiella. b) yersiniaSSIParams4.zip: longitudinal response parameters per antibody for Yersinia.

Simply use the supplied utility function getAdditionalData to download this data:

responseParams <- getAdditionalData("coxiellaIFAParams4.zip")

Internet connection is needed for this function to work.

3.3. Estimating seroincidence

Main calculation function provided by package seroincidence is called estimateSeroincidence. It takes several arguments:

Here is an example of the Campylobacter seroincidence calculation for three antibodies measured, stratified per AgeCat, and with measurements below 0.1 removed:

# Prepare input data.
serologyData <- with(campylobacterSimMediumData,
                     data.frame(
                       Age = rep(NA, nrow(campylobacterSimMediumData)),
                       IgG = IgG,
                       IgM = IgM,
                       IgA = IgA,
                       AgeCat = cut(Age, 15, labels = FALSE)))

responseParams <- campylobacterDelftParams4

cutOffs <- list(IgG = 0.1, IgM = 0.1, IgA = 0.1)

# Baseline distributions: the distributions of antibody concentrations in 
# subjects who have never seroconverted.
baseLine <- list(IgG = c(log(0.05), 1),
                 IgM = c(log(0.005), 1),
                 IgA = c(log(0.005), 1))

# Assign output of function "estimateSeroincidence" to object named 
# "seroincidenceData". Use all available processor cores.
seroincidenceData <- estimateSeroincidence(
  data = serologyData,
  antibodies = c("IgG", "IgM", "IgA"),
  strata = "AgeCat",
  params = responseParams,
  censorLimits = cutOffs,
  par0 = baseLine,
  start = -4,
  numCores = parallel::detectCores())

# Show content of the output variable...
print(seroincidenceData)
# ...or simply type in the console: 'seroincidenceData' (without "'") and 
# press ENTER.

The following text should be printed.

## Seroincidence object estimated given the following setup:
## a) Antibodies   : IgG, IgM, IgA
## b) Strata       : AgeCat 
## c) Censor limits:  IgG = 0.1, IgM = 0.1, IgA = 0.1 
## 
## This object is a list containing the following items:
## Fits         - List of outputs of "optim" function per stratum.
## Antibodies   - Input parameter antibodies of function "estimateSeroincidence".
## Strata       - Input parameter strata of function "estimateSeroincidence".
## CensorLimits - Input parameter censorLimits of function "estimateSeroincidence".
## 
## Call summary function to obtain output results.

The most important output is subobject Fits containing raw results on the output incidence. Translation of these raw results is provided by a custom function summary explained in the following section.

The cutoff argument is based on censoring of the observed serum antibody measurements [@5]. Cut-off levels must always be specified when calling function estimateSeroincidence (argument censorLimits). Value 0 should be set for antibody measurements where no censoring is needed, for instance

censorLimits <- list(IgG = 0, IgM = 0, IgA = 0)

Using different censoring levels will produce different results. Choosing higher cut-off causes the estimates of lambda to decrease. Results should be published together with the chosen cut-off values. The cut-off level of 0.1 set in the example above is a typical value, but users should set it to value reflecting the censoring level in the data.

3.4. Getting a summary of seroincidence

Output of the calculator should be passed in to function summary calculating output results:

summary(seroincidenceData)
## Seroincidence estimated given the following setup:
## a) Antibodies   : IgG, IgM, IgA 
## b) Strata       : AgeCat 
## c) Censor limits: IgG = 0.1, IgM = 0.1, IgA = 0.1 
## d) Quantiles    : 0.025, 0.975 
## 
##  Seroincidence estimates:
##    Lambda.est Lambda.lwr Lambda.upr  Deviance Convergence Stratum
## 1  0.09721890 0.06397004 0.14774909 105.29815           0       1
## 2  0.05497303 0.03573489 0.08456817 181.90627           0       2
## 3  0.04456096 0.02845903 0.06977327 199.99357           0       3
## 4  0.02944038 0.01715302 0.05052964 141.32558           0       4
## 5  0.02569781 0.01436524 0.04597053 123.33213           0       5
## 6  0.02565297 0.01587275 0.04145941 187.41060           0       6
## 7  0.03458616 0.02012722 0.05943207 128.84486           0       7
## 8  0.02776873 0.02640621 0.02920154 133.57166          52       8
## 9  0.04016261 0.02133209 0.07561541  94.99389           0       9
## 10 0.01996477 0.01898409 0.02099612 104.30992          52      10
## 11 0.03356017 0.01917771 0.05872886 141.20168           0      11
## 12 0.05287648 0.03263923 0.08566140 181.68655           0      12
## 13 0.05081097 0.02946963 0.08760729 123.66670          52      13
## 14 0.04813104 0.02945312 0.07865371 186.92218           0      14
## 15 0.03421042 0.02210895 0.05293569 215.78684           0      15

The summary function returns a list of objects as well. It contains sub-objects Antibodies, Strata, CensorLimits, Quantiles and Results.

The most important sub-object of the output of the summary functions is Results, containing estimates of Lambda (annual incidence) together with their lower and upper bounds (Lambda.lwr and Lambda.upr, respectively). The default lower bound is 2.5% and the upper bound is 97.5% of the distribution of Lambda. These values can be changed (see Other examples).

Optionally, the summary function returns also Deviance (negative log likelihood at the estimated Lambda) and Convergence (indicator returned by function stats::optim; value of 0 indicates convergence).

It may be beneficial to assign output of this function to a variable for further analysis:

# Compute seroincidence summary and assign to object "seroincidenceSummary".
seroincidenceSummary <- summary(seroincidenceData)

# Show the results.
seroincidenceSummary$Results
##    Lambda.est Lambda.lwr Lambda.upr  Deviance Convergence Stratum
## 1  0.09721890 0.06397004 0.14774909 105.29815           0       1
## 2  0.05497303 0.03573489 0.08456817 181.90627           0       2
## 3  0.04456096 0.02845903 0.06977327 199.99357           0       3
## 4  0.02944038 0.01715302 0.05052964 141.32558           0       4
## 5  0.02569781 0.01436524 0.04597053 123.33213           0       5
## 6  0.02565297 0.01587275 0.04145941 187.41060           0       6
## 7  0.03458616 0.02012722 0.05943207 128.84486           0       7
## 8  0.02776873 0.02640621 0.02920154 133.57166          52       8
## 9  0.04016261 0.02133209 0.07561541  94.99389           0       9
## 10 0.01996477 0.01898409 0.02099612 104.30992          52      10
## 11 0.03356017 0.01917771 0.05872886 141.20168           0      11
## 12 0.05287648 0.03263923 0.08566140 181.68655           0      12
## 13 0.05081097 0.02946963 0.08760729 123.66670          52      13
## 14 0.04813104 0.02945312 0.07865371 186.92218           0      14
## 15 0.03421042 0.02210895 0.05293569 215.78684           0      15

As one may note it is a dataframe with six columns that can be later post-processed with custom calculations.

4. Other examples

The following examples show the standard procedure for estimating seroincidence.

# 1. Define cross-sectional data.
serologyData <- with(campylobacterSimMediumData,
                     data.frame(
                       IgG = IgG.ratio.new,
                       IgM = IgM.ratio.new,
                       IgA = IgA.ratio.new))

# 2. Define longitudinal response data
responseParams <- campylobacterDelftParams3

# 3. Define cut-offs
cutOffs <- list(IgG = 0.25, IgM = 0.25, IgA = 0.25)

# 4. Baseline distributions
baseLine <- list(IgG = c(log(0.05), 1),
                 IgM = c(log(0.005), 1),
                 IgA = c(log(0.005), 1))

# 4a. Calculate a single seroincidence rate for all serum samples...
seroincidenceData <- estimateSeroincidence(
  data = serologyData,
  antibodies = c("IgG", "IgM", "IgA"),
  strata = "",
  params = responseParams,
  censorLimits = cutOffs,
  par0 = baseLine)

# 4b. ...or calculate a single seroincidence rate for a single serum sample 
# (triplet of titres)...
seroincidenceData <- estimateSeroincidence(
  data = serologyData[1, ],
  antibodies = c("IgG", "IgM", "IgA"),
  strata = "",
  params = responseParams,
  censorLimits = cutOffs,
  par0 = baseLine)

# 4c. ...or calculate a single seroincidence rate for all serum samples (only IgG)
seroincidenceData <- estimateSeroincidence(
  data = serologyData,
  antibodies = c("IgG"),
  strata = "",
  params = responseParams,
  censorLimits = cutOffs,
  par0 = baseLine)

# 5a. Produce summary of the results with 2.5% and 97.5% bounds...
summary(seroincidenceData)

# 5b. ...or produce summary of the results with 5% and 95% bounds, do not show 
# convergence...
summary(seroincidenceData, quantiles = c(0.05, 0.95), showConvergence = FALSE)

# 5c. ...or produce summary and assign to an object...
seroincidenceSummary <- summary(seroincidenceData)
# ...and work with the results object from now on (here: display the results).
seroincidenceSummary$Results

\newpage

Acknowledgements

Project funded by European Centre for Disease Prevention and Control (ECDC).

References



Try the seroincidence package in your browser

Any scripts or data that you put into this service are public.

seroincidence documentation built on May 2, 2019, 7 a.m.