knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In this tutorial we will analyse the the dataset patient_data_simulated with the function run_mcmc_with_types(), to demonstrate how discrete typing information can be used to infer sources of infection.

Example Dataset

The dataset patient_data_simulated is distributed with this package and contains simulated admission and discharge dates, test results and antibiotic use of 270 patients over 90 days. For a more detailed description of the example dataset consider the vignette introduction. In this example we assume that each patient who was tested positive gets assigned one of four types. We use these types to infer likely sources of infection.

library(mrsamcmc)
example_dataset <- patient_data_simulated
head(example_dataset$patients)

Model

To extend the model implemented in the function run_mcmc_with_types() to account for discrete typing data we use a simplified version of the model by @pmid27042253. If the testing contains some additional information on the type of infection this information can be used to determine likely sources of infection and therefore potentially increase the overall precision of the estimates. The dataset we are using does not contain typing or sequencing data, however, resistance profiles can be used as a proxy for the type of strain. We denote by $Z$ the augmented source patients for all transmission events and factorise the overall likelihood as follows

$$ \Pi\left(\Omega,\ \Sigma,\ T,\ D,\ W,Z\middle|\theta\right)=\Pi\left(T\middle|W,\ Z,\theta\ \right)\cdot\Pi\left(Z\middle|W,\ \theta\right)\cdot\Pi\left(W\middle|\theta\right) $$

with $T$ denoting the MRSA type causing colonisation. $D$ is a matrix of distances between the different types. In the simplest case we can assume that there is no transmission between different types and set the likelihood contribution of all patients who acquire to 1 if the augmented type of the patient is the same as the type as the source and to 0 otherwise. For patients who are imported and do not have a positive test, a type has to be augmented. We assume that the probability of being colonised with any given type is equal to the proportion of patients who were tested positive and found to have that type. We therefore model the likelihood of the observed types given the augmented sources and colonisation times as

$$ \Pi\left(T\middle|W,\ Z,\theta\ \right)=\ \prod_{k:t_k^c=\infty,\ {\ t}k^c\ \geq\ t_k^a}\left[1{T_k=T_{Z_k}}\right]\ \prod_{k:\ {\ t}k^c\ <\ t_k^a}\frac{N{T_k}}{N_T} $$

where $N_{T_k}$ is the number of patients who were tested positive with type $T_k$ and $N_T$ is the total number of patients tested positive with any type. Since our model allows for an effect of antibiotics on onward transmission, patients who are on antibiotics on a given day are not equally likely to act as sources of infection as patients who are not on antibiotics. In the absence of effects of antibiotics, the likelihood of any given patient being the source of infection to a patient who acquires infection on a given day is equal to one over the number of colonised patients on that day. In that case the likelihood of the sources given the colonisation times would be given by

$$ \Pi\left(Z\middle|S,\ \theta\ \right)=\prod_{k\ =\ 1}^{N}{\left[\prod_{i\ =\ 1}^{N_{c,k}}\frac{1}{C\left(t_i^c\right)}\right]\ \ \ \ \ }$$

where $C\left(t_i^c\right)$ is the number of colonised patients on the day when patient i becomes colonised. To account for the effect of antibiotics on transmission we modify this as follows

$$\Pi\left(Z\middle|S,\ \theta\ \right)=\prod_{k\ =\ 1}^{N}{\left[\prod_{i\ =\ 1}^{N_{c,k}}\frac{b^{l_{ik}}}{C_{nabx}\left(t_i^c\right)+bC_{abx}\left(t_i^c\right)}\right]\ \ \ \ \ }$$

with $$ l_{ik}= \left{ \begin{array}{cc} 1 & \mbox{ if } σ_{t_i^c,Z_{k,i}} =1\ 0 & \mbox{else} \end{array} \right.$$

The approach we present here can also be used for discrete typing schemes such as multi-locus sequence typing (MLST). It can also be easily adapted to more continuous measures of distance, such as spa typing, by deriving a likelihood function which is based on the distance between the type of the offspring and the type of the proposed source.

Inference

mcmc_chains <- run_mcmc_with_types(data = example_dataset, 
                                   configuration = generate_configuration(n_iter = 3000))

Analyse the output

Sources and time of infection of individual patients

We can look at likely infection times and sources of one patient as follows:

patient <- 248
table(mcmc_chains$sources[ ,patient])
table(mcmc_chains$statuses[ ,patient])

Transmission network

We can also plot the full transmission network, where each node corresponds to a colonised patient and the width of the edges corresponds to the proportion of simulation runs with a transmission between the two patients connected by an edge.

library(igraph)
all_patients <- seq(1,270)
types <- cbind(all_patients, example_dataset$patients$type)
types <- data.frame(types)
colnames(types) <- c("all_patients","type")
types[types$type == -1,]$type <- 5
admatrix <- matrix(0, nrow = 271, ncol = 271)

for (k in 1:270){
  for (j in 0:270){
    admatrix[j+1,k+1] <- length(which(mcmc_chains$sources[ ,k] == j))
  }
}
colnames(admatrix) <- seq(0,270)
admatrix <- admatrix/3000
graph_weighted <- graph_from_adjacency_matrix(admatrix, 
                                              mode = "directed", 
                                              weighted = TRUE)

V(graph_weighted)$type = as.character(types$type[match(V(graph_weighted)$name, types$all_patients)])

graph_weighted <- (delete.vertices(simplify(graph_weighted), 
                   degree(graph_weighted) == 0))

graph_weighted <- delete.edges(graph_weighted, 
                   which(E(graph_weighted)$weight < 0.1))

graph_weighted <- (delete.vertices(simplify(graph_weighted), 
                                   degree(graph_weighted)==0))

E <- t(apply(get.edgelist(graph_weighted), 1, sort))

E(graph_weighted)$curved <- 0
E(graph_weighted)[duplicated(E) | duplicated(E,fromLast =TRUE)]$curved <- 1

V(graph_weighted)$color = V(graph_weighted)$type

plot(graph_weighted,
     vertex.size = 10, vertex.label.cex = 0.8,
     edge.width = 5 * E(graph_weighted)$weight,
     edge.arrow.size = 0.1,
     main = "infered transmission network")


mirjamlaager/mrsamcmc documentation built on May 20, 2020, 11:13 a.m.