knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this tutorial we will analyse the the dataset patient_data_simulated
with
the function run_mcmc_with_types()
, to demonstrate how discrete typing information
can be used to infer sources of infection.
The dataset patient_data_simulated
is distributed with this package and contains
simulated admission and discharge dates, test results and antibiotic use of
270 patients over 90 days. For a more detailed description of the example dataset
consider the vignette introduction. In this example we assume that each patient
who was tested positive gets assigned one of four types. We use these types to
infer likely sources of infection.
library(mrsamcmc) example_dataset <- patient_data_simulated head(example_dataset$patients)
To extend the model implemented in the function run_mcmc_with_types()
to account for discrete typing data we use a simplified version of the model by @pmid27042253. If the testing contains some additional information on the type of infection this information can be used to determine likely sources of infection and therefore potentially increase the overall precision of the estimates. The dataset we are using does not contain typing or sequencing data, however, resistance profiles can be used as a proxy for the type of strain. We denote by $Z$ the augmented source patients for all transmission events and factorise the overall likelihood as follows
$$ \Pi\left(\Omega,\ \Sigma,\ T,\ D,\ W,Z\middle|\theta\right)=\Pi\left(T\middle|W,\ Z,\theta\ \right)\cdot\Pi\left(Z\middle|W,\ \theta\right)\cdot\Pi\left(W\middle|\theta\right) $$
with $T$ denoting the MRSA type causing colonisation. $D$ is a matrix of distances between the different types. In the simplest case we can assume that there is no transmission between different types and set the likelihood contribution of all patients who acquire to 1 if the augmented type of the patient is the same as the type as the source and to 0 otherwise. For patients who are imported and do not have a positive test, a type has to be augmented. We assume that the probability of being colonised with any given type is equal to the proportion of patients who were tested positive and found to have that type. We therefore model the likelihood of the observed types given the augmented sources and colonisation times as
$$ \Pi\left(T\middle|W,\ Z,\theta\ \right)=\ \prod_{k:t_k^c=\infty,\ {\ t}k^c\ \geq\ t_k^a}\left[1{T_k=T_{Z_k}}\right]\ \prod_{k:\ {\ t}k^c\ <\ t_k^a}\frac{N{T_k}}{N_T} $$
where $N_{T_k}$ is the number of patients who were tested positive with type $T_k$ and $N_T$ is the total number of patients tested positive with any type. Since our model allows for an effect of antibiotics on onward transmission, patients who are on antibiotics on a given day are not equally likely to act as sources of infection as patients who are not on antibiotics. In the absence of effects of antibiotics, the likelihood of any given patient being the source of infection to a patient who acquires infection on a given day is equal to one over the number of colonised patients on that day. In that case the likelihood of the sources given the colonisation times would be given by
$$ \Pi\left(Z\middle|S,\ \theta\ \right)=\prod_{k\ =\ 1}^{N}{\left[\prod_{i\ =\ 1}^{N_{c,k}}\frac{1}{C\left(t_i^c\right)}\right]\ \ \ \ \ }$$
where $C\left(t_i^c\right)$ is the number of colonised patients on the day when patient i becomes colonised. To account for the effect of antibiotics on transmission we modify this as follows
$$\Pi\left(Z\middle|S,\ \theta\ \right)=\prod_{k\ =\ 1}^{N}{\left[\prod_{i\ =\ 1}^{N_{c,k}}\frac{b^{l_{ik}}}{C_{nabx}\left(t_i^c\right)+bC_{abx}\left(t_i^c\right)}\right]\ \ \ \ \ }$$
with $$ l_{ik}= \left{ \begin{array}{cc} 1 & \mbox{ if } σ_{t_i^c,Z_{k,i}} =1\ 0 & \mbox{else} \end{array} \right.$$
The approach we present here can also be used for discrete typing schemes such as multi-locus sequence typing (MLST). It can also be easily adapted to more continuous measures of distance, such as spa typing, by deriving a likelihood function which is based on the distance between the type of the offspring and the type of the proposed source.
mcmc_chains <- run_mcmc_with_types(data = example_dataset, configuration = generate_configuration(n_iter = 3000))
We can look at likely infection times and sources of one patient as follows:
patient <- 248 table(mcmc_chains$sources[ ,patient]) table(mcmc_chains$statuses[ ,patient])
We can also plot the full transmission network, where each node corresponds to a colonised patient and the width of the edges corresponds to the proportion of simulation runs with a transmission between the two patients connected by an edge.
library(igraph) all_patients <- seq(1,270) types <- cbind(all_patients, example_dataset$patients$type) types <- data.frame(types) colnames(types) <- c("all_patients","type") types[types$type == -1,]$type <- 5 admatrix <- matrix(0, nrow = 271, ncol = 271) for (k in 1:270){ for (j in 0:270){ admatrix[j+1,k+1] <- length(which(mcmc_chains$sources[ ,k] == j)) } } colnames(admatrix) <- seq(0,270) admatrix <- admatrix/3000 graph_weighted <- graph_from_adjacency_matrix(admatrix, mode = "directed", weighted = TRUE) V(graph_weighted)$type = as.character(types$type[match(V(graph_weighted)$name, types$all_patients)]) graph_weighted <- (delete.vertices(simplify(graph_weighted), degree(graph_weighted) == 0)) graph_weighted <- delete.edges(graph_weighted, which(E(graph_weighted)$weight < 0.1)) graph_weighted <- (delete.vertices(simplify(graph_weighted), degree(graph_weighted)==0)) E <- t(apply(get.edgelist(graph_weighted), 1, sort)) E(graph_weighted)$curved <- 0 E(graph_weighted)[duplicated(E) | duplicated(E,fromLast =TRUE)]$curved <- 1 V(graph_weighted)$color = V(graph_weighted)$type plot(graph_weighted, vertex.size = 10, vertex.label.cex = 0.8, edge.width = 5 * E(graph_weighted)$weight, edge.arrow.size = 0.1, main = "infered transmission network")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.