knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this tutorial we will be analysing the dataset patient_data_simulated
to
demonstrate the main functionalities of the package mrsamcmc.
The dataset patient_data_simulated
is distributed with this package and contains
simulated admission and discharge dates, test results and antibiotic use of
270 patients over 90 days.
library(mrsamcmc) example_dataset <- patient_data_simulated
Data to be analysed with mrsamcmc should be in
the same format as the example dataset we use here. So let's look at the example
dataset in more detail. It is a list with four elements: $patients
,
$test_results_positive
, $test_results_negative
and $antibiotics
.
head(example_dataset$patients)
In $patients
each row corresponds to a patient and the columns are the admission
and discharge dates of the patient (in days) and the day of the first positive test
for those patients who were tested positive. If a patient was never tested positive,
their day of first positive test is set to 20000. The dataset also
contains a column for the types of the patients. This column is ignored in the
simplest form of the model run_mcmc()
, which does not infer sources of infection.
In $test_results_positive
and $test_results_negative
each row corresponds to a
patient and the entries are the day of a positive or negative test for that patient.
We can get the total number of positive and negative tests in the example
dataset as follows:
length(which(example_dataset$test_results_positive != 0)) length(which(example_dataset$test_results_negative != 0))
In $antibiotics
each row corresponds to a patient and the entries are the days
on which antibiotics were administered for that patient. Patient 4 for example
was admitted on day r example_dataset$patients[4,]$admission
and
discharged on day r example_dataset$patients[4,]$discharge
and was
on antibiotics on the following days:
example_dataset$antibiotics[4,example_dataset$antibiotics[4,] != 0]
The model implemented in the package mrsamcmc very much builds on previous methodology as described in @doi:10.1093/biostatistics/kxl017 and @pmid27042253. The data augmentation approach used in these publications introduces additional parameters for the unknown colonisation time for each patient, which makes it possible to split the overall likelihood into the following product: $$ \Pi(\Omega, W| \rho, \varphi, \beta) = \Pi(\Omega|W,\rho) \cdot \Pi(W | \rho, \beta), $$ where $\Omega$ denotes the test results, W is the colonisation status of the patients and colonisation times, $\rho$ is the test sensitivity, $\varphi$ is the probability of a patient being positive on admission and $\beta$ is a transmission parameter. The first factor of the product describes the observation process, which can be modelled as a binomial distribution. Assuming perfect specificity, the likelihood of the test results given the test sensitivity and the true statuses is therefore given by
$$ \Pi\left(\Omega\middle|W,\rho\right)=\rho^{TP}\cdot\left(1-\rho\right)^{FN} $$
where TP and FN denote the number of true positive and false negative tests. The second factor of the overall likelihood captures the transmission process. Following @pmid27042253 we assume that patients who acquire a colonisation stay colonised until discharge. The patient statuses, W, are therefore fully described by the colonisation times, $t^c$, which are set to infinity for patients who do not get colonised during their ward stay. By discretising time into days, the likelihood of the colonisation times given the transmission parameters can be modelled as
$$ \Pi\left(t^c\middle|\varphi,\beta\right)=\varphi^{N_p}\left(1-\varphi\right)^{N-N_p}\prod_{k:t_k^c=\infty}\left[\prod_{j=t_k^a}^{t_k^d}\left(1-p_{kj}\right)\right]\prod_{k:t_k^c\neq\infty}\left[p_{kt_k^c}\prod_{j=t_k^a}^{t_k^c-1}\left(1-p_{kj}\right)\right] $$
where $N$ is the total number of patients, $N_p$ is the number of patients who are admitted already colonised and $t_k^a$, $t_k^c$ and $t_k^d$ denote the day of admission, colonisation and discharge of patient $k$. $p_{kj}$ is the probability of patient $k$ becoming colonised on day $j$ which is given by
$$ p_{kj}= 1 - \exp(- \beta C(j)) $$
where $C(j)$ is the number of colonised patients on day $j$.
We expand this model to account for the effect of antibiotics on detection and transmission. We assume that antibiotics can affect the test sensitivity, the probability of acquisition and the probability of onward transmission. To capture the effect of antibiotics on test sensitivity we assume that in the presence of antibiotics the baseline test sensitivity changes by a factor \delta and therefore model the likelihood of the test results as
$$ \Pi\left(\Omega\middle|W,\rho,\delta,\Sigma\right)=\rho^{TP_{nabx}}\cdot\left(1-\rho\right)^{FN_{nabx}}\cdot\left(\delta\rho\right)^{TP_{abx}}\cdot\left(1-\delta\rho\right)^{FN_{abx}} $$
where $TP_{nabx}$, ${FN}{nabx}$, $TP{abx}$ and ${FN}{abx}$ denote the total number of true positive and false negative tests conducted while a patient was on or off antibiotics and $\Sigma$ is a matrix with entries $\sigma{ij}$ equal to one if patient i is on antibiotics on day j and zero otherwise. We model the effect of antibiotics on the transmission dynamics by introducing a parameter $\alpha$ for the effect on acquisition and a parameter $\tau$ for the effect on onward transmission and modify the daily probability of acquisition as follows
$$ p_{kj}=1-\ exp(\ -\ \alpha^{1_{\sigma_{kj}=1}}(\beta C_{nabx}\left(j\right)+\tau\beta C_{abx}\left(j\right))) $$
where $C_{nabx}(j)$ and $C_{abx}(j)$ denote the number of colonised patients on day j who are on or off antibiotics and $1_{\sigma_{kj}=1}$ is an indicator function which equals to 1 if patient $k$ is on antibiotics on day $j$ and zero otherwise.
In this section we will run the MCMC algorithm with the example dataset and
show how to customise the settings of the MCMC and use custom prior distributions.
The function run_mcmc()
has three arguments, data
, priors
and configuration
.
If none of these arguments are specified, the function runs with default priors and
configuration. Here, we will use some custom settings.
We generate a custom configuration using generate_configuration()
. This function
returns the default configuration, which can be inspected as follows:
my_configuration <- generate_configuration() my_configuration$n_iter
For the purpouse of this tutorial, we run a short chain with 3000 iterations only. We also infer no effects of antibiotics for now and modify the output storage such that we only store aggreated patient data only .
my_configuration <- generate_configuration(n_iter = 3000, store_params = "STORE_ALL", store_statuses = "AGGREGATED", infer_covariate_effects = "NONE")
Alternatively, we can get the default configuration and modify entries individually
new_configuration <- generate_configuration() new_configuration$n_iter <- 300
We generate custom priors using generate_priors()
. This function returns
the default priors, which can be inspected as follows:
my_priors <- generate_priors() my_priors$functions$f_beta my_priors$parameters$p_beta
The default prior for the transmission rate is a Cauchy distribution with parameters 0 and 4. Let's change that to a Uniform prior:
my_priors$functions$f_beta <- dunif my_priors$parameters$p_beta <- c(0,1)
Alternatively, we can generate the custom priors with the generate_priors()
function directly:
my_priors <- generate_priors(f_beta = dunif, p_beta = c(0,1) )
Now, we can run the MCMC using our customised configuration and priors:
output <- run_mcmc(data = example_dataset, priors = my_priors, configuration = my_configuration)
We use the package coda to take a look at the results. Because the MCMC augments the true colonisation times, we can now estimate, what proportion of the colnised patients were admitted already colonised and what proportion acquired the colonisation during their stay:
library(coda) library(ggplot2) print(HPDinterval(as.mcmc(output$statuses)))
We can also plot the chains and densities of individual parameters. Since we only ran a small number of iterations for the purpose of this tutorial, the chain clearly has not yet converged. Consider running more iterations (for example 30'000) and running multiple chains for better results.
full_chain_beta <- output$parameters$beta chain_without_burnin_beta <- full_chain_beta[-(1:my_configuration$n_burnin)] plot(as.mcmc(chain_without_burnin_beta))
For further output analysis including posterior predicitive checks and forward simulations, consider the vignette simulations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.