knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In this tutorial we will be analysing the dataset patient_data_simulated to demonstrate the main functionalities of the package mrsamcmc.

Example Dataset

The dataset patient_data_simulated is distributed with this package and contains simulated admission and discharge dates, test results and antibiotic use of 270 patients over 90 days.

library(mrsamcmc)
example_dataset <- patient_data_simulated

Data to be analysed with mrsamcmc should be in the same format as the example dataset we use here. So let's look at the example dataset in more detail. It is a list with four elements: $patients, $test_results_positive, $test_results_negative and $antibiotics.

patients

head(example_dataset$patients)

In $patients each row corresponds to a patient and the columns are the admission and discharge dates of the patient (in days) and the day of the first positive test for those patients who were tested positive. If a patient was never tested positive, their day of first positive test is set to 20000. The dataset also contains a column for the types of the patients. This column is ignored in the simplest form of the model run_mcmc(), which does not infer sources of infection.

test results

In $test_results_positive and $test_results_negative each row corresponds to a patient and the entries are the day of a positive or negative test for that patient. We can get the total number of positive and negative tests in the example dataset as follows:

length(which(example_dataset$test_results_positive != 0))
length(which(example_dataset$test_results_negative != 0))

antibiotics

In $antibiotics each row corresponds to a patient and the entries are the days on which antibiotics were administered for that patient. Patient 4 for example was admitted on day r example_dataset$patients[4,]$admission and discharged on day r example_dataset$patients[4,]$discharge and was on antibiotics on the following days:

example_dataset$antibiotics[4,example_dataset$antibiotics[4,] != 0]

Model

The model implemented in the package mrsamcmc very much builds on previous methodology as described in @doi:10.1093/biostatistics/kxl017 and @pmid27042253. The data augmentation approach used in these publications introduces additional parameters for the unknown colonisation time for each patient, which makes it possible to split the overall likelihood into the following product: $$ \Pi(\Omega, W| \rho, \varphi, \beta) = \Pi(\Omega|W,\rho) \cdot \Pi(W | \rho, \beta), $$ where $\Omega$ denotes the test results, W is the colonisation status of the patients and colonisation times, $\rho$ is the test sensitivity, $\varphi$ is the probability of a patient being positive on admission and $\beta$ is a transmission parameter. The first factor of the product describes the observation process, which can be modelled as a binomial distribution. Assuming perfect specificity, the likelihood of the test results given the test sensitivity and the true statuses is therefore given by

$$ \Pi\left(\Omega\middle|W,\rho\right)=\rho^{TP}\cdot\left(1-\rho\right)^{FN} $$

where TP and FN denote the number of true positive and false negative tests. The second factor of the overall likelihood captures the transmission process. Following @pmid27042253 we assume that patients who acquire a colonisation stay colonised until discharge. The patient statuses, W, are therefore fully described by the colonisation times, $t^c$, which are set to infinity for patients who do not get colonised during their ward stay. By discretising time into days, the likelihood of the colonisation times given the transmission parameters can be modelled as

$$ \Pi\left(t^c\middle|\varphi,\beta\right)=\varphi^{N_p}\left(1-\varphi\right)^{N-N_p}\prod_{k:t_k^c=\infty}\left[\prod_{j=t_k^a}^{t_k^d}\left(1-p_{kj}\right)\right]\prod_{k:t_k^c\neq\infty}\left[p_{kt_k^c}\prod_{j=t_k^a}^{t_k^c-1}\left(1-p_{kj}\right)\right] $$

where $N$ is the total number of patients, $N_p$ is the number of patients who are admitted already colonised and $t_k^a$, $t_k^c$ and $t_k^d$ denote the day of admission, colonisation and discharge of patient $k$. $p_{kj}$ is the probability of patient $k$ becoming colonised on day $j$ which is given by

$$ p_{kj}= 1 - \exp(- \beta C(j)) $$

where $C(j)$ is the number of colonised patients on day $j$.

Accounting for the effect of antibiotics

We expand this model to account for the effect of antibiotics on detection and transmission. We assume that antibiotics can affect the test sensitivity, the probability of acquisition and the probability of onward transmission. To capture the effect of antibiotics on test sensitivity we assume that in the presence of antibiotics the baseline test sensitivity changes by a factor \delta and therefore model the likelihood of the test results as

$$ \Pi\left(\Omega\middle|W,\rho,\delta,\Sigma\right)=\rho^{TP_{nabx}}\cdot\left(1-\rho\right)^{FN_{nabx}}\cdot\left(\delta\rho\right)^{TP_{abx}}\cdot\left(1-\delta\rho\right)^{FN_{abx}} $$

where $TP_{nabx}$, ${FN}{nabx}$, $TP{abx}$ and ${FN}{abx}$ denote the total number of true positive and false negative tests conducted while a patient was on or off antibiotics and $\Sigma$ is a matrix with entries $\sigma{ij}$ equal to one if patient i is on antibiotics on day j and zero otherwise. We model the effect of antibiotics on the transmission dynamics by introducing a parameter $\alpha$ for the effect on acquisition and a parameter $\tau$ for the effect on onward transmission and modify the daily probability of acquisition as follows

$$ p_{kj}=1-\ exp(\ -\ \alpha^{1_{\sigma_{kj}=1}}(\beta C_{nabx}\left(j\right)+\tau\beta C_{abx}\left(j\right))) $$

where $C_{nabx}(j)$ and $C_{abx}(j)$ denote the number of colonised patients on day j who are on or off antibiotics and $1_{\sigma_{kj}=1}$ is an indicator function which equals to 1 if patient $k$ is on antibiotics on day $j$ and zero otherwise.

Inference with customised settings

In this section we will run the MCMC algorithm with the example dataset and show how to customise the settings of the MCMC and use custom prior distributions. The function run_mcmc() has three arguments, data, priors and configuration. If none of these arguments are specified, the function runs with default priors and configuration. Here, we will use some custom settings.

Customise configuration

We generate a custom configuration using generate_configuration(). This function returns the default configuration, which can be inspected as follows:

my_configuration <- generate_configuration()
my_configuration$n_iter

For the purpouse of this tutorial, we run a short chain with 3000 iterations only. We also infer no effects of antibiotics for now and modify the output storage such that we only store aggreated patient data only .

my_configuration <- generate_configuration(n_iter = 3000, store_params = "STORE_ALL", 
store_statuses = "AGGREGATED", infer_covariate_effects = "NONE")

Alternatively, we can get the default configuration and modify entries individually

new_configuration <- generate_configuration()
new_configuration$n_iter <- 300

Customise priors

We generate custom priors using generate_priors(). This function returns the default priors, which can be inspected as follows:

my_priors <- generate_priors()
my_priors$functions$f_beta
my_priors$parameters$p_beta

The default prior for the transmission rate is a Cauchy distribution with parameters 0 and 4. Let's change that to a Uniform prior:

my_priors$functions$f_beta <- dunif
my_priors$parameters$p_beta <- c(0,1)

Alternatively, we can generate the custom priors with the generate_priors() function directly:

my_priors <- generate_priors(f_beta = dunif, p_beta = c(0,1) )

Run MCMC

Now, we can run the MCMC using our customised configuration and priors:

output <- run_mcmc(data = example_dataset, priors = my_priors, 
configuration = my_configuration)

Analysing the output

We use the package coda to take a look at the results. Because the MCMC augments the true colonisation times, we can now estimate, what proportion of the colnised patients were admitted already colonised and what proportion acquired the colonisation during their stay:

library(coda)
library(ggplot2)
print(HPDinterval(as.mcmc(output$statuses)))

We can also plot the chains and densities of individual parameters. Since we only ran a small number of iterations for the purpose of this tutorial, the chain clearly has not yet converged. Consider running more iterations (for example 30'000) and running multiple chains for better results.

full_chain_beta <- output$parameters$beta
chain_without_burnin_beta <- full_chain_beta[-(1:my_configuration$n_burnin)]
plot(as.mcmc(chain_without_burnin_beta))

For further output analysis including posterior predicitive checks and forward simulations, consider the vignette simulations.

References



mirjamlaager/mrsamcmc documentation built on May 20, 2020, 11:13 a.m.