knitr::opts_chunk$set(
  collapse = TRUE,
  quiet = TRUE,
  progress = FALSE,
  comment = "#>",
  fig.width=6,
  fig.height=4
)
options(readr.show_progress = FALSE)

Package Introduction

transnp: Takes SNP alignments for clusters, together with sampling dates and some assumptions, and models transmission events and times.

Functionality

The available functions in transnp are:

addAlignment: add to a list of alignments from fasta files

setUpDates: create list of dates to match alignments

createMLTrees: creates a list of maximum likelihood estimated trees from a list of alignments. Uses phangorn and phytools

createTimedTrees: creates a list of timed trees from a list of trees and their sample dates. Uses treedater

simulTransTrees: creates a set of transmission trees which are jointly inferred from timed trees. Uses TransPhylo

getLikelyInfectors: extract infection likelihoods for a given case

getInfectionTimes: extract infection times for a given case

Plotting and visualisation

networkTPlot: makes dynamic plot of a TransPhylo transmission tree showing unsampled cases

plot_gen_times: plot histogram of generation times for each cluster

plot_times_to_samp: plot histogram of times to sampling for each cluster

plot_unsampled_cases: plot histogram of the number of unsampled cases for each cluster

plotGenerationTimeDensity: plot generation time density for chosen case

plotInfectionDateDensity: plot infection date density for chosen case

plotSampleTimeDensity: plot time to sampling (from infection) for chosen case

plotTransTreeSummary: plot transmission network with posterior probabilities at or above given cutoff

Filtering on posterior records

filterInfectedDate: filter record on specified date of infection

filterPossibleWiw: filter record on specified allowable transmission matrix

filterTransPair: filter record on specified directed transmission pair

Package dependencies

Load libraries:

library(transnp)
library(ape)
library(phangorn)
library(TransPhylo)
library(treedater)
library(phytools)
library(gridExtra)
library(purrr)
library(ggplot2)
library(visNetwork)


Step 1: read in the data

We start with a fasta format file containing aligned sequence data.

Read the fasta files

aligns <- list()
aligns <- addAlignment(aligns, "demo_cluster_CL001.fas")
aligns <- addAlignment(aligns, "demo_cluster_CL002.fas")
aligns <- addAlignment(aligns, "demo_cluster_CL005.fas")
aligns <- addAlignment(aligns, "demo_cluster_CL007.fas")

This gives a list containing the alignments we will model:

aligns[[1]]

Step 2: create timed phylogenetic trees

2a: Create the maximum likelihood trees (in this case using TB base frequencies)

basefreqs <- c(0.1719, 0.3286, 0.3276, 0.1719)
set.seed(12345)
mltrees <- createMLTrees(aligns, bf=basefreqs)
par(mfrow=c(2,2)); lapply(mltrees,function(tree) {plot(tree); add.scale.bar()}); par(mfrow=c(1,1))
mltrees[[1]]

2b: Using sample dates, create timed trees using treedater.

allDates <- read.csv(system.file("extdata", "demo_dates.csv", package = "transnp", mustWork = TRUE))
sampledates <- setUpDates(aligns, allDates)
timedtrees <- createTimedTrees(mltrees, sampledates, aligns, strictClock = T, meanRateLimits = c(0.3, 0.7))

Note: results depend on the clock assumptions and can affect onward analysis (long time vs short time). We are working towards incorporating this uncertainty.

Step 3: transmission modelling

Infer transmission trees using TransPhylo functionality. This jointly estimates parameters over the trees, sharing parameters among the clusters. We analyse different clusters together.

nIter <- 1000
tpTimedTrees <- list()

maxDate <- max(unlist(lapply(sampledates, function(dateList) {return(max(dateList))})))
tpTimedTrees <- lapply(timedtrees, function(tree) {return(ptreeFromPhylo(tree, maxDate))}) 

#listRecords <- infer_multiTTree_shareParam(tpTimedTrees, share=c("neg","off.r","off.p","pi"), mcmcIterations=nIter, startNeg=1, startOff.p = 0.8, startOff.r = 1, startPi=0.9, updateNeg = T, updatePi = F, updateOff.p=FALSE)
listRecords <- simulTransTrees(tpTimedTrees, mean_gen=2, stddev_gen=sqrt(2), mean_sample=2, stddev_sample=sqrt(2), share=c("neg","off.r","off.p","pi"), mcmcIterations=nIter, startNeg=1, startOff.p = 0.8, startOff.r = 1, startPi=0.9, updateNeg = T, updatePi = F, updateOff.p=FALSE)

The result is a list of records, one for each cluster. The listRecords object contains the inferred transmission trees.

Step 4: Explore the results

Transmission trees

Combined tree using TransPhylo. In this tree, each colour corresponds to a host. Colours that don't end up at a tip represent inferred unsampled cases. Note that this Bayesian analysis produces a collection of possible transmission trees. Here we show the one with highest posterior probability, for cluster 4.

maptree <- selectTTree(listRecords[[4]],burnin = 0.3)
plotCTree(listRecords[[4]][[maptree]]$ctree)

Plot of the maximum posterior probability transmission tree for Cluster 7; we use visNetwork for the visualisation.

networkTPlot(listRecords[[4]], mcmcIndex=nIter-10, missLabel="Missed", colours=c('lightblue', 'orange'))

The thicker edges have higher probability - these transmission events are more certain. Here, there is a fairly high probability that c599 infected c946. We do not have a likely infector for c599.

Here is a different plot, for Cluster 2, showing different colour options.

networkTPlot(listRecords[[2]], mcmcIndex=nIter, missLabel="Missed", colours=c('blue', 'orange'))

How do the clusters compare to each other?

We can plot distribution of generation times (one infection to the next) from the posterior.

names(listRecords) = c("Cluster1", "Cluster2", "Cluster5", "Cluster7")
plot_gen_times(listRecords)

Clusters 1 and 5 contain some long times, suggesting cases that could have been intermittently infectious or non-infectious for a longer period. Cluster 2 seems to be moving faster.

Times from getting infected to getting sampled, showing differences between clusters.

plot_times_to_samp(listRecords)

The same pattern is mirrored in the times to sampling

How many unsampled cases are estimated in the clusters?

plot_unsampled_cases(listRecords)

Higher numbers in clusters 1 and 5 reflect longer branches and more intermediate cases between sample cases.

Individual-level estimates and uncertainty

We can also extract individual-level infection times by case, for each cluster:

gt11=getInfectionTimes(listRecords[[1]][300:1000],7)
name1=listRecords[[1]][[1000]]$ctree$nam[7]
hist(gt11,breaks=15,xlab = paste("estimated time of infection for", name1),main = "")

The timing of infection depends on who infected whom and on the timed tree

maptree1=selectTTree(listRecords[[1]],burnin = 0.5)
plotCTree(listRecords[[1]][[maptree1]]$ctree)

Generation time (time from being infected to infecting next case) density for a given individual

plotGenerationTimeDensity(listRecords[[1]], index=2)

Infection-to-sample time density for a given individual

plotSampleTimeDensity(listRecords[[1]], index=2)

We can overlay specific events on individual-level infection times by case, for each cluster:

plotInfectionDateDensity(listRecords[[1]], index=1, overlayDate=2015)

We can extract likehoods of infector for a given case

getLikelyInfectors(listRecords[[1]], 'c405')

Cluster-level summary information

Plot of transmission network with posterior probabilities at or above given cutoff. Edges are weighted in proportion to the probability of transmission.

plotTransTreeSummary(listRecords[[2]], cutoff=0.2, burnin=0.5) 

Apply additional knowledge not captured in the model

Epidemiological data may be gathered which adds information which is not modelled. It might be known, for example, that patient A could not have infected patient B (even indirectly) because they were known to living in geographically isolated locations.

Filter record on a specified directed transmission pair. From previous example, we impose that c674 cannot infect c1608.

subRecord1 <- filterTransPair(listRecords[[2]], from='c674', to='c1608')
plotTransTreeSummary(subRecord1, cutoff=0.2, burnin=0.5) 

Filter record on specified date of infection (before or after, depending on the keepAfter flag) for a particular case

subRecord2 <- filterInfectedDate(listRecords[[2]], case='c1608', infDate=2013.5, keepAfter=TRUE)
plotTransTreeSummary(subRecord2, cutoff=0.2, burnin=0.5, nodeColour='orange' )

Shifting the date forward a little excludes more entries

subRecord2b <- filterInfectedDate(listRecords[[2]], case='c1608', infDate=2014.5, keepAfter=TRUE)
plotTransTreeSummary(subRecord2b, cutoff=0.2, burnin=0.5, nodeColour='green' )

Filter record on specified logical matrix, which specifies who can infect whom. Record entries are kept only where infection events correspond to a TRUE entry in the testMatrix.

name_list <- listRecords[[2]][[1]]$ctree$nam
name_dim <- length(name_list)
testMatrix <- matrix(TRUE, name_dim, name_dim)
testMatrix[10,8] <- FALSE
testMatrix[8,10] <- FALSE
name_list[[8]]
name_list[[10]]
subRecord3 <- filterPossibleWiw(listRecords[[2]], testMatrix)
plotTransTreeSummary(subRecord3, cutoff=0.2, burnin=0.5, nodeColour='pink' )

Acknowledgements

Centre for Mathematics of Precision Healthcare, Imperial College London

EPSRC Impact Acceleration award, Imperial College London

Inaki Comas and Tom Irving: TB cluster data from Valencia, Spain

Notes

Last updated June 2018.



JamesStimson/transnp documentation built on May 30, 2019, 3:51 p.m.