knitr::opts_chunk$set( collapse = TRUE, quiet = TRUE, progress = FALSE, comment = "#>", fig.width=6, fig.height=4 )
options(readr.show_progress = FALSE)
transnp: Takes SNP alignments for clusters, together with sampling dates and some assumptions, and models transmission events and times.
The available functions in transnp are:
addAlignment: add to a list of alignments from fasta files
setUpDates: create list of dates to match alignments
createMLTrees: creates a list of maximum likelihood estimated trees from a list of alignments. Uses phangorn and phytools
createTimedTrees: creates a list of timed trees from a list of trees and their sample dates. Uses treedater
simulTransTrees: creates a set of transmission trees which are jointly inferred from timed trees. Uses TransPhylo
getLikelyInfectors: extract infection likelihoods for a given case
getInfectionTimes: extract infection times for a given case
Plotting and visualisation
networkTPlot: makes dynamic plot of a TransPhylo transmission tree showing unsampled cases
plot_gen_times: plot histogram of generation times for each cluster
plot_times_to_samp: plot histogram of times to sampling for each cluster
plot_unsampled_cases: plot histogram of the number of unsampled cases for each cluster
plotGenerationTimeDensity: plot generation time density for chosen case
plotInfectionDateDensity: plot infection date density for chosen case
plotSampleTimeDensity: plot time to sampling (from infection) for chosen case
plotTransTreeSummary: plot transmission network with posterior probabilities at or above given cutoff
Filtering on posterior records
filterInfectedDate: filter record on specified date of infection
filterPossibleWiw: filter record on specified allowable transmission matrix
filterTransPair: filter record on specified directed transmission pair
Load libraries:
library(transnp) library(ape) library(phangorn) library(TransPhylo) library(treedater) library(phytools) library(gridExtra) library(purrr) library(ggplot2) library(visNetwork)
We start with a fasta format file containing aligned sequence data.
Read the fasta files
aligns <- list() aligns <- addAlignment(aligns, "demo_cluster_CL001.fas") aligns <- addAlignment(aligns, "demo_cluster_CL002.fas") aligns <- addAlignment(aligns, "demo_cluster_CL005.fas") aligns <- addAlignment(aligns, "demo_cluster_CL007.fas")
This gives a list containing the alignments we will model:
aligns[[1]]
2a: Create the maximum likelihood trees (in this case using TB base frequencies)
basefreqs <- c(0.1719, 0.3286, 0.3276, 0.1719) set.seed(12345) mltrees <- createMLTrees(aligns, bf=basefreqs) par(mfrow=c(2,2)); lapply(mltrees,function(tree) {plot(tree); add.scale.bar()}); par(mfrow=c(1,1))
mltrees[[1]]
2b: Using sample dates, create timed trees using treedater.
allDates <- read.csv(system.file("extdata", "demo_dates.csv", package = "transnp", mustWork = TRUE)) sampledates <- setUpDates(aligns, allDates) timedtrees <- createTimedTrees(mltrees, sampledates, aligns, strictClock = T, meanRateLimits = c(0.3, 0.7))
Note: results depend on the clock assumptions and can affect onward analysis (long time vs short time). We are working towards incorporating this uncertainty.
Infer transmission trees using TransPhylo functionality. This jointly estimates parameters over the trees, sharing parameters among the clusters. We analyse different clusters together.
nIter <- 1000 tpTimedTrees <- list() maxDate <- max(unlist(lapply(sampledates, function(dateList) {return(max(dateList))}))) tpTimedTrees <- lapply(timedtrees, function(tree) {return(ptreeFromPhylo(tree, maxDate))}) #listRecords <- infer_multiTTree_shareParam(tpTimedTrees, share=c("neg","off.r","off.p","pi"), mcmcIterations=nIter, startNeg=1, startOff.p = 0.8, startOff.r = 1, startPi=0.9, updateNeg = T, updatePi = F, updateOff.p=FALSE) listRecords <- simulTransTrees(tpTimedTrees, mean_gen=2, stddev_gen=sqrt(2), mean_sample=2, stddev_sample=sqrt(2), share=c("neg","off.r","off.p","pi"), mcmcIterations=nIter, startNeg=1, startOff.p = 0.8, startOff.r = 1, startPi=0.9, updateNeg = T, updatePi = F, updateOff.p=FALSE)
The result is a list of records, one for each cluster. The
listRecords
object contains the inferred transmission trees.
Combined tree using TransPhylo. In this tree, each colour corresponds to a host. Colours that don't end up at a tip represent inferred unsampled cases. Note that this Bayesian analysis produces a collection of possible transmission trees. Here we show the one with highest posterior probability, for cluster 4.
maptree <- selectTTree(listRecords[[4]],burnin = 0.3) plotCTree(listRecords[[4]][[maptree]]$ctree)
Plot of the maximum posterior probability transmission tree for Cluster 7; we use visNetwork for the visualisation.
networkTPlot(listRecords[[4]], mcmcIndex=nIter-10, missLabel="Missed", colours=c('lightblue', 'orange'))
The thicker edges have higher probability - these transmission events are more certain. Here, there is a fairly high probability that c599 infected c946. We do not have a likely infector for c599.
Here is a different plot, for Cluster 2, showing different colour options.
networkTPlot(listRecords[[2]], mcmcIndex=nIter, missLabel="Missed", colours=c('blue', 'orange'))
We can plot distribution of generation times (one infection to the next) from the posterior.
names(listRecords) = c("Cluster1", "Cluster2", "Cluster5", "Cluster7") plot_gen_times(listRecords)
Clusters 1 and 5 contain some long times, suggesting cases that could have been intermittently infectious or non-infectious for a longer period. Cluster 2 seems to be moving faster.
Times from getting infected to getting sampled, showing differences between clusters.
plot_times_to_samp(listRecords)
The same pattern is mirrored in the times to sampling
How many unsampled cases are estimated in the clusters?
plot_unsampled_cases(listRecords)
Higher numbers in clusters 1 and 5 reflect longer branches and more intermediate cases between sample cases.
We can also extract individual-level infection times by case, for each cluster:
gt11=getInfectionTimes(listRecords[[1]][300:1000],7) name1=listRecords[[1]][[1000]]$ctree$nam[7] hist(gt11,breaks=15,xlab = paste("estimated time of infection for", name1),main = "")
The timing of infection depends on who infected whom and on the timed tree
maptree1=selectTTree(listRecords[[1]],burnin = 0.5) plotCTree(listRecords[[1]][[maptree1]]$ctree)
Generation time (time from being infected to infecting next case) density for a given individual
plotGenerationTimeDensity(listRecords[[1]], index=2)
Infection-to-sample time density for a given individual
plotSampleTimeDensity(listRecords[[1]], index=2)
We can overlay specific events on individual-level infection times by case, for each cluster:
plotInfectionDateDensity(listRecords[[1]], index=1, overlayDate=2015)
We can extract likehoods of infector for a given case
getLikelyInfectors(listRecords[[1]], 'c405')
Plot of transmission network with posterior probabilities at or above given cutoff. Edges are weighted in proportion to the probability of transmission.
plotTransTreeSummary(listRecords[[2]], cutoff=0.2, burnin=0.5)
Epidemiological data may be gathered which adds information which is not modelled. It might be known, for example, that patient A could not have infected patient B (even indirectly) because they were known to living in geographically isolated locations.
Filter record on a specified directed transmission pair. From previous example, we impose that c674 cannot infect c1608.
subRecord1 <- filterTransPair(listRecords[[2]], from='c674', to='c1608') plotTransTreeSummary(subRecord1, cutoff=0.2, burnin=0.5)
Filter record on specified date of infection (before or after, depending on the keepAfter flag) for a particular case
subRecord2 <- filterInfectedDate(listRecords[[2]], case='c1608', infDate=2013.5, keepAfter=TRUE) plotTransTreeSummary(subRecord2, cutoff=0.2, burnin=0.5, nodeColour='orange' )
Shifting the date forward a little excludes more entries
subRecord2b <- filterInfectedDate(listRecords[[2]], case='c1608', infDate=2014.5, keepAfter=TRUE) plotTransTreeSummary(subRecord2b, cutoff=0.2, burnin=0.5, nodeColour='green' )
Filter record on specified logical matrix, which specifies who can infect whom. Record entries are kept only where infection events correspond to a TRUE entry in the testMatrix.
name_list <- listRecords[[2]][[1]]$ctree$nam name_dim <- length(name_list) testMatrix <- matrix(TRUE, name_dim, name_dim) testMatrix[10,8] <- FALSE testMatrix[8,10] <- FALSE name_list[[8]] name_list[[10]]
subRecord3 <- filterPossibleWiw(listRecords[[2]], testMatrix) plotTransTreeSummary(subRecord3, cutoff=0.2, burnin=0.5, nodeColour='pink' )
Centre for Mathematics of Precision Healthcare, Imperial College London
EPSRC Impact Acceleration award, Imperial College London
Inaki Comas and Tom Irving: TB cluster data from Valencia, Spain
Last updated June 2018.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.