knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", warning=FALSE, message=F, cache=T )
#Load needed packages library(LineageHomology) library(ggplot2) library(scales) library(lubridate) #Loading other packages for simulating data. library(ape) library(phytools) library(phangorn) library(BactDating) #Pipe operator library(dplyr)
Here we use the R packages BactDating and Phytools to simulate time dated geographical data on a phylogeny with two locations. We let the geographical states represent Norway (represented by red colour) and the rest of the World (RoW) (represented by blue colour).
set.seed(400) tree_test = BactDating::simdatedtree(nsam=300, dateroot=2000) #300 taxa and date of the root at year 2000- tree_test = ladderize(tree_test) #Reorder the tree to make it look nice Q=matrix(c(0.5,0.5,0.5,0.5), nrow=2,ncol=2, byrow=F) #Set up a transition matrix for trait simulation colnames(Q)=c("Norway","RoW") #From state 1 to state 2. rownames(Q)=c("Norway","RoW") #From state 1 to state 2. trait = phytools::sim.Mk(tree=tree_test,Q=Q,nsim=1) #Simulate the traits on the phylogeny #Reconstruct ancestral states using ace. fit1 = ace(x=trait, phy= tree_test, type="discrete", mod="ARD") #Estimate the ancestral history plot.phylo(tree_test,lwd=2,label.offset = 0.15, mar=c(0.2,0.2,0.2,0.2),cex=0.3) #Ploy phylogeny axisPhylo(root.time=2000, backward=F) #Add time axis nodelabels(pie=fit1$lik.anc,cex=0.2,piecol=c("Red","Blue")) tips = to.matrix(trait,seq=c("Norway", "RoW")) tiplabels(pie=tips, cex=0.2,piecol=c("Red","Blue"))
The pie charts on the nodes of the phylogeny represent the probability of the node being in the different geographical locations, where red represents Norway, and blue represents RoW. LineageHomology uses these probabilities to count connected groups of tips, where all the nodes on the paths between them have a probability that > 50% for the same geographical state. LineageHomology also returns the number of tips that are not connected to any other tips in this way. Following du Plessis et al. (2021) (DOI: 10.1126/science.abf2946) we refer to the groups as transmission lineages (TLs) and isolated tips as singletons.
Next we run LineageHomology. The inputs data should take this format.
tree_test head(fit1$lik.anc) head( to.matrix(trait, seq=c("Norway", "RoW")))
Note that the node probabilities of the tips must have row names that match the tips of the phylogeny.
Result = LineageHomology(tree_test, ace_nodes=fit1$lik.anc, ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time=2000)
LineageHomology returns multiple values
names(Result) Result$Import_LocalTrans Result$lineage_state Result$Lineage_sizes head(Result$Taxa_names) Result$`MRCA's` Result$lineage_state
The simplest method to visualize the results from LineageHomology is to use a treemap plot. Treemap_lineagehomology uses the treemap package to visualize transmission lineages as rectangles with areas that reflect their relative sizes. Additionally, the text on each square gives the name of group number of the transmission lineage, the estimated time of the mrca (TMRCA) of the transmission lineage and the number of tips in the group (S)
LineageHomology::treemap_lineagehomology(Result)
The accumulation of tips in a lineage over time can be visualized by using density plots. Before using the plotting methods below, we need to run the processing function lineage_info. lineage_info takes the results from LineageHomology and names and dates observed dates of the tips in the tree. The formats of these should match the ones shown below.
#Set up matrix with taxa info name_date = data.frame(name = names(trait), dates= BactDating::leafDates(tree_test)) head(name_date) Result_lineage_info = LineageHomology::lineage_info(Result,name_date)
Ridgeplot_lineagedensities takes the input from the function lineage_info and produces density plots using for each TL over time. The density plots are made using the ggridges R-package. The TLs are sorted from largest to smallest and can be coloured according to the state that defines the TL.
LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 1,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = F)
LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 1,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T) LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 4,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T) LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 10,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T)
To visualize growth patterns over time, LineageHomology provides a function for plotting the cumulative number of observations in each TL separately.
LineageHomology::lineage_growth_cumulative(Result_lineage_info = Result_lineage_info, datelims=c("2000-01-01","2025-01-01","3 year"))
LineageHomology::lineage_growth_cumulative(Result_lineage_info = Result_lineage_info, datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T)
In all the figures above, the transmission lineage division was done by dividing lineages at points where a transition is estimated with more than 50 percent probability. If one wishes to divide the transmission lineages by sampling from the observed probabilities, one can use the function LineageHomology_w_uncertainty (or v2, which is faster). This retains the uncertainty in the phylogeographical mapping. Here we visualize the difference in the assignment of TLs in two runs on the same mapping by using treemap plots.
Result1 = LineageHomology_w_uncertainty_v2(tree_test, ace_nodes=fit1$lik.anc, ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time=2000) LineageHomology::treemap_lineagehomology(Result1) Result2 = LineageHomology_w_uncertainty_v2(tree_test, ace_nodes=fit1$lik.anc, ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time=2000) LineageHomology::treemap_lineagehomology(Result2)
As these figures show, the division of lineages will vary according to the sampled states of some of the nodes. This can be useful for producing confidence limits on, e.g. the size of the largest lineage.
Here we use pbreplicate to counts transmission lineages probabilistically many times to obtain confidence intervals, or, in this case, a distribution) for the size of the largest transmission lineage.
library(pbapply) multi_counts = pbreplicate( 1000, LineageHomology::LineageHomology_w_uncertainty_v2( tree=tree_test, ace_nodes=fit1$lik.anc, ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time = 2000 ) ) largest_lineage = c() for(i in 1:1000) { largest_lineage = c(largest_lineage, max(multi_counts[,i]$Lineage_sizes)) } hist(largest_lineage, breaks=100, xlim=c(0,40), probability = F, main="Size of largest transmission lineage",xlab="Size", ylab="Probability")
As shown above, in the given mapping, we are confident that the largest transmission lineage contains no more than 33 tips.
LineageHomology can also be used to estimate importation and local transmission over time from the phylogeographic mapping. Here we focus on Norway and run LineageHomology on the observed tips in Norway before estimating the contribution of importation and local transmission events over time in Norway.
library(pbapply) Norwegian_tips = names(trait[trait=="Norway"]) head(Norwegian_tips) multi_counts = pbreplicate( 1000, LineageHomology::LineageHomology_w_uncertainty_v2( tree=tree_test, give_tips = Norwegian_tips, ace_nodes=fit1$lik.anc, ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time = 2000 ) ) #Count number of local transmission and importations in half-year (0.5) intervals. result_import_local_transmission = import_local_transmission(tree = tree_test,LineageHomology_replicates = multi_counts, start_time = 2000,time_interval_size = 0.5)
We can summarise the number of tips estimated to be a result of importation events and local transmission events.
Summarize_import_local_transmission(LineageHomology_replicates = multi_counts)
Thus in the phylogeographic mapping above, there seem to be slightly more importation events in Norway (58%) than local transmission events. Lastly, we can plot the relative contribution of importation and local transmission in time intervals spanning half a year.
plot_importation_local_transmission(tree = tree_test, result_import_local_transmission = result_import_local_transmission,start_time=2000, time_interval_size = 0.5, date_breaks = "2 years",importation_or_local = "both") #Use importation_or_local = "importation" or "local" if you do not have the grid package installed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.