knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%", 
  warning=FALSE,
  message=F,
  cache=T
)
#Load needed packages
library(LineageHomology)
library(ggplot2)
library(scales)
library(lubridate)

#Loading other packages for simulating data. 
library(ape)
library(phytools)
library(phangorn)
library(BactDating)

#Pipe operator
library(dplyr)

Simulate tip data with two different states.

Here we use the R packages BactDating and Phytools to simulate time dated geographical data on a phylogeny with two locations. We let the geographical states represent Norway (represented by red colour) and the rest of the World (RoW) (represented by blue colour).

set.seed(400)
tree_test = BactDating::simdatedtree(nsam=300, dateroot=2000) #300 taxa and date of the root at year 2000-
tree_test = ladderize(tree_test) #Reorder the tree to make it look nice
Q=matrix(c(0.5,0.5,0.5,0.5), nrow=2,ncol=2, byrow=F) #Set up a transition matrix for trait simulation
colnames(Q)=c("Norway","RoW") #From state 1 to state 2. 
rownames(Q)=c("Norway","RoW") #From state 1 to state 2. 

trait = phytools::sim.Mk(tree=tree_test,Q=Q,nsim=1) #Simulate the traits on the phylogeny

#Reconstruct ancestral states using ace. 
fit1 = ace(x=trait, phy= tree_test, type="discrete", mod="ARD") #Estimate the ancestral history

plot.phylo(tree_test,lwd=2,label.offset = 0.15, mar=c(0.2,0.2,0.2,0.2),cex=0.3) #Ploy phylogeny

axisPhylo(root.time=2000, backward=F) #Add time axis

nodelabels(pie=fit1$lik.anc,cex=0.2,piecol=c("Red","Blue"))
tips = to.matrix(trait,seq=c("Norway", "RoW"))
tiplabels(pie=tips, cex=0.2,piecol=c("Red","Blue"))

The pie charts on the nodes of the phylogeny represent the probability of the node being in the different geographical locations, where red represents Norway, and blue represents RoW. LineageHomology uses these probabilities to count connected groups of tips, where all the nodes on the paths between them have a probability that > 50% for the same geographical state. LineageHomology also returns the number of tips that are not connected to any other tips in this way. Following du Plessis et al. (2021) (DOI: 10.1126/science.abf2946) we refer to the groups as transmission lineages (TLs) and isolated tips as singletons.

Running LineageHomology

Next we run LineageHomology. The inputs data should take this format.

tree_test
head(fit1$lik.anc)
head( to.matrix(trait, seq=c("Norway", "RoW")))

Note that the node probabilities of the tips must have row names that match the tips of the phylogeny.

Result = LineageHomology(tree_test, ace_nodes=fit1$lik.anc,
                        ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time=2000)

LineageHomology returns multiple values

names(Result)
Result$Import_LocalTrans
Result$lineage_state
Result$Lineage_sizes
head(Result$Taxa_names)
Result$`MRCA's`
Result$lineage_state

Make a treemap plot to get a quick overview of the lineages

The simplest method to visualize the results from LineageHomology is to use a treemap plot. Treemap_lineagehomology uses the treemap package to visualize transmission lineages as rectangles with areas that reflect their relative sizes. Additionally, the text on each square gives the name of group number of the transmission lineage, the estimated time of the mrca (TMRCA) of the transmission lineage and the number of tips in the group (S)

LineageHomology::treemap_lineagehomology(Result)

Plot lineage densities over time

The accumulation of tips in a lineage over time can be visualized by using density plots. Before using the plotting methods below, we need to run the processing function lineage_info. lineage_info takes the results from LineageHomology and names and dates observed dates of the tips in the tree. The formats of these should match the ones shown below.

#Set up matrix with taxa info
name_date = data.frame(name = names(trait), dates= BactDating::leafDates(tree_test))
head(name_date)
Result_lineage_info = LineageHomology::lineage_info(Result,name_date)

Ridgeplot_lineagedensities takes the input from the function lineage_info and produces density plots using for each TL over time. The density plots are made using the ggridges R-package. The TLs are sorted from largest to smallest and can be coloured according to the state that defines the TL.

LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 1,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = F)

We can colour the groups by the state they are in, and restrict the plotted groups to e.g. sizes larger e.g.than 1,4, and 10:

LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 1,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T)

LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 4,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T)

LineageHomology::ridgeplot_lineagedensities(Result_lineage_info=Result_lineage_info,groups_larger_than = 10,datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T)

Plot cumulative lineage size over time.

To visualize growth patterns over time, LineageHomology provides a function for plotting the cumulative number of observations in each TL separately.

LineageHomology::lineage_growth_cumulative(Result_lineage_info = Result_lineage_info, datelims=c("2000-01-01","2025-01-01","3 year"))
Again, we can add colour to the groups by specifying it.
LineageHomology::lineage_growth_cumulative(Result_lineage_info = Result_lineage_info, datelims=c("2000-01-01","2025-01-01","3 year"),color_by_state = T)

Probabilistic counting of transmission lineages

In all the figures above, the transmission lineage division was done by dividing lineages at points where a transition is estimated with more than 50 percent probability. If one wishes to divide the transmission lineages by sampling from the observed probabilities, one can use the function LineageHomology_w_uncertainty (or v2, which is faster). This retains the uncertainty in the phylogeographical mapping. Here we visualize the difference in the assignment of TLs in two runs on the same mapping by using treemap plots.

Result1 = LineageHomology_w_uncertainty_v2(tree_test, ace_nodes=fit1$lik.anc,
                        ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time=2000)
LineageHomology::treemap_lineagehomology(Result1)
Result2 =  LineageHomology_w_uncertainty_v2(tree_test, ace_nodes=fit1$lik.anc,
                        ace_tips = to.matrix(trait, seq=c("Norway", "RoW")), start_time=2000)
LineageHomology::treemap_lineagehomology(Result2)

As these figures show, the division of lineages will vary according to the sampled states of some of the nodes. This can be useful for producing confidence limits on, e.g. the size of the largest lineage.

Estimate size of largest transmission lineage

Here we use pbreplicate to counts transmission lineages probabilistically many times to obtain confidence intervals, or, in this case, a distribution) for the size of the largest transmission lineage.

library(pbapply)
multi_counts = pbreplicate(
  1000,
  LineageHomology::LineageHomology_w_uncertainty_v2(
    tree=tree_test,
    ace_nodes=fit1$lik.anc,
    ace_tips = to.matrix(trait, seq=c("Norway", "RoW")),
    start_time = 2000
  )
)

largest_lineage = c()
for(i in 1:1000) {
  largest_lineage = c(largest_lineage, max(multi_counts[,i]$Lineage_sizes))
}

hist(largest_lineage, breaks=100, xlim=c(0,40), probability = F, main="Size of largest transmission lineage",xlab="Size", ylab="Probability")

As shown above, in the given mapping, we are confident that the largest transmission lineage contains no more than 33 tips.

Estimating importation and local transmission

LineageHomology can also be used to estimate importation and local transmission over time from the phylogeographic mapping. Here we focus on Norway and run LineageHomology on the observed tips in Norway before estimating the contribution of importation and local transmission events over time in Norway.

library(pbapply)

Norwegian_tips = names(trait[trait=="Norway"])
head(Norwegian_tips)

multi_counts = pbreplicate(
  1000,
  LineageHomology::LineageHomology_w_uncertainty_v2(
    tree=tree_test,
    give_tips = Norwegian_tips,
    ace_nodes=fit1$lik.anc,
    ace_tips = to.matrix(trait, seq=c("Norway", "RoW")),
    start_time = 2000
  )
)


 #Count number of local transmission and importations in half-year (0.5) intervals.
result_import_local_transmission = import_local_transmission(tree = tree_test,LineageHomology_replicates = multi_counts, start_time = 2000,time_interval_size = 0.5)

We can summarise the number of tips estimated to be a result of importation events and local transmission events.

Summarize_import_local_transmission(LineageHomology_replicates = multi_counts)

Thus in the phylogeographic mapping above, there seem to be slightly more importation events in Norway (58%) than local transmission events. Lastly, we can plot the relative contribution of importation and local transmission in time intervals spanning half a year.

plot_importation_local_transmission(tree = tree_test, result_import_local_transmission = result_import_local_transmission,start_time=2000, time_interval_size = 0.5, date_breaks = "2 years",importation_or_local = "both") #Use importation_or_local = "importation" or "local" if you do not have the grid package installed. 

Questions regarding the package can be directed to magnusnosnes@gmail.com



magnusnosnes/GeoLineages documentation built on April 17, 2025, 6:29 p.m.