knitr::opts_knit$set(global.par = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "500px", dpi = 150, fig.align = "center", fig.width=4, fig.height=4, out.width=400 )
The ggtree package by Guangchuang Yu is an excellent resource for drawing information-rich visualizations of phylogenetic trees.
(See https://guangchuangyu.github.io/software/ggtree/gallery/ for a gallery of figures generated using ggtree
.)
ggtree
employs the same grammar as ggplot
, so writing code looks quite similar:
# load the package without all the preamble suppressMessages(require(ggtree)) # generate a random tree set.seed(1999); phy <- rtree(50) # draw the tree gg-style! ggtree(phy) + theme_tree2() + geom_tiplab(size=3)
Since I am accustomed to base R, however, I find this syntax unfamiliar and frustrating, and was motivated to start implementing similar plot functions in ggfree
.
Of course there are several other packages in R that provide plotting functions for phylogenetic tree objects.
For example, the ape package — which defines the phylo
S3 object class that many packages for phylogenetics will accept as inputs — adapts the generic S3 plot
function for trees (called plot.phylo
).
However, it is not straight-forward to annotate the resulting plot with additional information, such as highlighting a subset of internal nodes.
This is because there are two things happening when we call plot.phylo
:
the method maps the nodes and edges (branches) of the tree to the x
and y
coordinates according to one of several layout algorithms, and;
the method draws the tree on these coordinates in the active plot device.
Consequently, the x
and y
coordinates are not readily available for the user to add additional information to the plot.
(I'll demonstrate why it's nice to have access to these data in subsequent sections.)
par(family="Helvetica")
ggfree
separates these tasks into different functions.
Generating the coordinates for a given layout algorithm is handled by the function tree.layout
, which returns an S3 object that I call phyloLayout
.
require(ggfree) L <- tree.layout(phy) plot(L) axis(side=1) # add a horizontal axis like ggtree example
There are clearly some differences here.
For instance, the order of tips is different because ggtree
has automatically ladderized the phylogeny by default.
We can amend this using the ape:ladderize
function:
phy2 <- ladderize(phy, right=F) plot(tree.layout(phy2), col='black', lwd=1, cex=0.75, mar=c(2,0,0,0)) axis(side=1, cex.axis=0.75, mgp=c(3,0.5,0))
Note that I've also added some other basic R graphical parameters to make the plot more similar to the ggtree
example.
A lot of this code is tucked into the ggtree
function
phyloLayout
objectsLet's have a look at what's getting returned when we call tree.layout
:
L <- tree.layout(phy2) L # calls print.phyloLayout function
Here we can see that a phyloLayout
object contains two data frames for the nodes and edges of the tree, respectively.
(Only the first six rows of each are being displayed because this print
function is calling head
for each data frame.)
The nodes
data frame contains x and y coordinates, and the edges
data frame contains two sets of x and y coordinates.
In addition, any node- or edge-level attributes that you happened to attach to the original phylo
object are automatically carried over to these data frames.
These are the main outputs of the layout algorithm.
ggfree
provides three popular layout algorithms for phylogenetic trees.
We specify which layout algorithm we want to use by passing a type
argument to the function tree.layout
:
A slanted tree uses the same coordinate information as a rectangular layout, except that lineages are connected directly to their ancestral nodes without vertical line segments:
r
plot(tree.layout(phy, type='s', unscaled=TRUE), cex=0.5, mar=c(0,0,0,3))
Circular (radial) trees. This is essentially the rectangular layout algorithm in a polar coordinate system. Time flows outward from the origin (centre) of the plot.
r
plot(tree.layout(phy, type='o'))
Unrooted trees. Like many other R packages that provide plot functions for trees, ggtree
implements Joe Felsenstein's equal-angle layout algorithm for unrooted trees.
Note that I've turned off tip labels because they become cluttered in this layout:
r
plot(tree.layout(phy, type='u'), label='n', mar=c(0,0,0,0))
ggtree
To demonstrate how we can reproduce plots generated by ggtree
with base R graphics, I've taken the example data sets from the ggtree
distribution (https://github.com/YuLab-SMU/ggtree/blob/master/inst/examples/) and reprocessed them using base R, with the exception of read.nexus
from the ape
package:
# read maximum clade credibility tree flu <- read.nexus('MCC_FluA_H3.tree') # extract dN/dS ratios from text file mlc <- readLines('mlc.txt') nwk <- mlc[ which(grepl("w ratios", mlc)) + 1 ] nwk <- gsub("#", ":", nwk) nwk <- gsub(" ", "", nwk) dnds <- read.tree(text=nwk) # annotate flu tree with dN/dS outcomes flu <- ladderize(flu, right=F) dnds <- ladderize(dnds, right=F) flu$dnds <- dnds$edge.length # export the result for loading with `ggfree` save(flu, file="flu.RData")
Here is the example code from ggtree
to draw the phylogeny annotated with host species and genotype (with a heatmap):
cols <- scale_color(merged_tree, "dN_vs_dS", low="#0072B2", high="#D55E00", interval=seq(0, 1.5, length.out=100)) p <- ggtree(merged_tree, size=.8, mrsd="2013-01-01", ndigits = 2, color=cols) p <- add_colorbar(p, cols, font.size=3) ## add annotation of clade posterior and colored in darkgreen p <- p + geom_text(aes(label=posterior), color="darkgreen", vjust=-.1, hjust=-.03, size=1.8) ## add annotation of amino acid substitution inferred by joint probabilities p <- p + geom_text(aes(x=branch, label=joint_AA_subs), vjust=-.03, size=1.8) ## get the tip labels tip <- get.tree(merged_tree)$tip.label ## create a new data.frame containing information of host species. host <- rep("Human", length(tip)) host[grep("Swine", tip)] <- "Swine" host.df <- data.frame(taxa=tip, host=factor(host)) ## use %<+% operator to attach the information to the tree view p <- p %<+% host.df ## add circles in tips and tip labels align to the right hand side. ## after the attachment via %<+% operator, ## we can use 'host' information to color circles and labels of tips. p <- p + geom_tippoint(aes(color=host), size=2) + geom_tiplab(aes(color=host), align=TRUE, size=3, linesize=.3) ## rescale the color to blue (for human) and red (for swine) p <- p + scale_color_manual(values = c("#377EB8", "#E41A1C"), guide='none') ## setting the coordinate system p <- p + theme_tree2() + scale_y_continuous(expand=c(0, 0.6)) + xlab("Time") ## setting size of axis text and title. p <- p + theme(axis.text.x=element_text(size=10), axis.title.x = element_text(size=12)) p <- gheatmap(p, genotype, width=.4, offset=7, colnames=F) %>% scale_x_ggtree p <- p + scale_fill_brewer(palette="Set2") p <- p+theme(legend.text=element_text(size=8), legend.key.height=unit(.5, "cm"), legend.key.width=unit(.4, "cm"), legend.position=c(.13, y=.945))
Note that I've omitted some of the code for loading and reformatting the data for brevity and for a more equitable comparison to ggfree
syntax.
# now let's try the same thing with ggfree L <- tree.layout(flu, 'r') plot(L, cex=0.6, type='n', mar=c(3,1,0,20), label='n') # map dN/dS values to colours pal <- colorRampPalette(c('#0072B2', '#D55E00'))(20) breaks <- c(seq(0, 1.5, length.out=19), 1000) col <- pal[as.integer(cut(L$edges$dnds, breaks=breaks))] # draw labels host <- ifelse(grepl('Swine', L$nodes$label), '#E41A1C', '#377EB8') text(L, align=TRUE, cex=0.75, col=host) # draw tree lines(L, col=col, lwd=3) axis(side=1, at=seq(0, 20, 5), labels=seq(1990, 2010, 5), line=-2) # draw points on tips points(L, pch=20, col=host, cex=ifelse(L$nodes$n.tips==0, 1.5, 0)) # map colors from edges to nodes index <- L$edges$child[L$edges$isTip] draw.guidelines(L, col=host[index]) # load genotype data (example from ggtree) path <- system.file("extdata/Genotype.txt", package="ggfree") geno <- read.table(path, header=T, sep='\t', na.strings='') geno <- geno[match(flu$tip.label, row.names(geno)), ] # draw boxes require(RColorBrewer) col <- brewer.pal(3, 'Set2') image(L, geno, xlim=c(30, 37), col=col, cex.axis=0.75, line=-2)
This end result is pretty similar to the ggtree
example (Figure 5 from this paper), except that we're not embedding small images and not annotating the internal nodes with amino acid substitutions (which I find rather messy and uninformative!).
The ape
package has several examples of phylogenetic trees that relates birds at different taxonomic levels.
I'm going to use one that relates birds at the level of families.
# load the example tree data("bird.families") plot(tree.layout(bird.families, type='o'), cex.lab=0.8)
I'd like to annotate this tree with the number of named species per family. Since these data are not provided with this tree object, I had to search around for some supplementary data. I settled on the dataset published by the non-governmental conservation organization BirdLife International, where I retrieved the ZIP archive including an Excel spreadsheet that I converted to a CSV and processed in R. The resulting data frame covers substantially more bird families than our phylogeny:
path <- system.file("extdata/birdlife.csv", package='ggfree') birds <- read.csv(path, row.names=1) head(birds) nrow(birds)
Let's try to match up family names:
index <- match(bird.families$tip.label, birds$Family) # some family names failed to match bird.families$tip.label[is.na(index)] # There's only 13 of these missing, so I'm willing to manually restore them. # I *think* Dacelonidae is the tree kingfishers subfamily... missing <- data.frame( Family=c("Dendrocygnidae", "Bucorvidae", "Rhinopomastidae", "Dacelonidae", "Cerylidae", "Centropidae", "Coccyzidae", "Crotophagidae", "Neomorphidae", "Batrachostomidae", "Eurostopodidae", "Chionididae", "Eopsaltriidae"), Count=c(8, 2, 3, 70, 9, 10, 13, 4, 6, 5, 3, 2, 44) ) birds <- rbind(birds, missing) index <- match(bird.families$tip.label, birds$Family)
Now we're ready to use this information to annotate the tree:
# the count distribution is more even if we do a log-transform bins <- as.integer(cut(log(birds$Count[index]), breaks=8)) require(RColorBrewer) pal <- brewer.pal(9, 'Blues')[2:9] L <- tree.layout(bird.families, type='o') plot(L, cex.lab=0.7, offset=2, mar=rep(5,4), col='chocolate') image(L, z=as.matrix(bins), xlim=c(28.5,30), col=pal)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.