#build your vignette with the devtools::build_vignettes() knitr::opts_chunk$set(echo = TRUE, fig.width = 6, warning = FALSE, message=FALSE)
These notes should enable the user to gain an understanding of how their alignment data support different phylogenetic trees, and how to make easy comparisons of clades present in different trees, using the Rboretum package. This document illustrates features included in Rboretum, including visualizing trees, extracting the list of clades found in each tree, and comparing these clades among trees.
The first thing we will do (after loading the library) is read in a tree.
We assume trees are in Newick format.
We use the function readRooted
to read the tree and automatically root it.
Trees can be rooted with any number of taxa in a clade by providing them in a list.
library(Rboretum) Gene1_file <- system.file("extdata", "Gene_1.nwk", package = "Rboretum") geneTree1 <- readRooted(Gene1_file,root_taxa = c('Species_C','Species_H'))
Note that here we are using the example tree included with the package, but you can simply specify the path to your tree file.
We can plot the tree using the treePlotter
function.
The xmax
argument allows us to adjust the width of the tree so that species names can be viewed in their entirety.
You may need to adjust this parameter depending on your species names.
Use larger numbers for a narrower tree and longer names.
Node support is also shown.
treePlotter(geneTree1,xmax = 8)
You can include branch lengths on the tree using the branch_length
argument.
Here the support is also shown more clearly by placing the node labels in a box with the node_label_box
argument.
treePlotter(geneTree1,branch_length = TRUE,xmax=3,node_label_box = TRUE)
You can also label the nodes numerically by setting the node_label
argument.
treePlotter(geneTree1,xmax = 8,node_label = "node",node_label_box = TRUE)
Taxa can be highlighted on the tree using the to_color
argument
and specifying the taxa to highlight as a list.
Red is the default highlight color;
alternatives can be specified with the colors
argument.
treePlotter(geneTree1,xmax = 8,to_color = c('Species_A','Species_B'),colors = "blue")
We can also highlight groups of taxa by creating a list of named vectors.
This list is then passed to the to_color
argument.
cladesOfInterest <- list('Group 1' = c('Species_A','Species_F'), 'Group 2'=c('Species_B','Species_O')) treePlotter(geneTree1,xmax = 8,to_color = cladesOfInterest)
We provide a legend for this plot and add color to branches using the
highlight_legend
and color_branches
arguments, respectively.
treePlotter(geneTree1,xmax = 8,to_color = cladesOfInterest,highlight_legend = TRUE,color_branches = TRUE)
You can trim a tree to a set of taxa
either by specifying ta list of taxa to include
or a list of taxa to remove (by using the remove
argument).
trimmedTree <- treeTrimmer(geneTree1,taxa = c('Species_A','Species_B','Species_C')) #treePlotter(trimmedTree, xmax = 5) noSpeciesETree <- treeTrimmer(geneTree1,taxa = 'Species_E',remove = TRUE) # Remove taxa rather than retain treePlotter(noSpeciesETree, xmax = 8)
Check if tips are in a tree
checkTips(trimmedTree,'Species_A') checkTips(trimmedTree,'Species_F')
Frequently, labels in a dataset (that are passed on to the tree) are not the labels that you would like to have in a publications. To convert the dataset labels to more informative labels set up a file (e.g. csv or tsv) containing the original names and the preferred names.
name_file <- system.file("extdata", 'Name_Conversion_Table.tsv', package = "Rboretum") #specify the path to your file name_df <- read_tsv(name_file) head(name_df)
Now you can convert the labels so that your tree includes the correct species information.
renamed_tree <- convertLabels(geneTree1,name_df) treePlotter(renamed_tree,xmax = 10,node_label_nudge = 0.2)
You can get a written list of which clades are present in the tree using the getTreeClades
function.
These will match what you see in the phylogenies above.
geneTree1_clades <- getTreeClades(geneTree1) geneTree1_clades
When thinking about a tree as unrooted, the equivalent to a clade can been seen as a spilt.
The getTreeSplits
function provides a list of all groups of species given the division of
the tree at any branch.
getTreeSplits(geneTree1)
readRooted
can also read muliple trees together as a multiPhylo object.
You can specify a list of files.
file_names <- c('Gene_1.nwk','Gene_2.nwk','Gene_3.nwk','Gene_4.nwk','Gene_5.nwk') tree_paths <- paste( system.file("extdata", file_names, package = "Rboretum"), #replace the system file function with the path to your data sep = ",") allTrees <- readRooted(to_root = tree_paths,root_taxa = c('Species_C','Species_H'))
Alternatively you can read all the files of a particular suffix from a particular folder.
In this case includes the tree_names
argument to name the trees.
data_dir <- system.file("extdata", package = "Rboretum") #specify your folder path here allTrees <- readRooted(to_root = data_dir,suffix = 'nwk',root_taxa = c('Species_C','Species_H'), tree_names = c('Gene_1','Gene_2','Gene_3','Gene_4','Gene_5'))
Summarizing this multiPhlyo object provides information on shared tips the number of clades found in one tree only the number of shared clades and the species in those clades (i.e. the clades you are likely certain of given different results from different datasets) the number of unique topologies (i.e. a result found for only one dataset) * a list of which datasets produced which topolgies
summarizeMultiPhylo(allTrees)
We then obtain the list of all clades identified across all trees, and which datasets supported each clade. This knowledge can help us identify strongly supported, less controversial clades.
getTreeClades(allTrees,print_counts = TRUE)
You can also check whether species are present in all of your trees. Additionally you can check whether these species form a monophyletic clade.
checkTips(allTrees,c('Species_A','Species_F')) # Check all trees in a multiPhylo checkTips(allTrees,c('Species_A','Species_F'),check_mono = TRUE) # Also check if species are monophyletic checkTips(allTrees,c('Species_A','Species_F'),check_mono = TRUE,check_root = TRUE) # Check if species are monophyletic and root checkTips(allTrees,c('Species_C','Species_H'),check_mono = TRUE,check_root = TRUE) # Check if species are monophyletic and root
In order to use alignment features, which are written in python, source the scripts.
source_python(system.file("", "Split_Processor.py", package = "Rboretum")) #run this as shown
Here we read in mulitple sequence alignments to parse signal.
We specify the directory containing alignments and include all alignments with a particular suffix.
The species_info argument
specifies the trees (read above) for which we want to process the signal.
You can process site patterns with gaps as missing data (FALSE) or treated as informative indels (TRUE).
alignmentSignal <- getAlignmentSignal(data_dir,suffix = 'phylip',use_gaps = FALSE,species_info = allTrees)
Get support for clades in allTrees.
getTreeSupport
generates a data frame that shows how many sites support each clade
in each dataset.
getTreeClades
generates a data frame that shows for each clade the trees
that it is found in.
treeSupport <- getTreeSupport(alignmentSignal,allTrees, dataset_name = c('Gene_1','Gene_2','Gene_3','Gene_4','Gene_5')) head(treeSupport)
cladeSupport <- getTreeClades(allTrees,return_counts = TRUE) head(cladeSupport)
Tree support information can be plotted as pies at the nodes. Each pie shows the proportion of support at that node coming from each dataset. The size of the pies are proportional to the amount of overall support.
treePlotter(geneTree1,tree_support = treeSupport,geom_size = c(.2,.6),xmax=9, node_label_nudge = 0.5,geom_alpha=0.8,node_label_fontface = 'bold', use_pies = TRUE,pie_legend_position = c(8,8,8,8))
treePlotter(allTrees,clade_support = cladeSupport,tree_support = treeSupport, geom_size = c(3,10),node_label = 'support',xmax=6, node_label_nudge = 0.5,geom_alpha=0.5,node_label_fontface = 'bold') %>% tandemPlotter()
#treePlotter(allTrees,tree_support = treeSupport,geom_size = c(1,3),node_label = 'clade',xmax=8,node_label_nudge = 0.5,geom_alpha=0.5,node_label_fontface = 'bold',use_pies = TRUE,pie_legend_position = c(-1,-1,-1,-1)) %>% tandemPlotter()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.