SuperTree | R Documentation |
Given a set of unrooted gene trees, creates a species tree. This function works for rooted gene trees, but may not accurately root the resulting tree.
SuperTree(myDendList, NAMEFUN=NULL, Verbose=TRUE, Processors=1)
myDendList |
List of |
NAMEFUN |
Optional input specifying a function to apply to each leaf to convert gene tree leaf
labels into species names. This function should take as input a character vector
and return a character vector of the same size. By default equals |
Verbose |
Should output be displayed? |
Processors |
Number of processors to use for calculating the final species tree. |
This implementation follows the ASTRID algorithm for estimating a species tree from a set of unrooted gene trees. Input gene trees are not required to have identical species sets, as the algorithm can handle missing entries in gene trees. The algorithm essentially works by averaging the Cophenetic distance matrices of all gene trees, then constructing a neighbor-joining tree from the resulting distance matrix. See the original paper linked in the references section for more information.
If two species never appear together in a gene tree, their distance cannot be estimated in the algorithm and will thus be missing. SuperTree
handles this by imputing the value using the distances available with data-interpolating empirical orthogonal functions (DINEOF). This approach has relatively high accuracy even up to high levels of missingness. Eigenvector calculation speed is improved using a Lanczos algorithm for matrix compression.
SuperTree
allows an optional argument called NAMEFUN
to apply a
renaming step to leaf labels. Gene trees as constructed by other functions in
SynExtend
(ex. DisjointSet
) often include other information
aside from species name when labeling genes, but SuperTree
requires that
leaf nodes of the gene tree are labeled with just an identifier corresponding to
which species/genome each leaf is from. Duplicate values are allowed. See the examples
section for more details on what this looks like and how to handle it.
A dendrogram
object corresponding to the species tree constructed
from input gene trees.
Aidan Lakshman ahl27@pitt.edu
Vachaspati, P., Warnow, T. ASTRID: Accurate Species TRees from Internode Distances. BMC Genomics, 2015. 16 (Suppl 10): S3.
Taylor, M.H., Losch, M., Wenzel, M. and Schröter, J. On the sensitivity of field reconstruction and prediction using empirical orthogonal functions derived from gappy data. Journal of Climate, 2013. 26(22): 9194-9205.
TreeLine
, SuperTreeEx
# Loads a list of dendrograms
# each is a gene tree from Streptomyces genomes
data("SuperTreeEx", package="SynExtend")
# Notice that the labels of the tree are in #_#_# format
# See the man page for SuperTreeEx for more info
labs <- labels(exData[[1]])
if(interactive()) print(labs)
# The first number corresponds to the species,
# so we need to trim the rest in each leaf label
namefun <- function(x) gsub("([0-9A-Za-z]*)_.*", "\\1", x)
namefun(labs) # trims to just first number
# This function replaces gene identifiers with species identifiers
# we pass it to NAMEFUN
# Note NAMEFUN should take in a character vector and return a character vector
tree <- SuperTree(exData, NAMEFUN=namefun)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.