knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 11, warning = FALSE, message = FALSE )
This example shows the function trestruct applied to a simulated structured
coalescent tree that includes samples from a large constant size population and
samples from three small 'outbreaks' which are growing exponentially.
These simulations were generated with the phydynR package.
library(ape) library(treestructure) library(ggtree)
Load the tree:
( tree <- ape::read.tree( system.file('sim.nwk', package = 'treestructure') ) )
Note that the tip labels corresponds to the deme of each sample. '1' is the constant size reservoir, and '0' is the exponentially growing deme.
This will run the treestructure algorithm under default setting:
s <- trestruct( tree )
You can print the results:
print(s)
The default plotting behavior uses the ggtree package if available.
plot(s) + ggtree::geom_tiplab()
If not, or if desired, ape plots are available
plot( s, use_ggtree = FALSE )
For subsequent analysis, you may want to turn the treestructure result into a
dataframe:
structureData <- as.data.frame( s ) head( structureData )
Each cluster and partition assignment is stored as a factor. You could use split
to get a data frame for each partition.
Suppose we want a tree corresponding to partition 1:
with ( structureData, ape::keep.tip(s$tree, taxon[ partition==1 ] ) ) -> partition1 partition1 plot(partition1)
Two parameters will have large influence on results:
level is the significance level for subdividing a clade into a new cluster.
To detect more clusters, increase p, but note that this will also increase the
false positive rate. minCladeSize controls the smallest allowed cluster size in terms of the
number of tips. With a smaller value, smaller clusters may be detected, but
computation time will increase. Example:
trestruct( tree, level = .05, minCladeSize = 5 )
In practice, clustering thresholds are always subjective and the best value of
the level parameter will depend on your application.
One way to choose an appropriate level would be to use additional data associated
with each sample. You can select the level which gives a set of clusters that
explains the most variance in the data of interest (e.g. use the cluster as a
factor in an ANOVA).
Alternatively, in the absence of any additional data, the treestructure package
supports using the CH index
to compare different levels. This statistic is based on the ratio of the
between-cluster and within-cluster variance of the time of each node
(distance from the root) and returns the level such that this ratio is maximized.
If you wish to use the CH index, pass level = NULL to trestruct, and
read documentation for the levellb, levelub, and res parameters.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.