Description Usage Arguments Value Author(s) References Examples
A function that calculates optimal tree clustering scheme using EM method given the number of clusters, file directory for a phylogenetic tree or a newick tree character string, and an optional set of labeled leaves for semi-supervised clustering.
1 |
k |
A positive integer greater than 1 indicating the number of clusters. |
file |
A character string indicating the path to the newick tree. Default value is "". |
text |
A character string of the tree in newick format. Default value is NULL. By default, this is ignored. When this argument is assigned a value, the argument file is ignored. |
maxDim |
A positive integer that is at least 2, indicating the max dimension of the coordinates that represent tree leaves. The dimensions will be less than or equal to the number of leaves. The default value of maxDim is NULL, for which the full dimension will be used. maxDim can be set smaller to decrease runtime of PCA at the cost of discarding the least significant dimensions beyond maxDim. |
maxPC |
A positive integer that is at least 2 indicating the maximum number of dimensions of the reduced coordinates of the tree leaves after PCA used towards EM clustering. The dimensions of the reduced coordinates will never exceed the number of leaves in the tree when maxDim is NULL and will be at most maxDim if maxDim is given. Default value of maxPC is 5, as usually most of the variance in the data can be explained by the top 5 principle components. Including too many dimensions can lead to sparse datapoints and prevent effective clustering. |
Returns an S3 object of class EMclusts with results.
distM - A distance matrix of tree leaves.
phyloTree - An S3 object of class phylo, containing tree information.
clustering - A positive integer vector indicating the clusters assigned to each leaf.
mean - A numeric matrix with columns corresponding to mean coordinates of cluster centers.
bic - A numeric indicating BIC value of the optimal model.
model - A character string describing the optimal model used.
dimredResult - An S3 object of class treeDimred with the coordinate matrix and PCA results.
Yuzi Li, rainal.li@mail.utoronto.ca
Kaufman, L., & Rousseeuw, P. J. (2005). Finding groups in data: An introduction to cluster analysis. Wiley.
Paradis E, Schliep K (2019). “ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R.” Bioinformatics, 35, 526-528.
Scrucca L, Fop M, Murphy TB, Raftery AE (2016). “mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.” The R Journal, 8(1), 289–317. https://doi.org/10.32614/RJ-2016-021.
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.