knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
ClustPhy
is an R package for clustering phylogenetic trees (using PAM or EM
clustering), comparing different clusterings (using gap statistics), and
visualizing the clusters (in a phylogenetic tree or in a 2D biplot). This
document gives a tour of ClustPhy package.
To download ClustPhy, use the following commands:
require("devtools") install_github("rainali475/ClustPhy", build_vignettes = TRUE) library("ClustPhy")
To list all sample functions available in the package:
ls("package:ClustPhy")
To list all sample datasets available in the package:
data(package = "ClustPhy")
There are 6 functions available in this package. There are 2 clustering functions: clustPAM and clustEM. They allow users to input phylogenetic trees in newick format either as a character string or a file path and performs clustering via either PAM (k-medoids) or EM (expectation maximization) algorithms. Users can specify the number of clusters they want. The functions plotClustersTree and plotClusters2D can be used to visualize tree clusters on a phylogram or a 2D biplot, respectively. Users can specify whether or not to show a number of designated cluster centers, the symbols used to represent these centers, and the text size for these symbols. plotClusters2D first converts the distance matrix of the tree to a coordinate matrix, then uses principle component analysis to reduce dimensionality of the matrix to plot data points on a 2-dimensional plot. The compareGap function takes as input a distance matrix representation of phylogenetic tree and outputs a set of gap statistics for a range from 1 cluster to k.max clusters. This can be used to select the best clustering scheme for the target tree. The plotGapStat function takes the gap statistics output from compareGap and produces a plot of the gap statistics with a vertical dashed line representing the best number of clusters.
Here is an example that shows how to use ClustPhy to cluster a tree via EM and PAM:
> pam <- clustPAM(6, text = NwkTree2) > em <- clustEM(6, text = NwkTree2) > str(pam) List of 5 $ distM : num [1:72, 1:72] 0 138 169 164 208 195 173 195 190 196 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... .. ..$ : chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... $ phyloTree :List of 4 ..$ edge : int [1:141, 1:2] 73 73 74 75 76 76 75 77 77 78 ... ..$ edge.length: num [1:141] 55 55 15 1 12 43 10 29 55 18 ... ..$ Nnode : int 70 ..$ tip.label : chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... ..- attr(*, "class")= chr "phylo" ..- attr(*, "order")= chr "cladewise" $ clustering: Named int [1:72] 1 1 1 1 1 1 2 2 2 2 ... ..- attr(*, "names")= chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... $ medoids : chr [1:6] "Orycteropus" "Manis" "Presbytis" "Mus" ... $ stats : num [1:6, 1:5] 6 17 21 6 9 13 138 174 102 93 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:5] "size" "max_diss" "av_diss" "diameter" ... - attr(*, "class")= chr "PAMclusts" > em <- clustEM(6, text = NwkTree2) > str(em) List of 6 $ distM : num [1:72, 1:72] 0 138 169 164 208 195 173 195 190 196 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... .. ..$ : chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... $ phyloTree :List of 4 ..$ edge : int [1:141, 1:2] 73 73 74 75 76 76 75 77 77 78 ... ..$ edge.length: num [1:141] 55 55 15 1 12 43 10 29 55 18 ... ..$ Nnode : int 70 ..$ tip.label : chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... ..- attr(*, "class")= chr "phylo" ..- attr(*, "order")= chr "cladewise" $ clustering: Named num [1:72] 1 2 1 1 1 1 2 2 2 2 ... ..- attr(*, "names")= chr [1:72] "Edentata" "Orycteropus" "Trichechus" "Procavia" ... $ mean : num [1:72, 1:6] 147.2 84.8 98.6 87.6 92 ... $ bic : num -47979 $ model : chr "spherical, unequal volume" - attr(*, "class")= chr "EMclusts"
Then, user can use the plotClustersTree function to plot phylograms of both clustering schemes:
### plot the pam clusters plotClustersTree(pam$phyloTree, pam$clustering, show.centers = pam$medoids, center.symbol = pam$medoids) ### plot the em clusters plotClustersTree(em$phyloTree, em$clustering)
The PAM clusters phylogram:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.