EvoWeaver-PSPreds: Phylogenetic Structure Predictions for EvoWeaver

EvoWeaver-PSPredsR Documentation

Phylogenetic Structure Predictions for EvoWeaver

Description

EvoWeaver incorporates four classes of prediction, each with multiple methods and algorithms. Phylogenetic Structure (PS) methods examine conservation of overall evolutionary rates within orthology groups using distance matrices constructed from each gene tree.

predict.EvoWeaver currently supports three PS methods:

  • 'MirrorTree'

  • 'ContextTree'

  • 'TreeDistance'

Details

All distance matrix methods require a EvoWeaver object initialized with dendrogram objects. See EvoWeaver for more information on input data types.

The MirrorTree method was introduced by Pazos et al. (2001). This method builds distance matrices using a nucleotide substitution model, and then calculates coevolution between gene families using the Pearson correlation coefficient of the upper triangle of the two corresponding matrices.

Experimental analysis has shown data in the upper triangle is heavily redundant and rapidly overwhelms available system memory. Previous work has incorporated dimensionality reduction such as SVD to reduce the dimensionality of the data, but this prevents parallelization of the data and doesn't solve memory issues (since SVD takes as input the entire matrix with columns corresponding to upper triangle values). EvoWeaver instead uses a seeded random projection following Achlioptas (2001) to reduce the dimensionality of the data in a reproducible and parallel-compatible way. We also utilize Spearman's \rho, which outperforms Pearson's r following dimensionality reduction.

Subsequent work by Pazos et al. (2005) and Sato et al. (2005, 2006) found multiple ways to improve predictions from the initial MirrorTree method. These methods incorporate additional phylogenetic context, and are thus called ContextTree methods. These improvements include correcting for overall evolutionary rate using a species tree and/or using projection vectors. The built-in ContextTree method implements a species tree correction, and weights the resulting score by the normalized Hamming distance of the presence/absence profiles. This can correct for gene trees with low overlap that achieve spuriously high scores via random projection. Additional correction measures are implemented in the MTCorrection argument.

The TreeDistance method uses phylogenetic tree distance to quantify differences between gene trees. This method implements a number of metrics and groups them together to improve overall runtime. The default tree distance method is normalized Robinson-Foulds distance due to its lower computational complexity. Other methods can be specified using the TreeMethods argument, which expects a character vector containing one or more of the following:

  • "CI": Clustering Information Distance

  • "RF": Robinson-Foulds Distance

  • "JRF": Jaccard-Robinson-Foulds Distance

  • "Nye": Nye Similarity

  • "KF": Kuhner-Felsenstein Distance

  • "all": All of the above methods

See the links above for more information and references. All of these metrics are accessible using the PhyloDistance method. Method "JRF" defaults to a k value of 4, but this can be specified further if necessary using the JRFk input parameter. Higher values of k approach the value of Robinson-Foulds distance, but these have a negligible impact on performance so use of the default parameter is encouraged for simplicity. Multiple metrics can be specified.

Value

None.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Achlioptas, Dimitris. Database-friendly random projections. Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2001. p. 274-281.

Pazos, F. and A. Valencia, Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Engineering, Design and Selection, 2001. 14(9): p. 609-614.

Pazos, F., et al., Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J Mol Biol, 2005. 352(4): p. 1002-15.

Sato, T., et al., The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics, 2005. 21(17): p. 3482-9.

Sato, T., et al., Partial correlation coefficient between distance matrices as a new indicator of protein-protein interactions. Bioinformatics, 2006. 22(20): p. 2488-92.

See Also

EvoWeaver

predict.EvoWeaver

EvoWeaver Phylogenetic Profiling Predictors

EvoWeaver Gene Organization Predictors

EvoWeaver Sequence-Level Predictors

PhyloDistance


npcooley/SynExtend documentation built on May 2, 2024, 7:28 p.m.