EvoWeaver-PPPreds: Phylogenetic Profiling Predictions for EvoWeaver

EvoWeaver-PPPredsR Documentation

Phylogenetic Profiling Predictions for EvoWeaver

Description

EvoWeaver incorporates four classes of prediction, each with multiple methods and algorithms. Phylogenetic Profiling (PP) methods examine conservation of gain/loss events within orthology groups using phylogenetic profiles constructed from presence/absence patterns.

predict.EvoWeaver currently supports nine PP methods:

  • 'ExtantJaccard'

  • 'Hamming'

  • 'GLMI'

  • 'PAPV'

  • 'CorrGL'

  • 'ProfDCA'

  • 'Behdenna'

  • 'GLDistance'

  • 'PAJaccard'

  • 'PAOverlap'

Format

None.

Details

Most PP methods are compatible with a EvoWeaver object initialized with any input type. See EvoWeaver for more information on input data types.

When Method='Ensemble' or Method="PhylogeneticProfiling", EvoWeaver uses methods GLMI, GLDistance, PAJaccard, and PAOverlap.

All of these methods use presence/absence (PA) profiles, which are binary vectors such that 1 implies the corresponding genome has that particular gene, and 0 implies the genome does not have that particular gene.

Methods Hamming and ExtantJaccard use Hamming and Jaccard distance (respectively) of PA profiles to determine overall score.

GLMI uses mutual information of gain/loss (G/L) vectors to determine score, employing a weighting scheme such that concordant gains/losses give positive information, discordant gains/losses give negative information, and events that do not cooccur with a gain/loss in the other gene group give no information.

PAJaccard calculates the centered Jaccard index of P/A profiles, where each clade with identical extant patterns is collapsed to a single leaf.

PAOverlap calculates the proportion of time in the ancestry that both genes cooccur relative to the total time each individual gene occurs, based on ancestral states inferred with Fitch parsimony.

PAPV calculates a p-value for PA profiles using Fisher's Exact Test. The returned score is provided as 1-p_value so that larger scores indicate more significance, and smaller scores indicate less significance. This rescaling is consistent with the other similarity metrics in EvoWeaver. This can be used with ExtantJaccard, Hamming, or GLMI to weight raw scores by statistical significance.

ProfDCA uses the direct coupling analysis algorithm introduced by Weigt et al. (2005) to determine direct information between PA profiles. This approach has been validated on PA profiles in Fukunaga and Iwasaki (2022), though the implementation in EvoWeaver forsakes the persistent contrasive divergence method in favor of the the algorithm from Lokhov et al. (2018) for increased speed and exact solutions. Note that this algorithm is still extremely slow relative to the other methods despite the aforementioned runtime improvements.

Behdenna implements the method detailed in Behdenna et al. (2016) to find statistically significant interactions using co-occurence of gain/loss events mapped to ancestral states on a species tree. This method requires a species tree as input. If the EvoWeaver object is initialized with dendrogram objects, SuperTree will be used to infer a species tree.

GLDistance uses a similar method to Behdenna. This method uses Fitch Parsimony to infer where events were gained or lost on a species tree, and then looks for distance between these gain/loss events. Unlike Behdenna, this method takes into account the types of events (ex. gain/gain and loss/loss are treated differently than gain/loss). This method requires a species tree as input. If the EvoWeaver object is initialized with dendrogram objects, SuperTree will be used to infer a species tree.

CorrGL infers where events were gained or lost on a species tree as in method GLDistance, then uses a Pearson's correlation coefficient weighted by p-value to infer similarity.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Behdenna, A., et al., Testing for Independence between Evolutionary Processes. Systematic Biology, 2016. 65(5): p. 812-823.

Chung, N.C, et al., Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data. BMC Bioinformatics, 2019. 20(S15).

Date, S.V. and E.M. Marcotte, Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotechnology, 2003. 21(9): p. 1055-1062.

Fukunaga, T. and W. Iwasaki, Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics, 2022.

Lokhov, A.Y., et al., Optimal structure and parameter learning of Ising models. Science advances, 2018. 4(3): p. e1700791.

Pellegrini, M., et al., Assigning protein function by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences, 1999. 96(8) p. 4285-4288

Weigt, M., et al., Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences, 2009. 106(1): p. 67-72.

See Also

EvoWeaver

predict.EvoWeaver

EvoWeaver Phylogenetic Structure Predictors

EvoWeaver Gene Organization Predictors

EvoWeaver Sequence-Level Predictors


npcooley/SynExtend documentation built on Dec. 20, 2024, 4:03 p.m.