EvoWeaver-PPPreds: Phylogenetic Profiling Predictions for EvoWeaver

EvoWeaver-PPPredsR Documentation

Phylogenetic Profiling Predictions for EvoWeaver

Description

EvoWeaver incorporates four classes of prediction, each with multiple methods and algorithms. Phylogenetic Profiling (PP) methods examine conservation of gain/loss events within orthology groups using phylogenetic profiles constructed from presence/absence patterns.

predict.EvoWeaver currently supports seven PP methods:

  • 'Jaccard'

  • 'Hamming'

  • 'MutualInformation'

  • 'PAPV'

  • 'CorrGL'

  • 'ProfDCA'

  • 'Behdenna'

  • 'GainLoss'

Details

Most PP methods are compatible with a EvoWeaver object initialized with any input type. See EvoWeaver for more information on input data types.

All of these methods use presence/absence (PA) profiles, which are binary vectors such that 1 implies the corresponding genome has that particular gene, and 0 implies the genome does not have that particular gene.

Methods Hamming and Jaccard use Hamming and Jaccard distance (respectively) of PA profiles to determine overall score.

MutualInformation uses mutual information of PA profiels to determine score, employing a weighting scheme such that 11 and 00 give positive information, and 10 and 01 give negative information.

PAPV calculates a p-value for PA profiles using Fisher's Exact Test. The returned score is provided as 1-p_value so that larger scores indicate more significance, and smaller scores indicate less significance. This rescaling is consistent with the other similarity metrics in EvoWeaver. This can be used with Jaccard, Hamming, or MutualInformation to weight raw scores by statistical significance.

ProfDCA uses the direct coupling analysis algorithm introduced by Weigt et al. (2005) to determine direct information between PA profiles. This approach has been validated on PA profiles in Fukunaga and Iwasaki (2022), though the implementation in EvoWeaver forsakes the persistent contrasive divergence method in favor of the the algorithm from Lokhov et al. (2018) for increased speed and exact solutions. Note that this algorithm is still extremely slow relative to the other methods despite the aforementioned runtime improvements.

Behdenna implements the method detailed in Behdenna et al. (2016) to find statistically significant interactions using co-occurence of gain/loss events mapped to ancestral states on a species tree. This method requires a species tree as input. If the EvoWeaver object is initialized with dendrogram objects, SuperTree will be used to infer a species tree.

GainLoss uses a similar method to Behdenna. This method uses Fitch Parsimony to infer where events were gained or lost on a species tree, and then looks for distance between these gain/loss events. Unlike Behdenna, this method takes into account the types of events (ex. gain/gain and loss/loss are treated differently than gain/loss). This method requires a species tree as input. If the EvoWeaver object is initialized with dendrogram objects, SuperTree will be used to infer a species tree.

CorrGL infers where events were gained or lost on a species tree as in method GainLoss, then uses a Pearson's correlation coefficient weighted by p-value to infer similarity.

Value

None.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Behdenna, A., et al., Testing for Independence between Evolutionary Processes. Systematic Biology, 2016. 65(5): p. 812-823.

Date, S.V. and E.M. Marcotte, Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotechnology, 2003. 21(9): p. 1055-1062.

Fukunaga, T. and W. Iwasaki, Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics, 2022.

Lokhov, A.Y., et al., Optimal structure and parameter learning of Ising models. Science advances, 2018. 4(3): p. e1700791.

Pellegrini, M., et al., Assigning protein function by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences, 1999. 96(8) p. 4285-4288

Weigt, M., et al., Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences, 2009. 106(1): p. 67-72.

See Also

EvoWeaver

predict.EvoWeaver

EvoWeaver Phylogenetic Structure Predictors

EvoWeaver Gene Organization Predictors

EvoWeaver Sequence-Level Predictors


npcooley/SynExtend documentation built on May 2, 2024, 7:28 p.m.