EvoWeaver-PPPreds | R Documentation |
EvoWeaver
incorporates four classes of prediction, each with multiple
methods and algorithms. Phylogenetic Profiling (PP) methods examine conservation
of gain/loss events within orthology groups using phylogenetic profiles
constructed from presence/absence patterns.
predict.EvoWeaver
currently supports nine PP methods:
'ExtantJaccard'
'Hamming'
'GLMI'
'PAPV'
'CorrGL'
'ProfDCA'
'Behdenna'
'GLDistance'
'PAJaccard'
'PAOverlap'
None.
Most PP methods are compatible with a EvoWeaver
object initialized
with any input type. See EvoWeaver
for more information on input data types.
When Method='Ensemble'
or Method="PhylogeneticProfiling"
, EvoWeaver uses
methods GLMI
, GLDistance
, PAJaccard
, and PAOverlap
.
All of these methods use presence/absence (PA) profiles, which are binary vectors such that 1 implies the corresponding genome has that particular gene, and 0 implies the genome does not have that particular gene.
Methods Hamming
and ExtantJaccard
use Hamming and Jaccard distance
(respectively) of PA profiles to determine overall score.
GLMI
uses mutual information of gain/loss (G/L) vectors to determine
score, employing a weighting scheme such that concordant gains/losses give positive information,
discordant gains/losses give negative information, and events that do not cooccur with a gain/loss
in the other gene group give no information.
PAJaccard
calculates the centered Jaccard index of P/A profiles, where each clade
with identical extant patterns is collapsed to a single leaf.
PAOverlap
calculates the proportion of time in the ancestry that both genes cooccur
relative to the total time each individual gene occurs, based on ancestral states inferred
with Fitch parsimony.
PAPV
calculates a p-value for PA profiles using Fisher's Exact Test. The returned score is provided as 1-p_value
so that larger scores indicate more significance, and smaller scores indicate less significance. This rescaling is consistent with the other similarity metrics in EvoWeaver
. This can be used with ExtantJaccard
, Hamming
, or GLMI
to weight raw scores by statistical significance.
ProfDCA
uses the direct coupling analysis algorithm introduced by
Weigt et al. (2005) to determine direct information between PA profiles.
This approach has been validated on PA profiles in Fukunaga and Iwasaki (2022),
though the implementation in EvoWeaver
forsakes the persistent contrasive divergence method in favor of the the algorithm from
Lokhov et al. (2018) for increased speed and exact solutions. Note that this algorithm is still extremely slow relative to the other methods despite the aforementioned runtime improvements.
Behdenna
implements the method detailed in Behdenna et al. (2016) to
find statistically significant interactions using co-occurence of gain/loss
events mapped to ancestral states on a species tree. This method requires
a species tree as input. If the EvoWeaver
object is initialized with dendrogram
objects, SuperTree
will be used to infer a species tree.
GLDistance
uses a similar method to Behdenna
. This method uses Fitch Parsimony to infer where events were gained or lost on a species tree, and then looks for distance between these gain/loss events. Unlike Behdenna
, this method takes into account the types of events (ex. gain/gain and loss/loss are treated differently than gain/loss). This method requires
a species tree as input. If the EvoWeaver
object is initialized with dendrogram
objects, SuperTree
will be used to infer a species tree.
CorrGL
infers where events were gained or lost on a species tree as in method
GLDistance
, then uses a Pearson's correlation coefficient weighted by p-value to infer similarity.
Aidan Lakshman ahl27@pitt.edu
Behdenna, A., et al., Testing for Independence between Evolutionary Processes. Systematic Biology, 2016. 65(5): p. 812-823.
Chung, N.C, et al., Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data. BMC Bioinformatics, 2019. 20(S15).
Date, S.V. and E.M. Marcotte, Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotechnology, 2003. 21(9): p. 1055-1062.
Fukunaga, T. and W. Iwasaki, Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics, 2022.
Lokhov, A.Y., et al., Optimal structure and parameter learning of Ising models. Science advances, 2018. 4(3): p. e1700791.
Pellegrini, M., et al., Assigning protein function by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences, 1999. 96(8) p. 4285-4288
Weigt, M., et al., Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences, 2009. 106(1): p. 67-72.
EvoWeaver
predict.EvoWeaver
EvoWeaver Phylogenetic Structure Predictors
EvoWeaver Gene Organization Predictors
EvoWeaver Sequence-Level Predictors
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.