EvoWeaver-SLPreds | R Documentation |
EvoWeaver
incorporates four classes of prediction, each with multiple
methods and algorithms. Sequence-Level (SL) methods examine conservation of patterns in sequence data, commonly exhibited due to physical interactions
between proteins.
predict.EvoWeaver
currently supports three SL methods:
'SequenceInfo'
'GeneVector'
'Ancestral'
None.
All residue methods require a EvoWeaver
object initialized
with dendrogram
objects and ancestral states. See EvoWeaver
for more information on input data types.
When Method='Ensemble'
or Method="SequenceLevel"
, EvoWeaver uses
methods SequenceInfo
and GeneVector
.
The SequenceInfo
method looks at mutual information between sites in a multiple sequence alignment (MSA). This approach extends prior work in Martin et al. (2005). Each site from the first gene group is paired with the site from the second gene group that maximizes their mutual information.
The GeneVector
method uses the natural vector encoding method introduced in Zhao et al. (2022). This encodes each gene sequences as a 92-dimensional vector, with the following entries:
N(S) = (n_A,n_C,n_G,n_T,\\
\qquad\qquad\;\,\mu_A,\mu_C,\mu_G,\mu_T,\\
\qquad\qquad\quad\, D_2^A,D_2^C,D_2^G,D_2^T,\\
\qquad\qquad\qquad n_{AA},n_{AC},\dots,n_{TT},\\
\qquad\qquad\qquad\quad\;\; n_{AAA},n_{AAC},\dots,n_{TTT})
Here n_X
is the raw total count of nucleotide X
(or di/trinucleotide). For single nucleotides, we also calculate \mu_X
, the mean location of nucleotide X
, and D_2^X
, the second moment of the location of nucleotide X
. The overall natural vector for a COG is calculated as the normalized mean vector from the natural vectors of all component gene sequences. Interaction scores are computed using Pearson's R between each COG's natural vector. These di/trinucleotide counts are by default excluded, but can be included using the extended=TRUE
argument. Using the extended counts has shown minimal increased accuracy at the cost of slower runtime in benchmarking.
The Ancestral
method calculates coevolution by looking at correlation of residue mutations near the leaves of each respective gene tree.
Aidan Lakshman ahl27@pitt.edu
Martin, L. C., Gloor, G. B., Dunn, S. D. & Wahl, L. M, Using information theory to search for co-evolving residues in proteins. Bioinformatics, 2005. 21(4116-4124).
Zhao, N., et al., Protein-protein interaction and non-interaction predictions using gene sequence natural vector. Nature Communications Biology, 2022. 5(652).
EvoWeaver
predict.EvoWeaver
EvoWeaver Phylogenetic Profiling Predictors
EvoWeaver Phylogenetic Structure Predictors
EvoWeaver Gene Organization Predictors
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.