EvoWeaver-SLPreds: Sequence-Level Predictions for EvoWeaver

EvoWeaver-SLPredsR Documentation

Sequence-Level Predictions for EvoWeaver

Description

EvoWeaver incorporates four classes of prediction, each with multiple methods and algorithms. Sequence-Level (SL) methods examine conservation of patterns in sequence data, commonly exhibited due to physical interactions between proteins.

predict.EvoWeaver currently supports three SL methods:

  • 'SequenceInfo'

  • 'GeneVector'

  • 'Ancestral'

Format

None.

Details

All residue methods require a EvoWeaver object initialized with dendrogram objects and ancestral states. See EvoWeaver for more information on input data types.

When Method='Ensemble' or Method="SequenceLevel", EvoWeaver uses methods SequenceInfo and GeneVector.

The SequenceInfo method looks at mutual information between sites in a multiple sequence alignment (MSA). This approach extends prior work in Martin et al. (2005). Each site from the first gene group is paired with the site from the second gene group that maximizes their mutual information.

The GeneVector method uses the natural vector encoding method introduced in Zhao et al. (2022). This encodes each gene sequences as a 92-dimensional vector, with the following entries:

N(S) = (n_A,n_C,n_G,n_T,\\ \qquad\qquad\;\,\mu_A,\mu_C,\mu_G,\mu_T,\\ \qquad\qquad\quad\, D_2^A,D_2^C,D_2^G,D_2^T,\\ \qquad\qquad\qquad n_{AA},n_{AC},\dots,n_{TT},\\ \qquad\qquad\qquad\quad\;\; n_{AAA},n_{AAC},\dots,n_{TTT})

Here n_X is the raw total count of nucleotide X (or di/trinucleotide). For single nucleotides, we also calculate \mu_X, the mean location of nucleotide X, and D_2^X, the second moment of the location of nucleotide X. The overall natural vector for a COG is calculated as the normalized mean vector from the natural vectors of all component gene sequences. Interaction scores are computed using Pearson's R between each COG's natural vector. These di/trinucleotide counts are by default excluded, but can be included using the extended=TRUE argument. Using the extended counts has shown minimal increased accuracy at the cost of slower runtime in benchmarking.

The Ancestral method calculates coevolution by looking at correlation of residue mutations near the leaves of each respective gene tree.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Martin, L. C., Gloor, G. B., Dunn, S. D. & Wahl, L. M, Using information theory to search for co-evolving residues in proteins. Bioinformatics, 2005. 21(4116-4124).

Zhao, N., et al., Protein-protein interaction and non-interaction predictions using gene sequence natural vector. Nature Communications Biology, 2022. 5(652).

See Also

EvoWeaver

predict.EvoWeaver

EvoWeaver Phylogenetic Profiling Predictors

EvoWeaver Phylogenetic Structure Predictors

EvoWeaver Gene Organization Predictors


npcooley/SynExtend documentation built on Nov. 15, 2024, 3:02 p.m.