EvoWeaver-GOPreds | R Documentation |
EvoWeaver
incorporates four classes of prediction, each with multiple
methods and algorithms. Co-localization (Coloc) methods examine
conservation of relative location and relative orientation of genetic
regions within the genome.
predict.EvoWeaver
currently supports three Coloc methods:
'GeneDistance'
'MoransI'
'OrientationMI'
None.
All distance matrix methods require a EvoWeaver
object initialized
with gene locations using the a four number code. See EvoWeaver
for more information on input data types.
The built-in GeneDistance
examines relative location of genes within genomes
as evidence of interaction. For a given pair of genes, the score is given by
\sum_{G}e^{1-|dI_G|}
, where G
the set of genomes and dI_G
the
difference in index between the two genes in genome G
. Using gene index
instead of number of base pairs avoids bias introduced by gene and genome length.
If a given gene is found multiple times in the same genome, the maximal score across
all possible pairings for that gene is used. The score for a pair of gene groups
is the mean score of all gene pairings across the groups.
MoransI
measures the extent to which gene distances are preserved across a phylogeny. This function uses the same initial scoring scheme as GeneDistance
. The raw scores are passed into MoranI
to calculate spatial autocorrelation. "Space" is taken as e^{-C}
, where C
is the Cophenetic distance matrix calculated from the species tree of the inputs. As such, this method requires a species tree as input, which can be calculated from a set of gene trees using SuperTree
.
OrientationMI
uses mutual information of the relative orientation of each pair of genes. Conservation of relative orientation between gene pairs has been shown to imply functional association in prior work. This algorithm requires that the EvoWeaver
object is initialized with a four number code, with the third number either 0
or 1
, denoting whether the gene is on the forward or reverse strand. The mutual information is calculated as:
\sum_{x \in X}\sum_{y \in Y}(-1)^{(x!=y)}P_{(X,Y)}(x,y)\; \log\left(\frac{P_{(X,Y)}(x,y)}{P_X(x)P_Y(y)}\right)
Here X=Y=\{0,1\}
, x
is the direction of the gene with lower index, y
is the direction of the gene with higher index, and P_{(T)}(t)
is the probability of T=t
. Note that this is a weighted MI as introduced by Beckley and Wright (2021). The mutual information is augmented by the addition of a single pseudocount to each value, and normalized by the joint entropy of X,Y
. P-values are calculated using Fisher's Exact Test on the contingency table.
Aidan Lakshman ahl27@pitt.edu
Beckley, Andrew and E. S. Wright. Identification of antibiotic pairs that evade concurrent resistance via a retrospective analysis of antimicrobial susceptibility test results. The Lancet Microbe, 2021. 2(10): 545-554.
Korbel, J. O., et al., Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nature Biotechnology, 2004. 22(7): 911-917.
Moran, P. A. P., Notes on Continuous Stochastic Phenomena. Biometrika, 1950. 37(1): 17-23.
EvoWeaver
predict.EvoWeaver
EvoWeaver Phylogenetic Profiling Predictors
EvoWeaver Phylogenetic Structure Predictors
EvoWeaver Sequence-Level Predictors
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.