EvoWeaver-GOPreds: Gene Organization Predictions for EvoWeaver

EvoWeaver-GOPredsR Documentation

Gene Organization Predictions for EvoWeaver

Description

EvoWeaver incorporates four classes of prediction, each with multiple methods and algorithms. Co-localization (Coloc) methods examine conservation of relative location and relative orientation of genetic regions within the genome.

predict.EvoWeaver currently supports three Coloc methods:

  • 'GeneDistance'

  • 'MoransI'

  • 'OrientationMI'

Format

None.

Details

All distance matrix methods require a EvoWeaver object initialized with gene locations using the a four number code. See EvoWeaver for more information on input data types.

The built-in GeneDistance examines relative location of genes within genomes as evidence of interaction. For a given pair of genes, the score is given by \sum_{G}e^{1-|dI_G|}, where G the set of genomes and dI_G the difference in index between the two genes in genome G. Using gene index instead of number of base pairs avoids bias introduced by gene and genome length. If a given gene is found multiple times in the same genome, the maximal score across all possible pairings for that gene is used. The score for a pair of gene groups is the mean score of all gene pairings across the groups.

MoransI measures the extent to which gene distances are preserved across a phylogeny. This function uses the same initial scoring scheme as GeneDistance. The raw scores are passed into MoranI to calculate spatial autocorrelation. "Space" is taken as e^{-C}, where C is the Cophenetic distance matrix calculated from the species tree of the inputs. As such, this method requires a species tree as input, which can be calculated from a set of gene trees using SuperTree.

OrientationMI uses mutual information of the relative orientation of each pair of genes. Conservation of relative orientation between gene pairs has been shown to imply functional association in prior work. This algorithm requires that the EvoWeaver object is initialized with a four number code, with the third number either 0 or 1, denoting whether the gene is on the forward or reverse strand. The mutual information is calculated as:

\sum_{x \in X}\sum_{y \in Y}(-1)^{(x!=y)}P_{(X,Y)}(x,y)\; \log\left(\frac{P_{(X,Y)}(x,y)}{P_X(x)P_Y(y)}\right)

Here X=Y=\{0,1\}, x is the direction of the gene with lower index, y is the direction of the gene with higher index, and P_{(T)}(t) is the probability of T=t. Note that this is a weighted MI as introduced by Beckley and Wright (2021). The mutual information is augmented by the addition of a single pseudocount to each value, and normalized by the joint entropy of X,Y. P-values are calculated using Fisher's Exact Test on the contingency table.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Beckley, Andrew and E. S. Wright. Identification of antibiotic pairs that evade concurrent resistance via a retrospective analysis of antimicrobial susceptibility test results. The Lancet Microbe, 2021. 2(10): 545-554.

Korbel, J. O., et al., Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nature Biotechnology, 2004. 22(7): 911-917.

Moran, P. A. P., Notes on Continuous Stochastic Phenomena. Biometrika, 1950. 37(1): 17-23.

See Also

EvoWeaver

predict.EvoWeaver

EvoWeaver Phylogenetic Profiling Predictors

EvoWeaver Phylogenetic Structure Predictors

EvoWeaver Sequence-Level Predictors


npcooley/SynExtend documentation built on Nov. 15, 2024, 3:02 p.m.