EvoWeaver-GOPreds: Gene Organization Predictions for EvoWeaver

EvoWeaver-GOPredsR Documentation

Gene Organization Predictions for EvoWeaver

Description

EvoWeaver incorporates four classes of prediction, each with multiple methods and algorithms. Co-localization (Coloc) methods examine conservation of relative location and transcriptional direction of genetic regions within the genome.

predict.EvoWeaver currently supports three Coloc methods:

  • 'Coloc'

  • 'ColocMoran'

  • 'TranscripMI'

Details

All distance matrix methods require a EvoWeaver object initialized with gene locations using the a three or four number code. See EvoWeaver for more information on input data types.

The built-in Coloc examines relative location of genes within genomes as evidence of interaction. For a given pair of genes, the score is given by \sum_{G}e^{1-|dI_G|}, where G the set of genomes and dI_G the difference in index between the two genes in genome G. Using gene index instead of number of base pairs avoids bias introduced by gene and genome length.

ColocMoran measures the extent to which gene distances are preserved across a phylogeny. This function uses the same initial scoring scheme as Coloc, but can handle paralogs. The raw scores are passed into MoransI to calculate spatial autocorrelation. "Space" is taken as e^{-C}, where C is the Cophenetic distance matrix calculated from the species tree of the inputs. As such, this method requires a species tree as input, which can be calculated from a set of gene trees using SuperTree.

TranscripMI uses mutual information of the transcriptional direction of each pair of genes. Conservation of relative transcriptional direction between gene pairs has been shown to imply functional association in prior work. This algorithm requires that the EvoWeaver object is initialized with a four number code, with the third number either 0 or 1, denoting whether the gene is on the forward or reverse strand. The mutual information is calculated as:

\sum_{x \in X}\sum_{y \in Y}(-1)^{(x!=y)}P_{(X,Y)}(x,y)\; \log\left(\frac{P_{(X,Y)}(x,y)}{P_X(x)P_Y(y)}\right)

Here X=Y=\{0,1\}, x is the direction of the gene with lower index, y is the direction of the gene with higher index, and P_{(T)}(t) is the probability of T=t. Note that this is a weighted MI as introduced by Beckley and Wright (2021). The mutual information is augmented by the addition of a single pseudocount to each value, and normalized by the joint entropy of X,Y. P-values are calculated using Fisher's Exact Test on the contingency table.

Value

None.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Beckley, Andrew and E. S. Wright. Identification of antibiotic pairs that evade concurrent resistance via a retrospective analysis of antimicrobial susceptibility test results. The Lancet Microbe, 2021. 2(10): 545-554.

Korbel, J. O., et al., Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nature Biotechnology, 2004. 22(7): 911-917.

Moran, P. A. P., Notes on Continuous Stochastic Phenomena. Biometrika, 1950. 37(1): 17-23.

See Also

EvoWeaver

predict.EvoWeaver

EvoWeaver Phylogenetic Profiling Predictors

EvoWeaver Phylogenetic Structure Predictors

EvoWeaver Sequence-Level Predictors


npcooley/SynExtend documentation built on May 2, 2024, 7:28 p.m.