EvoWeaver: EvoWeaver: Predicting Protein Functional Association Networks

View source: R/EvoWeaver-class.R

EvoWeaverR Documentation

EvoWeaver: Predicting Protein Functional Association Networks

Description

EvoWeaver is an S3 class with methods for predicting functional association using protein or gene data. EvoWeaver implements multiple algorithms for analyzing coevolutionary signal between genes, which are combined into overall predictions on functional association. For details on predictions, see predict.EvoWeaver.

Usage

EvoWeaver(ListOfData, MySpeciesTree=NULL, NoWarn=FALSE)

## S3 method for class 'EvoWeaver'
SpeciesTree(ew, Verbose=TRUE, Processors=1L)

Arguments

ListOfData

A list of gene data, where each entry corresponds to information on a particular gene. List must contain either dendrograms or vectors, and cannot contain a mixture. If list is composed of dendrograms, each dendrogram is a gene tree for the corresponding entry. If list is composed of vectors, vectors should be numeric or character vectors denoting the genomes containing that gene.

MySpeciesTree

An object of class 'dendrogram' representing the overall species tree for the list provided in ListOfData.

NoWarn

Several algorithms depend on having certain data. When a EvoWeaver object is initialized, it automatically selects which algorithms can be used given the input data. By default, EvoWeaver will notify the user of algorithms that cannot be used with warnings. Setting NoWarn=TRUE will suppress these messages.

ew

An object of class EvoWeaver

Verbose

Should output be displayed when calculating species tree?

Processors

Number of processors to use. Set to NULL to automatically use the maximum amount of processors.

Details

EvoWeaver expects input data to be a list. All entries must be one of the following:

  1. ListOfData[[i]] = c('ID#1', 'ID#2', ..., 'ID#k')

  2. (a) ListOfData[[i]] = c('i1_d1_p1', 'i2_d2_p2', ..., 'ik_dk_pk')

    (b) ListOfData[[i]] = c('i1_d1_s1_p1', 'i2_d2_s2_p2', ..., 'ik_dk_sk_pk')

  3. ListOfData[[i]] = dendrogram(...)

In (1), each ID#i corresponds to the unique identifier for genome #i. For entry #j in the list, the presence of 'ID#i' means genome #i has an ortholog for gene/protein #j.

Case (2a) is the same as (1), just with the formatting of names slightly different. Each entry is of the form i_d_p, where i is the unique identifier for the genome, d is which chromosome the ortholog is located, and p is what position the ortholog appears in on that chromosome. p must be a numeric, while the other entries can be any value.

Case (2b) is a variation on (2a), adding in an identifier s. This value must be 0 or 1, corresponding to whether the gene is on the forward or reverse strand. Whether 0 denotes forward or reverse is inconsequential as long as the scheme is consistent.

Case (3) expects gene trees for each gene, with labeled leaves corresponding to each source genome. If ListOfData is in this format, taking labels(ListOfData[[i]]) should produce a character vector that matches the format of one of the previous cases.

See the Examples section for illustrative examples.

Whenever possible, provide a full set of dendrogram objects with leaf labels in form (2b). This will allow the most algorithms to run. What follows is a more detailed description of which inputs allow which algorithms.

EvoWeaver requires input of scenario (3) to use distance matrix methods, and requires input of scenario (2) (or (3) with leaves labeled according to (2)) for gene organization analyses. Transcriptional direction analysis requires input of scenario (2b). Sequence-level methods require dendrograms with sequence information included as the state attribute in each leaf node.

Note that ALL entries must belong to the same category–a combination of character vectors and dendrograms is not allowed.

Prediction of a functional association network is done using predict(EvoWeaverObject). See predict.EvoWeaver for more information.

The SpeciesTree function takes in an object of class EvoWeaver and returns a species tree. If the object was not initialized with a species tree, it calculates one using SuperTree. The species tree for a EvoWeaver object can be set with attr(ew, 'speciesTree') <- ....

Value

Returns a EvoWeaver object.

Author(s)

Aidan Lakshman ahl27@pitt.edu

See Also

predict.EvoWeaver, ExampleStreptomycesData, BuiltInEnsembles, SuperTree

Examples

# I'm using gene to mean either a gene or protein

## Imagine we have the following 4 genomes:
## (each letter denotes a distinct gene)
##    Genome 1: a b c d
##    Genome 2: d c e
##    Genome 3: b a e 
##    Genome 4: a e

## We have 5 total genes: (a,b,c,d,e)
##    a is present in genomes 1, 3, 4
##    b is present in genomes 1, 3
##    c is present in genomes 1, 2
##    d is present in genomes 1, 2
##    e is present in genomes 2, 3, 4

## Constructing a EvoWeaver object according to (1):
l <- list()
l[['a']] <- c('1', '3', '4') 
l[['b']] <- c('1', '3') 
l[['c']] <- c('1', '2') 
l[['d']] <- c('1', '2') 
l[['e']] <- c('2', '3', '4') 

## Each value of the list corresponds to a gene
## The associated vector shows which genomes have that gene
pwCase1 <- EvoWeaver(l)

## Constructing a EvoWeaver object according to (2):
##  Here we need to add in the chromosome and the position
##  As we only have one chromosome, 
##  we can just set that to 1 for all.
##  Position can be identified with knowledge, or with
##  FindGenes(...) from DECIPHER.

## In this toy case, genomes are small so it's simple.
l <- list()
l[['a']] <- c('1_1_1', '3_1_2', '4_1_1') 
l[['b']] <- c('1_1_2', '3_1_1') 
l[['c']] <- c('1_1_3', '2_1_2') 
l[['d']] <- c('1_1_4', '2_1_1') 
l[['e']] <- c('2_1_3', '3_1_3', '4_1_2') 

pwCase2a <- EvoWeaver(l)

## If we want transcriptional information, we need an 
## value corresponding to the strand of each gene
## Notice that the genome identifer need not be numeric,
## but the strand identifer must be 0 or 1
l <- list()
l[['a']] <- c('a_1_0_1', 'c_1_1_2', 'd_1_0_1') 
l[['b']] <- c('a_1_1_2', 'c_1_1_1') 
l[['c']] <- c('a_1_1_3', 'b_1_0_2') 
l[['d']] <- c('a_1_0_4', 'b_1_0_1') 
l[['e']] <- c('b_1_0_3', 'c_1_0_3', 'd_1_0_2') 

## For Case 3, we just need dendrogram objects for each
# l[['a']] <- dendrogram(...)
# l[['b']] <- dendrogram(...)
# l[['c']] <- dendrogram(...)
# l[['d']] <- dendrogram(...)
# l[['e']] <- dendrogram(...)

## Leaf labels for these will be the same as the 
##  entries in Case 1.

npcooley/SynExtend documentation built on May 2, 2024, 7:28 p.m.