annotateCluster: Each gene family (cluster of amino acid sequences) has...

Description Usage Arguments Value

Description

Each gene family (cluster of amino acid sequences) has members that in turn might have InterPro annotations in argument 'ipr.annos'. Those are filtered in order to retain only those that are NOT contained in others and those that are NOT parents of others. Here, 'others' refers to InterPro annotations found for genes of the cluster of course. The family 'fam' is then assigned the most frequent InterPro family(s), if their respective frequency is at least 0.5. If no such 'frequent' InterPro family annotation is found, the most frequent InterPro annotation regardless of its type are used. In most cases these are of type domain.

Usage

1
annotateCluster(fam, ipr.annos, interpro.database)

Arguments

fam

a character vector of gene accessions

ipr.annos

a data.frame with - at least - two columns (named "V1" and "V2"), the first holding the gene accessions, and the second holding the annotated InterPro entry. Note, that is highly important that this data.frame has unique rows. Duplicated entries will bias the frequency estimation. Use unique(ipr.annos) if necessary.

interpro.database

the database of InterPro entries as parsed from the interpro XML document 'interpro.xml'. The format is a named list of named lists. See parseInterProXML(...) for more details.

Value

A list of InterPro accessions to be interpreted as the argument gene family's 'fam' annotation. The list includes the annotations' frequency and short descriptions.


groupschoof/AHRD_on_gene_clusters documentation built on May 17, 2019, 8:38 a.m.