getGENEONTOLOGY: Find Gene Ontology (GO) annotation

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/getGENEONTOLOGY.R

Description

Takes a vector of probe set identifiers and an annotation table and retrieves the corresponding GO annotation.

Usage

1
getGENEONTOLOGY(ps, annot, diagnose = FALSE, specifics = 0, GOcol = 31, noGOsymbol = NA, noGOprovidedSymbol = "---", sep = " /// ")

Arguments

ps

character vector containing the probe sets identifiers.

annot

annotation table (data frame) where each row is a record and each column is an annotation field.

diagnose

logical. If TRUE, 3 (logical) vectors used for diagnostic purpose are returned in addition to the annotation. If FALSE (default) only the annotation is returned.

specifics

can take value 0, 1, 2, 3, ... . If specifics=i with i>0, the GO biological process annotation is parsed (using " // " as separator) and the i-th part of the expression is returned. If specifics=0, the GO biological process annotation is not parsed.

GOcol

column in annotation table containing the GO biological processes.

noGOsymbol

character string to be used in output list 'go' if no GO biological process is found or provided in the annotation table.

noGOprovidedSymbol

character string used in annotation table and indicating missing GO biological process.

sep

character string used in annotation table to separate multiple GO biological processes.

Details

This function can be used with Affymetrix annotation files (e.g. 'HG-U133\_Plus\_2\_annot.csv'). It retrieves GO annotation corresponding to particular probe set identifiers. GO biological processes are returned by default ('GOcol'=31) but GO cellular components ('GOcol'=32) or GO molecular functions ('GOcol'=33) can be returned by setting 'GOcol' appropriately.

GO biological processes are returned as elements of list 'go'. If multiple GO biological processes are provided for 'ps[i]' (with 'sep' separating GO biological processes in the annotation table), a vector containing all GO biological processes is returned as the 'i-th' element of list 'go'.

The default values for 'GOcol', 'noGOsymbol', 'noGOprovidedSymbol' and 'sep' are chosen to suit the format of Affymetrix annotation files. However, options can be set to look up any annotation table, provided the probe set identifiers are in the first column and occur only once.

Note that each GO annotation in Affymetrix annotation files contains 3 attributes: the GO biological process ID, term and quality, separated by " // ". Setting the option 'specifics' to 1, 2, or 3 allows to retrieve any of the 3 attributes separately.

Value

go

list of length 'length(ps)' the 'i'-th element of which contains the GO annotation for 'ps[i]'.

empty

logical vector of length 'length(ps)'. 'empty[i]' is TRUE if 'ps[i]' is empty or NA.

noentry

locial vector of length 'length(ps)'. 'noentry[i]' is TRUE if 'ps[i]' cannot be found in the first column of the annotation table.

nogo

locial vector of length 'length(ps)'. 'nogo[i]' is TRUE if 'go[i]==noIDprovidedSymbol' is TRUE.

Note

getANNOTATION provides a more flexible solution to be used with arbitrary annotation tables.

Author(s)

Alexandre Kuhn

References

Kuhn et al. Cross-species and cross-platform gene expression studies with the Bioconductor-compliant R package 'annotationTools'. BMC Bioinformatics, 9:26 (2008)

See Also

getANNOTATION

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
##example Affymetrix annotation file and its location
annotationFile<-system.file('extdata','HG-U133_Plus_2_annot_part.csv',package='annotationTools')

##load annotation file
annotation<-read.csv(annotationFile,colClasses='character',comment.char='#')

##get gene GO biological process (full information)
myPS<-c('117_at','1007_s_at','1552288_at',NA,'xyz_at')
getGENEONTOLOGY(myPS,annotation)

##get gene GO biological process terms only
getGENEONTOLOGY(myPS,annotation,specifics=2)

##track origin of annotation failure for the 3 last probe set IDs
getGENEONTOLOGY(myPS,annotation,diagnose=TRUE)

##GO molecular functions are contained in column 33 of the annotation
colnames(annotation)

##get gene GO molecular functions
getGENEONTOLOGY(myPS,annotation,GOcol=33)

Example output

[[1]]
[1] "6457 // protein folding // inferred from electronic annotation"             
[2] "6986 // response to unfolded protein // traceable author statement"         
[3] "6986 // response to unfolded protein // inferred from electronic annotation"

[[2]]
[1] "6468 // protein amino acid phosphorylation // inferred from electronic annotation"                              
[2] "7155 // cell adhesion // traceable author statement"                                                            
[3] "7169 // transmembrane receptor protein tyrosine kinase signaling pathway // inferred from electronic annotation"
[4] "7155 // cell adhesion // inferred from electronic annotation"                                                   

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getGENEONTOLOGY(myPS, annotation) :
  One or more empty probe sets in input
2: In getGENEONTOLOGY(myPS, annotation) :
  One or more probe sets not found in annotation
3: In getGENEONTOLOGY(myPS, annotation) :
  One or more ps with no GO term provided in annotation
[[1]]
[1] "protein folding"              "response to unfolded protein"
[3] "response to unfolded protein"

[[2]]
[1] "protein amino acid phosphorylation"                              
[2] "cell adhesion"                                                   
[3] "transmembrane receptor protein tyrosine kinase signaling pathway"
[4] "cell adhesion"                                                   

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getGENEONTOLOGY(myPS, annotation, specifics = 2) :
  One or more empty probe sets in input
2: In getGENEONTOLOGY(myPS, annotation, specifics = 2) :
  One or more probe sets not found in annotation
3: In getGENEONTOLOGY(myPS, annotation, specifics = 2) :
  One or more ps with no GO term provided in annotation
[[1]]
[[1]][[1]]
[1] "6457 // protein folding // inferred from electronic annotation"             
[2] "6986 // response to unfolded protein // traceable author statement"         
[3] "6986 // response to unfolded protein // inferred from electronic annotation"

[[1]][[2]]
[1] "6468 // protein amino acid phosphorylation // inferred from electronic annotation"                              
[2] "7155 // cell adhesion // traceable author statement"                                                            
[3] "7169 // transmembrane receptor protein tyrosine kinase signaling pathway // inferred from electronic annotation"
[4] "7155 // cell adhesion // inferred from electronic annotation"                                                   

[[1]][[3]]
[1] NA

[[1]][[4]]
[1] NA

[[1]][[5]]
[1] NA


[[2]]
[1] FALSE FALSE FALSE  TRUE FALSE

[[3]]
[1] FALSE FALSE FALSE FALSE  TRUE

[[4]]
[1] FALSE FALSE  TRUE FALSE FALSE

Warning messages:
1: In getGENEONTOLOGY(myPS, annotation, diagnose = TRUE) :
  One or more empty probe sets in input
2: In getGENEONTOLOGY(myPS, annotation, diagnose = TRUE) :
  One or more probe sets not found in annotation
3: In getGENEONTOLOGY(myPS, annotation, diagnose = TRUE) :
  One or more ps with no GO term provided in annotation
 [1] "Probe.Set.ID"                     "GeneChip.Array"                  
 [3] "Species.Scientific.Name"          "Annotation.Date"                 
 [5] "Sequence.Type"                    "Sequence.Source"                 
 [7] "Transcript.ID.Array.Design."      "Target.Description"              
 [9] "Representative.Public.ID"         "Archival.UniGene.Cluster"        
[11] "UniGene.ID"                       "Genome.Version"                  
[13] "Alignments"                       "Gene.Title"                      
[15] "Gene.Symbol"                      "Chromosomal.Location"            
[17] "Unigene.Cluster.Type"             "Ensembl"                         
[19] "Entrez.Gene"                      "SwissProt"                       
[21] "EC"                               "OMIM"                            
[23] "RefSeq.Protein.ID"                "RefSeq.Transcript.ID"            
[25] "FlyBase"                          "AGI"                             
[27] "WormBase"                         "MGI.Name"                        
[29] "RGD.Name"                         "SGD.accession.number"            
[31] "Gene.Ontology.Biological.Process" "Gene.Ontology.Cellular.Component"
[33] "Gene.Ontology.Molecular.Function" "Pathway"                         
[35] "Protein.Families"                 "Protein.Domains"                 
[37] "InterPro"                         "Trans.Membrane"                  
[39] "QTL"                              "Annotation.Description"          
[41] "Annotation.Transcript.Cluster"    "Transcript.Assignments"          
[43] "Annotation.Notes"                
[[1]]
[1] "166 // nucleotide binding // inferred from electronic annotation"
[2] "5524 // ATP binding // inferred from electronic annotation"      

[[2]]
[1] "166 // nucleotide binding // inferred from electronic annotation"                                      
[2] "4713 // protein-tyrosine kinase activity // inferred from electronic annotation"                       
[3] "4714 // transmembrane receptor protein tyrosine kinase activity // traceable author statement"         
[4] "4872 // receptor activity // inferred from electronic annotation"                                      
[5] "5524 // ATP binding // inferred from electronic annotation"                                            
[6] "16740 // transferase activity // inferred from electronic annotation"                                  
[7] "4672 // protein kinase activity // inferred from electronic annotation"                                
[8] "4714 // transmembrane receptor protein tyrosine kinase activity // inferred from electronic annotation"
[9] "16301 // kinase activity // inferred from electronic annotation"                                       

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getGENEONTOLOGY(myPS, annotation, GOcol = 33) :
  One or more empty probe sets in input
2: In getGENEONTOLOGY(myPS, annotation, GOcol = 33) :
  One or more probe sets not found in annotation
3: In getGENEONTOLOGY(myPS, annotation, GOcol = 33) :
  One or more ps with no GO term provided in annotation

annotationTools documentation built on Nov. 8, 2020, 6:58 p.m.