getPROBESET: Find probe set IDs
In annotationTools: Annotate microarrays and perform cross-species gene expression analyses using flat file databases

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Takes a vector of gene IDs (or identifiers of other types) and an annotation table and looks up the gene IDs in the table to retrieve the corresponding probe set identifiers. Each gene ID can occur multiple times (i.e. on mulitple lines) in the annotation table.

1	getPROBESET(geneid, annot, uniqueID = FALSE, diagnose = FALSE, idCol = 19, noPSsymbol = NA, noPSprovidedSymbol = "---")

`geneid`	character vector containing the gene IDs.
`annot`	annotation table (data frame) where each row is a record and each column is an annotation field.
`uniqueID`	logical. If TRUE, only probe set IDs annotated with a single gene ID are returned. If FALSE, probe set IDs annotated with multiple gene IDs are returned too.
`diagnose`	logical. If TRUE, 3 (logical) vectors used for diagnostic purpose are returned in addition to the annotation. If FALSE (default) only the annotation is returned.
`idCol`	column in annotation table containing the gene identifiers.
`noPSsymbol`	character string to be used in output list 'ps' if no probe set ID is found or provided in the annotation table.
`noPSprovidedSymbol`	character string used in annotation table and indicating missing probe set ID.

This function can be used with Affymetrix annotation files (e.g. 'HG-U133\_Plus\_2\_annot.csv'). It retrieves probe set IDs corresponding to particular gene identifiers. By default, the function takes gene IDs but any type of identifier (e.g. gene symbol) can be used (set 'idCol' accordingly).

Probe set IDs are returned as elements of list 'ps'. If multiple probe set IDs are found for 'geneid[i]', a vector containing all probe set IDs is returned as the 'i-th' element of list 'ps'.

The default values for 'idCol', 'noPSsymbol', and 'noPSprovidedSymbol' are chosen to suit the format of Affymetrix annotation files. However, options can be set to look up any annotation table, provided the probe set identifiers are in the first column.

`ps`	list of length 'length(geneid)' the 'i'-th element of which contains the probe set IDs for 'geneid[i]'.
`empty`	logical vector of length 'length(geneid)'. 'empty[i]' is TRUE if 'geneid[i]' is empty or NA.
`noentry`	locial vector of length 'length(geneid)'. 'noentry[i]' is TRUE if 'geneid[i]' cannot be found in column 'idCol' (default is column 19) of the annotation table.
`noid`	locial vector of length 'length(geneid)'. 'noid[i]' is TRUE if 'ps[i]==noIDprovidedSymbol' is TRUE.

getMULTIANNOTATION provides a more flexible solution that can be used with arbitrary annotation tables.

Alexandre Kuhn

Kuhn et al. Cross-species and cross-platform gene expression studies with the Bioconductor-compliant R package 'annotationTools'. BMC Bioinformatics, 9:26 (2008)

getMULTIANNOTATION

##example Affymetrix annotation file and its location
annotationFile<-system.file('extdata','HG-U133_Plus_2_annot_part.csv',package='annotationTools')

##load annotation file
annotation<-read.csv(annotationFile,colClasses='character',comment.char='#')

##genes of interest
myGenes<-c('DDR1','GUCA1A','HSPA6',NA,'XYZ')

##column 15 in annotation contains gene symbols
colnames(annotation)

##find probe sets probing for particular genes 
getPROBESET(myGenes,annotation,idCol=15)

##find probe sets probing only for the genes of interest (i.e. with unique annotation)
getPROBESET(myGenes,annotation,idCol=15,uniqueID=TRUE)

##track origin of annotation failure for the 2 last probe set IDs
getPROBESET(myGenes,annotation,idCol=15,diagnose=TRUE)

 [1] "Probe.Set.ID"                     "GeneChip.Array"                  
 [3] "Species.Scientific.Name"          "Annotation.Date"                 
 [5] "Sequence.Type"                    "Sequence.Source"                 
 [7] "Transcript.ID.Array.Design."      "Target.Description"              
 [9] "Representative.Public.ID"         "Archival.UniGene.Cluster"        
[11] "UniGene.ID"                       "Genome.Version"                  
[13] "Alignments"                       "Gene.Title"                      
[15] "Gene.Symbol"                      "Chromosomal.Location"            
[17] "Unigene.Cluster.Type"             "Ensembl"                         
[19] "Entrez.Gene"                      "SwissProt"                       
[21] "EC"                               "OMIM"                            
[23] "RefSeq.Protein.ID"                "RefSeq.Transcript.ID"            
[25] "FlyBase"                          "AGI"                             
[27] "WormBase"                         "MGI.Name"                        
[29] "RGD.Name"                         "SGD.accession.number"            
[31] "Gene.Ontology.Biological.Process" "Gene.Ontology.Cellular.Component"
[33] "Gene.Ontology.Molecular.Function" "Pathway"                         
[35] "Protein.Families"                 "Protein.Domains"                 
[37] "InterPro"                         "Trans.Membrane"                  
[39] "QTL"                              "Annotation.Description"          
[41] "Annotation.Transcript.Cluster"    "Transcript.Assignments"          
[43] "Annotation.Notes"                
[[1]]
[1] "1007_s_at"   "207169_x_at"

[[2]]
[1] "1255_g_at"

[[3]]
[1] "117_at"

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getPROBESET(myGenes, annotation, idCol = 15) :
  one or more empty gene ID in input
2: In getPROBESET(myGenes, annotation, idCol = 15) :
  one or more gene ID not found in annotation
[[1]]
[1] "1007_s_at"   "207169_x_at"

[[2]]
[1] "1255_g_at"

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getPROBESET(myGenes, annotation, idCol = 15, uniqueID = TRUE) :
  one or more empty gene ID in input
2: In getPROBESET(myGenes, annotation, idCol = 15, uniqueID = TRUE) :
  one or more gene ID not found in annotation
[[1]]
[[1]][[1]]
[1] "1007_s_at"   "207169_x_at"

[[1]][[2]]
[1] "1255_g_at"

[[1]][[3]]
[1] "117_at"

[[1]][[4]]
[1] NA

[[1]][[5]]
[1] NA


[[2]]
[1] FALSE FALSE FALSE  TRUE FALSE

[[3]]
[1] FALSE FALSE FALSE FALSE  TRUE

[[4]]
[1] FALSE FALSE FALSE FALSE FALSE

Warning messages:
1: In getPROBESET(myGenes, annotation, idCol = 15, diagnose = TRUE) :
  one or more empty gene ID in input
2: In getPROBESET(myGenes, annotation, idCol = 15, diagnose = TRUE) :
  one or more gene ID not found in annotation