getPROBESET: Find probe set IDs

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/getPROBESET.R

Description

Takes a vector of gene IDs (or identifiers of other types) and an annotation table and looks up the gene IDs in the table to retrieve the corresponding probe set identifiers. Each gene ID can occur multiple times (i.e. on mulitple lines) in the annotation table.

Usage

1
getPROBESET(geneid, annot, uniqueID = FALSE, diagnose = FALSE, idCol = 19, noPSsymbol = NA, noPSprovidedSymbol = "---")

Arguments

geneid

character vector containing the gene IDs.

annot

annotation table (data frame) where each row is a record and each column is an annotation field.

uniqueID

logical. If TRUE, only probe set IDs annotated with a single gene ID are returned. If FALSE, probe set IDs annotated with multiple gene IDs are returned too.

diagnose

logical. If TRUE, 3 (logical) vectors used for diagnostic purpose are returned in addition to the annotation. If FALSE (default) only the annotation is returned.

idCol

column in annotation table containing the gene identifiers.

noPSsymbol

character string to be used in output list 'ps' if no probe set ID is found or provided in the annotation table.

noPSprovidedSymbol

character string used in annotation table and indicating missing probe set ID.

Details

This function can be used with Affymetrix annotation files (e.g. 'HG-U133\_Plus\_2\_annot.csv'). It retrieves probe set IDs corresponding to particular gene identifiers. By default, the function takes gene IDs but any type of identifier (e.g. gene symbol) can be used (set 'idCol' accordingly).

Probe set IDs are returned as elements of list 'ps'. If multiple probe set IDs are found for 'geneid[i]', a vector containing all probe set IDs is returned as the 'i-th' element of list 'ps'.

The default values for 'idCol', 'noPSsymbol', and 'noPSprovidedSymbol' are chosen to suit the format of Affymetrix annotation files. However, options can be set to look up any annotation table, provided the probe set identifiers are in the first column.

Value

ps

list of length 'length(geneid)' the 'i'-th element of which contains the probe set IDs for 'geneid[i]'.

empty

logical vector of length 'length(geneid)'. 'empty[i]' is TRUE if 'geneid[i]' is empty or NA.

noentry

locial vector of length 'length(geneid)'. 'noentry[i]' is TRUE if 'geneid[i]' cannot be found in column 'idCol' (default is column 19) of the annotation table.

noid

locial vector of length 'length(geneid)'. 'noid[i]' is TRUE if 'ps[i]==noIDprovidedSymbol' is TRUE.

Note

getMULTIANNOTATION provides a more flexible solution that can be used with arbitrary annotation tables.

Author(s)

Alexandre Kuhn

References

Kuhn et al. Cross-species and cross-platform gene expression studies with the Bioconductor-compliant R package 'annotationTools'. BMC Bioinformatics, 9:26 (2008)

See Also

getMULTIANNOTATION

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
##example Affymetrix annotation file and its location
annotationFile<-system.file('extdata','HG-U133_Plus_2_annot_part.csv',package='annotationTools')

##load annotation file
annotation<-read.csv(annotationFile,colClasses='character',comment.char='#')

##genes of interest
myGenes<-c('DDR1','GUCA1A','HSPA6',NA,'XYZ')

##column 15 in annotation contains gene symbols
colnames(annotation)

##find probe sets probing for particular genes 
getPROBESET(myGenes,annotation,idCol=15)

##find probe sets probing only for the genes of interest (i.e. with unique annotation)
getPROBESET(myGenes,annotation,idCol=15,uniqueID=TRUE)

##track origin of annotation failure for the 2 last probe set IDs
getPROBESET(myGenes,annotation,idCol=15,diagnose=TRUE)

Example output

 [1] "Probe.Set.ID"                     "GeneChip.Array"                  
 [3] "Species.Scientific.Name"          "Annotation.Date"                 
 [5] "Sequence.Type"                    "Sequence.Source"                 
 [7] "Transcript.ID.Array.Design."      "Target.Description"              
 [9] "Representative.Public.ID"         "Archival.UniGene.Cluster"        
[11] "UniGene.ID"                       "Genome.Version"                  
[13] "Alignments"                       "Gene.Title"                      
[15] "Gene.Symbol"                      "Chromosomal.Location"            
[17] "Unigene.Cluster.Type"             "Ensembl"                         
[19] "Entrez.Gene"                      "SwissProt"                       
[21] "EC"                               "OMIM"                            
[23] "RefSeq.Protein.ID"                "RefSeq.Transcript.ID"            
[25] "FlyBase"                          "AGI"                             
[27] "WormBase"                         "MGI.Name"                        
[29] "RGD.Name"                         "SGD.accession.number"            
[31] "Gene.Ontology.Biological.Process" "Gene.Ontology.Cellular.Component"
[33] "Gene.Ontology.Molecular.Function" "Pathway"                         
[35] "Protein.Families"                 "Protein.Domains"                 
[37] "InterPro"                         "Trans.Membrane"                  
[39] "QTL"                              "Annotation.Description"          
[41] "Annotation.Transcript.Cluster"    "Transcript.Assignments"          
[43] "Annotation.Notes"                
[[1]]
[1] "1007_s_at"   "207169_x_at"

[[2]]
[1] "1255_g_at"

[[3]]
[1] "117_at"

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getPROBESET(myGenes, annotation, idCol = 15) :
  one or more empty gene ID in input
2: In getPROBESET(myGenes, annotation, idCol = 15) :
  one or more gene ID not found in annotation
[[1]]
[1] "1007_s_at"   "207169_x_at"

[[2]]
[1] "1255_g_at"

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

Warning messages:
1: In getPROBESET(myGenes, annotation, idCol = 15, uniqueID = TRUE) :
  one or more empty gene ID in input
2: In getPROBESET(myGenes, annotation, idCol = 15, uniqueID = TRUE) :
  one or more gene ID not found in annotation
[[1]]
[[1]][[1]]
[1] "1007_s_at"   "207169_x_at"

[[1]][[2]]
[1] "1255_g_at"

[[1]][[3]]
[1] "117_at"

[[1]][[4]]
[1] NA

[[1]][[5]]
[1] NA


[[2]]
[1] FALSE FALSE FALSE  TRUE FALSE

[[3]]
[1] FALSE FALSE FALSE FALSE  TRUE

[[4]]
[1] FALSE FALSE FALSE FALSE FALSE

Warning messages:
1: In getPROBESET(myGenes, annotation, idCol = 15, diagnose = TRUE) :
  one or more empty gene ID in input
2: In getPROBESET(myGenes, annotation, idCol = 15, diagnose = TRUE) :
  one or more gene ID not found in annotation

annotationTools documentation built on Nov. 8, 2020, 6:58 p.m.