idtype: Identifying Gene or Probe ID Type

Description Usage Arguments Details Value Methods (by generic) Examples

Description

The S4 generic idtype automatically determines the type of gene/feature identifiers stored in objects, based on a combination of regular expression patterns and test functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
idtype(object, ...)

## S4 method for signature 'missing'
idtype(object, def = FALSE)

## S4 method for signature 'matrix'
idtype(object, ...)

## S4 method for signature ''NULL''
idtype(object, ...)

## S4 method for signature 'vector'
idtype(object, each = FALSE, limit = NULL, no.match = "")

## S4 method for signature 'ExpressionSet'
idtype(object, ...)

## S4 method for signature 'ProbeAnnDbBimap'
idtype(object, limit = 500L, ...)

## S4 method for signature 'ChipDb'
idtype(object, limit = 500L, ...)

## S4 method for signature 'AnnDbBimap'
idtype(object, limit = 500L, ...)

## S4 method for signature 'list'
idtype(object, ...)

## S4 method for signature ''NULL''
idtype(object, ...)

Arguments

object

an R object that contains the gene identifiers whose type is to be determined.

...

extra argument to allow extension, generally passed down to idtype,character-method. See each method's description for more details.

def

a logical or a subsetting vector, used when object is missing, which indicates that the result should contain the definition of the matching pattern/function of each type, or which type's deifnition should be included in the result list.

each

logical indicating whether the type of each element should be returned (TRUE) or only the type of the vector as a whole (default).

limit

specification for limiting which elements are used to detect the type of identifiers. If a single numeric, then only the first limit elements are used. Otherwise it must be a subsetting logical or numeric vector.

no.match

character string that specifies the string to use when the type cannot be determined.

The IDs can be either:

  • probe IDs (e.g. 123456_at or ILMN_123456 for Affymetrix or Illumina chips respectively), the type starts with a dot '.', allowing the subsequent handling of such IDs as a group.

  • other biological ID types, the result are character strings such as those used as attributes in Bioconductor annotation packages (e.g. "ENTREZID" or "ENSEMBL")

  • Names of annotation packages e.g. "hgu133plus2.db".

This function is able to identify the following ID types using regular expression patterns or dedicated function:

  • ENSEMBL = "^ENSG[0-9]+$"

  • ENSEMBLTRANS = "^ENST[0-9]+$"

  • ENSEMBLPROT = "^ENSP[0-9]+$"

  • ENTREZID = "^[0-9]+$"

  • IMAGE = "^IMAGE:[0-9]+$"

  • GOID = "^GO:[0-9]+$"

  • PFAM = "^PF[0-9]+$"

  • REFSEQ = "^N[MP]_[0-9]+$"

  • ENZYME = "^[0-9]+(\.(([0-9]+)|-)+)3$"

  • MAP = "^[0-9XY]+((([pq])|(cen))(([0-9]+(\.[0-9]+)?)|(ter))?(-([0-9XY]+)?(([pq]?)|(cen))((ter)|([0-9]+(\.[0-9]+)?))?)?)?$"

  • GENEBANK (Nucleotide) = "^[A-Z][0-9]5$" | "^[A-Z]2[0-9]6$"

  • GENEBANK (Protein) = "^[A-Z]3[0-9]5$"

  • GENEBANK (WGS) = "^[A-Z]4[0-9]8[0-9]?[0-9]?$"

  • GENEBANK (MGA) = "^[A-Z]5[0-9]7$"

  • GENENAME = " "

  • .Affymetrix = "(^AFFX-)|(^[0-9]+_([abfgilrsx]_)?([as]t)|(i))$"

  • .Illumina = "^ILMN_[0-9]+$"

  • .Agilent = "^A_[0-9]+_P[0-9]+$"

  • .nuID = use the function nuIDdecode to try converting the ids into nucleotide sequences. Identification is positive if no error is thrown during the conversion.

Details

It uses a heuristic based on a set of regular expressions and functions that uniquely match most common types of identifiers, such as Unigene, Entrez gene, Affymetrix probe ids, Illumina probe ids, etc..

Value

a single character string (possibly empty) if each=FALSE (default) or a character vector of the same "length" as object otherwise.

Methods (by generic)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# all known types
idtype()
# with their definitions
idtype(def=TRUE)
idtype(def='ENTREZID')
idtype(def=c('ENTREZID', 'ENSEMBLTRANS'))


idtype("12345_at")
idtype(c("12345_at", "23232_at", "555_x_at"))
# mixed types
ids <- c("12345_at", "23232_at", "Hs.1213")
idtype(ids) # not detected
idtype(ids, each=TRUE)

renozao/xbioc documentation built on Sept. 3, 2020, 1:13 a.m.