AnnotationDb-class: AnnotationDb objects and their progeny, methods etc.

AnnotationDb-objectsR Documentation

AnnotationDb objects and their progeny, methods etc.

Description

AnnotationDb is the virtual base class for all annotation packages. It contain a database connection and is meant to be the parent for a set of classes in the Bioconductor annotation packages. These classes will provide a means of dispatch for a widely available set of select methods and thus allow the easy extraction of data from the annotation packages.

select, columns and keys are used together to extract data from an AnnotationDb object (or any object derived from the parent class). Examples of classes derived from the AnnotationDb object include (but are not limited to): ChipDb, OrgDb GODb, OrthologyDb and ReactomeDb.

columns shows which kinds of data can be returned for the AnnotationDb object.

keytypes allows the user to discover which keytypes can be passed in to select or keys and the keytype argument.

keys returns keys for the database contained in the AnnotationDb object . This method is already documented in the keys manual page but is mentioned again here because it's usage with select is so intimate. By default it will return the primary keys for the database, but if used with the keytype argument, it will return the keys from that keytype.

select will retrieve the data as a data.frame based on parameters for selected keys columns and keytype arguments. Users should be warned that if you call select and request columns that have multiple matches for your keys, select will return a data.frame with one row for each possible match. This has the effect that if you request multiple columns and some of them have a many to one relationship to the keys, things will continue to multiply accordingly. So it's not a good idea to request a large number of columns unless you know that what you are asking for should have a one to one relationship with the initial set of keys. In general, if you need to retrieve a column (like GO) that has a many to one relationship to the original keys, it is most useful to extract that separately.

mapIds gets the mapped ids (column) for a set of keys that are of a particular keytype. Usually returned as a named character vector.

saveDb will take an AnnotationDb object and save the database to the file specified by the path passed in to the file argument.

loadDb takes a .sqlite database file as an argument and uses data in the metadata table of that file to return an AnnotationDb style object of the appropriate type.

species shows the genus and species label currently attached to the AnnotationDb objects database.

dbfile gets the database file associated with an object.

dbconn gets the datebase connection associated with an object.

taxonomyId gets the taxonomy ID associated with an object (if available).

Usage

  columns(x)
  keytypes(x)
  keys(x, keytype, ...)
  select(x, keys, columns, keytype, ...)
  mapIds(x, keys, column, keytype, ..., multiVals)
  saveDb(x, file)
  loadDb(file, packageName=NA)

Arguments

x

the AnnotationDb object. But in practice this will mean an object derived from an AnnotationDb object such as a OrgDb or ChipDb object.

keys

the keys to select records for from the database. All possible keys are returned by using the keys method.

columns

the columns or kinds of things that can be retrieved from the database. As with keys, all possible columns are returned by using the columns method.

keytype

the keytype that matches the keys used. For the select methods, this is used to indicate the kind of ID being used with the keys argument. For the keys method this is used to indicate which kind of keys are desired from keys

column

the column to search on (for mapIds). Different from columns in that it can only have a single element for the value

...

other arguments. These include:

pattern:

the pattern to match (used by keys)

column:

the column to search on. This is used by keys and is for when the thing you want to pattern match is different from the keytype, or when you want to simply want to get keys that have a value for the thing specified by the column argument.

fuzzy:

TRUE or FALSE value. Use fuzzy matching? (this is used with pattern by the keys method)

multiVals

What should mapIds do when there are multiple values that could be returned? Options include:

first:

This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior

list:

This will just returns a list object to the end user

filter:

This will remove all elements that contain multiple matches and will therefore return a shorter vector than what came in whenever some of the keys match more than one value

asNA:

This will return an NA value whenever there are multiple matches

CharacterList:

This just returns a SimpleCharacterList object

FUN:

You can also supply a function to the multiVals argument for custom behaviors. The function must take a single argument and return a single value. This function will be applied to all the elements and will serve a 'rule' that for which thing to keep when there is more than one element. So for example this example function will always grab the last element in each result: last <- function(x){x[[length(x)]]}

file

an sqlite file path. A string the represents the full name you want for your sqlite database and also where to put it.

packageName

for internal use only

Value

keys,columns and keytypes each return a character vector or possible values. select returns a data.frame.

Author(s)

Marc Carlson

See Also

keys, dbConnect, dbListTables, dbListFields, dbGetQuery, Bimap

Examples

require(hgu95av2.db)
## display the columns
columns(hgu95av2.db)
## get the 1st 6 possible keys
keys <- head( keys(hgu95av2.db) )
keys
## lookup gene symbol and gene type for the 1st 6 keys
select(hgu95av2.db, keys=keys, columns = c("SYMBOL","GENETYPE"))

## get keys based on RefSeq
keyref <- head( keys(hgu95av2.db, keytype="REFSEQ") )
keyref
## list supported key types
keytypes(hgu95av2.db)
## lookup gene symbol and refseq ID based on refseq IDs by setting
## the keytype to "REFSEQ" and passing in refseq keys:
select(hgu95av2.db, keys=keyref, columns = c("SYMBOL","REFSEQ"),
       keytype="REFSEQ")

keys <- head(keys(hgu95av2.db, 'ENTREZID'))
## get a default result (captures only the 1st element)
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID')
## or use a different option
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID', 
    multiVals="CharacterList")
## Or define your own function
last <- function(x){x[[length(x)]]}
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID', 
    multiVals=last)
    
## For other ways to access the DB, you can use dbfile() or dbconn() to extract
dbconn(hgu95av2.db)
dbfile(hgu95av2.db)

## Try to retrieve an associated taxonomyId 
taxonomyId(hgu95av2.db)

Bioconductor/AnnotationDbi documentation built on Nov. 2, 2024, 6:27 a.m.