CompDb: Simple compound (metabolite) databases

View source: R/CompDb.R

CompDbR Documentation

Simple compound (metabolite) databases

Description

CompDb objects provide access to general (metabolite) compound annotations along with metadata information such as the annotation's source, date and release version. The data is stored internally in a database (usually an SQLite database).

hasMsMsSpectra returns TRUE if MS/MS spectrum data is available in the database and FALSE otherwise.

Usage

CompDb(x, flags = SQLITE_RO)

hasMsMsSpectra(x)

src_compdb(x)

tables(x)

copyCompDb(x, y)

## S4 method for signature 'CompDb'
dbconn(x)

## S4 method for signature 'CompDb'
Spectra(object, filter, ...)

## S4 method for signature 'CompDb'
supportedFilters(object)

## S4 method for signature 'CompDb'
metadata(x, ...)

## S4 method for signature 'CompDb'
spectraVariables(object, ...)

## S4 method for signature 'CompDb'
compoundVariables(object, includeId = FALSE, ...)

## S4 method for signature 'CompDb'
compounds(
  object,
  columns = compoundVariables(object),
  filter,
  return.type = c("data.frame", "tibble"),
  ...
)

## S4 method for signature 'CompDb,Spectra'
insertSpectra(object, spectra, columns = spectraVariables(spectra), ...)

## S4 method for signature 'CompDb'
deleteSpectra(object, ids = integer(0), ...)

## S4 method for signature 'CompDb'
mass2mz(x, adduct = c("[M+H]+"), name = "formula")

## S4 method for signature 'CompDb'
insertCompound(object, compounds = data.frame(), addColumns = FALSE)

## S4 method for signature 'CompDb'
deleteCompound(object, ids = character(), recursive = FALSE, ...)

Arguments

x

For CompDb(): character(1) with the file name of the SQLite compound database. Alternatively it is possible to provide the connection to the database with parameter x. For copyCompDb(): either a CompDb or a database connection.

For all other methods: a `CompDb` object.
flags

flags passed to the SQLite database connection. See SQLite(). Defaults to read-only, i.e. RSQLite::SQLITE_RO.

y

For copyCompDb(): connection to a database to which the content should be copied.

object

For all methods: a CompDb object.

filter

For compounds() and Spectra(): filter expression or AnnotationFilter() defining a filter to be used to retrieve specific elements from the database.

...

additional arguments. Currently not used.

includeId

for compoundVariables(): logical(1) whether the comound ID (column "compound_id") should be included in the result. The default is includeIds = FALSE.

columns

For compounds(), Spectra: character with the names of the database columns that should be retrieved. Use compoundVariables() and/or spectraVariables() for a list of available column names. For insertSpectra(): columns (spectra variables) that should be inserted into the database (to avoid inserting all variables).

return.type

For compounds(): either "data.frame" or "tibble" to return the result as a data.frame() or tibble(), respectively.

spectra

For insertSpectra(): Spectra object containing the spectra to be added to the IonDb database.

ids

For deleteSpectra(): integer() specifying the IDs of the spectra to delete. IDs in ids that are not associated to any spectra in the CompDb object are ignored. For deleteCompound: character() with the compound IDs to be deleted.

adduct

either a character specifying the name(s) of the adduct(s) for which the m/z should be calculated or a data.frame with the adduct definition. See adductNames() for supported adduct names and the description for more information on the expected format if a data.frame is provided.

name

For mass2mz(): character(1). Defines the CompDb column that will be used to name/identify the returned m/z values. By default (name = "formula") m/z values for all unique molecular formulas are calculated and these are used as rownames for the returned matrix. With name = "compound_id" the adduct m/z for all compounds (even those with equal formulas) are calculated and returned.

compounds

For insertCompound(): data.frame with compound data to be inserted into a CompDb database. See function description for details.

addColumns

For insertCompound(): logical(1) whether all (extra) columns in parameter compounds should be stored also in the database table. The default is addColumns = FALSE.

recursive

For deleteCompound(): logical(1) whether also MS2 spectra associated with the compounds should be deleted.

Details

CompDb objects should be created using the constructor function CompDb() providing the name of the (SQLite) database file providing the compound annotation data.

Value

See description of the respective function.

Retrieve annotations from the database

Annotations/compound informations can be retrieved from a CompDb database with the compounds() and Spectra() functions:

  • compounds() extracts compound data from the CompDb object. In contrast to src_compdb it returns the actual data as a data.frame (if return.type = "data.frame") or a tibble::tibble() (if return.type = "tibble"). A compounds() call will always return all elements from the ms_compound table (unless a filter is used).

  • Spectra() extract spectra from the database and returns them as a Spectra() object from the Spectra package. Additional annotations requested with the columns parameter are added as additional spectra variables.

General functions

  • CompDb(): connect to a compound database.

  • compoundVariables(): returns all available columns/database fields for compounds.

  • copyCompDb(): allows to copy the content from a CompDb to another database. Parameter x is supposed to be either a CompDb or a database connection from which the data should be copied and y a connection to a database to which it should be copied.

  • dbconn(): returns the connection (of type DBIConnection) to the database.

  • metadata(): returns general meta data of the compound database.

  • spectraVariables(): returns all spectra variables (i.e. columns) available in the CompDb.

  • src_compdb() provides access to the CompDb's database via the functionality from the dplyr/dbplyr package.

  • supportedFilters(): provides an overview of the filters that can be applied on a CompDb object to extract only specific data from the database.

  • tables(): returns a named list (names being table names) with the fields/columns from each table in the database.

  • mass2mz(): calculates a table of the m/z values for each compound based on the provided set of adduct(s). Adduct definitions can be provided with parameter adduct. See MetaboCoreUtils::mass2mz() for more details. Parameter name defines the database table column that should be used as rownames of the returned matrix. By default name = "formula", m/z values are calculated for each unique formula in the CompDb x.

Adding and removing data from a database

Note that inserting and deleting data requires read-write access to the database. Databases returned by CompDb are by default read-only. To get write access CompDb should be called with parameter flags = RSQLite::SQLITE_RW.

  • insertCompound(): adds additional compound(s) to a CompDb. The compound(s) to be added can be specified with parameter compounds that is expected to be a data.frame with columns "compound_id", "name", "inchi", "inchikey", "formula", "exactmass". Column "exactmass" is expected to contain numeric values, all other columns character. Missing values are allowed for all columns except "compound_id". An optional column "synonyms" can be used to provide alternative names for the compound. This column can contain a single character by row, or a list with multiple character (names) per row/compound (see examples below for details). By setting parameter addColumns = TRUE any additional columns in compound will be added to the database table. The default is addColumns = FALSE. The function returns the CompDb with the compounds added. See also createCompDb() for more information and details on expected compound data and the examples below for general usage.

  • deleteCompound(): removes specified compounds from the CompDb database. The IDs of the compounds that should be deleted need to be provided with parameter ids. To include compound IDs in the output of a compounds() call "compound_id" should be added to the columns parameter. By default an error is thrown if for some of the specified compounds also MS2 spectra are present in the database. To force deletion of the compounds along with all associated MS2 spectra use recursive = TRUE. See examples below for details. The function returns the updated CompDb database.

  • insertSpectra(): adds further spectra to the database. The method always adds all the spectra specified through the spectra parameter and does not check if they are already in the database. Note that the input spectra must have the variable compound_id and only Spectra whose compound_id values are also in compounds(object, "compound_id") can be added. Parameter columns defines which spectra variables from the spectra should be inserted into the database. By default, all spectra variables are added but it is strongly suggested to specifically select (meaningful) spectra variables that should be stored in the database. Note that a spectra variable "compound_id" is mandatory. If needed, the function adds additional columns to the msms_spectrum database table. The function returns the updated CompDb object.

  • deleteSpectra(): deletes specified spectra from the database. The IDs of the spectra to be deleted need to be provided with parameter ids.

Filtering the database

Data access methods such as compounds() and Spectra allow to filter the results using specific filter classes and expressions. Filtering uses the concepts from Bioconductor's AnnotationFilter package. All information for a certain compound with the ID "HMDB0000001" can for example be retrieved by passing the filter expression filter = ~ compound_id == "HMDB0000001" to the compounds function.

Use the supportedFilters() function on the CompDb object to get a list of all supported filters. See also examples below or the usage vignette for details.

Author(s)

Johannes Rainer

See Also

createCompDb() for the function to create a SQLite compound database.

CompoundIdFilter() for filters that can be used on the CompDb database.

Examples


## We load a small compound test database based on MassBank which is
## distributed with this package.
cdb <- CompDb(system.file("sql/CompDb.MassBank.sql", package = "CompoundDb"))
cdb

## Get general metadata information from the database, such as originating
## source and version:
metadata(cdb)

## List all available compound annotations/fields
compoundVariables(cdb)

## Extract a data.frame with these annotations for all compounds
compounds(cdb)

## Note that the `compounds` function will by default always return a
## data frame of **unique** entries for the specified columns. Including
## also the `"compound_id"` to the requested columns will ensure that all
## data is returned from the tables.
compounds(cdb, columns = c("compound_id", compoundVariables(cdb)))

## Add also the synonyms (aliases) for the compounds. This will cause the
## tables compound and synonym to be joined. The elements of the compound_id
## and name are now no longer unique
res <- compounds(cdb, columns = c("name", "synonym"))
head(res)

## List all database tables and their columns
tables(cdb)

## Any of these columns can be used in the `compounds` call to retrieve
## the specific annotations. The corresponding database tables will then be
## joined together
compounds(cdb, columns = c("formula", "publication"))

## Calculating m/z values for the exact masses of unique chemical formulas
## in the database:
mass2mz(cdb, adduct = c("[M+H]+", "[M+Na]+"))

## By using `name = "compound_id"` the calculation will be performed for
## each unique compound ID instead (resulting in potentially redundant
## results)
mass2mz(cdb, adduct = c("[M+H]+", "[M+Na]+"), name = "compound_id")

## Create a Spectra object with all MS/MS spectra from the database.
library(Spectra)
sps <- Spectra(cdb)
sps

## Extract spectra for a specific compound.
sps <- Spectra(cdb, filter = ~ name == "Mellein")
sps

## List all available annotations for MS/MS spectra
spectraVariables(sps)

## Get access to the m/z values of these
mz(sps)

library(Spectra)
## Plot the first spectrum
plotSpectra(sps[1])


#########
## Filtering the database
##
## Get all compounds with an exact mass between 310 and 320
res <- compounds(cdb, filter = ~ exactmass > 310 & exactmass < 320)
res

## Get all compounds that have an H14 in their formula.
res <- compounds(cdb, filter = FormulaFilter("H14", "contains"))
res

#########
## Using CompDb with the *tidyverse*
##
## Using return.type = "tibble" the result will be returned as a "tibble"
compounds(cdb, return.type = "tibble")

## Use the CompDb in a dplyr setup
library(dplyr)
src_cmp <- src_compdb(cdb)
src_cmp

## Get a tbl for the ms_compound table
cmp_tbl <- tbl(src_cmp, "ms_compound")

## Extract the id, name and inchi
cmp_tbl %>% select(compound_id, name, inchi) %>% collect()

########
## Creating an empty CompDb and sequentially adding content
##
## Create an empty CompDb and store the database in a temporary file
cdb <- emptyCompDb(tempfile())
cdb

## Define a data.frame with some compounds to add
cmp <- data.frame(
    compound_id = c(1, 2),
    name = c("Caffeine", "Glucose"),
    formula = c("C8H10N4O2", "C6H12O6"),
    exactmass = c(194.080375584, 180.063388116))

## We can also add multiple synonyms for each compound
cmp$synonyms <- list(c("Cafeina", "Koffein"), "D Glucose")
cmp

## These compounds can be added to the empty database with insertCompound
cdb <- insertCompound(cdb, compounds = cmp)
compounds(cdb)

## insertCompound would also allow to add additional columns/annotations to
## the database. Below we define a new compound adding an additional column
## hmdb_id
cmp <- data.frame(
    compound_id = 3,
    name = "Alpha-Lactose",
    formula = "C12H22O11",
    exactmass = 342.116211546,
    hmdb_id = "HMDB0000186")

## To add additional columns we need to set addColumns = TRUE
cdb <- insertCompound(cdb, compounds = cmp, addColumns = TRUE)
cdb
compounds(cdb)

######
## Deleting selected compounds from a database
##
## Compounds can be deleted with the deleteCompound function providing the
## IDs of the compounds that should be deleted. IDs of compounds in the
## database can be retrieved by adding "compound_id" to the columns parameter
## of the compounds function:
compounds(cdb, columns = c("compound_id", "name"))

## Compounds can be deleted with the deleteCompound function. Below we delete
## the compounds with the IDs "1" and "3" from the database
cdb <- deleteCompound(cdb, ids = c("1", "3"))
compounds(cdb)

## If also MS2 spectra associated with any of these two compounds an error
## would be thrown. Setting the parameter `recursive = TRUE` in the
## `deleteCompound` call would delete the compounds along with their MS2
## spectra.

rformassspectrometry/CompoundDb documentation built on Sept. 28, 2024, 8:48 a.m.