SimpleCompoundDb-class: A simple metabolite compound database

Description Usage Arguments Value Objects of the class Slots Constructors and alike Basic data retrieval and usage Compound identification Author(s) References See Also Examples

Description

The SimpleCompoundDb represents simple database to store compound information and thus to assist and allow to perform a simple identification of compounds based on their mass.

This package also provides a SimpleCompoundDb object bound to the variable name scDb.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## S4 method for signature 'SimpleCompoundDb'
as.data.frame(x, row.names=NULL,
                                           optional=FALSE, ...)

## S4 method for signature 'SimpleCompoundDb'
columns(x)

## S4 method for signature 'SimpleCompoundDb'
compounds(x, columns, filter=list(),
                                       order.by="accession", ...)

## S4 method for signature 'SimpleCompoundDb'
dbconn(x)

## S4 method for signature 'SimpleCompoundDb'
listTables(x, ...)

## S4 method for signature 'numeric,SimpleCompoundDb'
mzmatch(x, mz, mzdev=0, ppm=10,
                                             column="monoisotopic_molecular_weight",
                                             ionAdduct=NULL)

## S4 method for signature 'matrix,SimpleCompoundDb'
mzmatch(x, mz, mzdev=0, ppm=10,
                                             column="monoisotopic_molecular_weight",
                                             ionAdduct=NULL)

SimpleCompoundDb(x)

Arguments

(in alphabetic order)

column

For mzmatch: the name of the column in the database against which the provided M/Z values should be matched. Allowed are either "avg_molecular_weight" or "monoisotopic_molecular_weight".

columns

For compounds: a character vector with names of columns that should be returned from the database. By default all columns from the compound_basic table are returned.

filter

For compounds: a single filter object of a list of filter objects. The filter objects should extend the BasicFilter class defined in the ensembldb package. See help for CompoundidFilter for supported filters.

ionAdduct

The name(s) of the suspected ion adducts of the measured M/Z values. By default (ionAdduct=NULL) it is assumed that the searched compount is already an ion. See supportedIonAdducts for a complete list of available ion adducts.

mz

For mzmatch: the SimpleCompoundDb instance.

mzdev

Numeric (length 1) specifying an allowed maximal difference of the M/Z values. See mzmatch for more details.

optional

For as.data.frame: not used.

order.by

For compounds: the column by which the result should be ordered.

ppm

Numeric (length 1) specifying the parts per million deviation/difference of the M/Z value. See mzmatch for more details.

row.names

For as.data.frame: not used.

x

For SimpleCompoundDb: a character string specifying the SQLite database file name providing the annotations.

For mzmatch: a numeric vector with the M/Z values that should be matched against the database or a matrix with two columns (or columns named "mzmin" and "mzmax") specifying the minimum and maximum M/Z of the peak.

For all other methods: a SimpleCompoundDb object.

...

For compounds: additional arguments passed to the internal getWhat method.

Ignored for all other functions.

Value

Refer to the method and function description above for detailed information on the returned result object.

Objects of the class

SimpleCompoundDb objects should only be created using the SimpleCompoundDb function, that requires the file name of the SQLite database providing the data.

Slots

con

The database connection to the database.

tables

A list of the database tables. Names of the list represent the names of the database tables, the values their attributes (column names).

.properties

An internal list of optional properties.

Constructors and alike

SimpleCompoundDb

Constructor function to create a new SimpleCompoundDb instance providing a simplified access to the annotation data stored in the SQLite database, which file name has to be provided with the single argument x.

Basic data retrieval and usage

as.data.frame

Retrieve the full data stored in the database as a data.frame.

compounds

Retrieve data from the database starting at the compound_basic database table. By default this method returns data from that table, but, depending on the columns attribute, might also return data from other tables, that are joined to the compound_basic table. Tables are joined using a left join starting from the compound_basic table, thus containing all data from that table. The results can be filtered by specifying one or more filters (defined with objects extending the BasicFilter from the ensembldb package, e.g. the CompoundidFilter).

The method returns a data.frame with the results of the query.

dbconn

Returns the SQLiteConnection object providing the connection to the actual database.

columns

Get the column names from the database tables. Returns a character vector with the names of the columns.

listTables

Get the tables and their columns from the database. The method returns a list, the list names being the database table names and the values their attribute (database table column) names.

Compound identification

mzmatch

Matches the given M/Z values with the masses of compounds in the database and returns the matches if the "deltaMz" was smaller than the specified threshold. By default the method assumes that the M/Z corresponds to the mass of the compound, i.e. that the measured feature is already an ion. With the argument ionAdduct it is possible to specify the assumed ion adduct of the real compound that is measured in the MS. It is possible to specify single adduct names, or all possible (i.e. the most commonly found) ion adducts according to the Fiehn lab's ESI MS adduct calculator [Huang 1999]. In that case, the mass of the possible adduct is calculated for each input M/Z and this mass is looked up in the database. The ppm and mzdev arguments are used for the mass search.

If the input argument x is a matrix specifying the minimal and maximal M/Z value for a peak matching is performed against this range, i.e. the minimum and maximum range is first converted to the corresponding mass and compounds in the database are searched with a mass within the such defined mass range which is extended by the specified ppm. In that case, the reported "deltaMz" in the result table is the distance of the compound mass to the mean mass of the peak (i.e. to the mean of the minimal and maximal mass for the minmz and maxmz).

Like the generic mzmatch method, this method returns a list, each of the elements representing the result for one of the specified masses provided as a two-column matrix "idx" and deltaMz, "idx" containing the compound IDs, "deltaMz" the difference of the specified mass and the compound's M/Z. The ion adduct name is provided as third column ("adduct"). For more details see the mzmatch method help.

Author(s)

Johannes Rainer.

References

Huang N, Siegel MM, Kruppa GH & Laukien FH (1999) Automation of a Fourier transform ion cyclotron resonance mass spectrometer for acquisition, analysis, and e-mailing of high-resolution exact-mass electrospray ionization mass spectral data. Journal of the American Society for Mass Spectrometry. pp1166-1173.

See Also

xcmsSet, xcmsRaw, MSdata, MSsliceList, CompoundidFilter, mzmatch, supportedIonAdducts

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## With the package a SimpleCompoundDb object is provided which is bound
## to the variable name scDb.
scDb

## List all tables from the database along with the respective columns
listTables(scDb)

## List all column names
columns(scDb)

## Retrieve all data as a data.frame
tmp <- as.data.frame(scDb)
nrow(tmp)
head(tmp)

## Retrieve compound data for specific compounds only.
## Define the CompoundidFilter
cf <- CompoundidFilter(c("HMDB00010", "HMDB00002", "HMDB00011"))
res <- compounds(scDb, filter=cf, columns=c("name", "inchikey"))
res


## Perform a compound identification.
## Define masses which we would like to identify, i.e. match against the
## compounds in the database. In the default case it is expected that the
## measured features are ions and that the M/Z corresponds already to the
## mass.
comps <- c(300.1898, 298.1508, 491.2000, 169.13481, 169.1348)
Res <- mzmatch(comps, scDb)

## We get a list of results, each list element representing the result of one
## of the specified masses. If the provided mass does not match any compound
## in the database NA is returned.
Res

## Getting the compound name for the 2nd mass.
cf <- CompoundidFilter(Res[[2]][, 1])
res <- compounds(scDb, filter=cf, columns=c("name", "inchikey", "monoisotopic_molecular_weight"))
res

## Next we match the masses of all possible positive ion adducts that would result in the input M/Z
Res <- mzmatch(comps, scDb, ionAdduct=supportedIonAdducts(charge="pos"))
## Removing adducts that would not match.
lapply(Res, function(x){
    return(x[!is.na(x[, 1]), ])
})

## Perform some basic SQL queries.
require(RSQLite)
tmp <- dbGetQuery(dbconn(scDb), "select * from compound_basic where avg_molecular_weight < 100;")
nrow(tmp)
head(tmp)

jotsetung/xcmsExtensions documentation built on May 19, 2019, 9:42 p.m.