BiocStyle::markdown()
Package: r Biocpkg("Chromatograms")
Authors: r packageDescription("Chromatograms")[["Author"]]
Compiled: r date()
library(Chromatograms) library(BiocStyle)
Similar to the r Biocpkg("Spectra")
package, the r Biocpkg("Chromatograms")
also separates the user-faced functionality to process and analyze
chromatographic mass spectrometry (MS) data from the code for storage and
representation of the data. The latter functionality is provided by
implementations of the ChromBackend
class, further on called backends. This
vignette describes the ChromBackend
class and illustrates on a simple example
how a backend extending this class could be implemented.
Contributions to this vignette (content or correction of typos) or requests for additional details and information are highly welcome, ideally via pull requests or issues on the package's github repository.
ChromBackend
?The purpose of a backend class extending the virtual ChromBackend
is to
provide the chromatographic MS data to the Chromatograms
object, which is used
by the user to interact with - and analyze the data. The ChromBackend
defines
the API that new backends need to provide so that they can be used with
Chromatograms
. This API defines a set of methods to access the data. For many
functions default implementations exist and a dedicated implementation for a new
backend is only needed if necessary (e.g. if the data is stored in a way that a
different access to it would be better). In addition, a core set of variables
(data fields), the so called core chromatogram variables, is defined to
describe the chromatographic data. Each backend needs to provide these, but can
in also define additional data fields. Before implementing a new backend it is
highly suggested to carefully read the following Conventions and definitions
section.
General conventions for chromatographic MS data of a Chromatograms
are:
Chromatograms
object is designed to contain multiple chromatographic
data (not data from a single chromatogram).NA
) for retention time values are not supported.coreChromVariables()
function.dataStorage
and dataOrigin
are two special variables that define
for each chromatogram where the data is (currently) stored and from where the
data derived, respectively. Both are expected to be of
typecharacter
. Missing values for dataStorage
are not allowed.ChromBackend
implementations can also represent purely read-only data
resources. In this case only data accessor methods need to be implemented but
not data replacement methods (i.e. <-
methods that would allow to add or set
variables. Read-only backends should implement the isReadOnly()
method, that
should then return TRUE
. Note that backends for purely read-only resources
could also implement a caching mechanism to (temporarily) store changes to
the data locally within the object (and hence in memory). See information on
the MsBackendCached
in the r Biocpkg("Spectra")
package for more details.For parallel processing, Chromatograms
splits the backend based on a defined
factor
and processes each in parallel (or in serial if a SerialParam
is
used). The splitting factor
can be defined for Chromatograms
by setting the
parameter processingChunkSize
. Alternatively, through the
backendParallelFactor()
method the backend can also suggest a factor
that
should/could be used for splitting and parallel processing. The default
implementation for backendParallelFactor()
is to return an empty factor
(factor()
) hence not suggesting any preferred splitting.
Besides parallel processing, for on-disk backends (i.e., backends that don't keep all of the data in memory), this chunk-wise processing can also reduce the memory demand for operations, because only the peak data of the current chunk needs to be realized in memory.
The ChromBackend
class defines core methods that have to be implemented by a
MS backend as well as optional methods for which a default implementation is
already available. These functions are described in sections Required methods
and Optional methods, respectively.
To create a new backend a class extending the virtual ChromBackend
needs to be
implemented. In the example below we create thus a simple class with a
data.frame
for general properties (chromatogram variables) and two slots for
the retention time and intensity values, representing the actual chromatographic
MS data. We store these values as list
, each list element representing values
for one chromatogram, since the number of values (peaks) can be different
between chromatograms. We also define a simple constructor function that returns
an empty instance of our new class.
library(Chromatograms) #' Definition of the backend class extending ChromBackend setClass("ChromBackendTest", contains = "ChromBackend", slots = c( chromVars = "data.frame", rtime = "list", intensity = "list" ), prototype = prototype( spectraVars = data.frame(), rtime = list(), intensity = list() )) #' Simple constructor function ChromBackendTest <- function() { new("ChromBackendTest") }
The 3 slots @chromVars
, @rtime
and @intensity
will be used to store our MS
data: each row in chromVars
will contain data for one chromatogram with the
columns being the different chromatogram variables (i.e. additional properties
of a chromatogram such as its m/z value or MS level) and each element in
@rtime
and @intensity
a numeric
vector with the retention times and
intensity values representing thus the peaks data of the respective
chromatogram. This is only one of the possibly many ways chromatographic data
might be represented.
We should ideally also add some basic validity function that ensures the data to
be correct (valid). The function below simply checks that the number of rows of
the @chromVars
slot matches the length of the @rtime
and @intensity
slots.
#' Basic validation function setValidity("ChromBackendTest", function(object) { if (length(object@rtime) != length(object@intensity) || length(object@rtime) != nrow(object@chromVars)) return("length of 'rtime' and 'intensity' has to match the number of ", "rows of 'chromVars'") NULL })
We can now create an instance of our new class with the ChromBackendTest()
function.
#' Create an empty instance of ChromBackendTest be <- ChromBackendTest() be
A show()
method would allow for a more convenient way how general information
of our object is displayed. Below we add an implementation of the show()
method.
#' implementation of show for ChromBackendTest setMethod("show", "ChromBackendTest", function(object) { cd <- object@chromVars cat(class(object), "with", nrow(cd), "chromatograms\n") }) be
Methods listed in this section must be implemented for a new class extending
ChromBackend
. Methods should ideally also be implemented in the order they are
listed here. Also, it is strongly advised to write dedicated unit tests for each
newly implemented method or function already during the development.
dataStorage()
The dataStorage
chromatogram variable provides information how or where the
data is stored. The dataStorage()
method should therefore return a character
vector of length equal to the number of chromatograms that are represented by
the object. The values for dataStorage
can be any character value, except
NA
. For our example backend we define a simple dataStorage()
method that
simply returns the column "dataStorage"
from the @chromVars
(as a
character
).
#' dataStorage method to provide information *where* data is stored setMethod("dataStorage", "ChromBackendTest", function(object) { as.character(object@chromVars$dataStorage) })
Calling dataStorage()
on our example backend will thus return an empty
character
(since the object created above does not contain any data).
dataStorage(be)
length()
length()
is expected to return an integer
of length 1 with the total number
of chromatograms that are represented by the backend. For our example backend we
simply return the number of rows of the data.frame
stored in the @chromVars
slot.
#' length to provide information on the number of chromatograms setMethod("length", "ChromBackendTest", function(x) { nrow(x@chromVars) }) length(be)
backendInitialize()
The backendInitialize()
method is expected to be called after creating an
instance of the backend class and should prepare (initialize) the backend with
data. This method can take any parameters needed by the backend to get
loaded/initialized with data (which can be file names from which to load the
data, a database connection or object(s) containing the data). During
backendInitialize()
it is also suggested to set the special spectra variables
dataStorage
and dataOrigin
are set.
Below we define a backendInitialize()
method that takes as arguments a
data.frame
with spectra variables and two list
s with the retention time and
intensity values for each spectrum.
#' backendInitialize method to fill the backend with data. setMethod( "backendInitialize", "ChromBackendTest", function(object, chromVars, rtime, intensity) { if (!is.data.frame(chromVars)) stop("'chromVars' needs to be a 'data.frame' with the general", "chromatogram variables") ## Defining dataStorage and dataOrigin, if not available if (is.null(chromVars$dataStorage)) chromVars$dataStorage <- "<memory>" if (is.null(chromVars$dataOrigin)) chromVars$dataOrigin <- "<user provided>" object@chromVars <- chromVars object@rtime <- rtime object@intensity <- intensity validObject(object) object })
In addition to adding the data to object, the function also defined the
dataStorage
and dataOrigin
spectra variables. The purpose of these two
variables is to provide some information on where the data is currently stored
(in memory as in our example) and from where the data is originating.
We can now create an instance of our backend class and fill it with data. We
thus first define our MS data and pass this to the backendInitialize()
method.
#' A data.frame with chromatogram variables. cvars <- data.frame(msLevel = c(1L, 1L, 1L), mz = c(112.2, 123.3, 134.4)) #' retention time values for each chromatogram. rts <- list(c(12.4, 12.8, 13.2, 14.6), c(45.1, 46.2), c(64.4, 64.8, 65.2)) #' intensity values for each chromatogram. ints <- list(c(123.3, 153.6, 2354.3, 243.4), c(100, 80.1), c(12.3, 135.2, 100)) #' Create and initialize the backend be <- backendInitialize(ChromBackendTest(), chromVars = cvars, rtime = rts, intensity = ints) be
While this method works and is compliant with the MsBackend
API (because there
is no requirement on the input parameters for the backendInitialize()
method),
it would be good practice for backends to support an additional parameter data
that would allow passing the complete MS data (including retention time and
intensity values) to the function as a DataFrame
. This would simplify the
implementation of some replacement methods and would in addition also allow to
change the backend of a Chromatograms
using the setBackend()
function to our
new backend. Also, it is highly suggested to check the validity of the input
data within the initialize method. The advantage of performing these validity
checks in backendInitialize()
over adding them with setValidity()
is that
eventually computationally expensive operations/checks would only performed
once instead of each time values within the object are changed (e.g. by
subsetting or similar), which would be the case with validation functionality
registered with setValidity()
.
We thus re-implement the backendInitialize()
method supporting also the data
parameter mentioned above and add additional validity checks. These validity
checks verify that only numeric values are provided with rtime
and
intensity
, that the number of retention time and intensity values matches for
each chromatogram. We also use the validChromData()
function that checks that
provided core chromatogram variables have the correct data type.
#' Reimplementation of backendInitialize with a `data` parameter and #' additional input validation setMethod( "backendInitialize", "ChromBackendTest", function(object, chromVars, rtime, intensity, data) { ## Extract relevant information from a parameter `data` if provided if (!missing(data)) { chromVars <- as.data.frame( data[, !colnames(data) %in% c("rtime", "intensity")]) if (any(colnames(data) == "rtime")) rtime <- data$rtime if (any(colnames(data) == "intensity")) intensity <- data$intensity } ## Check that provided variables have the correct data type validChromData(chromVars) n <- nrow(chromVars) ## Validate rtime and intensity if (missing(rtime)) rtime <- vector("list", n) if (missing(intensity)) intensity <- vector("list", n) if (length(rtime) != length(intensity) || length(rtime) != n) stop("lengths of 'rtime' and 'intensity' need to match the ", "number of chromatograms (i.e., nrow of 'chromVars'") if (any(lengths(rtime) != lengths(intensity))) stop("the number of data values in 'rtime' and 'intensity' have ", "to match") if (!all(vapply(rtime, is.numeric, logical(1)))) stop("'rtime' has to be a list of numeric values") if (!all(vapply(intensity, is.numeric, logical(1)))) stop("'intensity' has to be a list of numeric values") ## If rtime or itensity is of type NumericList convert to list if (inherits(rtime, "NumericList")) rtime <- as.list(rtime) if (inherits(intensity, "NumericList")) intensity <- as.list(intensity) ## Setting dataStorage and dataOrigin chromVars$dataStorage <- rep("<memory>", n) if (is.null(chromVars$dataOrigin)) chromVars$dataOrigin <- rep("<user provided>", n) ## Fill object with data object@chromVars <- as.data.frame(chromVars) object@rtime <- rtime object@intensity <- intensity validObject(object) object })
This extended backendInitialize()
implementation would now also assure data
validity and integrity. Below we use this function again to create our backend
instance.
#' Create and initialize the backend be <- backendInitialize(ChromBackendTest(), chromVars = cvars, rtime = rts, intensity = ints) be
The backendInitialize()
method that we implemented for our backend class
expects the user to provide the full MS data. It would alternatively also be
possible to implement a method that takes data file names as input from which
the function can then import the data. The purpose of the backendInitialize()
method is to initialize and prepare the data in a way that it can be accessed
by a Chromatograms
object. Whether the data is actually loaded into memory or
simply referenced and loaded upon request does not matter as long as the backend
is able to provide the data though its accessor methods when requested by the
Chromatograms
object.
chromVariables()
The chromVariables()
method should return a character
vector with the names
of all available chromatogram variables of the backend. While a backend class
should support defining and providing their own variables, each ChromBackend
class must provide also the core chromatogram variables (in the correct
data type). These can be listed by the coreChromVariables()
function:
#' List core chromatogram variables along with data types. coreChromVariables()
A typical chromVariables()
method for a ChromBackend
class will thus be
implemented similarly to the one for our ChromBackendTest
test backend: it
will return the union of the core chromatogram variables and the names for all
available spectra variables within the backend object.
#' Accessor for available chromatogram variables setMethod("chromVariables", "ChromBackendTest", function(object) { union(names(coreChromVariables()), colnames(object@chromVars)) }) chromVariables(be)
chromData()
The chromData
method should return the full chromatogram data within a
backend as a DataFrame
object (defined in the r Biocpkg("S4Vectors")
package). A parameter columns
should allow to define the names of the
variables that should be returned. Each row in this data frame should represent
one chromatogram, each column a chromatogram variable. Columns "rtime"
and
"intensity"
(if requested) have to contain each a NumericList
with the
retention time and intensity values of the chromatograms. The DataFrame
must provide values (even if they are NA
) for all requested spectra
variables of the backend (including the core chromatogram variables). The
fillCoreChromVariables()
function from the Chromatograms package allows to
complete (fill) a provided data.frame
with eventually missing core
chromatogram variables (columns):
#' Get the data.frame with the available chrom variables be@chromVars #' Complete this data.frame with missing core variables fillCoreChromVariables(be@chromVars)
We can thus use this function to add eventually missing core chromatogram
variables in the chromData
implementation for our backend:
#' function to extract the full chrom data; we would need to import the #' `DataFrame()` function from the S4Vectors package and the `NumericList` #' from the IRanges package. setMethod( "chromData", "ChromBackendTest", function(object, columns = chromVariables(object)) { if (!all(columns %in% chromVariables(object))) stop("Some of the requested variables are not available") res <- S4Vectors::DataFrame(object@chromVars) ## Add rtime and intensity values to the result; would need to ## import the `NumericList()` function from the IRanges package res$rtime <- IRanges::NumericList(object@rtime, compress = FALSE) res$intensity <- IRanges::NumericList( object@intensity, compress = FALSE) ## Fill with eventually missing core variables res <- fillCoreChromVariables(res) res[, columns, drop = FALSE] })
We can now use chromData()
to either extract the full chromatogram data from
the backend, or only the data for selected variables.
#' Extract the full data chromData(be) #' Selected variables chromData(be, c("rtime", "mz", "msLevel")) #' Only missing core spectra variables chromData(be, c("collisionEnergy", "mzMin"))
peaksData()
The peaksData()
method extracts the chromatographic data (peaks), i.e., the
chromatograms' retention time and intensity values. This data is returned as a
list
of arrays, with one array per chromatogram with columns being the peaks
variables (retention time and intensity values) and rows the individual data
pairs. Each backend must provide retention times and intensity values with this
method, but additional peaks variables (columns) are also supported.
Below we implement the peaksData()
method for our backend. Due to the way we
stored the retention time and intensity values within our object we need to loop
over the respective lists (in @rtime
and intensity
) and combine the values
of each chromatogram to an array (matrix
). Since our backend does not allow
any additional other peaks variables we allow columns
to be only c("rtime",
"intensity")
, and also only in that specific order.
#' method to extract the full chromatographic data as list of arrays setMethod( "peaksData", "ChromBackendTest", function(object, columns = c("rtime", "intensity")) { if (length(columns) != 2 && columns != c("rtime", "intensity")) stop("'columns' supports only \"rtime\" and \"intensity\"") mapply(rtime = object@rtime, intensity = object@intensity, FUN = cbind, SIMPLIFY = FALSE, USE.NAMES = FALSE) })
And with this method we can now extract the peaks data from our backend.
#' Extract the *peaks* data (i.e. intensity and retention times) peaksData(be)
Since the peaksData()
method is the main function used by a Chromatograms
to
retrieve data from the backend (and further process the values), this method
should be implemented in an efficient way. Due to the way we store the data
within our example backend we need to loop over the @rtime
and @intensity
slots. A different implementation that stores the peaks data already as a list
of arrays would be more efficient for this operation (but eventually slower for
some other operations, such as extracting peaks variables separately with the
rtime()
or intensity()
functions.
[
The [
method allows to subset ChromBackend
objects. This operation is
expected to reduce a ChromBackend
object to the selected chromatograms without
changing values for the subset chromatograms. The method should support to
subset by indices or logical vectors and should also support duplicating
elements (i.e., when duplicated indices are used) as well as to subset in
arbitrary order. An error should be thrown if indices are out of bounds, but the
method should also support returning an empty backend with [integer()]
. The
MsCoreUtils::i2index
function can be used to check and convert the provided
parameter i
(defining the subset) to an integer vector.
Below we implement a possible [
for our test backend class. We ignore the
parameters j
from the definition of the [
generic, since we treat our data
to be one-dimensional (with each chromatogram being one element).
#' Main subset method. setMethod("[", "ChromBackendTest", function(x, i, j, ..., drop = FALSE) { i <- MsCoreUtils::i2index(i, length = length(x)) x@chromVars <- x@chromVars[i, ] x@rtime <- x@rtime[i] x@intensity <- x@intensity[i] x })
We can now subset our backend to the last two chromatograms.
a <- be[2:3] chromData(a)
Or extracting the second chromatogram multiple times.
a <- be[c(2, 2, 2)] chromData(a)
$
The $
method is expected to extract a single chromatogram variable from a
backend. Parameter name
should allow to name the chromatogram variable to
return. Each ChromBackend
must support extracting the core chromatogram
variables with this method (even if no data might be available for that
variable). In our example implementation below we make use of the chromData()
method, but more efficient implementations might be possible as well (that would
not require to first subset/create a DataFrame
with the full data and to then
subset that again to an individual column). Also, the $
method should check if
the requested spectra variable is available and should throw an error otherwise.
#' Access a single chromatogram variable setMethod("$", "ChromBackendTest", function(x, name) { chromData(x, columns = name)[, 1L] })
With this we can now extract the MS levels
be$msLevel
or a core spectra variable without values in our example backend.
be$precursorMz
or also the intensity values
be$intensity
backendMerge()
The backendMerge()
method merges (combines) ChromBackend
objects (of the
same type!) into a single instance. For our test backend we thus need to combine
the values in the @chromVars
, @rtime
and @intensity
slots. To support also
merging of data.frame
s with different sets of columns we use the
MsCoreUtils::rbindFill
function instead of a simple rbind
(this function
joins data frames making an union of all available columns filling eventually
missing columns with NA
).
#' Method allowing to join (concatenate) backends setMethod("backendMerge", "ChromBackendTest", function(object, ...) { res <- object object <- unname(c(list(object), list(...))) res@rtime <- do.call(c, lapply(object, function(z) z@rtime)) res@intensity <- do.call(c, lapply(object, function(z) z@intensity)) res@chromVars <- do.call(MsCoreUtils::rbindFill, lapply(object, function(z) z@chromVars)) validObject(res) res })
Testing the function by merging the example backend instance with itself.
a <- backendMerge(be, be[2], be) a
As stated in the general description, ChromBackend
implementations can also be
purely read-only resources allowing to just access, but not to replace
data. For these backends isReadOnly()
should return FALSE
. Data replacement
methods listed in this section would not need to be implemented. Our example
backend stores the full data in memory, within the object, and hence we can
easily change and replace values.
Since we support replacing values we also implement the isReadOnly()
method
for our example implementation to return FALSE
(instead of the default
TRUE
).
#' Default for backends: isReadOnly(be)
#' Implementation of isReadOnly for ChromBackendTest setMethod("isReadOnly", "ChromBackendTest", function(object) FALSE) isReadOnly(be)
All data replacement function are expected to return an instance of the same backend class that was used as input.
chromData<-
The main replacement method is chromData<-
which should allow to replace the
content of a backend with new data. This data is expected to be provided as a
DataFrame
(similar to the one returned by chromData()
). Also the method is
expected to replace the full data within the backend, i.e., all chromatogram
and peaks variables. While values can be replaced, the number of chromatograms
before and after a call to chromData<-
has to be the same. For our example
implementation of chromData<-
we can re-use the backendInitialize()
method
defined before, with the data
parameter.
#' Replacement method for the full chromatogram data setReplaceMethod("chromData", "ChromBackendTest", function(object, value) { if (!inherits(value, "DataFrame")) stop("'value' is expected to be a 'DataFrame'") if (length(object) && length(object) != nrow(value)) stop("'value' has to be a 'DataFrame' with ", length(object), " rows") object <- backendInitialize(ChromBackendTest(), data = value) object })
To test this new method we extract the full chromatogram data from our example
data set, add an additional column (chromatogram variable) and use chromData<-
to replace the data of the backend.
d <- chromData(be) d$new_col <- c("a", "b", "c") chromData(be) <- d
Check that we have now also the new column available.
be$new_col
$<-
The $<-
method should allow to replace values for an existing chromatogram
variable or to add an additional variable to the backend. As with all
replacement methods, the length
of value
has to match the number of
chromatograms represented by the backend. For replacement of retention time or
intensity values we need also to ensure that the data would be correct after the
operation, i.e., that the number of retention time and intensity values per
chromatogram are the identical and that all retention time and intensity values
are numeric. Finally, we use the validChromData()
function to ensure that,
after replacement, all core chromatogram variables have the correct data type.
#' Replace or add a single chromatogram variable. setReplaceMethod("$", "ChromBackendTest", function(x, name, value) { if (length(value) != length(be)) stop("length of 'value' needs to match the number of chromatograms ", "in object.") if (name %in% c("rtime", "intensity")) { ## In case retention time or intensity values are provided as ## NumericList convert to a list. if (is(value, "NumericList")) value <- as.list(value) ## Ensure number of retention time and intensity values match if (!all(lengths(value) == lengths(x@intensity))) stop("Number of retention time values needs to match number of ", "intensity values.") ## Ensure all values are numeric if (!all(vapply(value, is.numeric, logical(1)))) stop("For replacement of retention time or intensity values, ", "'value' is expected to be a list of numeric vectors.") if (name == "rtime") x@rtime <- value if (name == "intensity") x@intensity <- value } else x@chromVars[[name]] <- value ## Check that data types are correct after replacement validChromData(x@chromVars) x })
We can thus replace an existing chromatogram variable, such as msLevel
:
#' Values before replacement be$msLevel #' Replace MS levels be$msLevel <- c(3L, 2L, 1L) #' Values after replacement be$msLevel
We can also add a new chromatogram variables:
#' Add a new chromatogram variable be$name <- c("A", "B", "C") be$name
Or also replace intensity values. Below we replace the intensity values by adding a value of +3 to each.
#' Replace intensity values be$intensity <- be$intensity + 3 be$intensity
selectChromVariables()
The selectChromVariables()
function should subset the content of a backend to
the selected chromatogram variables, that can be specified with parameter
chromVariables
. As a result the input backend should be returned, but reduced
to the selected chromatogram variables. This function thus adds a subset
operation that reduces the data in a backend by columns, dropping all
chromatogram variables other than the ones specified with the chromVariables
parameter. In the implementation we need to give special care to variables
"rtime"
and "intensity"
. If both are about to be removed we need to
initialize the @rtime
and @intensity
slots with empty lists matching the
number of chromatograms in our backend. If only "intensity"
values are to be
removed we replace them with NA_real_
while removing only "rtime"
is not
supported (also because retention time values of NA
are not allowed).
#' Method to *subset* a backend by chromatogram variables (columns) setMethod( "selectChromVariables", "ChromBackendTest", function(object, chromVariables = chromVariables(object)) { keep <- colnames(object@chromVars) %in% chromVariables object@chromVars <- object@chromVars[, keep, drop = FALSE] ## If neither "rtime" and "intensity" is in chromVariables: initialize ## with empty vectors. if (!any(c("rtime", "intensity") %in% chromVariables)) { object@rtime <- vector("list", length(object)) object@intensity <- vector("list", length(object)) } else { ## intensity not in chromVariables: replace intensity values with NA if (!"intensity" %in% chromVariables) object@intensity <- lapply(object@intensity, function(z) rep(NA_real_, length(z))) ## removal of only rtime is not supported if (!"rtime" %in% chromVariables) stop("Exclusive removal of retention times is not supported. ", "Retention times can only be removed if also intensity ", "values are removed.") } validObject(object) object })
We can now restrict the data set to only selected chrom variables:
#' keep only dataStorage and msLevel be_2 <- selectChromVariables(be, c("dataStorage", "msLevel")) chromData(be_2)
Replacing/removing intensity values would be possible:
#' Keep dataStorage, msLevel, mz and rtime be_2 <- selectChromVariables(be, c("dataStorage", "msLevel", "mz", "rtime")) chromData(be_2)
All intensity values are thus NA. Removing only intensity values would (should) throw an error.
peaksData<-
The peaksData<-
method should allow to replace the full peaks data (retention
time and intensity value pairs) of all chromatograms in a backend. As value
a
list
of arrays (e.g. two column numeric
matrices) should be provided with
columns names "rtime"
and "intensity"
. Because the full peaks data is
provided at once, this method can (and should) support changing also the number
of peaks per chromatogram (while the methods like rtime<-
or $rtime
would
not allow). In our implementation we need to ensure that a) the provided list
is of length equal to the number of chromatograms and b) each element is a
numeric
matrix with "rtime"
and "intensity"
columns from which we can
extract the values.
#' replacement method for peaks data setReplaceMethod("peaksData", "ChromBackendTest", function(object, value) { if (!(is.list(value) || inherits(value, "SimpleList"))) stop("'value' has to be a list-like object") if (!length(value) == length(object)) stop("The length of the provided list has to match the number of ", "chromatograms in 'object'") ## First loop to check also for validity of the matrices, i.e. each element ## has to be a `numeric` `matrix` with columns named "rtime" and "intensity" object@rtime <- lapply(value, function(z) { if (!is.matrix(z) || !is.numeric(z)) stop("'value' is expected to be a 'list' of numeric matrices") if (!all(c("rtime", "intensity") %in% colnames(z))) stop("All matrices in 'value' need to have columns named ", "\"rtime\" and \"intensity\"") z[, "rtime"] }) object@intensity <- lapply(value, "[", , "intensity") validObject(object) object })
With this method we can now replace the peaks data of a backend:
#' Create a list with peaks matrices; our backend has 3 chromatograms #' thus our `list` has to be of length 3 tmp <- list( cbind(rtime = c(12.3, 14.4, 15.4, 16.4), intensity = c(200, 312, 354.1, 232)), cbind(rtime = c(14.4), intensity = c(13.4)), cbind(rtime = c(223.2, 223.8, 234.1, 234.5, 234.9), intensity = c(12.3, 45.3, 65.3, 51.1, 29.3)) ) #' Assign this peaks data to one of our test backends peaksData(be_2) <- tmp #' Evaluate that we properly added the peaks data peaksData(be_2)
Default implementations for the ChromBackend
class are available for a large
number of methods. Thus, any backend extending this class will automatically
inherit these default implementations. Alternative, class-specific, versions
can, but don't need to be developed. The default versions are defined in the
R/ChromBackend.R file, and also listed in this section. If alternative
versions are implemented it should be ensured that the expected data type is
always used for core chromatogram variables. Use coreChromVariables()
to list
these mandatory data types.
backendParallelFactor()
The backendParallelFactor()
function allows a backend to suggest a preferred
way it could be split for parallel processing. The default implementation
returns factor()
(i.e. a factor
of length 0) hence not suggesting any
specific splitting setup.
#' Is there a specific way how the object could be best split for #' parallel processing? setMethod("backendParallelFactor", "ChromBackend", function(object, ...) { factor() })
backendParallelFactor(be)
chromVariables()
The chromVariables()
function is expected to return the names of all available
chromatogram variables (which should include the core chromatogram
variables). The default implementation is:
#' get the available chromatogram variables. setMethod("chromVariables", "ChromBackend", function(object) { colnames(chromData(object)) })
The result from calling the default implementation on our test backend:
chromVariables(be)
chromIndex()
The chromIndex()
function should return the value for the "chromIndex"
chromatogram variable. As a result, an integer
of length equal to the number
of chromatograms in object
needs to be returned. The default implementation
is:
#' get the values for the chromIndex chromatogram variable setMethod("chromIndex", "ChromBackend", function(object, columns = chromVariables(object)) { chromData(object, columns = "chromIndex")[, 1L] })
The result of calling this method on our test backend:
chromIndex(be)
collisionEnergy()
The collisionEnergy()
function should return the value for the
"collisionEnergy"
chromatogram variable. As a result, a numeric
of length
equal to the number of chromatograms has to be returned. The default
implementation is:
#' get the values for the collisionEnergy chromatogram variable setMethod("collisionEnergy", "ChromBackend", function(object) { chromData(object, columns = "collisionEnergy")[, 1L] })
The result of calling this method on our test backend:
collisionEnergy(be)
The default replacement method for the collisionEnergy
chromatogram variable
is:
#' Default replacement method for collisionEnergy setReplaceMethod( "collisionEnergy", "ChromBackend", function(object, value) { object$collisionEnergy <- value object })
This method thus makes use of the $<-
replacement method we implemented
above. To test this function we replace the collision energy below.
#' Replace the collision energy collisionEnergy(be) <- c(20, 30, 20) collisionEnergy(be)
dataOrigin()
, dataOrigin<-
The dataOrigin()
and dataOrigin<-
methods return or set the value(s) for the
"dataOrigin"
chromatogram variable. The values for this chromatogram variable
need to be of type character
(the length equal to the number of
chromatograms). The default implementation for dataOrigin()
is:
#' Default implementation to access dataOrigin setMethod("dataOrigin", "ChromBackend", function(object) { chromData(object, columns = "dataOrigin")[, 1L] })
Below we use this method to access the values of the dataOrigin
chromatogram
variable.
#' Access the dataOrigin values dataOrigin(be)
The default implementation for dataOrigin<-
uses, like all defaults for
replacement methods, the $<-
method:
#' Default implementation of the `dataOrigin<-` replacement method setReplaceMethod("dataOrigin", "ChromBackend", function(object, value) { object$dataOrigin <- value object })
For our backend we can change the values of the dataOrigin
variable:
#' Replace the backend's dataOrigin values dataOrigin(be) <- rep("from somewhere", 3) dataOrigin(be)
dataStorage()
, dataStorage<-
Similarly, the dataStorage()
and dataStorage<-
methods should allow to get
or set the data storage chromatogram variable. Values of the dataStorage
chromatogram variable are expected to be of type character
and for each
chromatogram in a backend one value needs to be defined (which can not be
NA_character
). The default implementation for dataStorage()
uses, like most
access methods, the chromData()
function:
#' Default implementation to access dataStorage setMethod("dataStorage", "ChromBackend", function(object) { chromData(object, columns = "dataStorage")[, 1L] })
Below we use this method to access the values of the dataStorage
chromatogram
variable.
#' Access the dataStorage values dataStorage(be)
Note that this variable is supposed to provide information on the location where
the data is stored and hence for some type of backends it might not be possible
or advised to let the user change its values. For such backends a
dataStorage<-
replacement method should be implemented specifically that
throws an error if values are replaced with eventually invalid values. The
default implementation for this method uses, like all defaults for replacement
methods, the $<-
method:
#' Default implementation of the `dataStorage<-` replacement method setReplaceMethod("dataStorage", "ChromBackend", function(object, value) { object$dataStorage <- value object })
For our backend we can change the values of the dataStorage
variable:
#' Replace the backend's datastorage values dataStorage(be) <- c("here", "here", "here") dataStorage(be)
intensity()
, intensity<-
The intensity()
and intensity<-
methods allow to extract or set the
intensity values of the individual chromatograms represented by the backend. The
default for the intensity()
function, which is expected to return a list
of
numeric
values with the intensity values of each chromatogram, uses also the
chromData()
method:
#' Default method to extract intensity values setMethod("intensity", "ChromBackend", function(object) { chromData(object, columns = "intensity")[, 1L] })
Based on the way our example backend implementation stores the data, accessing
the intensity values in this way would not be very efficient. It would be much
faster to directly return the content of the @intensity
slot, converting that
into the expected NumericList
. Thus we implement below a more efficient
version of the method specifically for our backend:
#' Alternative implementation for our backend setMethod("intensity", "ChromBackendTest", function(object) { IRanges::NumericList(object@intensity, compress = FALSE) }) intensity(be)
The default replacement method for intensity values uses the $<-
method:
#' Default implementation of the replacement method for intensity values setReplaceMethod("intensity", "ChromBackend", function(object, value) { object$intensity <- value object })
Also here we could implement an alternative version that replaces directly the
content of the @intensity
slot. We implement such a replacement method further
below for the rtime<-
method. Here we simply use the default implementation to
replace the intensity values with original values divided by 10.
#' Replace intensity values intensity(be) <- intensity(be) / 10 intensity(be)
isEmpty()
The isEmpty()
is a simple helper function to evaluate whether chromatograms
are empty, i.e. have no peaks (retention time and intensity values). It should
return a logical vector of length equal to the number of chromatograms in the
backend with TRUE
if a chromatogram is empty and FALSE
otherwise. The
default implementation uses the lengths()
method (defined further below) that
returns for each chromatogram the number of available data points (peaks).
#' Default implementation for `isEmpty()` setMethod("isEmpty", "ChromBackend", function(x) { lengths(x) == 0L })
isEmpty(be)
isReadOnly()
As discussed above, backends can also be read-only, hence only allowing to
access, but not to change any values (e.g. if the data is stored in a data base
and the connection to this data base does not support updating or replacing
data). In such cases, the default isReadOnly()
method can be used, which
returns always TRUE
:
#' Default implementation of `isReadOnly()` setMethod("isReadOnly", "ChromBackend", function(object) { TRUE })
Backends that support changing data values should implement their own version
(like we did above) to return FALSE
instead:
isReadOnly(be)
length()
The length()
method should return a single integer
with the total number of
chromatograms available through the backend. The default implementation for this
function is:
#' Default implementation for `length()` setMethod("length", "ChromBackend", function(x) { nrow(chromData(x, columns = "dataStorage")) })
length(be)
lengths()
The lengths()
function should return the number of data pairs (peaks;
retention time or intensity values) per chromatogram. The result should be an
integer
vector (of length equal to the number of chromatograms in the backend)
with these counts. The default implementation uses the intensity()
function.
#' Default implementation for `lengths()` setMethod("lengths", "ChromBackend", function(x) { lengths(intensity(x)) })
The number of peaks for our test backend:
lengths(be)
msLevel()
, msLevel<-
The msLevel()
and msLevel<-
methods should allow extracting and setting the
MS level for the individual chromatograms. MS levels are encoded as integer
,
thus, msLevel()
must return an integer
vector of length equal to the number
of chromatograms of the backend and msLevel<-
should take/accept such a vector
as input. The default implementations for both methods are shown below.
#' Default methods to get or set MS levels setMethod("msLevel", "ChromBackend", function(object) { chromData(object, columns = "msLevel")[, 1L] }) setReplaceMethod("msLevel", "ChromBackend", function(object, value) { object$msLevel <- value object })
To test these we below replace the MS levels for our test data set and extract these values again.
msLevel(be) <- c(1L, 2L, 4L) msLevel(be)
mz()
, mz<-
The mz()
and mz<-
methods should allow to extract or set the m/z value for
each chromatogram. The m/z value of a chromatogram is encoded as numeric
,
thus, the methods are expected to return or accept a numeric
vector of length
equal to the number of chromatograms. The default implementations are shown
below.
#' Default implementations to get or set m/z value(s) setMethod("mz", "ChromBackend", function(object) { chromData(object, columns = "mz")[, 1L] }) setReplaceMethod("mz", "ChromBackend", function(object, value) { object$mz <- value object })
We below set and extract these target m/z values.
mz(be) <- c(314.3, 312.5, 542.1) mz(be)
mzMax()
, mzMax<-
The mzMax()
and mzMax<-
methods should allow to extract or set the upper m/z
boundary for each chromatogram. m/z values are encoded as numeric
, thus, the
methods are expected to return or accept a numeric
vector of length equal to
the number of chromatograms. The default implementations are shown below.
#' Default implementations to get or set upper m/z limits setMethod("mzMax", "ChromBackend", function(object) { chromData(object, columns = "mzMax")[, 1L] }) setReplaceMethod("mzMax", "ChromBackend", function(object, value) { object$mzMax <- value object })
Testing these functions by replacing the upper m/z boundary with new values.
mzMax(be) <- mz(be) + 0.01 mzMax(be)
mzMin(),
mzMin<-`The mzMin()
and mzMin<-
methods should allow to extract or set the lower m/z
boundary for each chromatogram. m/z values are encoded as numeric
, thus, the
methods are expected to return or accept a numeric
vector of length equal to
the number of chromatograms. The default implementations are shown below.
#' Default methods to get or set the lower m/z boundary setMethod("mzMin", "ChromBackend", function(object) { chromData(object, columns = "mzMin")[, 1L] }) setReplaceMethod("mzMin", "ChromBackend", function(object, value) { object$mzMin <- value object })
Testing these functions by replacing the lower m/z boundary with new values.
mzMin(be) <- mz(be) - 0.01 mzMin(be)
peaksVariables()
The peaksVariables()
function is supposed to provide the names of the
available peaks variables. Backends must provide retention time and
intensity values, thus, the default implementation simply returns c("rtime",
"intensity")
. If additional peaks variables would be available, these could
also be listed by the peaksVariables()
method.
#' Default implementation for peaksVariables() setMethod( "peaksVariables", "ChromBackend", function(object) { c("rtime", "intensity") })
peaksVariables(be)
precursorMz()
, precursorMz<-
The precursorMz()
and precursorMz<-
methods are expected to get or set the
values for the precursor m/z of each chromatogram (if available). These are
encoded as numeric
(one value per chromatogram) - and if a value is not
available NA_real_
should be returned. The default implementations are:
#' Default implementations to get or set the precursorMz chrom variable setMethod("precursorMz", "ChromBackend", function(object) { chromData(object, columns = "precursorMz")[, 1L] }) setReplaceMethod("precursorMz", "ChromBackend", function(object, value) { object$precursorMz <- value object })
Below we set and get the precursorMz
chromatogram variable for our backend.
precursorMz(be) <- c(NA_real_, 123.3, 314.2) precursorMz(be)
precursorMzMax()
, precursorMzMax<-
These methods are supposed to allow to get and set the precursorMzMax
chromatogram variable. The default implementations are:
#' Default implementations for `precursorMzMax` setMethod("precursorMzMax", "ChromBackend", function(object) { chromData(object, columns = "precursorMzMax")[, 1L] }) setReplaceMethod("precursorMzMax", "ChromBackend", function(object, value) { object$precursorMzMax <- value object })
Below we test these functions by setting and extracting the values for this chromatogram variable.
precursorMzMax(be) <- precursorMz(be) + 0.1 precursorMzMax(be)
precursorMzMin()
, precursorMzMin<-
These methods are supposed to allow to get and set the precursorMzMin
chromatogram variable. The default implementations are:
#' Default implementations for `precursorMzMin` setMethod("precursorMzMin", "ChromBackend", function(object) { chromData(object, columns = "precursorMzMin")[, 1L] }) setReplaceMethod("precursorMzMin", "ChromBackend", function(object, value) { object$precursorMzMin <- value object })
Below we test these functions by setting and extracting the values for this chromatogram variable.
precursorMzMin(be) <- precursorMz(be) - 0.1 precursorMzMin(be)
productMz()
, productMz<-
These methods are supposed to allow to get and set the productMz
chromatogram
variable. The default implementations are:
#' Default implementations for `productMz` setMethod("productMz", "ChromBackend", function(object) { chromData(object, columns = "productMz")[, 1L] }) setReplaceMethod("productMz", "ChromBackend", function(object, value) { object$productMz <- value object })
Below we test these functions by setting and extracting the values for this chromatogram variable.
productMz(be) <- c(123.2, NA_real_, NA_real_) productMz(be)
productMzMax()
, productMzMax<-
These methods are supposed to allow to get and set the productMzMax
chromatogram variable. The default implementations are:
#' Default implementations for `productMzMax` setMethod("productMzMax", "ChromBackend", function(object) { chromData(object, columns = "productMzMax")[, 1L] }) setReplaceMethod("productMzMax", "ChromBackend", function(object, value) { object$productMzMax <- value object })
Below we test these functions by setting and extracting the values for this chromatogram variable.
productMzMax(be) <- productMz(be) + 0.02 productMzMax(be)
productMzMin()
, productMzMin<-
These methods are supposed to allow to get and set the productMzMin
chromatogram variable. The default implementations are:
#' Default implementations for `productMzMin` setMethod("productMzMin", "ChromBackend", function(object) { chromData(object, columns = "productMzMin")[, 1L] }) setReplaceMethod("productMzMin", "ChromBackend", function(object, value) { object$productMzMin <- value object })
Below we test these functions by setting and extracting the values for this chromatogram variable.
productMzMin(be) <- productMz(be) - 0.2 productMzMin(be)
rtime()
, rtime<-
The rtime()
and rtime<-
methods allow to get and set the retention times of
the individual chromatograms of the backend. Similar to the method for the
intensity values described above they should return or accept a NumericList
,
each element being a numeric
vector with the retention time values of one
chromatogram. The default implementations of these methods are shown below.
#' Default methods for `rtime()` and `rtime<-` setMethod("rtime", "ChromBackend", function(object) { chromData(object, columns = "rtime")[, 1L] }) setReplaceMethod("rtime", "ChromBackend", function(object, value) { object$rtime <- value object })
Also these methods use the chromData()
function to extract intensity values
and the $<-
to replace them. Due to the way the data is stored in our example
backend implementation this is not the best/most efficient way to get or set
these values. Instead, we could implement the rtime()
function similar to
intensity()
above. For rtime<-
we implement below a version that takes a
list
or NumericList
as input and directly replaces the values of the
@rtime
slot. In this method we need also to ensure that the provided data is
in the correct format, that the number of values per chromatogram matches the
expected values and that no missing values are provided (NA_real_
values are
not supported for retention time).
#' Implementation of `rtime<-` for our backend setReplaceMethod("rtime", "ChromBackendTest", function(object, value) { ## Convert to a standard list if (inherits(value, "NumericList")) value <- as.list(value) ## Check that length is correct if (!length(value) == length(object)) stop("Length of 'value' needs to match the number of ", "chromatograms in 'object'.") ## Check that lengths are correct if (!all(lengths(value) == lengths(object@intensity))) stop("The number of retention time values per chromatogram need to ", "match the numher of intensities for that chromatogram.") ## Check that all values are numeric and we don't have missing values not_ok <- vapply(value, function(z) anyNA(z) | !is.numeric(z), logical(1)) if (any(not_ok)) stop("'value' needs to be a list of numeric values without ", "missing values") object@rtime <- value object })
We below test this implementation replacing the retention times of our example backend by shifting all values by 2 seconds.
rtime(be) <- rtime(be) + 2 rtime(be)
split()
The split()
method should split the backend into a list
of backends
containing subsets of the original backend. The default implementation uses the
default implementation of split()
from R and should work in most cases. This
function uses the [
method to subset/split the object.
#' Default method to split a backend setMethod("split", "ChromBackend", function(x, f, drop = FALSE, ...) { split.default(x, f, drop = drop, ...) })
We below test this by splitting the backend into two subsets.
split(be, f = c(1, 2, 1))
A set of filter methods is defined that all allow to subset the backend to a smaller set of chromatograms, i.e. these filter methods reduce the number of chromatograms of the backend. Defaults are available for all methods, but also here alternative versions might be implemented depending on the backend class.
filterDataOrigin()
The filterDataOrigin()
method allows to filter/subset the backend keeping only
chromatograms for which the dataOrigin
chromatogram variable matches (exactly)
the value(s) provided with parameter dataOrigin
.
#' Default for `filterDataOrigin()` setMethod("filterDataOrigin", "ChromBackend", function(object, dataOrigin = character(), ...) { if (length(dataOrigin)) { object <- object[dataOrigin(object) %in% dataOrigin] if (is.unsorted(dataOrigin)) object[order(match(dataOrigin(object), dataOrigin))] else object } else object })
Like all filter functions, this function is expected to always return an instance of the backend class, even if no element matches the provided values:
filterDataOrigin(be, "disk")
filterDataStorage()
The filterDataStorage()
method allows to subset a backend keeping only
chromatograms for which values of their dataStorage
chromatogram variable
match the value(s) provided with parameter dataStorage
. The default
implementation is shown below.
#' Default implementation for `filterDataStorage()` setMethod("filterDataStorage", "ChromBackend", function(object, dataStorage = character()) { if (length(dataStorage)) { object <- object[dataStorage(object) %in% dataStorage] if (is.unsorted(dataStorage)) object[order(match(dataStorage(object), dataStorage))] else object } else object })
filterMsLevel()
The filterMsLevel()
method allows to subset a backend to chromatograms with
their MS level matching the provided MS levels. The default implementation is
shown below.
#' The default implementation for `filterMsLevel()` setMethod("filterMsLevel", "ChromBackend", function(object, msLevel = integer()) { if (length(msLevel)) { object[msLevel(object) %in% msLevel] } else object })
filterMzRange()
The filterMzRange()
method allows to subset a backend to chromatograms with
their value of the mz
chromatogram being within the provided m/z value
range. Parameter mz
is expected to be a numeric
of length 2 defining the
lower and upper boundary of the m/z range. The default implementation is shown
below:
#' The default implementation for `filterMzRange()` setMethod("filterMzRange", "ChromBackend", function(object, mz = numeric(), ...) { if (length(mz)) { mz <- range(mz) keep <- which(between(mz(object), mz)) object[keep] } else object })
filterMzValues()
The filterMzValues()
method allows to subset a backend to chromatograms with
their value of the mz
chromatogram variable being equal to (one) of the
provided m/z values, given an acceptable difference defined by parameters ppm
and tolerance
.
#' Default for `filterMzValues()` setMethod("filterMzValues", "ChromBackend", function(object, mz = numeric(), ppm = 20, tolerance = 0, ...) { if (length(mz)) { object[.values_match_mz(precursorMz(object), mz = mz, ppm = ppm, tolerance = tolerance)] } else object })
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.