suppressPackageStartupMessages(library("MSnbase")) suppressPackageStartupMessages(library("BiocStyle"))
cat(readLines("./Foreword.md"), sep = "\n")
cat(readLines("./Bugs.md"), sep = "\n")
NB This document is going to be updated based on current major
development plans in MSnbase.
This document is not a replacement for the individual manual pages,
that document the slots of the r Biocpkg("MSnbase") classes. It is a
centralised high-level description of the package design.
r Biocpkg("MSnbase") aims at being compatible with the
r Biocpkg("Biobase") infrastructure [@Gentleman2004]. Many meta data
structures that are used in eSet and associated classes are also
used here. As such, knowledge of the Biobase development and the new
eSet vignette would be beneficial; the vignette can directly be
accessed with vignette("BiobaseDevelopment", package="Biobase").
The initial goal is to use the r Biocpkg("MSnbase") infrastructure
for MS2 labelled (iTRAQ [@Ross2004] and TMT [@Thompson2003]) and
label-free (spectral counting, index and abundance) quantitation
- see the documentation for the quantify function for details. The
infrastructure is currently extended to support a wider range of
technologies, including metabolomics.
MSnbase follows the Bioconductor style
guide. In
particular
. when naming symbols.. can be used for hidden/local functions or variables.## to start full-line comments.#' is preferred, although ##' is tolerated.= in function arguments or class definition:
f(a = 1, b = 2).a, b, c.a + b.# no wrap at 80 someVeryLongVariableName <- someVeryLongFunctionName(withSomeEvenLongerFunctionArgumentA = 1, withSomeEvenLongerFunctionArgumentB = 2)
and should be wrapped as shown below:
# alternative 1 someVeryLongVariableName <- someVeryLongFunctionName(withSomeEvenLongerFunctionArgumentA = 1, withSomeEvenLongerFunctionArgumentB = 2) # alternative 2 someVeryLongVariableName <- someVeryLongFunctionName( withSomeEvenLongerFunctionArgumentA = 1, withSomeEvenLongerFunctionArgumentB = 2)
r Biocpkg("MSnbase") classesAll classes have a .__classVersion__ slot, of class Versioned from
the r Biocpkg("Biobase") package. This slot documents the class
version for any instance to be used for debugging and object update
purposes. Any change in a class implementation should trigger a
version change.
pSet: a virtual class for raw mass spectrometry data and meta dataThis virtual class is the main container for mass spectrometry data,
i.e spectra, and meta data. It is based on the eSet implementation
for genomic data. The main difference with eSet is that the
assayData slot is an environment containing any number of
Spectrum instances (see the Spectrum section).
One new slot is introduced, namely processingData, that contains one
MSnProcess instance (see the MSnProcess section).
and the experimentData slot is now expected to contain MIAPE data.
The annotation slot has not been implemented, as no prior feature
annotation is known in shotgun proteomics.
getClass("pSet")
MSnExp: a class for MS experimentsMSnExp extends pSet to store MS experiments. It does not add any
new slots to pSet. Accessors and setters are all inherited from
pSet and new ones should be implemented for pSet. Methods that
manipulate actual data in experiments are implemented for
MSnExp objects.
getClass("MSnExp")
OnDiskMSnExp: a on-disk implementation of the MSnExp classThe OnDiskMSnExp class extends MSnExp and inherits all of its
functionality but is aimed to use as little memory as possible based
on a balance between memory demand and performance. Most of the
spectrum-specific data, like retention time, polarity, total ion
current are stored within the object's featureData slot. The actual
M/Z and intensity values from the individual spectra are, in contrast
to MSnExp objects, not kept in memory (in the assayData slot), but
are fetched from the original files on-demand. Because mzML files are
indexed, using the r Biocpkg("mzR") package to read the relevant
spectrum data is fast and only moderately slower than for in-memory
MSnExp^[The benchmarking vignette compares data size and operation speed of the two implementations.].
To keep track of data manipulation steps that are applied to spectrum
data (such as performed by methods removePeaks or clean) a lazy
execution framework was implemented. Methods that manipulate or
subset a spectrum's M/Z or intensity values can not be applied
directly to a OnDiskMSnExp object, since the relevant data is not
kept in memory. Thus, any call to a processing method that changes or
subset M/Z or intensity values are added as ProcessingStep items to
the object's spectraProcessingQueue. When the spectrum data is then
queried from an OnDiskMSnExp, the spectra are read in from the file
and all these processing steps are applied on-the-fly to the spectrum
data before being returned to the user.
The operations involving extracting or manipulating spectrum data are
applied on a per-file basis, which enables parallel processing. Thus,
all corresponding method implementations for OnDiskMSnExp objects
have an argument BPPARAM and users can set a PARALLEL_THRESH
option flag^[see ?MSnbaseOptions for details.] that enables to
define how and when parallel processing should be performed (using the
r Biocpkg("BiocParallel") package).
Note that all data manipulations that are not applied to M/Z or
intensity values of a spectrum (e.g. sub-setting by retention time
etc) are very fast as they operate directly to the object's
featureData slot.
getClass("OnDiskMSnExp")
The distinction between MSnExp and OnDiskMSnExp is often not
explicitly stated as it should not matter, from a user's perspective,
which data structure they are working with, as both behave in
equivalent ways. Often, they are referred to as in-memory and
on-disk MSnExp implementations.
MSnSet: a class for quantitative proteomics dataThis class stores quantitation data and meta data after running
quantify on an MSnExp object or by creating an MSnSet instance
from an external file, as described in the MSnbase-io vignette and
in ?readMSnSet, readMzTabData, etc. The quantitative data is in
form of a n by p matrix, where n is the number of
features/spectra originally in the MSnExp used as parameter in
quantify and p is the number of reporter ions. If read from an
external file, n corresponds to the number of features (protein
groups, proteins, peptides, spectra) in the file and $p$ is the number
of columns with quantitative data (samples) in the file.
This prompted to keep a similar implementation as the ExpressionSet
class, while adding the proteomics-specific annotation slot introduced
in the pSet class, namely processingData for objects of class
MSnProcess.
getClass("MSnSet")
The MSnSet class extends the virtual eSet class to provide
compatibility for ExpressionSet-like behaviour. The experiment
meta-data in experimentData is also of class MIAPE . The
annotation slot, inherited from eSet is not used. As a result, it
is easy to convert ExpressionSet data from/to MSnSet objects with
the coersion method as.
data(msnset) class(msnset) class(as(msnset, "ExpressionSet")) data(sample.ExpressionSet) class(sample.ExpressionSet) class(as(sample.ExpressionSet, "MSnSet"))
MSnProcess: a class for logging processing meta data {#MSnProcess}This class aims at recording specific manipulations applied to
MSnExp or MSnSet instances. The processing
slot is a character vector that describes major
processing. Most other slots are of class logical that
indicate whether the data has been centroided, smoothed, \ldots
although many of the functionality is not implemented yet. Any new
processing that is implemented should be documented and logged here.
It also documents the raw data file from which the data originates
(files slot) and the r Biocpkg("MSnbase") version that was in
use when the MSnProcess instance, and hence the
MSnExp/MSnSet objects, were originally created.
getClass("MSnProcess")
MIAPE: Minimum Information About a Proteomics ExperimentThe Minimum Information About a Proteomics Experiment
[@Taylor2007; @Taylor2008] MIAPE class describes the experiment,
including contact details, information about the mass spectrometer and
control and analysis software.
getClass("MIAPE")
Spectrum et al.: classes for MS spectra {#Spectum}Spectrum is a virtual class that defines common attributes to all
types of spectra. MS1 and MS2 specific attributes are defined in the
Spectrum1 and Spectrum2 classes, that directly extend Spectrum.
getClass("Spectrum")
getClass("Spectrum1")
getClass("Spectrum2")
ReporterIons: a class for isobaric tagsThe iTRAQ and TMT (or any other peak of interest) are implemented
ReporterIons instances, that essentially defines an expected MZ
position for the peak and a width around this value as well a names
for the reporters.
getClass("ReporterIons")
Chromatogram and MChromatograms: classes to handle chromatographic dataThe Chromatogram class represents chromatographic MS data, i.e. retention time
and intensity duplets for one file/sample. The MChromatograms class (Matrix of
Chromatograms) allows to arrange multiple Chromatogram instances in a
two-dimensional grid, with columns supposed to represent different samples and
rows two-dimensional areas in the plane spanned by the m/z and retention time
dimensions from which the intensities are extracted (e.g. an extracted ion
chromatogram for a specific ion). The MChromatograms class extends the base
matrix class. MChromatograms objects can be extracted from an MSnExp or
OnDiskMSnExp object using the chromatogram method.
getClass("Chromatogram")
getClass("MChromatograms")
MSnSet instances {-}When several MSnSet instances are related to each other and should
be stored together as different objects, they can be grouped as a list
into and MSnSetList object. In addition to the actual list slot,
this class also has basic logging functionality and enables iteration
over the MSnSet instances using a dedicated lapply methods.
getClass("MSnSetList")
r Biocpkg("MSnbase") implements unit tests with the
r CRANpkg("testthat") package.
Methods that process raw data, i.e. spectra should be implemented for
Spectrum objects first and then eapplyed (or similar) to the
assayData slot of an MSnExp instance in the specific method.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.