r BiocStyle::Biocpkg("mzR") package aims at providing a common, low-level
interface to several mass spectrometry data formats, namely
mzML [@Martens2010] for raw
mzIdentML [@Jones2012], somewhat similar to the
Bioconductor package affyio for affymetrix raw data. No processing is
r BiocStyle::Biocpkg("mzR"), which is left to packages such as
BiocStyle::Biocpkg("xcms") [@Smith:2006, Tautenhahn:2008] or
r BiocStyle::Biocpkg("MSnbase") [@Gatto:2012]. These packages also provide more
convenient, high-level interfaces to raw and identification. data
Most importantly, access to the data should be fast and memory efficient. This is made possible by allowing on-disk random file access, i.e. retrieving specific data of interest without having to sequentially browser the full content nor loading the entire data into memory.
The actual work of reading and parsing the data files is handled by
the included C/C++ libraries or backends. The
mzRramp RAMP parser,
written at the Institute for Systems Biology (ISB) is a fast and
lightweight parser in pure C. Later, it gained support for the
mzData format. The C++ reference implementation for the
the proteowizard library [@Kessner08] (pwiz in short), which in turn
makes use of the boost C++ (http://www.boost.org/) library. RAMP is
able to access
mzML files by calling pwiz methods. More recently,
the proteowizard (http://proteowizard.sourceforge.net/)
[@Chambers2012] has been fully integrated using the
for raw data, and is not the default option. The
provides support to
CDF-based formats. Finally, the
backend is available to access identification data (
r BiocStyle::Biocpkg("mzR") package is in essence a collection of wrappers
to the C++ code, and benefits from the C++ interface provided through
the Rcpp package [@Rcpp11].
IMPORTANT New developers that need to access and manipulate raw
mass spectrometry data are advised against using this infrastucture
directly. They are invited to use the corresponding
MSnExp (with on
disk mode) from the
r BiocStyle::Biocpkg("MSnbase") package instead. The
latter supports reading multiple files at once and offers access to
the spectra data (m/z and intensity) as well as all the spectra
metadata using a coherent interface. The MSnbase infrastructure itself
used the low level classes in mzR, thus offering fast and efficient
All the mass spectrometry file formats are organized similarly, where a set of metadata nodes about the run is followed by a list of spectra with the actual masses and intensities. In addition, each of these spectra has its own set of metadata, such as the retention time and acquisition parameters.
Access to the spectral data is done via the
peaks function. The
return value is a list of two-column mass-to-charge and intensity
matrices or a single matrix if one spectrum is queried.
Access to the chromatogram(s) is done using the
chromatograms) function, that return one (or a list of)
?chromatogram for details. This functionality is
only available with the
The main access to identification result is done via
score will return the detailed
information on each psm and scores.
modifications will return the
details on each modification found in peptide.
Run metadata is available via several functions such as
runInfo(). The individual fields can be
accessed via e.g.
Spectrum metadata is available via
header(), which will return a
list (for single scans) or a dataframe with information such as the
peaksCount, ... or, for higher-order MS the
and precursor information.
Identification metadatais available via
mzidInfo(), which will
return a list with information such as the
SpectraSource and other
information for this identification result.
The availability of this metadata can not always be guaranteed, and depends on the MS software which converted the data.
A short example sequence to read data from a mass spectrometer. First open the file.
library(mzR) library(msdata) mzxml <- system.file("threonine/threonine_i2_e35_pH_tree.mzXML", package = "msdata") aa <- openMSfile(mzxml)
We can obtain different kind of header information.
runInfo(aa) instrumentInfo(aa) header(aa,1)
Read a single spectrum from the file.
pl <- peaks(aa,10) peaksCount(aa,10) head(pl) plot(pl[,1], pl[,2], type="h", lwd=1)
One should always close the file when not needed any more. This will release the memory of cached content.
You can use
openIDfile to read a
mzIdentML file (version 1.1),
which use the pwiz backend.
library(mzR) library(msdata) file <- system.file("mzid", "Tandem.mzid.gz", package="msdata") x <- openIDfile(file)
mzidInfo function will return general information about this
psms will return the detailed information on each
modNum and others.
p <- psms(x) colnames(p)
The modifications information can be accessed using
which will return the
m <- modifications(x) head(m)
Since different software will use different scoring function, we
score to extract the scores for each psm. It will return a
data.frame with different columns depending on software generating
scr <- score(x) colnames(scr)
Other file formats provided by HUPO, such as
quantitative data [@Walzer:2013] are also possible in the future.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.