suppressPackageStartupMessages(library("BiocStyle")) suppressPackageStartupMessages(library("MSnbase")) suppressPackageStartupMessages(library("pRolocdata"))
cat(readLines("./Foreword.md"), sep = "\n")
cat(readLines("./Bugs.md"), sep = "\n")
r Biocpkg("MSnbase")'s aims are to facilitate the reproducible
analysis of mass spectrometry data within the R environment, from raw
data import and processing, feature quantification, quantification and
statistical analysis of the results [@Gatto2012]. Data import
functions for several formats are provided and intermediate or final
results can also be saved or exported. These capabilities are
Data stored in one of the published
XML-based formats. i.e.
mzData [@Orchard2007] or
mzML [@Martens2010], can
be imported with the
readMSData method, which makes use of the
r Biocpkg("mzR") package to create
MSnExp objects. The files can be
in profile or centroided mode. See
?readMSData for details.
mzML files containing chromatographic data (e.g. generated in
SRM/MRM experiments) can be imported with the
readSRMData function that
returns the chromatographic data as a
MChromatograms object. See
for more details.
Peak lists in the
can be imported using the
readMgfData. In this case, the peak data
has generally been pre-processed by other software. See
?readMgfData for details.
Third party software can be used to generate quantitative data and
exported as a spreadsheet (generally comma or tab separated format).
This data as well as any additional meta-data can be imported with the
readMSnSet function. See
?readMSnSet for details.
r Biocpkg("MSnbase") also supports the
format^[https://github.com/HUPO-PSI/mzTab], a light-weight,
tab-delimited file format for proteomics data developed within the
Proteomics Standards Initiative (PSI).
mzTab files can be read into
readMzTabData to create and
R objects can most easily be stored on disk with the
It creates compressed binary images of the data representation that
can later be read back from the file with the
OnDiskMSnExp files can be written to MS data files in
mzXML files with the
writeMSData method. See
?writeMSData for details.
MSnExp instances as well as individual spectra can be written as
mgf files with the
writeMgfData method. Note that the meta-data in
the original R object can not be included in the file. See
?writeMgfData for details.
Quantitation data can be exported to spreadsheet files with the
write.exprs method. Feature meta-data can be appended to the feature
intensity values. See
?writeMgfData for details.
MSnSet instances can also be exported to
files using the
MSnSetfrom text spread sheets
This section describes the generation of
MSnSet objects using data
available in a text-based spreadsheet. This entry point into R and
r Biocpkg("MSnbase") allows to import data processed by any of the
third party mass-spectrometry processing software available and
proceed with data exploration, normalisation and statistical analysis
using functions available in \R and the numerous Bioconductor
The following section describes a work flow that uses three input
files to create the
MSnSet. These files respectively describe the
quantitative expression data, the sample meta-data and the feature
meta-data. It is taken from the
r Biocpkg("pRoloc") tutorial and
uses example files from the
r Biocpkg("pRolocdat") package.
We start by describing the
csv to be used as input using the
## The original data for replicate 1, available ## from the pRolocdata package f0 <- dir(system.file("extdata", package = "pRolocdata"), full.names = TRUE, pattern = "pr800866n_si_004-rep1.csv") csv <- read.csv(f0)
The three first lines of the original spreadsheet, containing the data
for replicate one, are illustrated below (using the function
head). It contains
r nrow(csv) rows (proteins) and
columns, including protein identifiers, database accession numbers,
gene symbols, reporter ion quantitation values, information related to
protein identification, ...
Below read in turn the spread sheets that contain the quantitation
exprsFile.csv), feature meta-data (
fdataFile.csv) and sample
## The quantitation data, from the original data f1 <- dir(system.file("extdata", package = "pRolocdata"), full.names = TRUE, pattern = "exprsFile.csv") exprsCsv <- read.csv(f1) ## Feature meta-data, from the original data f2 <- dir(system.file("extdata", package = "pRolocdata"), full.names = TRUE, pattern = "fdataFile.csv") fdataCsv <- read.csv(f2) ## Sample meta-data, a new file f3 <- dir(system.file("extdata", package = "pRolocdata"), full.names = TRUE, pattern = "pdataFile.csv") pdataCsv <- read.csv(f3)
exprsFile.csv contains the quantitation (expression) data for the
r nrow(exprsCsv) proteins and 4 reporter tags.
head(exprsCsv, n = 3)
fdataFile.csv contains meta-data for the
features (here proteins).
head(fdataCsv, n = 3)
pdataFile.csv contains samples (here fractions) meta-data. This
simple file has been created manually.
MSnSet can now easily be generated using the
readMSnSet constructor, providing the respective
csv file names
shown above and specifying that the data is comma-separated (with
= ","). Below, we call that object
res and display its content.
library("MSnbase") res <- readMSnSet(exprsFile = f1, featureDataFile = f2, phenoDataFile = f3, sep = ",") res
Although there are additional specific sub-containers for additional
meta-data (for instance to make the object MIAPE compliant), the
feature (the sub-container, or slot
featureData) and sample (the
phenoData slot) are the most important ones. They need to meet the
following validity requirements (see figure below):
the number of row in the expression/quantitation data and feature data must be equal and the row names must match exactly, and
the number of columns in the expression/quantitation data and number of row in the sample meta-data must be equal and the column/row names must match exactly.
A detailed description of the
MSnSet class is available by typing
?MSnSet in the R console.
The individual parts of this data object can be accessed with their respective accessor methods:
readMSnSet2 function provides a simplified import workforce. It
takes a single spreadsheet as input (default is
csv) and extract the
columns identified by
ecol to create the expression data, while the
others are used as feature meta-data.
ecol can be a
the respective column labels or a numeric with their indices. In the
former case, it is important to make sure that the names match
exactly. Special characters like
'(' will be transformed by
'.' when the
csv file is read in. Optionally, one can also
specify a column to be used as feature names. Note that these must be
unique to guarantee the final object validity.
ecol <- paste("area", 114:117, sep = ".") fname <- "Protein.ID" eset <- readMSnSet2(f0, ecol, fname) eset
ecol columns can also be queried interactively from R using the
grepEcols function. The former return a character
with all column names, given a splitting character, i.e. the
separation value of the spreadsheet (typically
tsv, ...). The latter can be used to grep a pattern of interest
to obtain the relevant column indices.
getEcols(f0, ",") grepEcols(f0, "area", ",") e <- grepEcols(f0, "area", ",") readMSnSet2(f0, e)
phenoData slot can now be updated accordingly using the
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.