pRolocdata
packagepRolocdata
is a Bioconductor
experiment package
(release
and
devel
pages) that collects published (mainly, although some unpublished
datasets are also available) mass spectrometry-based spatial/organelle
and protein-complex dataset. The data are distributed as MSnSet
instances (see the
MSnbase
for details) and are used throughout the
pRoloc
and
pRolocGUI
software for spatial proteomics data analysis and visualisation.
Current build status:
library("knitr") library("pRolocdata") x <- data.frame(pRolocdata()$results[, -(1:2)]) colnames(x) <- c("Data", "Description")
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("pRolocdata")
Once installed, the package needs to be loaded
library("pRolocdata")
Currently, there are r nrow(x)
datasets available in
pRolocdata
. Use the pRolocdata()
function to obtain a list of data
names and their description.
pRolocdata()
kable(x, format = "markdown")
Data is loaded into the R
session using the load
function; for
instance, to get the data from
Dunkley et al (2006),
one would type
data(dunkley2006)
To get more information about a given dataset, see its manual page
?dunkley2006 ## or help("dunkley2006")
Each data object in pRolocdata
is available as an MSnSet
instance. The instances contain the actual quantitative data, sample
and features annotations (see pData
and fData
respectively). Additional MIAPE data
[1,
2]
experimental data is available in the experimentData
slot, as
described in section Required metadata below.
The source of these data is generally one or several text-based
spreadsheet (csv
, tsv
) produced by a third-party
application. These original files are often distributed as
supplementary material to the research paper and used to generate the
R
objects. These source spreadsheets are available in the package's
inst/extdata
directory. The R
script files, that read the
spreadsheets and create the R
data is distributed in the
inst/scripts
directory.
Additional metadata is available with the pRolocmetadata()
function
as detailed below.
Documented in experimentData(.)@samples$species
Documented in experimentData(.)@samples$tissue
. If tissue is Cell
,
then details about the cell line are available in
experimentData(.)@samples$cellLine
.
Documented in pubMedIds(.)
.
Documented in experimentData(.)@other
:
- MS ($MS
) type of mass spectrometry experiment: iTRAQ8,
iTRAQ4, TMT6, LF, SC, ...
- Experiment ($spatexp
) type of spatial proteomics
experiment: LOPIT, LOPIMS, subtractive, PCP, other, PCP-SILAC,
...
- MarkerCol ($markers.fcol
) name of the markers feature
data. Default is markers
.
- PredictionCol ($prediction.fcol
) name of the localisation
prediction feature data.
experimentData(dunkley2006)@samples pubMedIds(dunkley2006) otherInfo(experimentData(dunkley2006)) ## all at once pRolocmetadata(dunkley2006)
The procedure to data in pRolocdata is as follows. Here, we assume
that 3 new data files are available from the manuscript of Smith et
al. 2017, and these files will be added to pRolocdata
as three
MSnSet
objects.
the original data (often from supplementary material) are added to
inst/extdata
, say Smith_expA.csv
, Smith_expB.csv
and
Smith_expC.csv
(the name should ideally be the same as the
original files), and the files and provenance is documented in
inst/extdata/README
. If the data files are really big, then they
should be compressed. If they are too big (for example don't fit on
github or would substantially increase the size of the package),
then we might decide not to added them, but they should still be
documented in the README file and the script (see point 2) should
still assume they are there.
A script, typically called Smith2017.R
, is added to
inst/scripts/
. That script reads the files above and saves the
corresponding (compressed) MSnSet objects directly in data,
typically called Smith2016a.rda
, Smith2016a.rda
, ..., and the
objects themselves would be named Smith2016a
, Smith2016b
, ...
Write a man/Smith2016.Rd
documentation file documenting all
relevant data objects, providing some information about the
experiment and data provenance, and a reference to the original
paper.
Build and check the package and, if successful, send a github pull request.
If you do not have the R
expertise to prepare the data, please
open an issue in the
pRolocdata
Github repo or send me an email at
laurent.gatto<AT>uclouvain<dot>be
with the source csv
files and appropriate metadata and I will add it for you.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.