View source: R/readQFeatures.R
readQFeatures | R Documentation |
These functions convert tabular data into dedicated data
objets. The readSummarizedExperiment()
function takes a file
name or data.frame
and converts it into a
SummarizedExperiment()
object. The readQFeatures()
function
takes a data.frame
and converts it into a QFeatures
object
(see QFeatures()
for details). For the latter, two use-cases
exist:
The single-set case will generate a QFeatures
object with a
single SummarizedExperiment
containing all features of the
input table.
The multi-set case will generate a QFeatures
object containing
multiple SummarizedExperiment
s, resulting from splitting the
input table. This multi-set case is generally used when the
input table contains data from multiple runs/batches.
readSummarizedExperiment(
assayData,
quantCols = NULL,
fnames = NULL,
ecol = NULL,
...
)
readQFeatures(
assayData,
colData = NULL,
quantCols = NULL,
runCol = NULL,
name = "quants",
removeEmptyCols = FALSE,
verbose = TRUE,
ecol = NULL,
...
)
assayData |
A |
quantCols |
A |
fnames |
For the single- and multi-set cases, an optional
|
ecol |
Same as |
... |
Further arguments that can be passed on to |
colData |
A |
runCol |
For the multi-set case, a |
name |
For the single-set case, an optional |
removeEmptyCols |
A |
verbose |
A |
The single- and multi-set cases are defined by the quantCols
and
runCol
parameters, whether passed by the quantCols
and
runCol
vectors and/or the colData
data.frame
(see below).
The quantitative data variables are defined by the quantCols
.
The single-set case can be represented schematically as shown
below.
|------+----------------+-----------| | cols | quantCols 1..N | more cols | | . | ... | ... | | . | ... | ... | | . | ... | ... | |------+----------------+-----------|
Note that every quantCols
column contains data for a single
sample. The single-set case is defined by the absence of any
runCol
input (see next section). We here provide a
(non-exhaustive) list of typical data sets that fall under the
single-set case:
Peptide- or protein-level label-free data (bulk or single-cell).
Peptide- or protein-level multiplexed (e.g. TMT) data (bulk or single-cell).
PSM-level multiplexed data acquired in a single MS run (bulk or single-cell).
PSM-level data from fractionation experiments, where each fraction of the same sample was acquired with the same multiplexing label.
A run/batch variable, runCol
, is required to import multi-set
data. The multi-set case can be represented schematically as shown
below.
|--------+------+----------------+-----------| | runCol | cols | quantCols 1..N | more cols | | 1 | . | ... | ... | | 1 | . | ... | ... | |--------+------+----------------+-----------| | 2 | . | ... | ... | |--------+------+----------------+-----------| | . | . | ... | ... | |--------+------+----------------+-----------|
Every quantCols
column contains data for multiple samples
acquired in different runs. The multi-set case applies when
runCol
is provided, which will determine how the table is split
into multiple sets.
We here provide a (non-exhaustive) list of typical data sets that fall under the multi-set case:
PSM- or precursor-level multiplexed data acquired in multiple runs (bulk or single-cell)
PSM- or precursor-level label-free data acquired in multiple runs (bulk or single-cell)
DIA-NN data (see also readQFeaturesFromDIANN()
).
colData
We recommend providing sample annotations when creating a
QFeatures
object. The colData
is a table in which each row
corresponds to a sample and each column provides information about
the samples. There is no restriction on the number of columns and
on the type of data they should contain. However, we impose one or
two columns (depending on the use case) that allow to link the
annotations of each sample to its quantitative data:
Single-set case: the colData
must contain a column named
quantCols
that provides the names of the columns in
assayData
containing quantitative values for each sample (see
single-set cases in the examples).
Multi-set case: the colData
must contain a column named
quantCols
that provides the names of the columns in
assayData
with the quantitative values for each sample, and a
column named runCol
that provides the MS runs/batches in which
each sample has been acquired. The entries in
colData[["runCol"]]
are matched against the entries provided
by assayData[[runCol]]
.
When the quantCols
argument is not provided to
readQFeatures()
, the function will automatically determine the
quantCols
from colData[["quantCols"]]
. Therefore, quantCols
and colData
cannot be both missing.
Samples that are present in assayData
but absent
colData
will lead to a warning, and the missing entries will be
automatically added to the colData
and filled with NA
s.
When using the quantCols
and runCol
arguments only
(without colData
), the colData
contains zero
columns/variables.
An instance of class QFeatures
or
SummarizedExperiment::SummarizedExperiment()
. For the
former, the quantitative sets of each run are stored in
SummarizedExperiment::SummarizedExperiment()
object.
Laurent Gatto, Christophe Vanderaa
The QFeatures
(see QFeatures()
) class to read about how to
manipulate the resulting QFeatures
object.
The readQFeaturesFromDIANN()
function to import DIA-NN
quantitative data.
######################################
## Single-set case.
## Load a data.frame with PSM-level data
data(hlpsms)
hlpsms[1:10, c(1, 2, 10:11, 14, 17)]
## Create a QFeatures object with a single psms set
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")
qf1
colData(qf1)
######################################
## Single-set case with colData.
(coldat <- data.frame(var = rnorm(10),
quantCols = names(hlpsms)[1:10]))
qf2 <- readQFeatures(hlpsms, colData = coldat)
qf2
colData(qf2)
######################################
## Multi-set case.
## Let's simulate 3 different files/batches for that same input
## data.frame, and define a colData data.frame.
hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE))
hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)]
qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file")
qf3
colData(qf3)
######################################
## Multi-set case with colData.
(coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10),
var = rnorm(10),
quantCols = names(hlpsms)[1:10]))
qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file")
qf4
colData(qf4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.