readQFeatures: QFeatures from tabular data
In lgatto/Features: Quantitative features for mass spectrometry data

readQFeatures

R Documentation

QFeatures from tabular data

Description

These functions convert tabular data into dedicated data objets. The readSummarizedExperiment() function takes a file name or data.frame and converts it into a SummarizedExperiment() object. The readQFeatures() function takes a data.frame and converts it into a QFeatures object (see QFeatures() for details). For the latter, two use-cases exist:

The single-set case will generate a QFeatures object with a single SummarizedExperiment containing all features of the input table.
The multi-set case will generate a QFeatures object containing multiple SummarizedExperiments, resulting from splitting the input table. This multi-set case is generally used when the input table contains data from multiple runs/batches.

Usage

readSummarizedExperiment(
  assayData,
  quantCols = NULL,
  fnames = NULL,
  ecol = NULL,
  ...
)

readQFeatures(
  assayData,
  colData = NULL,
  quantCols = NULL,
  runCol = NULL,
  name = "quants",
  removeEmptyCols = FALSE,
  verbose = TRUE,
  ecol = NULL,
  fnames = NULL,
  ...
)

Arguments

`assayData`	A `data.frame`, or any object that can be coerced into a `data.frame`, holding the quantitative assay. For `readSummarizedExperiment()`, this can also be a `character(1)` pointing to a filename. This `data.frame` is typically generated by an identification and quantification software, such as Sage, Proteome Discoverer, MaxQuant, ...
`quantCols`	A `numeric()`, `logical()` or `character()` defining the columns of the `assayData` that contain the quantitative data. This information can also be defined in `colData` (see details).
`fnames`	For the single- and multi-set cases, an optional `character(1)` or `numeric(1)` indicating the column to be used as feature names. Note that rownames must be unique within `QFeatures` sets. Default is `NULL`.
`ecol`	Same as `quantCols`. Available for backwards compatibility. Default is `NULL`. If both `ecol` and `colData` are set, an error is thrown.
`...`	Further arguments that can be passed on to `read.csv()` except `stringsAsFactors`, which is always `FALSE`. Only applicable to `readSummarizedExperiment()`.
`colData`	A `data.frame` (or any object that can be coerced to a `data.frame`) containing sample/column annotations, including `quantCols` and `runCol` (see details).
`runCol`	For the multi-set case, a `numeric(1)` or `character(1)` pointing to the column of `assayData` (and `colData`, is set) that contains the runs/batches. Make sure that the column name in both tables are identical and syntactically valid (if you supply a `character`) or have the same index (if you supply a `numeric`). Note that characters are converted to syntactically valid names using `make.names`
`name`	For the single-set case, an optional `character(1)` to name the set in the `QFeatures` object. Default is `quants`.
`removeEmptyCols`	A `logical(1)`. If `TRUE`, quantitative columns that contain only missing values are removed.
`verbose`	A `logical(1)` indicating whether the progress of the data reading and formatting should be printed to the console. Default is `TRUE`.

Details

The single- and multi-set cases are defined by the quantCols and runCol parameters, whether passed by the quantCols and runCol vectors and/or the colData data.frame (see below).

Single-set case

The quantitative data variables are defined by the quantCols. The single-set case can be represented schematically as shown below.

|------+----------------+-----------|
| cols | quantCols 1..N | more cols |
| .    | ...            | ...       |
| .    | ...            | ...       |
| .    | ...            | ...       |
|------+----------------+-----------|

Note that every quantCols column contains data for a single sample. The single-set case is defined by the absence of any runCol input (see next section). We here provide a (non-exhaustive) list of typical data sets that fall under the single-set case:

Peptide- or protein-level label-free data (bulk or single-cell).
Peptide- or protein-level multiplexed (e.g. TMT) data (bulk or single-cell).
PSM-level multiplexed data acquired in a single MS run (bulk or single-cell).
PSM-level data from fractionation experiments, where each fraction of the same sample was acquired with the same multiplexing label.

Multi-set case

A run/batch variable, runCol, is required to import multi-set data. The multi-set case can be represented schematically as shown below.

|--------+------+----------------+-----------|
| runCol | cols | quantCols 1..N | more cols |
|   1    | .    | ...            | ...       |
|   1    | .    | ...            | ...       |
|--------+------+----------------+-----------|
|   2    | .    | ...            | ...       |
|--------+------+----------------+-----------|
|   .    | .    | ...            | ...       |
|--------+------+----------------+-----------|

Every quantCols column contains data for multiple samples acquired in different runs. The multi-set case applies when runCol is provided, which will determine how the table is split into multiple sets.

We here provide a (non-exhaustive) list of typical data sets that fall under the multi-set case:

PSM- or precursor-level multiplexed data acquired in multiple runs (bulk or single-cell)
PSM- or precursor-level label-free data acquired in multiple runs (bulk or single-cell)
DIA-NN data (see also readQFeaturesFromDIANN()).

Adding sample annotations with `colData`

We recommend providing sample annotations when creating a QFeatures object. The colData is a table in which each row corresponds to a sample and each column provides information about the samples. There is no restriction on the number of columns and on the type of data they should contain. However, we impose one or two columns (depending on the use case) that allow to link the annotations of each sample to its quantitative data:

Single-set case: the colData must contain a column named quantCols that provides the names of the columns in assayData containing quantitative values for each sample (see single-set cases in the examples).
Multi-set case: the colData must contain a column named quantCols that provides the names of the columns in assayData with the quantitative values for each sample, and a column named runCol that provides the MS runs/batches in which each sample has been acquired. The entries in colData[["runCol"]] are matched against the entries provided by assayData[[runCol]].

When the quantCols argument is not provided to readQFeatures(), the function will automatically determine the quantCols from colData[["quantCols"]]. Therefore, quantCols and colData cannot be both missing.

Samples that are present in assayData but absent colData will lead to a warning, and the missing entries will be automatically added to the colData and filled with NAs.

When using the quantCols and runCol arguments only (without colData), the colData contains zero columns/variables.

Value

An instance of class QFeatures or SummarizedExperiment::SummarizedExperiment(). For the former, the quantitative sets of each run are stored in SummarizedExperiment::SummarizedExperiment() object.

Author(s)

Laurent Gatto, Christophe Vanderaa

Examples


######################################
## Single-set case.

## Load a data.frame with PSM-level data
data(hlpsms)
hlpsms[1:10, c(1, 2, 10:11, 14, 17)]

## Create a QFeatures object with a single psms set
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")
qf1
colData(qf1)

######################################
## Single-set case with colData.

(coldat <- data.frame(var = rnorm(10),
                      quantCols = names(hlpsms)[1:10]))
qf2 <- readQFeatures(hlpsms, colData = coldat)
qf2
colData(qf2)

######################################
## Multi-set case.

## Let's simulate 3 different files/batches for that same input
## data.frame, and define a colData data.frame.

hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE))
hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)]

qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file")
qf3
colData(qf3)


######################################
## Multi-set case with colData.

(coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10),
                      var = rnorm(10),
                      quantCols = names(hlpsms)[1:10]))
qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file")
qf4
colData(qf4)

lgatto/Features documentation built on June 9, 2025, 9:35 a.m.

lgatto/Features index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lgatto/Features
Quantitative features for mass spectrometry data

readQFeatures: QFeatures from tabular data
In lgatto/Features: Quantitative features for mass spectrometry data

QFeatures from tabular data

Description

Usage

Arguments

Details

Single-set case

Multi-set case

Adding sample annotations with `colData`

Value

Author(s)

See Also

Examples

Related to readQFeatures in lgatto/Features...

R Package Documentation

Browse R Packages

We want your feedback!

lgatto/Features Quantitative features for mass spectrometry data

readQFeatures: QFeatures from tabular data In lgatto/Features: Quantitative features for mass spectrometry data

QFeatures from tabular data

Description

Usage

Arguments

Details

Single-set case

Multi-set case

Adding sample annotations with colData

Value

Author(s)

See Also

Examples

Related to readQFeatures in lgatto/Features...

R Package Documentation

Browse R Packages

We want your feedback!

lgatto/Features
Quantitative features for mass spectrometry data

readQFeatures: QFeatures from tabular data
In lgatto/Features: Quantitative features for mass spectrometry data

Adding sample annotations with `colData`