QFeatures-aggregate: Aggregate an assay's quantitative features

Description Usage Arguments Details Value Missing quantitative values Missing values in the row data See Also Examples

Description

This function aggregates the quantitative features of an assay, applying a summarisation function (fun) to sets of features as defined by the fcol feature variable. The new assay's features will be named based on the unique fcol values.

In addition to the results of the aggregation, the newly aggregated SummarizedExperiment assay also contains a new aggcounts assay containing the aggregation counts matrix, i.e. the number of features that were aggregated, which can be accessed with the aggcounts() accessor.

The rowData of the aggregated SummarizedExperiment assay contains a .n variable that provides the number of features that were aggregated. This .n value is always >= that the sample-level aggcounts.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## S4 method for signature 'QFeatures'
aggregateFeatures(
  object,
  i,
  fcol,
  name = "newAssay",
  fun = MsCoreUtils::robustSummary,
  ...
)

## S4 method for signature 'SummarizedExperiment'
aggregateFeatures(object, fcol, fun = MsCoreUtils::robustSummary, ...)

## S4 method for signature 'SummarizedExperiment'
aggcounts(object, ...)

Arguments

object

An instance of class QFeatures or SummarizedExperiment.

i

The index or name of the assay which features will be aggregated the create the new assay.

fcol

The feature variable of assay i defining how to summarise the features.

name

A character(1) naming the new assay. Default is newAssay. Note that the function will fail if there's already an assay with name.

fun

A function used for quantitative feature aggregation. See Details for examples.

...

Additional parameters passed the fun.

Details

Aggregation is performed by a function that takes a matrix as input and returns a vector of length equal to ncol(x). Examples thereof are

Value

A QFeatures object with an additional assay or a SummarizedExperiment object (or subclass thereof).

Missing quantitative values

Missing quantitative values have different effect based on the aggregation method employed:

More generally, missing values often need dedicated handling such as filtering (see filterNA()) or imputation (see impute()).

Missing values in the row data

Missing values in the row data of an assay will also impact the resulting (aggregated) assay row data, as illustrated in the example below. Any feature variables (a column in the row data) containing NA values will be dropped from the aggregated row data. The reasons underlying this drop are detailed in the reduceDataFrame() manual page: only invariant aggregated rows, i.e. rows resulting from the aggregation from identical variables, are preserved during aggregations.

The situation illustrated below should however only happen in rare cases and should often be imputable using the value of the other aggregation rows before aggregation to preserve the invariant nature of that column. In cases where an NA is present in an otherwise variant column, the column would be dropped anyway.

See Also

The QFeatures vignette provides an extended example and the Processing vignette, for a complete quantitative proteomics data processing pipeline.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
## ---------------------------------------
## An example QFeatures with PSM-level data
## ---------------------------------------
data(feat1)
feat1

## Aggregate PSMs into peptides
feat1 <- aggregateFeatures(feat1, "psms", "Sequence", name = "peptides")
feat1

## Aggregate peptides into proteins
feat1 <- aggregateFeatures(feat1, "peptides", "Protein", name = "proteins")
feat1

assay(feat1[[1]])
assay(feat1[[2]])
aggcounts(feat1[[2]])
assay(feat1[[3]])
aggcounts(feat1[[3]])

## --------------------------------------------
## Aggregation with missing quantitative values
## --------------------------------------------
data(ft_na)
ft_na

assay(ft_na[[1]])
rowData(ft_na[[1]])

## By default, missing values are propagated
ft2 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums)
assay(ft2[[2]])
aggcounts(ft2[[2]])

## The rowData .n variable tallies number of initial rows that
## were aggregated (irrespective of NAs) for all the samples.
rowData(ft2[[2]])

## Ignored when setting na.rm = TRUE
ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums, na.rm = TRUE)
assay(ft3[[2]])
aggcounts(ft3[[2]])

## -----------------------------------------------
## Aggregation with missing values in the row data
## -----------------------------------------------
## Row data results without any NAs, which includes the
## Y variables
rowData(ft2[[2]])

## Missing value in the Y feature variable
rowData(ft_na[[1]])[1, "Y"] <- NA
rowData(ft_na[[1]])

ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums)
## The Y feature variable has been dropped!
assay(ft3[[2]])
rowData(ft3[[2]])

QFeatures documentation built on Nov. 8, 2020, 6:51 p.m.