Description Usage Arguments Details Value Missing quantitative values Missing values in the row data See Also Examples
This function aggregates the quantitative features of an assay,
applying a summarisation function (fun
) to sets of features as
defined by the fcol
feature variable. The new assay's features
will be named based on the unique fcol
values.
In addition to the results of the aggregation, the newly
aggregated SummarizedExperiment
assay also contains a new
aggcounts
assay containing the aggregation counts matrix,
i.e. the number of features that were aggregated, which can be
accessed with the aggcounts()
accessor.
The rowData of the aggregated SummarizedExperiment
assay
contains a .n
variable that provides the number of features that
were aggregated. This .n
value is always >= that the
sample-level aggcounts
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ## S4 method for signature 'QFeatures'
aggregateFeatures(
object,
i,
fcol,
name = "newAssay",
fun = MsCoreUtils::robustSummary,
...
)
## S4 method for signature 'SummarizedExperiment'
aggregateFeatures(object, fcol, fun = MsCoreUtils::robustSummary, ...)
## S4 method for signature 'SummarizedExperiment'
aggcounts(object, ...)
|
object |
An instance of class QFeatures or SummarizedExperiment. |
i |
The index or name of the assay which features will be aggregated the create the new assay. |
fcol |
The feature variable of assay |
name |
A |
fun |
A function used for quantitative feature aggregation. See Details for examples. |
... |
Additional parameters passed the |
Aggregation is performed by a function that takes a matrix as
input and returns a vector of length equal to ncol(x)
. Examples
thereof are
MsCoreUtils::medianPolish()
to fits an additive model (two way
decomposition) using Tukey's median polish_ procedure using
stats::medpolish()
;
MsCoreUtils::robustSummary()
to calculate a robust aggregation
using MASS::rlm()
(default);
base::colMeans()
to use the mean of each column;
matrixStats::colMedians()
to use the median of each column.
base::colSums()
to use the sum of each column;
A QFeatures
object with an additional assay or a
SummarizedExperiment
object (or subclass thereof).
Missing quantitative values have different effect based on the aggregation method employed:
The aggregation functions should be able to deal with missing
values by either ignoring them, and propagating them. This is
often done with an na.rm
argument, that can be passed with
...
. For example, rowSums
, rowMeans
, rowMedians
,
... will ignore NA
values with na.rm = TRUE
, as illustrated
below.
Missing values will result in an error when using medpolish
,
unless na.rm = TRUE
is used. Note that this option relies on
implicit assumptions and/or performes an implicit imputation:
when summing, the values are implicitly imputed by 0, assuming
that the NA
represent a trully absent features; when
averaging, the assumption is that the NA
represented a
genuinely missing value.
When using robust summarisation, individual missing values are
excluded prior to fitting the linear model by robust
regression. To remove all values in the feature containing the
missing values, use filterNA()
.
More generally, missing values often need dedicated handling such
as filtering (see filterNA()
) or imputation (see impute()
).
Missing values in the row data of an assay will also impact the
resulting (aggregated) assay row data, as illustrated in the
example below. Any feature variables (a column in the row data)
containing NA
values will be dropped from the aggregated row
data. The reasons underlying this drop are detailed in the
reduceDataFrame()
manual page: only invariant aggregated rows,
i.e. rows resulting from the aggregation from identical variables,
are preserved during aggregations.
The situation illustrated below should however only happen in rare
cases and should often be imputable using the value of the other
aggregation rows before aggregation to preserve the invariant
nature of that column. In cases where an NA
is present in an
otherwise variant column, the column would be dropped anyway.
The QFeatures vignette provides an extended example and the Processing vignette, for a complete quantitative proteomics data processing pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ## ---------------------------------------
## An example QFeatures with PSM-level data
## ---------------------------------------
data(feat1)
feat1
## Aggregate PSMs into peptides
feat1 <- aggregateFeatures(feat1, "psms", "Sequence", name = "peptides")
feat1
## Aggregate peptides into proteins
feat1 <- aggregateFeatures(feat1, "peptides", "Protein", name = "proteins")
feat1
assay(feat1[[1]])
assay(feat1[[2]])
aggcounts(feat1[[2]])
assay(feat1[[3]])
aggcounts(feat1[[3]])
## --------------------------------------------
## Aggregation with missing quantitative values
## --------------------------------------------
data(ft_na)
ft_na
assay(ft_na[[1]])
rowData(ft_na[[1]])
## By default, missing values are propagated
ft2 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums)
assay(ft2[[2]])
aggcounts(ft2[[2]])
## The rowData .n variable tallies number of initial rows that
## were aggregated (irrespective of NAs) for all the samples.
rowData(ft2[[2]])
## Ignored when setting na.rm = TRUE
ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums, na.rm = TRUE)
assay(ft3[[2]])
aggcounts(ft3[[2]])
## -----------------------------------------------
## Aggregation with missing values in the row data
## -----------------------------------------------
## Row data results without any NAs, which includes the
## Y variables
rowData(ft2[[2]])
## Missing value in the Y feature variable
rowData(ft_na[[1]])[1, "Y"] <- NA
rowData(ft_na[[1]])
ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums)
## The Y feature variable has been dropped!
assay(ft3[[2]])
rowData(ft3[[2]])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.