aggregate: Aggreagate quantitative features
In rformassspectrometry/MsCoreUtils: Core Utils for Mass Spectrometry Data

aggregate

R Documentation

Aggreagate quantitative features

Description

These functions take a matrix of quantitative features x and aggregate the features (rows) according to either a vector (or factor) INDEX or an adjacency matrix MAT. The aggregation method is defined by function FUN.

Adjacency matrices are an elegant way to explicitly encode for shared peptides (see example below) during aggregation.

Usage

colMeansMat(x, MAT, na.rm = FALSE)

colSumsMat(x, MAT, na.rm = FALSE)

aggregate_by_matrix(x, MAT, FUN, ...)

aggregate_by_vector(x, INDEX, FUN, ...)

Arguments

`x`	A `matrix` of mode `numeric` or an `HDF5Matrix` object of type `numeric`.
`MAT`	An adjacency matrix that defines peptide-protein relations with `nrow(MAT) == nrow(x)`: a non-missing/non-null value at position (i,j) indicates that peptide i belong to protein j. This matrix is tyically binary but can also contain weighted relations.
`na.rm`	A `logical(1)` indicating whether the missing values (including NaN) should be omitted from the calculations or not. Defaults to `FALSE`.
`FUN`	A `function` to be applied to the subsets of `x`.
`...`	Additional arguments passed to `FUN`.
`INDEX`	A `vector` or `factor` of length `nrow(x)`.

Value

aggregate_by_matrix() returns a matrix (or Matrix) of dimensions ncol(MAT) and ⁠ncol(x), with ⁠dimnames⁠equal to⁠colnames(x)andrownames(MAT)'.

aggregate_by_vector() returns a new matrix (if x is a matrix) or HDF5Matrix (if x is an HDF5Matrix) of dimensions length(INDEX) and ⁠ncol(x), with ⁠dimnames⁠ equal to⁠colnames(x)andINDEX'.

Vector-based aggregation functions

When aggregating with a vector/factor, user-defined functions must return a vector of length equal to ncol(x) for each level in INDEX. Examples thereof are:

medianPolish() to fits an additive model (two way decomposition) using Tukey's median polish procedure using stats::medpolish();
robustSummary() to calculate a robust aggregation using MASS::rlm();
base::colMeans() to use the mean of each column;
base::colSums() to use the sum of each column;
matrixStats::colMedians() to use the median of each column.

Matrix-based aggregation functions

When aggregating with an adjacency matrix, user-defined functions must return a new matrix. Examples thereof are:

colSumsMat(x, MAT) aggregates by the summing the peptide intensities for each protein. Shared peptides are re-used multiple times.
colMeansMat(x, MAT) aggregation by the calculating the mean of peptide intensities. Shared peptides are re-used multiple times.

Handling missing values

By default, missing values in the quantitative data will propagate to the aggregated data. You can provide na.rm = TRUE to most functions listed above to ignore missing values, except for robustSummary() where you should supply na.action = na.omit (see ?MASS::rlm).

Author(s)

Laurent Gatto and Samuel Wieczorek (aggregation from an adjacency matrix).

Examples


x <- matrix(c(10.39, 17.16, 14.10, 12.85, 10.63, 7.52, 3.91,
              11.13, 16.53, 14.17, 11.94, 11.51, 7.69, 3.97,
              11.93, 15.37, 14.24, 11.21, 12.29, 9.00, 3.83,
              12.90, 14.37, 14.16, 10.12, 13.33, 9.75, 3.81),
            nrow = 7,
            dimnames = list(paste0("Pep", 1:7), paste0("Sample", 1:4)))
x

## -------------------------
## Aggregation by vector
## -------------------------

(k <- paste0("Prot", c("B", "E", "X", "E", "B", "B", "E")))

aggregate_by_vector(x, k, colMeans)
aggregate_by_vector(x, k, robustSummary)
aggregate_by_vector(x, k, medianPolish)

## -------------------------
## Aggregation by matrix
## -------------------------

adj <- matrix(c(1, 0, 0, 1, 1, 1, 0, 0,
                1, 0, 1, 0, 0, 1, 0, 0,
                1, 0, 0, 0, 1),
              nrow = 7,
              dimnames = list(paste0("Pep", 1:7),
                              paste0("Prot", c("B", "E", "X"))))
adj

## Peptide 4 is shared by 2 proteins (has a rowSums of 2),
## namely proteins B and E
rowSums(adj)

aggregate_by_matrix(x, adj, colSumsMat)
aggregate_by_matrix(x, adj, colMeansMat)

## ---------------
## Missing values
## ---------------

x <- matrix(c(NA, 2:6), ncol = 2,
            dimnames = list(paste0("Pep", 1:3),
                            c("S1", "S2")))
x

## simply use na.rm = TRUE to ignore missing values
## during the aggregation

(k <- LETTERS[c(1, 1, 2)])
aggregate_by_vector(x, k, colSums)
aggregate_by_vector(x, k, colSums, na.rm = TRUE)

(adj <- matrix(c(1, 1, 0, 0, 0, 1), ncol = 2,
               dimnames = list(paste0("Pep", 1:3),
                           c("A", "B"))))
aggregate_by_matrix(x, adj, colSumsMat, na.rm = FALSE)
aggregate_by_matrix(x, adj, colSumsMat, na.rm = TRUE)

rformassspectrometry/MsCoreUtils documentation built on April 12, 2025, 6:54 p.m.