aggregate: Aggreagate quantitative features

aggregateR Documentation

Aggreagate quantitative features

Description

These functions take a matrix of quantitative features x and aggregate the features (rows) according to either a vector (or factor) INDEX or an adjacency matrix MAT. The aggregation method is defined by function FUN.

Adjacency matrices are an elegant way to explicitly encode for shared peptides (see example below) during aggregation.

Usage

colMeansMat(x, MAT, na.rm = FALSE)

colSumsMat(x, MAT, na.rm = FALSE)

aggregate_by_matrix(x, MAT, FUN, ...)

aggregate_by_vector(x, INDEX, FUN, ...)

Arguments

x

A matrix of mode numeric or an HDF5Matrix object of type numeric.

MAT

An adjacency matrix that defines peptide-protein relations with nrow(MAT) == nrow(x): a non-missing/non-null value at position (i,j) indicates that peptide i belong to protein j. This matrix is tyically binary but can also contain weighted relations.

na.rm

A logical(1) indicating whether the missing values (including NaN) should be omitted from the calculations or not. Defaults to FALSE.

FUN

A function to be applied to the subsets of x.

...

Additional arguments passed to FUN.

INDEX

A vector or factor of length nrow(x).

Value

aggregate_by_matrix() returns a matrix (or Matrix) of dimensions ncol(MAT) and ⁠ncol(x), with ⁠dimnames⁠equal to⁠colnames(x)andrownames(MAT)'.

aggregate_by_vector() returns a new matrix (if x is a matrix) or HDF5Matrix (if x is an HDF5Matrix) of dimensions length(INDEX) and ⁠ncol(x), with ⁠dimnames⁠ equal to⁠colnames(x)andINDEX'.

Vector-based aggregation functions

When aggregating with a vector/factor, user-defined functions must return a vector of length equal to ncol(x) for each level in INDEX. Examples thereof are:

  • medianPolish() to fits an additive model (two way decomposition) using Tukey's median polish procedure using stats::medpolish();

  • robustSummary() to calculate a robust aggregation using MASS::rlm();

  • base::colMeans() to use the mean of each column;

  • base::colSums() to use the sum of each column;

  • matrixStats::colMedians() to use the median of each column.

Matrix-based aggregation functions

When aggregating with an adjacency matrix, user-defined functions must return a new matrix. Examples thereof are:

  • colSumsMat(x, MAT) aggregates by the summing the peptide intensities for each protein. Shared peptides are re-used multiple times.

  • colMeansMat(x, MAT) aggregation by the calculating the mean of peptide intensities. Shared peptides are re-used multiple times.

Handling missing values

By default, missing values in the quantitative data will propagate to the aggregated data. You can provide na.rm = TRUE to most functions listed above to ignore missing values, except for robustSummary() where you should supply na.action = na.omit (see ?MASS::rlm).

Author(s)

Laurent Gatto and Samuel Wieczorek (aggregation from an adjacency matrix).

See Also

Other Quantitative feature aggregation: colCounts(), medianPolish(), robustSummary()

Examples


x <- matrix(c(10.39, 17.16, 14.10, 12.85, 10.63, 7.52, 3.91,
              11.13, 16.53, 14.17, 11.94, 11.51, 7.69, 3.97,
              11.93, 15.37, 14.24, 11.21, 12.29, 9.00, 3.83,
              12.90, 14.37, 14.16, 10.12, 13.33, 9.75, 3.81),
            nrow = 7,
            dimnames = list(paste0("Pep", 1:7), paste0("Sample", 1:4)))
x

## -------------------------
## Aggregation by vector
## -------------------------

(k <- paste0("Prot", c("B", "E", "X", "E", "B", "B", "E")))

aggregate_by_vector(x, k, colMeans)
aggregate_by_vector(x, k, robustSummary)
aggregate_by_vector(x, k, medianPolish)

## -------------------------
## Aggregation by matrix
## -------------------------

adj <- matrix(c(1, 0, 0, 1, 1, 1, 0, 0,
                1, 0, 1, 0, 0, 1, 0, 0,
                1, 0, 0, 0, 1),
              nrow = 7,
              dimnames = list(paste0("Pep", 1:7),
                              paste0("Prot", c("B", "E", "X"))))
adj

## Peptide 4 is shared by 2 proteins (has a rowSums of 2),
## namely proteins B and E
rowSums(adj)

aggregate_by_matrix(x, adj, colSumsMat)
aggregate_by_matrix(x, adj, colMeansMat)

## ---------------
## Missing values
## ---------------

x <- matrix(c(NA, 2:6), ncol = 2,
            dimnames = list(paste0("Pep", 1:3),
                            c("S1", "S2")))
x

## simply use na.rm = TRUE to ignore missing values
## during the aggregation

(k <- LETTERS[c(1, 1, 2)])
aggregate_by_vector(x, k, colSums)
aggregate_by_vector(x, k, colSums, na.rm = TRUE)

(adj <- matrix(c(1, 1, 0, 0, 0, 1), ncol = 2,
               dimnames = list(paste0("Pep", 1:3),
                           c("A", "B"))))
aggregate_by_matrix(x, adj, colSumsMat, na.rm = FALSE)
aggregate_by_matrix(x, adj, colSumsMat, na.rm = TRUE)


rformassspectrometry/MsCoreUtils documentation built on Oct. 24, 2024, 1:52 p.m.