aggregate | R Documentation |
These functions take a matrix of quantitative features x
and
aggregate the features (rows) according to either a vector (or
factor) INDEX
or an adjacency matrix MAT
. The aggregation
method is defined by function FUN
.
Adjacency matrices are an elegant way to explicitly encode for shared peptides (see example below) during aggregation.
colMeansMat(x, MAT, na.rm = FALSE)
colSumsMat(x, MAT, na.rm = FALSE)
aggregate_by_matrix(x, MAT, FUN, ...)
aggregate_by_vector(x, INDEX, FUN, ...)
x |
A |
MAT |
An adjacency matrix that defines peptide-protein
relations with |
na.rm |
A |
FUN |
A |
... |
Additional arguments passed to |
INDEX |
A |
aggregate_by_matrix()
returns a matrix
(or Matrix
)
of dimensions ncol(MAT)
and ncol(x), with
dimnamesequal to
colnames(x)and
rownames(MAT)'.
aggregate_by_vector()
returns a new matrix
(if x
is
a matrix
) or HDF5Matrix
(if x
is an HDF5Matrix
)
of dimensions length(INDEX)
and ncol(x), with
dimnames equal to
colnames(x)and
INDEX'.
When aggregating with a vector/factor, user-defined functions
must return a vector of length equal to ncol(x)
for each level
in INDEX
. Examples thereof are:
medianPolish()
to fits an additive model (two way
decomposition) using Tukey's median polish procedure using
stats::medpolish()
;
robustSummary()
to calculate a robust aggregation using
MASS::rlm()
;
base::colMeans()
to use the mean of each column;
base::colSums()
to use the sum of each column;
matrixStats::colMedians()
to use the median of each column.
When aggregating with an adjacency matrix, user-defined functions must return a new matrix. Examples thereof are:
colSumsMat(x, MAT)
aggregates by the summing the peptide intensities
for each protein. Shared peptides are re-used multiple times.
colMeansMat(x, MAT)
aggregation by the calculating the mean of
peptide intensities. Shared peptides are re-used multiple
times.
By default, missing values in the quantitative data will propagate
to the aggregated data. You can provide na.rm = TRUE
to most
functions listed above to ignore missing values, except for
robustSummary()
where you should supply na.action = na.omit
(see ?MASS::rlm
).
Laurent Gatto and Samuel Wieczorek (aggregation from an adjacency matrix).
Other Quantitative feature aggregation:
colCounts()
,
medianPolish()
,
robustSummary()
x <- matrix(c(10.39, 17.16, 14.10, 12.85, 10.63, 7.52, 3.91,
11.13, 16.53, 14.17, 11.94, 11.51, 7.69, 3.97,
11.93, 15.37, 14.24, 11.21, 12.29, 9.00, 3.83,
12.90, 14.37, 14.16, 10.12, 13.33, 9.75, 3.81),
nrow = 7,
dimnames = list(paste0("Pep", 1:7), paste0("Sample", 1:4)))
x
## -------------------------
## Aggregation by vector
## -------------------------
(k <- paste0("Prot", c("B", "E", "X", "E", "B", "B", "E")))
aggregate_by_vector(x, k, colMeans)
aggregate_by_vector(x, k, robustSummary)
aggregate_by_vector(x, k, medianPolish)
## -------------------------
## Aggregation by matrix
## -------------------------
adj <- matrix(c(1, 0, 0, 1, 1, 1, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0,
1, 0, 0, 0, 1),
nrow = 7,
dimnames = list(paste0("Pep", 1:7),
paste0("Prot", c("B", "E", "X"))))
adj
## Peptide 4 is shared by 2 proteins (has a rowSums of 2),
## namely proteins B and E
rowSums(adj)
aggregate_by_matrix(x, adj, colSumsMat)
aggregate_by_matrix(x, adj, colMeansMat)
## ---------------
## Missing values
## ---------------
x <- matrix(c(NA, 2:6), ncol = 2,
dimnames = list(paste0("Pep", 1:3),
c("S1", "S2")))
x
## simply use na.rm = TRUE to ignore missing values
## during the aggregation
(k <- LETTERS[c(1, 1, 2)])
aggregate_by_vector(x, k, colSums)
aggregate_by_vector(x, k, colSums, na.rm = TRUE)
(adj <- matrix(c(1, 1, 0, 0, 0, 1), ncol = 2,
dimnames = list(paste0("Pep", 1:3),
c("A", "B"))))
aggregate_by_matrix(x, adj, colSumsMat, na.rm = FALSE)
aggregate_by_matrix(x, adj, colSumsMat, na.rm = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.