combineFeatures | R Documentation |
MSnSet
object This function combines the features in an
"MSnSet"
instance applying a summarisation
function (see fun
argument) to sets of features as defined by a
factor (see fcol
argument). Note that the feature names are
automatically updated based on the groupBy
parameter.
The coefficient of variations are automatically computed and collated
to the featureData slot. See cv
and cv.norm
arguments
for details.
If NA values are present, a message will be shown. Details on how missing value impact on the data aggregation are provided below.
object |
An instance of class |
groupBy |
A |
fun |
Deprecated; use |
method |
The summerising function. Currently, mean, median,
weighted mean, sum, median polish, robust summarisation (using
|
fcol |
Feature meta-data label (fData column name) defining how
to summerise the features. It must be present in
|
redundancy.handler |
If |
cv |
A |
cv.norm |
A |
verbose |
A |
... |
Additional arguments for the |
Missing values have different effect based on the aggregation method employed, as detailed below. See also examples below.
When using either "sum"
, "mean"
,
"weighted.mean"
or "median"
, any missing value will be
propagated at the higher level. If na.rm = TRUE
is used, then
the missing value will be ignored.
Missing values will result in an error when using
"medpolish"
, unless na.rm = TRUE
is used.
When using robust summarisation ("robust"
), individual
missing values are excluded prior to fitting the linear model by
robust regression. To remove all values in the feature containing
the missing values, use filterNA
.
The "iPQF"
method will fail with an error if missing
value are present, which will have to be handled explicitly. See
below.
More generally, missing values often need dedicated handling such as
filtering (see filterNA
) or imputation (see
impute
).
A new "MSnSet"
instance is returned with
ncol
(i.e. number of samples) is unchanged, but nrow
(i.e. the number od features) is now equals to the number of levels in
groupBy
. The feature metadata (featureData
slot) is
updated accordingly and only the first occurrence of a feature in the
original feature meta-data is kept.
Laurent Gatto with contributions from Martina Fischer for iPQF and Ludger Goeminne, Adriaan Sticker and Lieven Clement for robust.
iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov 20. PubMed PMID:26589272.
featureCV
to calculate coefficient of variation,
nFeatures
to document the number of features per group
in the feature data, and the aggvar
to explore
variability within protein groups.
iPQF
for iPQF summarisation.
NTR
for normalisation to reference summarisation.
data(msnset)
msnset <- msnset[11:15, ]
exprs(msnset)
## arbitrary grouping into two groups
grp <- as.factor(c(1, 1, 2, 2, 2))
msnset.comb <- combineFeatures(msnset, groupBy = grp, method = "sum")
dim(msnset.comb)
exprs(msnset.comb)
fvarLabels(msnset.comb)
## grouping with a list
grpl <- list(c("A", "B"), "A", "A", "C", c("C", "B"))
## optional naming
names(grpl) <- featureNames(msnset)
exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "unique"))
exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "multiple"))
## missing data
exprs(msnset)[4, 4] <-
exprs(msnset)[2, 2] <- NA
exprs(msnset)
## NAs propagate in the 115 and 117 channels
exprs(combineFeatures(msnset, grp, "sum"))
## NAs are removed before summing
exprs(combineFeatures(msnset, grp, "sum", na.rm = TRUE))
## using iPQF
data(msnset2)
anyNA(msnset2)
res <- combineFeatures(msnset2,
groupBy = fData(msnset2)$accession,
redundancy.handler = "unique",
method = "iPQF",
low.support.filter = FALSE,
ratio.calc = "sum",
method.combine = FALSE)
head(exprs(res))
## using robust summarisation
data(msnset) ## reset data
msnset <- log(msnset, 2) ## log2 transform
## Feature X46, in the ENO protein has one missig value
which(is.na(msnset), arr.ind = TRUE)
exprs(msnset["X46", ])
## Only the missing value in X46 and iTRAQ4.116 will be ignored
res <- combineFeatures(msnset,
fcol = "ProteinAccession",
method = "robust")
tail(exprs(res))
msnset2 <- filterNA(msnset) ## remove features with missing value(s)
res2 <- combineFeatures(msnset2,
fcol = "ProteinAccession",
method = "robust")
## Here, the values for ENO are different because the whole feature
## X46 that contained the missing value was removed prior to fitting.
tail(exprs(res2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.