combineFeatures: Combines features in an 'MSnSet' object
In MSnbase: Base Functions and Classes for Mass Spectrometry and Proteomics

Description Arguments Details Value Author(s) References See Also Examples

This function combines the features in an "MSnSet" instance applying a summarisation function (see fun argument) to sets of features as defined by a factor (see fcol argument). Note that the feature names are automatically updated based on the groupBy parameter.

The coefficient of variations are automatically computed and collated to the featureData slot. See cv and cv.norm arguments for details.

If NA values are present, a message will be shown. Details on how missing value impact on the data aggregation are provided below.

`object`	An instance of class `"MSnSet"` whose features will be summerised.
`groupBy`	A `factor`, `character`, `numeric` or a `list` of the above defining how to summerise the features. The list must be of length `nrow(object)`. Each element of the list is a vector describing the feature mapping. If the list can be named, its names must match `fetureNames(object)`. See `redundancy.handler` for details about the latter.
`fun`	Deprecated; use `method` instead.
`method`	The summerising function. Currently, mean, median, weighted mean, sum, median polish, robust summarisation (using `MASS::rlm`), iPQF (see `iPQF` for details) and NTR (see `NTR` for details) are implemented, but user-defined functions can also be supplied. Note that the robust menthods assumes that the data are already log-transformed.
`fcol`	Feature meta-data label (fData column name) defining how to summerise the features. It must be present in `fvarLabels(object)` and, if present, will be used to defined `groupBy` as `fData(object)[, fcol]`. Note that `fcol` is ignored if `groupBy` is present.
`redundancy.handler`	If `groupBy` is a `list`, one of `"unique"` (default) or `"multiple"` (ignored otherwise) defining how to handle peptides that can be associated to multiple higher-level features (proteins) upon combination. Using `"unique"` will only consider uniquely matching features (features matching multiple proteins will be discarded). `"multiple"` will allow matching to multiple proteins and each feature will be repeatedly tallied for each possible matching protein.
`cv`	A `logical` defining if feature coefficients of variation should be computed and stored as feature meta-data. Default is `TRUE`.
`cv.norm`	A `character` defining how to normalise the feature intensitites prior to CV calculation. Default is `sum`. Use `none` to keep intensities as is. See `featureCV` for more details.
`verbose`	A `logical` indicating whether verbose output is to be printed out.
`...`	Additional arguments for the `fun` function.

Missing values have different effect based on the aggregation method employed, as detailed below. See also examples below.

When using either "sum", "mean", "weighted.mean" or "median", any missing value will be propagated at the higher level. If na.rm = TRUE is used, then the missing value will be ignored.
Missing values will result in an error when using "medpolish", unless na.rm = TRUE is used.
When using robust summarisation ("robust"), individual missing values are excluded prior to fitting the linear model by robust regression. To remove all values in the feature containing the missing values, use filterNA.
The "iPQF" method will fail with an error if missing value are present, which will have to be handled explicitly. See below.

More generally, missing values often need dedicated handling such as filtering (see filterNA) or imputation (see impute).

A new "MSnSet" instance is returned with ncol (i.e. number of samples) is unchanged, but nrow (i.e. the number od features) is now equals to the number of levels in groupBy. The feature metadata (featureData slot) is updated accordingly and only the first occurrence of a feature in the original feature meta-data is kept.

Laurent Gatto <lg390@cam.ac.uk> with contributions from Martina Fischer for iPQF and Ludger Goeminne, Adriaan Sticker and Lieven Clement for robust.

iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov 20. PubMed PMID:26589272.

featureCV to calculate coefficient of variation, nFeatures to document the number of features per group in the feature data, and the aggvar to explore variability within protein groups.

iPQF for iPQF summarisation.

NTR for normalisation to reference summarisation.

data(msnset)
msnset <- msnset[11:15, ]
exprs(msnset)

## arbitrary grouping into two groups
grp <- as.factor(c(1, 1, 2, 2, 2))
msnset.comb <- combineFeatures(msnset, groupBy = grp, method = "sum")
dim(msnset.comb)
exprs(msnset.comb)
fvarLabels(msnset.comb)

## grouping with a list
grpl <- list(c("A", "B"), "A", "A", "C", c("C", "B"))
## optional naming
names(grpl) <- featureNames(msnset)
exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "unique"))
exprs(combineFeatures(msnset, groupBy = grpl, method = "sum", redundancy.handler = "multiple"))

## missing data
exprs(msnset)[4, 4] <-
    exprs(msnset)[2, 2] <- NA
exprs(msnset)
## NAs propagate in the 115 and 117 channels
exprs(combineFeatures(msnset, grp, "sum"))
## NAs are removed before summing
exprs(combineFeatures(msnset, grp, "sum", na.rm = TRUE))

## using iPQF
data(msnset2)
anyNA(msnset2)
res <- combineFeatures(msnset2,
		       groupBy = fData(msnset2)$accession,
		       redundancy.handler = "unique",
		       method = "iPQF",
		       low.support.filter = FALSE,
		       ratio.calc = "sum",
		       method.combine = FALSE)
head(exprs(res))

## using robust summarisation
data(msnset) ## reset data
msnset <- log(msnset, 2) ## log2 transform

## Feature X46, in the ENO protein has one missig value
which(is.na(msnset), arr.ind = dim(msnset))
exprs(msnset["X46", ])
## Only the missing value in X46 and iTRAQ4.116 will be ignored
res <- combineFeatures(msnset,
		       fcol = "ProteinAccession",
		       method = "robust")
tail(exprs(res))

msnset2 <- filterNA(msnset) ## remove features with missing value(s)
res2 <- combineFeatures(msnset2,
			fcol = "ProteinAccession",
			method = "robust")
## Here, the values for ENO are different because the whole feature
## X46 that contained the missing value was removed prior to fitting.
tail(exprs(res2))