se_collapse_by_row | R Documentation |
Collapse SummarizedExperiment data by row
se_collapse_by_row(
se,
rows = rownames(se),
row_groups,
assay_names = NULL,
group_func_name = c("sum", "mean", "weighted.mean", "geomean", "none"),
rowStatsFunc = NULL,
rowDataColnames = NULL,
keepNULLlevels = FALSE,
delim = "[ ]*[;,]+[ ]*",
data_transform = c("none", "log2p+sqrt", "log2+sqrt", "log2p", "log2"),
verbose = TRUE,
...
)
se |
|
rows |
|
row_groups |
|
assay_names |
|
group_func_name |
|
rowStatsFunc |
|
rowDataColnames |
|
keepNULLlevels |
|
delim |
|
data_transform |
|
verbose |
|
... |
additional arguments are passed to |
Purpose is to collapse rows of a SummarizedExperiment
object,
where measurements for a given entity, usually a gene, are split
across multiple rows in the source data. The output of this function
should be measurements appropriately summarized to the gene level.
The key arguments are group_func_name
, and data_transform
.
Note that data is inverse-transformed based upon data_transform
,
prior to calculating group summary values defined by group_func_name
.
The reason is to enable using group_func_name="sum"
on normal
space abundance values, when input data has already been
transformed with log2(1 + x)
for example. In this case it is most
appropriate to take the sum
of normal space abundance values,
then to re-apply the transformation afterwards.
However, when using group_func_name="mean"
it is usually
recommended to use data_transform="none"
so that data is maintained
in appropriately transformed state.
The driving use case is proteomics mass spectrometry data, where measurements are described in terms of peptide sequences, with or without optional post-translational modification (PTM), and the peptide sequences are annotated to a source protein or gene. This function can be used to:
collapse peptide-PTM data to the peptide level
collapse peptide data to the protein level
In future it may be used to collapse multiple microarray probe measurements to the gene level, although that process is more likely to be useful and recommended after performing probe-level statistical analysis.
For proteomics mass spectrometry data, proteins are inconsistently fragmented into smaller peptides of varying sizes. The peptides are usually separated on a chromatography column, from which aliquot fractions are taken and measured by mass spectrometry. The total signal derived from the original protein is therefore some combination of the measured peptide parts.
In some upstream data processing tools, such as Proteomics Discoverer, and PEAKS, the peptide data may be annotated with observed modification events (PTM). In this scenario, peptide measurements are split across multiple rows of data, where each row represents an observed combination of peptide and PTMs.
It is fairly straightforward to observe peptide-PTM measurement data is correlated with overall protein quantification, and that the specific combination of peptide fragments may be inconsistent across samples. That is, one may observe five peptides of protein A in one sample, and may observe seven peptides of protein A in another sample. The quantities of each peptide may be inconsistent, due to variability in protein fragmentation across samples. However, the general sum of peptide measurements is typically fairly stable across samples, especially for proteins of moderate to high abundance which are known to have stable abundance per cell.
Choice of method to collapse measurements is not trivial, and is
therefore configurable. In general, proteomics abundances are
analyzed after log2( 1 + x )
transformation. However, measurements
cannot be summed in log2 form, which would be equivalent to
multiplying measurements in normal form. Measurements can be summed
but only after exponentiating the data, for example the reciprocal
( 2 ^ x ) - 1
is sufficient.
SummarizedExperiment
object with these changes:
rows will be collapsed by row_groups
, for each assays(se)
numeric
matrix defined by assay_names
. The collapse may
optionally apply a data transformation defined in
data_transform
in order to apply an appropriate numeric
summary
calculation.
rowData(se)
will also be collapsed by shrinkDataFrame()
to
combine unique values from each row annotation.
Other jamses SE utilities:
make_se_test()
,
se_collapse_by_column()
,
se_detected_rows()
,
se_normalize()
,
se_rbind()
,
se_to_rowcoldata()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.