agglomerate-methods | R Documentation |
Agglomeration functions can be used to sum-up data based on specific criteria such as taxonomic ranks, variables or prevalence.
agglomerateByRanks
takes a SummarizedExperiment
, splits it along the
taxonomic ranks, aggregates the data per rank, converts the input to a
SingleCellExperiment
objects and stores the aggregated data as
alternative experiments. unsplitByRanks
takes these alternative
experiments and flattens them again into a single
SummarizedExperiment
.
agglomerateByRank(x, ...)
agglomerateByVariable(x, ...)
## S4 method for signature 'SummarizedExperiment'
agglomerateByRank(
x,
rank = taxonomyRanks(x)[1],
na.rm = TRUE,
empty.fields = c(NA, "", " ", "\t", "-", "_"),
...
)
## S4 method for signature 'SummarizedExperiment'
agglomerateByVariable(x, by, group = f, f, ...)
## S4 method for signature 'TreeSummarizedExperiment'
agglomerateByVariable(
x,
by,
group = f,
f,
update.tree = mergeTree,
mergeTree = FALSE,
...
)
## S4 method for signature 'SingleCellExperiment'
agglomerateByRank(
x,
...,
altexp = NULL,
altexp.rm = strip_altexp,
strip_altexp = TRUE
)
## S4 method for signature 'TreeSummarizedExperiment'
agglomerateByRank(
x,
...,
update.tree = agglomerateTree,
agglomerate.tree = agglomerateTree,
agglomerateTree = FALSE
)
agglomerateByRanks(x, ...)
## S4 method for signature 'SummarizedExperiment'
agglomerateByRanks(
x,
ranks = taxonomyRanks(x),
na.rm = TRUE,
as.list = FALSE,
...
)
## S4 method for signature 'SingleCellExperiment'
agglomerateByRanks(
x,
ranks = taxonomyRanks(x),
na.rm = TRUE,
as.list = FALSE,
...
)
## S4 method for signature 'TreeSummarizedExperiment'
agglomerateByRanks(
x,
ranks = taxonomyRanks(x),
na.rm = TRUE,
as.list = FALSE,
...
)
splitByRanks(x, ...)
unsplitByRanks(x, ...)
## S4 method for signature 'SingleCellExperiment'
unsplitByRanks(
x,
ranks = taxonomyRanks(x),
keep.dimred = keep_reducedDims,
keep_reducedDims = FALSE,
...
)
## S4 method for signature 'TreeSummarizedExperiment'
unsplitByRanks(
x,
ranks = taxonomyRanks(x),
keep.dimred = keep_reducedDims,
keep_reducedDims = FALSE,
...
)
x |
|
... |
arguments passed to |
rank |
|
na.rm |
|
empty.fields |
|
by |
|
group |
|
f |
Deprecated. Use |
update.tree |
|
mergeTree |
Deprecated. Use |
altexp |
|
altexp.rm |
|
strip_altexp |
Deprecated. Use |
agglomerate.tree |
Deprecated. Use |
agglomerateTree |
Deprecated. Use |
ranks |
|
as.list |
|
keep.dimred |
|
keep_reducedDims |
Deprecated. Use |
agglomerateByRank
can be used to sum up data based on associations
with certain taxonomic ranks, as defined in rowData
. Only available
taxonomyRanks
can be used.
agglomerateByVariable
merges data on rows or columns of a
SummarizedExperiment
as defined by a factor
alongside the
chosen dimension. This function allows agglomeration of data based on other
variables than taxonomy ranks.
Metadata from the rowData
or colData
are
retained as defined by archetype
.
assay
are
agglomerated, i.e. summed up. If the assay contains values other than counts
or absolute values, this can lead to meaningless values being produced.
Agglomeration sums up the values of assays at the specified taxonomic level. With certain assays, e.g. those that include binary or negative values, this summing can produce meaningless values. In those cases, consider performing agglomeration first, and then applying the transformation afterwards.
agglomerateByVariable
works similarly to
sumCountsAcrossFeatures
.
However, additional support for TreeSummarizedExperiment
was added and
science field agnostic names were used. In addition the archetype
argument lets the user select how to preserve row or column data.
For merge data of assays the function from scuttle
are used.
agglomerateByRanks
will use by default all available taxonomic ranks, but
this can be controlled by setting ranks
manually. NA
values
are removed by default, since they would not make sense, if the result
should be used for unsplitByRanks
at some point. The input data
remains unchanged in the returned SingleCellExperiment
objects.
unsplitByRanks
will remove any NA
value on each taxonomic rank
so that no ambiguous data is created. In additional, a column
taxonomicLevel
is created or overwritten in the rowData
to
specify from which alternative experiment this originates from. This can also
be used for splitAltExps
to
split the result along the same factor again. The input data from the base
objects is not returned, only the data from the altExp()
. Be aware that
changes to rowData
of the base object are not returned, whereas only
the colData
of the base object is kept.
agglomerateByRank
returns a taxonomically-agglomerated,
optionally-pruned object of the same class as x
.
agglomerateByVariable
returns an object of the same class as x
with the specified entries merged into one entry in all relevant components.
agglomerateByRank
returns a taxonomically-agglomerated,
optionally-pruned object of the same class as x
.
For agglomerateByRanks
:
If as.list = TRUE
: SummarizedExperiment
objects in a
SimpleList
If as.list = FALSE
: The SummarizedExperiment
passed as a
parameter and now containing the SummarizedExperiment
objects in its
altExps
For unsplitByRanks
: x
, with rowData
and assay
data replaced by the unsplit data. colData
of x is kept as well
and any existing rowTree
is dropped as well, since existing
rowLinks
are not valid anymore.
splitOn
unsplitOn
agglomerateByVariable
,
sumCountsAcrossFeatures
,
agglomerateByRank
,
altExps
,
splitAltExps
### Agglomerate data based on taxonomic information
data(GlobalPatterns)
# print the available taxonomic ranks
colnames(rowData(GlobalPatterns))
taxonomyRanks(GlobalPatterns)
# agglomerate at the Family taxonomic rank
x1 <- agglomerateByRank(GlobalPatterns, rank="Family")
## How many taxa before/after agglomeration?
nrow(GlobalPatterns)
nrow(x1)
# agglomerate the tree as well
x2 <- agglomerateByRank(GlobalPatterns, rank="Family",
update.tree = TRUE)
nrow(x2) # same number of rows, but
rowTree(x1) # ... different
rowTree(x2) # ... tree
# If assay contains binary or negative values, summing might lead to
# meaningless values, and you will get a warning. In these cases, you might
# want to do agglomeration again at chosen taxonomic level.
tse <- transformAssay(GlobalPatterns, method = "pa")
tse <- agglomerateByRank(tse, rank = "Genus")
tse <- transformAssay(tse, method = "pa")
# removing empty labels by setting na.rm = TRUE
sum(is.na(rowData(GlobalPatterns)$Family))
x3 <- agglomerateByRank(GlobalPatterns, rank="Family", na.rm = TRUE)
nrow(x3) # different from x2
# Because all the rownames are from the same rank, rownames do not include
# prefixes, in this case "Family:".
print(rownames(x3[1:3,]))
# To add them, use getTaxonomyLabels function.
rownames(x3) <- getTaxonomyLabels(x3, with.rank = TRUE)
print(rownames(x3[1:3,]))
# use 'empty.ranks.rm' to remove columns that include only NAs
x4 <- agglomerateByRank(GlobalPatterns, rank="Phylum",
empty.ranks.rm = TRUE)
head(rowData(x4))
# If the assay contains NAs, you might want to consider replacing them,
# since summing-up NAs lead to NA
x5 <- GlobalPatterns
# Replace first value with NA
assay(x5)[1,1] <- NA
x6 <- agglomerateByRank(x5, "Kingdom")
head( assay(x6) )
# Replace NAs with 0. This is justified when we are summing-up counts.
assay(x5)[ is.na(assay(x5)) ] <- 0
x6 <- agglomerateByRank(x5, "Kingdom")
head( assay(x6) )
## Look at enterotype dataset...
data(enterotype)
## Print the available taxonomic ranks. Shows only 1 available rank,
## not useful for agglomerateByRank
taxonomyRanks(enterotype)
### Merge TreeSummarizedExperiments on rows and columns
data(esophagus)
esophagus
plot(rowTree(esophagus))
# get a factor for merging
f <- factor(regmatches(rownames(esophagus),
regexpr("^[0-9]*_[0-9]*",rownames(esophagus))))
merged <- agglomerateByVariable(esophagus, by = "rows", f,
update.tree = TRUE)
plot(rowTree(merged))
#
data(GlobalPatterns)
GlobalPatterns
merged <- agglomerateByVariable(GlobalPatterns, by = "cols",
colData(GlobalPatterns)$SampleType)
merged
data(GlobalPatterns)
# print the available taxonomic ranks
taxonomyRanks(GlobalPatterns)
# agglomerateByRanks
#
tse <- agglomerateByRanks(GlobalPatterns)
altExps(tse)
altExp(tse,"Kingdom")
altExp(tse,"Species")
# unsplitByRanks
tse <- unsplitByRanks(tse)
tse
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.