getPrevalence: Calculation prevalence information for features across...
In FelixErnst/mia: Microbiome analysis

getPrevalence

R Documentation

Calculation prevalence information for features across samples

Description

These functions calculate the population prevalence for taxonomic ranks in a SummarizedExperiment object.

Usage

getPrevalence(x, ...)

getPrevalent(x, ...)

getRare(x, ...)

subsetByPrevalent(x, ...)

subsetByRare(x, ...)

getPrevalentAbundance(
  x,
  assay.type = assay_name,
  assay_name = "relabundance",
  ...
)

addPrevalentAbundance(x, ...)

addPrevalence(x, ...)

## S4 method for signature 'SummarizedExperiment'
addPrevalence(x, name = "prevalence", ...)

## S4 method for signature 'ANY'
getPrevalence(
  x,
  detection = 0,
  include.lowest = include_lowest,
  include_lowest = FALSE,
  sort = FALSE,
  na.rm = TRUE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
getPrevalence(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  rank = NULL,
  ...
)

## S4 method for signature 'ANY'
getPrevalent(
  x,
  prevalence = 50/100,
  include.lowest = include_lowest,
  include_lowest = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
getPrevalent(
  x,
  rank = NULL,
  prevalence = 50/100,
  include.lowest = include_lowest,
  include_lowest = FALSE,
  ...
)

## S4 method for signature 'ANY'
getRare(
  x,
  prevalence = 50/100,
  include.lowest = include_lowest,
  include_lowest = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
getRare(
  x,
  rank = NULL,
  prevalence = 50/100,
  include.lowest = include_lowest,
  include_lowest = FALSE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
subsetByPrevalent(x, rank = NULL, ...)

## S4 method for signature 'TreeSummarizedExperiment'
subsetByPrevalent(x, update.tree = TRUE, ...)

## S4 method for signature 'SummarizedExperiment'
subsetByRare(x, rank = NULL, ...)

## S4 method for signature 'TreeSummarizedExperiment'
subsetByRare(x, update.tree = TRUE, ...)

## S4 method for signature 'SummarizedExperiment'
addPrevalentAbundance(x, name = "prevalent_abundance", ...)

## S4 method for signature 'ANY'
getPrevalentAbundance(
  x,
  assay.type = assay_name,
  assay_name = "relabundance",
  ...
)

## S4 method for signature 'SummarizedExperiment'
getPrevalentAbundance(x, assay.type = assay_name, assay_name = "counts", ...)

Arguments

`x`	`TreeSummarizedExperiment`.
`...`	additional arguments If `!is.null(rank)` arguments are passed on to `agglomerateByRank`. See `?agglomerateByRank` for more details. for `getPrevalent`, `getRare`, `subsetByPrevalent` and `subsetByRare` additional parameters passed to `getPrevalence` for `getPrevalentAbundance` additional parameters passed to `getPrevalent`
`assay.type`	`Character scalar`. Specifies the name of assay used in calculation. (Default: `"counts"`)
`assay_name`	Deprecated. Use `assay.type` instead.
`name`	`Character scalar`. Specifies name of column in `rowData` where the results will be stored. (Default: `"prevalence"`)
`detection`	`Numeric scalar`. Detection threshold for absence/presence. If `as_relative = FALSE`, it sets the counts threshold for a taxon to be considered present. If `as_relative = TRUE`, it sets the relative abundance threshold for a taxon to be considered present. (Default: `0`)
`include.lowest`	`Logical scalar`. Should the lower boundary of the detection and prevalence cutoffs be included? (Default: `FALSE`)
`include_lowest`	Deprecated. Use `include.lowest` instead.
`sort`	`Logical scalar`. Should the result be sorted by prevalence? (Default: `FALSE`)
`na.rm`	`Logical scalar`. Should NA values be omitted? (Default: `TRUE`)
`rank`	`Character scalar`. Defines a taxonomic rank. Must be a value of `taxonomyRanks()` function.
`prevalence`	Prevalence threshold (in 0 to 1). The required prevalence is strictly greater by default. To include the limit, set `include.lowest` to `TRUE`.
`update.tree`	`Logical scalar`. Should `rowTree()` also be agglomerated? (Default: `TRUE`)

Details

getPrevalence calculates the frequency of samples that exceed the detection threshold. For SummarizedExperiment objects, the prevalence is calculated for the selected taxonomic rank, otherwise for the rows. The absolute population prevalence can be obtained by multiplying the prevalence by the number of samples (ncol(x)).

The core abundance index from getPrevalentAbundance gives the relative proportion of the core species (in between 0 and 1). The core taxa are defined as those that exceed the given population prevalence threshold at the given detection level as set for getPrevalent.

subsetPrevalent and subsetRareFeatures return a subset of x. The subset includes the most prevalent or rare taxa that are calculated with getPrevalent or getRare respectively.

getPrevalent returns taxa that are more prevalent with the given detection threshold for the selected taxonomic rank.

getRare returns complement of getPrevalent.

Value

subsetPrevalent and subsetRareFeatures return subset of x.

All other functions return a named vectors:

getPrevalence returns a numeric vector with the names being set to either the row names of x or the names after agglomeration. addPrevalence adds these results to rowData(x).
getPrevalentAbundance returns a numeric vector with the names corresponding to the column name of x and include the joint abundance of prevalent taxa.
getPrevalent and getRare return a character vector with only the names exceeding the threshold set by prevalence, if the rownames of x is set. Otherwise an integer vector is returned matching the rows in x.

References

A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the R package, see citation('mia')

Examples

data(GlobalPatterns)
tse <- GlobalPatterns
# Get prevalence estimates for individual ASV/OTU
prevalence.frequency <- getPrevalence(tse,
                                      detection = 0,
                                      sort = TRUE)
head(prevalence.frequency)

# Get prevalence estimates for phyla
# - the getPrevalence function itself always returns population frequencies
prevalence.frequency <- getPrevalence(tse,
                                      rank = "Phylum",
                                      detection = 0,
                                      sort = TRUE)
head(prevalence.frequency)

# - to obtain population counts, multiply frequencies with the sample size,
# which answers the question "In how many samples is this phylum detectable"
prevalence.count <- prevalence.frequency * ncol(tse)
head(prevalence.count)

# Detection threshold 1 (strictly greater by default);
# Note that the data (GlobalPatterns) is here in absolute counts
# (and not compositional, relative abundances)
# Prevalence threshold 50 percent (strictly greater by default)
prevalent <- getPrevalent(
    tse,
    rank = "Phylum",
    detection = 10,
    prevalence = 50/100)
head(prevalent)

# Add relative aundance data
tse <- transformAssay(tse, assay.type = "counts", method = "relabundance")

# Gets a subset of object that includes prevalent taxa
altExp(tse, "prevalent") <- subsetByPrevalent(tse,
                                             rank = "Family",
                                             assay.type = "relabundance",
                                             detection = 0.001,
                                             prevalence = 0.55)
altExp(tse, "prevalent")

# getRare returns the inverse
rare <- getRare(tse,
    rank = "Phylum",
    assay.type = "relabundance",
    detection = 1/100,
    prevalence = 50/100)
head(rare)

# Gets a subset of object that includes rare taxa
altExp(tse, "rare") <- subsetByRare(
    tse,
    rank = "Class",
    assay.type = "relabundance",
    detection = 0.001,
    prevalence = 0.001)
altExp(tse, "rare")

# Names of both experiments, prevalent and rare, can be found from slot
# altExpNames
tse

data(esophagus)
getPrevalentAbundance(esophagus, assay.type = "counts")

FelixErnst/mia documentation built on July 16, 2025, 8:08 p.m.