top_taxa: Get the most abundant taxa from a phyloseq object
In gmteunisse/Fantaxtic: Fantaxtic - nested bar plots for phyloseq data

top_taxa

R Documentation

Get the most abundant taxa from a phyloseq object

Description

This function identifies the top n taxa in a phyloseq object. Users specify the summary statistic that is used to rank the taxa, e.g. sum, mean or median. Furthermore, it is possible to add one or more grouping factors from the tax_table to get group-specific top n taxa.

Usage

top_taxa(
  ps_obj,
  tax_level = NULL,
  n_taxa = 1,
  grouping = NULL,
  by_proportion = TRUE,
  include_na_taxa = F,
  merged_label = "Other",
  FUN = mean,
  ...
)

Arguments

`ps_obj`	A phyloseq object with an `otu_table` and a `tax_table`.
`tax_level`	Optional taxonomic level at which to get the top taxa.
`n_taxa`	The number of top taxa to identify.
`grouping`	A character vector with the names of one or more grouping factors found in the `sample_data`. To group by sample, specify `sample_id`.
`by_proportion`	Converts absolute abundances to proportions before calculating the summary statistic (default = `TRUE`).
`include_na_taxa`	When `tax_level` is specified, include NA taxa? See details.
`merged_label`	The label to assign to merged taxa
`FUN`	Function that returns a single summary statistic from an input vector, e.g. `sum`, `mean` (default) or `median`
`...`	Additional arguments to be passed to `FUN`.

Details

When tax_level = NULL, the analysis will be done at the ASV level. If a tax_level is specified, the object will first be glommed using tax_glom(ps_obj, tax_rank = tax_level, NArm = F) at the specified level. This can lead to taxa with NA annotations at the specified tax_level. By default, these taxa will not be considered for the analysis, but they can be included by setting include_na_taxa = T.

This function, together with collapse_taxa, replaces get_top_taxa. Identical output can be obtained by setting FUN = sum.

The top taxa can be identified based on the absolute abundances or proportions. When using absolute abundances, please make sure to normalize or rarefy the data before using this function. If by_proportion = TRUE, abundances will be converted to relative abundance before applying FUN.

Value

A tibble with the rank, taxon id, grouping factors, abundance summary statistic and taxonomy.

Examples

data(GlobalPatterns)

# Top 10 most abundant ASVs over all samples
top_taxa(GlobalPatterns, n_taxa = 10)

# Top 10 most abundant ASVs over all samples by median abundance
top_taxa(GlobalPatterns, n_taxa = 10, FUN = median, na.rm = T)

# Top 10 most abundant ASVs over all samples using absolute abundances
top_taxa(GlobalPatterns, n_taxa = 10, by_proportion = FALSE)

# Top 2 most abundant ASVs per sample
top_taxa(GlobalPatterns, n_taxa = 2, grouping = "sample_id")

# Top 2 most abundant ASVs per sample type
top_taxa(GlobalPatterns, n_taxa = 2, grouping = "SampleType")

# Top 2 most abundant ASVs per sample type and group
set.seed(1)
sample_data(GlobalPatterns)$group <- as.factor(rbinom(nsamples(GlobalPatterns), 1, .5))
top_taxa(GlobalPatterns, n_taxa = 2, grouping = c("SampleType", "group"))

# Top 20 most abundant genera
top_taxa(GlobalPatterns, n_taxa = 20, tax_level = "Genus")

#' # Top 20 most abundant genera including NAs
top_taxa(GlobalPatterns, n_taxa = 20, tax_level = "Genus", include_na_taxa = T)

gmteunisse/Fantaxtic documentation built on July 13, 2024, 7:12 a.m.