top_taxa: Get the most abundant taxa from a phyloseq object

View source: R/top_taxa.R

top_taxaR Documentation

Get the most abundant taxa from a phyloseq object

Description

This function identifies the top n taxa in a phyloseq object. Users specify the summary statistic that is used to rank the taxa, e.g. sum, mean or median. Furthermore, it is possible to add one or more grouping factors from the tax_table to get group-specific top n taxa.

Usage

top_taxa(
  ps_obj,
  tax_level = NULL,
  n_taxa = 1,
  grouping = NULL,
  by_proportion = TRUE,
  include_na_taxa = F,
  merged_label = "Other",
  FUN = mean,
  ...
)

Arguments

ps_obj

A phyloseq object with an otu_table and a tax_table.

tax_level

Optional taxonomic level at which to get the top taxa.

n_taxa

The number of top taxa to identify.

grouping

A character vector with the names of one or more grouping factors found in the sample_data. To group by sample, specify sample_id.

by_proportion

Converts absolute abundances to proportions before calculating the summary statistic (default = TRUE).

include_na_taxa

When tax_level is specified, include NA taxa? See details.

merged_label

The label to assign to merged taxa

FUN

Function that returns a single summary statistic from an input vector, e.g. sum, mean (default) or median

...

Additional arguments to be passed to FUN.

Details

When tax_level = NULL, the analysis will be done at the ASV level. If a tax_level is specified, the object will first be glommed using tax_glom(ps_obj, tax_rank = tax_level, NArm = F) at the specified level. This can lead to taxa with NA annotations at the specified tax_level. By default, these taxa will not be considered for the analysis, but they can be included by setting include_na_taxa = T.

This function, together with collapse_taxa, replaces get_top_taxa. Identical output can be obtained by setting FUN = sum.

The top taxa can be identified based on the absolute abundances or proportions. When using absolute abundances, please make sure to normalize or rarefy the data before using this function. If by_proportion = TRUE, abundances will be converted to relative abundance before applying FUN.

Value

A tibble with the rank, taxon id, grouping factors, abundance summary statistic and taxonomy.

Examples

data(GlobalPatterns)

# Top 10 most abundant ASVs over all samples
top_taxa(GlobalPatterns, n_taxa = 10)

# Top 10 most abundant ASVs over all samples by median abundance
top_taxa(GlobalPatterns, n_taxa = 10, FUN = median, na.rm = T)

# Top 10 most abundant ASVs over all samples using absolute abundances
top_taxa(GlobalPatterns, n_taxa = 10, by_proportion = FALSE)

# Top 2 most abundant ASVs per sample
top_taxa(GlobalPatterns, n_taxa = 2, grouping = "sample_id")

# Top 2 most abundant ASVs per sample type
top_taxa(GlobalPatterns, n_taxa = 2, grouping = "SampleType")

# Top 2 most abundant ASVs per sample type and group
set.seed(1)
sample_data(GlobalPatterns)$group <- as.factor(rbinom(nsamples(GlobalPatterns), 1, .5))
top_taxa(GlobalPatterns, n_taxa = 2, grouping = c("SampleType", "group"))

# Top 20 most abundant genera
top_taxa(GlobalPatterns, n_taxa = 20, tax_level = "Genus")

#' # Top 20 most abundant genera including NAs
top_taxa(GlobalPatterns, n_taxa = 20, tax_level = "Genus", include_na_taxa = T)


gmteunisse/Fantaxtic documentation built on June 7, 2024, 8:47 a.m.