nested_top_taxa: Get the most abundant taxa over two taxonomic levels

View source: R/nested_top_taxa.R

nested_top_taxaR Documentation

Get the most abundant taxa over two taxonomic levels

Description

This function identifies the top n named taxa and the top m named taxa at a nested level in a phyloseq object. Users specify the summary statistic that is used to rank the taxa, e.g. sum, mean or median. Furthermore, it is possible to add one or more grouping factors from the tax_table to get group-specific top n,m taxa.

Usage

nested_top_taxa(
  ps_obj,
  top_tax_level,
  nested_tax_level,
  n_top_taxa = 1,
  n_nested_taxa = 1,
  top_merged_label = "Other",
  nested_merged_label = "Other <tax>",
  by_proportion = T,
  ...
)

Arguments

ps_obj

A phyloseq object with an otu_table and a tax_table.

top_tax_level

The name of the top taxonomic rank in the phyloseq object

nested_tax_level

The name of the nested taxonomic rank in the phyloseq object

n_top_taxa

The number of top taxa to identify at the top level.

n_nested_taxa

The number of top taxa to identify at the nested level. For ASVs, specify "ASV"

top_merged_label

Label to assign to the merged top_tax_level taxa

nested_merged_label

Label to assign to the merged nested_tax_level taxa

by_proportion

Converts absolute abundances to proportions before calculating the summary statistic (default = TRUE).

...

Additional arguments to be passed top_taxa (e.g. grouping = <string>, FUN = mean, na.rm = TRUE,).

Details

This function first finds the top n named taxa at the top level, after which it merges all other top_taxa level into a single taxon with the merged_label annotation. Next, it loops through each remaining top level taxon, and identifies the top m named taxa at the nested level. If \le m taxa are available, it will only return those taxa. If more are available, it will merge all non-top-taxa into a single taxon with the merged_label annotation, together with its top level annotation. If no named taxa are available at the nested_level, all taxa will be merged into a single taxon with merged_label annotation, together with its top_level annotation. Thus, the merged_label taxon overall and in each group represents the combination of taxa without an annotation and taxa with an annotation that were not in the top (n,m) abundant taxa.

If nested_tax_level = "ASV", row.names(tax_table(ps_obj)) will be added as an ASV column to the tax_table, unless this column already exists.

The top taxa can be identified based on the absolute abundances or proportions. When using absolute abundances, please make sure to normalize or rarefy the data before using this function. If by_proportion = TRUE, abundances will be converted to relative abundance before applying FUN.

Value

A list in which top_taxa is a tibble with the rank, taxon id, grouping factors, abundance summary statistic and taxonomy of the top taxa and ps_obj is the phyloseq object after collapsing all non-top taxa.

Examples

data(GlobalPatterns)

# Top 3 most abundant orders, top 3 most abundant families over all samples,
# using the mean as the aggregation function
nested_top_taxa(GlobalPatterns, top_tax_level = "Order", nested_tax_level
= "Family", n_top_taxa = 3, n_nested_taxa = 3,
FUN = mean, na.rm = T)

#' # Top 1 most abundant genera, top 2 most abundant species per SampleType,
# using the median as the aggregation function
nested_top_taxa(GlobalPatterns, top_tax_level = "Genus", nested_tax_level
= "Species", n_top_taxa = 1, n_nested_taxa = 2, grouping = "SampleType",
FUN = median)

gmteunisse/Fantaxtic documentation built on June 7, 2024, 8:47 a.m.