nested_top_taxa: Get the most abundant taxa over two taxonomic levels
In gmteunisse/Fantaxtic: Fantaxtic - nested bar plots for phyloseq data

nested_top_taxa

R Documentation

Get the most abundant taxa over two taxonomic levels

Description

This function identifies the top n named taxa and the top m named taxa at a nested level in a phyloseq object. Users specify the summary statistic that is used to rank the taxa, e.g. sum, mean or median. Furthermore, it is possible to add one or more grouping factors from the tax_table to get group-specific top n,m taxa.

Usage

nested_top_taxa(
  ps_obj,
  top_tax_level,
  nested_tax_level,
  n_top_taxa = 1,
  n_nested_taxa = 1,
  top_merged_label = "Other",
  nested_merged_label = "Other <tax>",
  by_proportion = T,
  ...
)

Arguments

`ps_obj`	A phyloseq object with an `otu_table` and a `tax_table`.
`top_tax_level`	The name of the top taxonomic rank in the phyloseq object
`nested_tax_level`	The name of the nested taxonomic rank in the phyloseq object
`n_top_taxa`	The number of top taxa to identify at the top level.
`n_nested_taxa`	The number of top taxa to identify at the nested level. For ASVs, specify "ASV"
`top_merged_label`	Label to assign to the merged top_tax_level taxa
`nested_merged_label`	Label to assign to the merged nested_tax_level taxa
`by_proportion`	Converts absolute abundances to proportions before calculating the summary statistic (default = `TRUE`).
`...`	Additional arguments to be passed `top_taxa` (e.g. `grouping = <string>, FUN = mean, na.rm = TRUE`,).

Details

This function first finds the top n named taxa at the top level, after which it merges all other top_taxa level into a single taxon with the merged_label annotation. Next, it loops through each remaining top level taxon, and identifies the top m named taxa at the nested level. If \le m taxa are available, it will only return those taxa. If more are available, it will merge all non-top-taxa into a single taxon with the merged_label annotation, together with its top level annotation. If no named taxa are available at the nested_level, all taxa will be merged into a single taxon with merged_label annotation, together with its top_level annotation. Thus, the merged_label taxon overall and in each group represents the combination of taxa without an annotation and taxa with an annotation that were not in the top (n,m) abundant taxa.

If nested_tax_level = "ASV", row.names(tax_table(ps_obj)) will be added as an ASV column to the tax_table, unless this column already exists.

The top taxa can be identified based on the absolute abundances or proportions. When using absolute abundances, please make sure to normalize or rarefy the data before using this function. If by_proportion = TRUE, abundances will be converted to relative abundance before applying FUN.

Value

A list in which top_taxa is a tibble with the rank, taxon id, grouping factors, abundance summary statistic and taxonomy of the top taxa and ps_obj is the phyloseq object after collapsing all non-top taxa.

Examples

data(GlobalPatterns)

# Top 3 most abundant orders, top 3 most abundant families over all samples,
# using the mean as the aggregation function
nested_top_taxa(GlobalPatterns, top_tax_level = "Order", nested_tax_level
= "Family", n_top_taxa = 3, n_nested_taxa = 3,
FUN = mean, na.rm = T)

#' # Top 1 most abundant genera, top 2 most abundant species per SampleType,
# using the median as the aggregation function
nested_top_taxa(GlobalPatterns, top_tax_level = "Genus", nested_tax_level
= "Species", n_top_taxa = 1, n_nested_taxa = 2, grouping = "SampleType",
FUN = median)

gmteunisse/Fantaxtic documentation built on July 13, 2024, 7:12 a.m.