tax_agg: Aggregate taxa and track aggregation in psExtra
In david-barnett/microViz: Microbiome Data Analysis and Visualization

tax_agg

R Documentation

Aggregate taxa and track aggregation in psExtra

Description

tax_agg sums the abundances of the phyloseq taxa at the given rank. It records the tax_agg rank argument in the info of the psExtra object output. This psExtra object tracks aggregation, and any further transformations and scaling, to help you keep track of what you have done with your phyloseq object and automatically caption ordination plots.

Instead of tax_agg, consider using tax_transform() with a rank argument instead, to both aggregate and transform the taxa. This is also useful when you want to aggregate but not transform the taxa, and yet still log the "identity" transformation in psExtra for captioning your ordination plots. e.g. tax_transform(rank = "Genus", trans = "identity")

tax_agg allows you to pass NA or "unique" to the rank argument which will NOT aggregate the taxa. If you use rank = "unique" or add_unique = TRUE, it will add a new rank called unique, identical to the taxa_names (after any aggregation)

Be aware: you should not use the top_N argument yourself without good reason. top_N provides a feature inspired by the deprecated microbiome function aggregate_top_taxa which is primarily useful for decluttering compositional barplots. microViz comp_barplot (and ord_plot_iris) already run tax_agg with a top_N argument for you, so you should not. The tax_table produced when using top_N is otherwise INVALID FOR MOST OTHER ANALYSES.

Usage

tax_agg(
  ps,
  rank = NA,
  sort_by = NA,
  top_N = NA,
  force = FALSE,
  add_unique = FALSE
)

Arguments

`ps`	phyloseq object
`rank`	NA (for tax_names level) or name of valid taxonomic rank (try phyloseq::rank_names(ps)) or "unique"
`sort_by`	if not NA, how should the taxa be sorted, uses tax_sort(), takes same options as `by` arg
`top_N`	NA does nothing, but if top_N is a number, it creates an extra tax_table column called top, which is the same as the unique column for the first top_N number of taxa, and "other" otherwise.
`force`	If TRUE, this forces aggregation at chosen rank to occur regardless of if the output will be sensible! This avoids the "Taxa not unique at rank: ..." error, but may allow very inappropriate aggregation to occur. Do not use force = TRUE unless you know why you are doing this, and what the result will be. If you are getting an error with force = FALSE, it is almost certainly better to examine the tax_table and fix the problem. force = TRUE is similar to microbiome::aggregate_taxa, which also does not check that the taxa are uniquely defined by only the aggregation level.
`add_unique`	if TRUE, adds a rank named unique, identical to the rownames after aggregation

Details

This function is inspired by microbiome::aggregate_taxa. However if microbiome::aggregate_taxa is used, microViz cannot track this aggregation.

Comparing aggregate_taxa and tax_agg:

Except for the ordering of taxa, and the addition of a "unique" rank being optional (in tax_agg), the resulting phyloseq objects are identical for aggregating a phyloseq with no ambiguous taxa. Taxa are ambiguous when the tax_table converges at a lower rank after branching, such as if two different genera share the same species (e.g. "s__"). microbiome::aggregate_taxa handles ambiguous taxa by creating a "unique" rank with all of the taxonomic rank info pasted together into one, often very long, name. tax_agg throws an error, and directs the user to tax_fix() to fix the ambiguous taxa before aggregation, which should then result in (much) shorter unique names at the aggregation rank.

Value

psExtra object including phyloseq and tax_agg rank info

Examples

library(phyloseq)
library(dplyr)
data("dietswap", package = "microbiome")

tax_agg(ps = dietswap, "Phylum") %>%
  ps_get() %>%
  tax_table()
tax_agg(ps = dietswap, "Family") %>%
  ps_get() %>%
  tax_table()

# create some missing values
tax_table(dietswap)[3:7, "Genus"] <- "g__"

# this will produce an error, instructing the user to use tax_fix
# tax_agg(ps = dietswap, "Genus")

# this will then work:
dietswap %>%
  tax_fix() %>%
  tax_agg("Genus")

# you can replace unknown values with `tax_fix()`
# which will fix most problems, like the common "g__" and "s__"
# but default tax_fix settings won't catch this long unknown
tax_table(dietswap)[13:17, "Family"] <- "some_unknown_family"
dietswap %>%
  tax_fix(unknowns = "some_unknown_family") %>%
  tax_agg("Family")

# try tax_fix_interactive() to help you find and fix all the uninformative
# and converging values in your taxonomy table.

# the code below won't aggregate taxa,
# but just adds a new rank called unique, equal to taxa_names
tax_agg(ps = dietswap, rank = NA, add_unique = TRUE)
identical(tax_agg(dietswap, NA, add_unique = TRUE), tax_agg(dietswap, "unique")) # TRUE

david-barnett/microViz documentation built on April 17, 2025, 4:25 a.m.