Write, Analyze, and Visualize 'BIOM' Data

taxa_boxplot

R Documentation

Visualize BIOM data with boxplots.

Description

Visualize BIOM data with boxplots.

Usage

taxa_boxplot(
  biom,
  x = NULL,
  rank = -1,
  layers = "x",
  taxa = 6,
  unc = "singly",
  other = FALSE,
  p.top = Inf,
  stat.by = x,
  facet.by = NULL,
  colors = TRUE,
  shapes = TRUE,
  patterns = FALSE,
  flip = FALSE,
  stripe = NULL,
  ci = "ci",
  level = 0.95,
  p.adj = "fdr",
  outliers = NULL,
  xlab.angle = "auto",
  p.label = 0.05,
  transform = "none",
  y.transform = "sqrt",
  caption = TRUE,
  ...
)

Arguments

`biom`	An rbiom object, such as from `as_rbiom()`. Any value accepted by `as_rbiom()` can also be given here.
`x`	A categorical metadata column name to use for the x-axis. Or `NULL`, which puts taxa along the x-axis. Default: `NULL`
`rank`	What rank(s) of taxa to display. E.g. `"Phylum"`, `"Genus"`, `".otu"`, etc. An integer vector can also be given, where `1` is the highest rank, `2` is the second highest, `-1` is the lowest rank, `-2` is the second lowest, and `0` is the OTU "rank". Run `biom$ranks` to see all options for a given rbiom object. Default: `-1`.
`layers`	One or more of `c("bar", "box" ("x"), "violin", "dot", "strip", "crossbar", "errorbar", "linerange", "pointrange")`. Single letter abbreviations are also accepted. For instance, `c("box", "dot")` is equivalent to `c("x", "d")` and `"xd"`. Default: `"x"`
`taxa`	Which taxa to display. An integer value will show the top n most abundant taxa. A value 0 <= n < 1 will show any taxa with that mean abundance or greater (e.g. `0.1` implies >= 10%). A character vector of taxa names will show only those named taxa. Default: `6`.
`unc`	How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are: `"singly"` - Replaces them with the OTU name. `"grouped"` - Replaces them with a higher rank's name. `"drop"` - Excludes them from the result. `"asis"` - To not check/modify any taxa names. Abbreviations are allowed. Default: `"singly"`
`other`	Sum all non-itemized taxa into an "Other" taxa. When `FALSE`, only returns taxa matched by the `taxa` argument. Specifying `TRUE` adds "Other" to the returned set. A string can also be given to imply `TRUE`, but with that value as the name to use instead of "Other". Default: `FALSE`
`p.top`	Only display taxa with the most significant differences in abundance. If `p.top` is >= 1, then the `p.top` most significant taxa are displayed. If `p.top` is less than one, all taxa with an adjusted p-value <= `p.top` are displayed. Recommended to be used in combination with the `taxa` parameter to set a lower bound on the mean abundance of considered taxa. Default: `Inf`
`stat.by`	Dataset field with the statistical groups. Must be categorical. Default: `NULL`
`facet.by`	Dataset field(s) to use for faceting. Must be categorical. Default: `NULL`
`colors`	How to color the groups. Options are: `TRUE` - Automatically select colorblind-friendly colors. `FALSE` or `NULL` - Don't use colors. a palette name - Auto-select colors from this set. E.g. `"okabe"` character vector - Custom colors to use. E.g. `c("red", "#00FF00")` named character vector - Explicit mapping. E.g. `c(Male = "blue", Female = "red")` See "Aesthetics" section below for additional information. Default: `TRUE`
`shapes`	Shapes for each group. Options are similar to `colors`'s: `TRUE`, `FALSE`, `NULL`, shape names (typically integers 0 - 17), or a named vector mapping groups to specific shape names. See "Aesthetics" section below for additional information. Default: `TRUE`
`patterns`	Patterns for each group. Options are similar to `colors`'s: `TRUE`, `FALSE`, `NULL`, pattern names (`"brick"`, `"chevron"`, `"fish"`, `"grid"`, etc), or a named vector mapping groups to specific pattern names. See "Aesthetics" section below for additional information. Default: `FALSE`
`flip`	Transpose the axes, so that taxa are present as rows instead of columns. Default: `FALSE`
`stripe`	Shade every other x position. Default: same as flip
`ci`	How to calculate min/max of the crossbar, errorbar, linerange, and pointrange layers. Options are: `"ci"` (confidence interval), `"range"`, `"sd"` (standard deviation), `"se"` (standard error), and `"mad"` (median absolute deviation). The center mark of crossbar and pointrange represents the mean, except for `"mad"` in which case it represents the median. Default: `"ci"`
`level`	The confidence level for calculating a confidence interval. Default: `0.95`
`p.adj`	Method to use for multiple comparisons adjustment of p-values. Run `p.adjust.methods` for a list of available options. Default: `"fdr"`
`outliers`	Show boxplot outliers? `TRUE` to always show. `FALSE` to always hide. `NULL` to only hide them when overlaying a dot or strip chart. Default: `NULL`
`xlab.angle`	Angle of the labels at the bottom of the plot. Options are `"auto"`, `'0'`, `'30'`, and `'90'`. Default: `"auto"`.
`p.label`	Minimum adjusted p-value to display on the plot with a bracket. `p.label = 0.05` - Show p-values that are <= 0.05. `p.label = 0` - Don't show any p-values on the plot. `p.label = 1` - Show all p-values on the plot. If a numeric vector with more than one value is provided, they will be used as breaks for asterisk notation. Default: `0.05`
`transform`	Transformation to apply. Options are: `c("none", "rank", "log", "log1p", "sqrt", "percent")`. `"rank"` is useful for correcting for non-normally distributions before applying regression statistics. Default: `"none"`
`y.transform`	The transformation to apply to the y-axis. Visualizing differences of both high- and low-abundance taxa is best done with a non-linear axis. Options are: `"sqrt"` - square-root transformation `"log1p"` - log(y + 1) transformation `"none"` - no transformation These methods allow visualization of both high- and low-abundance taxa simultaneously, without complaint about 'zero' count observations. Default: `"sqrt"` Use `xaxis.transform` or `yaxis.transform` to pass custom values directly to ggplot2's `⁠scale_*⁠` functions.
`caption`	Add methodology caption beneath the plot. Default: `TRUE`
`...`	Additional parameters to pass along to ggplot2 functions. Prefix a parameter name with a layer name to pass it to only that layer. For instance, `d.size = 2` ensures only the points on the dot layer have their size set to `2`.

Value

A ggplot2 plot. The computed data points, ggplot2 command, stats table, and stats table commands are available as ⁠$data⁠, ⁠$code⁠, ⁠$stats⁠, and ⁠$stats$code⁠, respectively.

Aesthetics

All built-in color palettes are colorblind-friendly. The available categorical palette names are: "okabe", "carto", "r4", "polychrome", "tol", "bright", "light", "muted", "vibrant", "tableau", "classic", "alphabet", "tableau20", "kelly", and "fishy".

Patterns are added using the fillpattern R package. Options are "brick", "chevron", "fish", "grid", "herringbone", "hexagon", "octagon", "rain", "saw", "shingle", "rshingle", "stripe", and "wave", optionally abbreviated and/or suffixed with modifiers. For example, "hex10_sm" for the hexagon pattern rotated 10 degrees and shrunk by 2x. See fillpattern::fill_pattern() for complete documentation of options.

Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.

Examples

    library(rbiom)
    
    biom <- rarefy(hmp50)
    
    taxa_boxplot(biom, stat.by = "Body Site", stripe = TRUE)
    taxa_boxplot(biom, layers = "bed", rank = c("Phylum", "Genus"), flip = TRUE)
    taxa_boxplot(
      biom    = subset(biom, `Body Site` %in% c('Saliva', 'Stool')), 
      taxa    = 3, 
      layers  = "ps", 
      stat.by = "Body Site",
      colors  = c('Saliva' = "blue", 'Stool' = "red") )

cmmr/rbiom documentation built on June 10, 2025, 9:24 p.m.