phyloseq_average: Average relative OTU abundances.

View source: R/phyloseq_average.R

phyloseq_averageR Documentation

Average relative OTU abundances.

Description

This function implements OTU abundance averaging following CoDa (Compositional Data Analysis) workflow.

Usage

phyloseq_average(
  physeq,
  avg_type = "aldex",
  acomp_zero_impute = NULL,
  aldex_samples = 128,
  aldex_denom = "all",
  group = NULL,
  drop_group_zero = FALSE,
  verbose = TRUE,
  progress = NULL,
  ...
)

Arguments

physeq

A phyloseq-class object

avg_type

Averaging type ("aldex" for ALDEx2-based averaging , "acomp" for Aitchison CoDa approach; "arithmetic" for simple arithmetic mean)

acomp_zero_impute

Character ("CZM", "GBM","SQ","BL") or NULL; indicating weather to perform replacement of 0 abundance values with an estimate of the probability that the zero is not 0 (implemented only for avg_type = "acomp"; see cmultRepl)

aldex_samples

The number of Monte-Carlo Dirichlet instances to generate (see aldex.clr)

aldex_denom

Character ("all", "iqlr", "lvha"), indicating which features to use as the denominator for the geometric mean calculation (see aldex.clr)

group

Variable name in sample_data) which defines sample groups for averaging (default is NULL)

drop_group_zero

Logical; indicating weather OTUs with zero abundance withing a group of samples should be removed

verbose

Logical; if TRUE (default), informational messages will be shown on screen

progress

Name of the progress bar to use ("none" or "text"; see create_progress_bar)

...

Additional arguments may be passed to cmultRepl

Details

Typical OTU abundance tables in metagenomic analysis usually has different sampling effort for different samples (which is an artifact of the sequencing procedure). The total number of reads is meaningless and distance between OTU compositions is on the relative scale (e.g., OTUs with 1 and 2 reads in one sample are so far as OTUs with 10 and 20 reads in the other samples). Therefore such OTU tables represents closed compositions and requires a special treatment within Aitchison geometry framework.

With ALDEx2-based approach (avg_type = "aldex") it is possible to take into account per-OTU technical variation within each sample using Monte-Carlo instances drawn from the Dirichlet distribution (see Fernandes et al., 2013). As the result the expected average of the OTU portions will be estimated.

Zero OTU abundance could be due to the insufficient number of reads. However, it is possible to replace the zero counts with an expected value. Bayesian-multiplicative (BM) replacement of count zeros is implemented in cmultRepl function of zCompositions package. Sevral methods are supported: geometric Bayesian multiplicative (zero_impute = "GBM"), count zero multiplicative (zero_impute = "CZM", default), Bayes-Laplace BM (zero_impute = "BL"), or square root BM (zero_impute = "SQ"). In case of structural zeroes in OTU abundance table (e.g., absence of OTU within a group assumes that it is not observed due to some biological pattern and is not caused by a detection limit) "drop_group_zero" argument may be set to "TRUE" to avoid zero replacement.

Value

phyloseq object with OTU relative abundance averaged over samples (all together or within a group).

References

Gloor GB, Macklaim JM, Pawlowsky-Glahn V and Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Front. Microbiol. 8:2224. doi: 10.3389/fmicb.2017.02224 Martin-Fernandez JA, Barcelo-Vidal C, Pawlowsky-Glahn V (2003) Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation. Mathematical Geology 35:3. doi: 10.1023/A:1023866030544 Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB (2013) ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq. PLOS ONE 8(7): e67019. doi: 10.1371/journal.pone.0067019

See Also

aldex.clr, acomp, cmultRepl


vmikk/metagMisc documentation built on June 20, 2024, 7:20 a.m.