pathway_daa: Differential Abundance Analysis for Predicted Functional...

View source: R/pathway_daa.R

pathway_daaR Documentation

Differential Abundance Analysis for Predicted Functional Pathways

Description

Performs differential abundance analysis on predicted functional pathway data using various statistical methods. This function supports multiple methods for analyzing differences in pathway abundance between groups, including popular approaches like ALDEx2, DESeq2, edgeR, and others.

Usage

pathway_daa(
  abundance,
  metadata,
  group,
  daa_method = "ALDEx2",
  select = NULL,
  p_adjust_method = "BH",
  reference = NULL,
  include_abundance_stats = FALSE,
  include_effect_size = TRUE,
  p.adjust = NULL,
  .pre_aligned = FALSE,
  .sample_col = NULL,
  ...
)

Arguments

abundance

A data frame or matrix containing predicted functional pathway abundance, with pathways/features as rows and samples as columns. The column names should match the sample names in metadata. Values should be counts or abundance measurements.

metadata

A data frame or tibble containing sample information. Must include a 'sample' column with sample identifiers matching the column names in abundance data.

group

Character string specifying the column name in metadata that contains group information for differential abundance analysis.

daa_method

Character string specifying the method for differential abundance analysis. Available choices are:

  • "ALDEx2": ANOVA-Like Differential Expression tool

  • "DESeq2": Differential expression analysis based on negative binomial distribution

  • "edgeR": Exact test for differences between groups using negative binomial model

  • "limma voom": Limma-voom framework for RNA-seq analysis

  • "metagenomeSeq": Zero-inflated Gaussian mixture model

  • "LinDA": Linear models for differential abundance analysis

  • "Maaslin2": Multivariate Association with Linear Models

  • "Lefser": Linear discriminant analysis effect size

Default is "ALDEx2".

select

Vector of sample names to include in the analysis. If NULL (default), all samples are included.

p_adjust_method

Character string specifying the method for p-value adjustment. Choices are:

  • "BH": Benjamini-Hochberg procedure (default)

  • "holm": Holm's step-down method

  • "bonferroni": Bonferroni correction

  • "hochberg": Hochberg's step-up method

  • "fdr": False Discovery Rate

  • "none": No adjustment

reference

Character string specifying the reference level for the group comparison. If NULL (default), the first level is used as reference.

include_abundance_stats

Logical value indicating whether to include abundance statistics (mean relative abundance and standard deviation per group) in the output. Default is FALSE. When the selected daa_method already provides a log2_fold_change column (ALDEx2 with effect size, DESeq2, edgeR, limma voom, LinDA, Maaslin2, metagenomeSeq), the method-native log2 fold change is preserved and the relative-abundance ratio is not recomputed.

include_effect_size

Logical value indicating whether to compute ALDEx2 effect size information via ALDEx2::aldex.effect(). When TRUE, adds effect_size, diff_btw, log2_fold_change, rab_all, and overlap columns, aligning ALDEx2 output with the other DAA methods that return log2 fold changes by default. Only applicable for two-group comparisons with the ALDEx2 method; ignored otherwise. Default is TRUE; set to FALSE to skip the extra aldex.effect() computation.

p.adjust

Deprecated. Use p_adjust_method instead.

.pre_aligned

Internal logical. Set to TRUE only when the caller has already aligned abundance columns and metadata rows in identical sample order.

.sample_col

Internal character. Sample identifier column used when .pre_aligned = TRUE.

...

Additional arguments passed to the specific DAA method

Value

A data frame containing the differential abundance analysis results. The structure of the results depends on the chosen DAA method. For methods that support multi-group comparisons (like LinDA), when there are more than two groups, the results will contain separate rows for each feature in each pairwise comparison between the reference group and each non-reference group. The data frame includes the following columns:

  • feature: Feature/pathway identifier

  • method: The DAA method used

  • group1: Reference group

  • group2: Comparison group

  • p_values: P-values for the comparison

  • p_adjust: Adjusted p-values

  • adj_method: Method used for p-value adjustment

Methods that fit a model on the abundance data (DESeq2, edgeR, limma voom, LinDA, Maaslin2, metagenomeSeq) return a log2_fold_change column computed in the method's own model space. ALDEx2 returns log2_fold_change (plus effect_size, diff_btw, rab_all, overlap) when include_effect_size = TRUE (the default), derived from ALDEx2::aldex.effect() in CLR space. Lefser returns an lda_score column instead, which is its native effect-size metric.

When include_abundance_stats = TRUE, the following additional columns are included:

  • mean_rel_abundance_group1: Mean relative abundance for group1

  • sd_rel_abundance_group1: Standard deviation of relative abundance for group1

  • mean_rel_abundance_group2: Mean relative abundance for group2

  • sd_rel_abundance_group2: Standard deviation of relative abundance for group2

A log2_fold_change column from relative abundance is only added when the DAA method does not already provide one, to avoid conflating model-based and ratio-based effect sizes.

References

  • ALDEx2: Fernandes et al. (2014) Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome.

  • DESeq2: Love et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology.

  • edgeR: Robinson et al. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics.

  • limma-voom: Law et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology.

  • metagenomeSeq: Paulson et al. (2013) Differential abundance analysis for microbial marker-gene surveys. Nature Methods.

  • Maaslin2: Mallick et al. (2021) Multivariable Association Discovery in Population-scale Meta-omics Studies.

Examples


# Load example data
data(ko_abundance)
data(metadata)

# Prepare abundance data
abundance_data <- as.data.frame(ko_abundance)
rownames(abundance_data) <- abundance_data[, "#NAME"]
abundance_data <- abundance_data[, -1]

# Run differential abundance analysis using ALDEx2
results <- pathway_daa(
  abundance = abundance_data,
  metadata = metadata,
  group = "Environment"
)

# Using a different method (DESeq2)
deseq_results <- pathway_daa(
  abundance = abundance_data,
  metadata = metadata,
  group = "Environment",
  daa_method = "DESeq2"
)

# Create example data with more samples
abundance <- data.frame(
  sample1 = c(10, 20, 30),
  sample2 = c(20, 30, 40),
  sample3 = c(30, 40, 50),
  sample4 = c(40, 50, 60),
  sample5 = c(50, 60, 70),
  row.names = c("pathway1", "pathway2", "pathway3")
)

metadata <- data.frame(
  sample = c("sample1", "sample2", "sample3", "sample4", "sample5"),
  group = c("control", "control", "treatment", "treatment", "treatment")
)

# Run differential abundance analysis using ALDEx2
results <- pathway_daa(abundance, metadata, "group")

# Using a different method (limma voom instead of DESeq2 for this small example)
limma_results <- pathway_daa(abundance, metadata, "group",
                            daa_method = "limma voom")

# Analyze specific samples only
subset_results <- pathway_daa(abundance, metadata, "group",
                             select = c("sample1", "sample2", "sample3", "sample4"))

# ALDEx2 returns effect size columns by default
# (effect_size, diff_btw, log2_fold_change, rab_all, overlap).
# Ranking by |log2_fold_change| is generally more biologically informative
# than ranking by p-value, especially for large datasets where small effects
# can reach statistical significance without being biologically meaningful.
aldex2_res <- pathway_daa(abundance, metadata, "group", daa_method = "ALDEx2")
head(aldex2_res)

# Opt out of the extra aldex.effect() computation if only p-values are needed
aldex2_pvals_only <- pathway_daa(abundance, metadata, "group",
                                daa_method = "ALDEx2",
                                include_effect_size = FALSE)



ggpicrust2 documentation built on April 30, 2026, 1:06 a.m.