ggpicrust2: This function integrates pathway name/description...

View source: R/ggpicrust2.R

ggpicrust2R Documentation

This function integrates pathway name/description annotations, ten of the most advanced differential abundance (DA) methods, and visualization of DA results.

Description

This function integrates pathway name/description annotations, ten of the most advanced differential abundance (DA) methods, and visualization of DA results.

Usage

ggpicrust2(
  file = NULL,
  data = NULL,
  metadata,
  group,
  pathway,
  daa_method = "ALDEx2",
  ko_to_kegg = FALSE,
  filter_for_prokaryotes = TRUE,
  p.adjust = "BH",
  order = "group",
  p_values_bar = TRUE,
  x_lab = NULL,
  select = NULL,
  reference = NULL,
  colors = NULL,
  p_values_threshold = 0.05
)

Arguments

file

A character string representing the file path of the input file containing KO abundance data in picrust2 export format. The input file should have KO identifiers in the first column and sample identifiers in the first row. The remaining cells should contain the abundance values for each KO-sample pair.

data

An optional data.frame containing KO abundance data in the same format as the input file. If provided, the function will use this data instead of reading from the file. By default, this parameter is set to NULL.

metadata

A tibble, consisting of sample information

group

A character, name of the group

pathway

A character, consisting of "EC", "KO", "MetaCyc"

daa_method

a character specifying the method for differential abundance analysis, default is "ALDEx2", choices are: - "ALDEx2": ANOVA-Like Differential Expression tool for high throughput sequencing data - "DESeq2": Differential expression analysis based on the negative binomial distribution using DESeq2 - "edgeR": Exact test for differences between two groups of negative-binomially distributed counts using edgeR - "limma voom": Limma-voom framework for the analysis of RNA-seq data - "metagenomeSeq": Fit logistic regression models to test for differential abundance between groups using metagenomeSeq - "LinDA": Linear models for differential abundance analysis of microbiome compositional data - "Maaslin2": Multivariate Association with Linear Models (MaAsLin2) for differential abundance analysis

ko_to_kegg

A character to control the conversion of KO abundance to KEGG abundance

filter_for_prokaryotes

Logical. If TRUE (default), filters out KEGG pathways that are specific to eukaryotes (e.g., human diseases, organismal systems) when ko_to_kegg = TRUE. Set to FALSE to include all KEGG pathways.

p.adjust

a character specifying the method for p-value adjustment, default is "BH", choices are: - "BH": Benjamini-Hochberg correction - "holm": Holm's correction - "bonferroni": Bonferroni correction - "hochberg": Hochberg's correction - "fdr": False discovery rate correction - "none": No p-value adjustment.

order

A character to control the order of the main plot rows

p_values_bar

A character to control if the main plot has the p_values bar

x_lab

A character to control the x-axis label name, you can choose from "feature","pathway_name" and "description"

select

A vector consisting of pathway names to be selected

reference

A character, a reference group level for several DA methods

colors

A vector consisting of colors number

p_values_threshold

A numeric value specifying the threshold for statistical significance of differential abundance. Pathways with adjusted p-values below this threshold will be displayed in the plot. Default is 0.05.

Value

A list containing:

  • Numbered elements (1, 2, ...): Sub-lists for each DA method, each containing:

    • plot: A ggplot2 error bar plot visualizing the differential abundance results

    • results: A data frame of differential abundance results for that method

  • abundance: The processed abundance data (KEGG pathway or original) for downstream analysis

  • metadata: The metadata data frame

  • group: The group variable name used in the analysis

  • daa_results_df: The complete annotated DAA results data frame

  • ko_to_kegg: Logical indicating whether KO to KEGG conversion was performed

These additional fields allow seamless integration with pathway_pca and pathway_heatmap for further visualization without re-preparing data.

Examples

## Not run: 
# Load necessary data: abundance data and metadata
abundance_file <- "path/to/your/abundance_file.tsv"
metadata <- read.csv("path/to/your/metadata.csv")

# Run ggpicrust2 with input file path
results_file_input <- ggpicrust2(file = abundance_file,
                                 metadata = metadata,
                                 group = "your_group_column",
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = "TRUE",
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")

# Run ggpicrust2 with imported data.frame
abundance_data <- read_delim(abundance_file, delim="\t", col_names=TRUE, trim_ws=TRUE)

# Run ggpicrust2 with input data
results_data_input <- ggpicrust2(data = abundance_data,
                                 metadata = metadata,
                                 group = "your_group_column",
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = "TRUE",
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")

# Access the plot and results dataframe for the first DA method
example_plot <- results_file_input[[1]]$plot
example_results <- results_file_input[[1]]$results

# Use the example data in ggpicrust2 package
data(ko_abundance)
data(metadata)
results_file_input <- ggpicrust2(data = ko_abundance,
                                 metadata = metadata,
                                 group = "Environment",
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = TRUE,
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")
# Analyze the EC or MetaCyc pathway
data(metacyc_abundance)
results_file_input <- ggpicrust2(data = metacyc_abundance,
                                 metadata = metadata,
                                 group = "Environment",
                                 pathway = "MetaCyc",
                                 daa_method = "LinDA",
                                 ko_to_kegg = FALSE,
                                 order = "group",
                                 p_values_bar = TRUE,
                                 x_lab = "description")

# Use the returned data for PCA analysis (no need to re-prepare data)
pca_plot <- pathway_pca(
  abundance = results_file_input$abundance,
  metadata = results_file_input$metadata,
  group = results_file_input$group
)

# Use the returned data for heatmap (filter significant pathways first)
sig_features <- results_file_input$daa_results_df %>%
  dplyr::filter(p_adjust < 0.05) %>%
  dplyr::pull(feature)
if (length(sig_features) > 0) {
  heatmap_plot <- pathway_heatmap(
    abundance = results_file_input$abundance[sig_features, , drop = FALSE],
    metadata = results_file_input$metadata,
    group = results_file_input$group
  )
}

## End(Not run)

ggpicrust2 documentation built on April 10, 2026, 5:06 p.m.