ggpicrust2: This function integrates pathway name/description...
In ggpicrust2: Make 'PICRUSt2' Output Analysis and Visualization Easier

ggpicrust2

R Documentation

This function integrates pathway name/description annotations, ten of the most advanced differential abundance (DA) methods, and visualization of DA results.

Description

This function integrates pathway name/description annotations, ten of the most advanced differential abundance (DA) methods, and visualization of DA results.

Usage

ggpicrust2(
  file = NULL,
  data = NULL,
  metadata,
  group,
  pathway,
  daa_method = "ALDEx2",
  ko_to_kegg = FALSE,
  filter_for_prokaryotes = TRUE,
  p_adjust_method = "BH",
  order = "group",
  p_values_bar = TRUE,
  x_lab = NULL,
  select = NULL,
  reference = NULL,
  colors = NULL,
  p_values_threshold = 0.05,
  p.adjust = NULL
)

Arguments

`file`	A character string representing the file path of the input file containing KO abundance data in picrust2 export format. The input file should have KO identifiers in the first column and sample identifiers in the first row. The remaining cells should contain the abundance values for each KO-sample pair.
`data`	An optional data.frame containing KO abundance data in the same format as the input file. If provided, the function will use this data instead of reading from the file. By default, this parameter is set to NULL.
`metadata`	A tibble, consisting of sample information
`group`	A character, name of the group
`pathway`	A character, consisting of "EC", "KO", "MetaCyc"
`daa_method`	a character specifying the method for differential abundance analysis, default is "ALDEx2", choices are: - "ALDEx2": ANOVA-Like Differential Expression tool for high throughput sequencing data - "DESeq2": Differential expression analysis based on the negative binomial distribution using DESeq2 - "edgeR": Exact test for differences between two groups of negative-binomially distributed counts using edgeR - "limma voom": Limma-voom framework for the analysis of RNA-seq data - "metagenomeSeq": Fit logistic regression models to test for differential abundance between groups using metagenomeSeq - "LinDA": Linear models for differential abundance analysis of microbiome compositional data - "Maaslin2": Multivariate Association with Linear Models (MaAsLin2) for differential abundance analysis
`ko_to_kegg`	Logical or logical-like string controlling conversion of KO abundance to KEGG pathway abundance.
`filter_for_prokaryotes`	Logical. If TRUE (default), filters out KEGG pathways that are specific to eukaryotes (e.g., human diseases, organismal systems) when ko_to_kegg = TRUE. Set to FALSE to include all KEGG pathways.
`p_adjust_method`	A character specifying the method for p-value adjustment, default is "BH".
`order`	A character to control the order of the main plot rows
`p_values_bar`	A character to control if the main plot has the p_values bar
`x_lab`	A character to control the x-axis label name, you can choose from "feature","pathway_name" and "description"
`select`	A vector consisting of pathway names to be selected
`reference`	A character, a reference group level for several DA methods
`colors`	A vector consisting of colors number
`p_values_threshold`	A numeric value specifying the threshold for statistical significance of differential abundance. Pathways with adjusted p-values below this threshold will be displayed in the plot. Default is 0.05.
`p.adjust`	a character specifying the method for p-value adjustment, default is "BH", choices are: - "BH": Benjamini-Hochberg correction - "holm": Holm's correction - "bonferroni": Bonferroni correction - "hochberg": Hochberg's correction - "fdr": False discovery rate correction - "none": No p-value adjustment.

Value

A list containing:

Numbered elements (1, 2, ...): Sub-lists for each DA method, each containing:
- plot: A ggplot2 error bar plot visualizing the differential abundance results
- results: A data frame of differential abundance results for that method
abundance: The processed abundance data (KEGG pathway or original) for downstream analysis
metadata: The metadata data frame
group: The group variable name used in the analysis
daa_results_df: The complete annotated DAA results data frame
ko_to_kegg: Logical indicating whether KO to KEGG conversion was performed

These additional fields allow seamless integration with pathway_pca and pathway_heatmap for further visualization without re-preparing data.

Examples

## Not run: 
# Load necessary data: abundance data and metadata
abundance_file <- "path/to/your/abundance_file.tsv"
metadata <- read.csv("path/to/your/metadata.csv")

# Run ggpicrust2 with input file path
results_file_input <- ggpicrust2(file = abundance_file,
                                 metadata = metadata,
                                 group = "your_group_column",
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = "TRUE",
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")

# Run ggpicrust2 with imported data.frame
abundance_data <- read_delim(abundance_file, delim="\t", col_names=TRUE, trim_ws=TRUE)

# Run ggpicrust2 with input data
results_data_input <- ggpicrust2(data = abundance_data,
                                 metadata = metadata,
                                 group = "your_group_column",
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = "TRUE",
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")

# Access the plot and results dataframe for the first DA method
example_plot <- results_file_input[[1]]$plot
example_results <- results_file_input[[1]]$results

# Use the example data in ggpicrust2 package
data(ko_abundance)
data(metadata)
results_file_input <- ggpicrust2(data = ko_abundance,
                                 metadata = metadata,
                                 group = "Environment",
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = TRUE,
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name")
# Analyze the EC or MetaCyc pathway
data(metacyc_abundance)
results_file_input <- ggpicrust2(data = metacyc_abundance,
                                 metadata = metadata,
                                 group = "Environment",
                                 pathway = "MetaCyc",
                                 daa_method = "LinDA",
                                 ko_to_kegg = FALSE,
                                 order = "group",
                                 p_values_bar = TRUE,
                                 x_lab = "description")

# Use the returned data for PCA analysis (no need to re-prepare data)
pca_plot <- pathway_pca(
  abundance = results_file_input$abundance,
  metadata = results_file_input$metadata,
  group = results_file_input$group
)

# Use the returned data for heatmap (filter significant pathways first)
sig_features <- results_file_input$daa_results_df %>%
  dplyr::filter(p_adjust < 0.05) %>%
  dplyr::pull(feature)
if (length(sig_features) > 0) {
  heatmap_plot <- pathway_heatmap(
    abundance = results_file_input$abundance[sig_features, , drop = FALSE],
    metadata = results_file_input$metadata,
    group = results_file_input$group
  )
}

## End(Not run)

ggpicrust2 documentation built on May 20, 2026, 5:07 p.m.