ko2kegg_abundance: Convert KO abundance in picrust2 export files to KEGG pathway...
In ggpicrust2: Make 'PICRUSt2' Output Analysis and Visualization Easier

ko2kegg_abundance

R Documentation

Convert KO abundance in picrust2 export files to KEGG pathway abundance

Description

This function takes a file containing KO (KEGG Orthology) abundance data in picrust2 export format and converts it to KEGG pathway abundance data. The input file should be in .tsv, .txt, or .csv format.

Usage

ko2kegg_abundance(
  file = NULL,
  data = NULL,
  method = c("abundance", "sum"),
  filter_for_prokaryotes = TRUE,
  progress = interactive()
)

Arguments

`file`	A character string representing the file path of the input file containing KO abundance data in picrust2 export format. The input file should have KO identifiers in the first column and sample identifiers in the first row. The remaining cells should contain the abundance values for each KO-sample pair.
`data`	An optional data.frame containing KO abundance data in the same format as the input file. If provided, the function will use this data instead of reading from the file. By default, this parameter is set to NULL.
`method`	Method for calculating pathway abundance. One of: `"abundance"`: (Default) PICRUSt2-style calculation using the mean of upper-half sorted KO abundances. This method is more robust and avoids inflating abundances for pathways with more KOs. `"sum"`: Simple summation of all KO abundances. This is the legacy method and may double-count KOs belonging to multiple pathways.
`filter_for_prokaryotes`	Logical. If TRUE (default), filters out KEGG pathways that are not relevant to prokaryotic (bacterial/archaeal) analysis. The function always removes non-pathway KEGG buckets before this filter is applied. The prokaryote filter removes pathways in categories such as: Human diseases (cancer, neurodegenerative diseases, addiction, etc.) Organismal systems (immune system, nervous system, endocrine system, etc.) Bacterial infection pathways and antimicrobial resistance pathways are retained. Set to FALSE to include all KEGG pathways (for eukaryotic analysis or custom filtering).
`progress`	Logical. Whether to show a progress bar while aggregating pathways. Defaults to `interactive()` so non-interactive scripts and tests stay quiet.

Details

The default "abundance" method follows PICRUSt2's approach for calculating pathway abundance:

For each pathway, collect abundances of all associated KOs present in the data
Sort the abundances in ascending order
Take the upper half of the sorted values
Calculate the mean as the pathway abundance

This approach has several advantages over simple summation:

Does not inflate abundances for pathways containing more KOs
More robust to missing or low-abundance KOs
Provides a more accurate representation of pathway activity

The "sum" method is provided for backward compatibility and simply sums all KO abundances for each pathway.

Value

A data frame with KEGG pathway abundance values. Rows represent KEGG pathways, identified by their KEGG pathway IDs. Columns represent samples, identified by their sample IDs from the input file.

Pathway Filtering

Before abundance calculation, KEGG BRITE hierarchies and "Not Included in Pathway or Brite" pseudo-pathways are removed because they are not KEGG pathway maps and cannot be consistently annotated as pathways (for example, ko99980).

When filter_for_prokaryotes = TRUE, the function excludes KEGG pathways that are biologically irrelevant to prokaryotic organisms. KEGG reference pathways include pathways from all domains of life, and many human/animal-specific pathways would appear in bacterial analysis simply because some KOs are shared across organisms.

The following KEGG Level 2 categories are excluded:

Cancer pathways (overview and specific types)
Neurodegenerative diseases (Alzheimer's, Parkinson's, etc.)
Substance dependence (addiction pathways)
Cardiovascular diseases
Endocrine and metabolic diseases
Immune diseases
Organismal systems (immune, nervous, endocrine, digestive, etc.)

The following are RETAINED even with filtering:

Infectious disease: bacterial (Salmonella, E. coli, Tuberculosis, etc.)
Drug resistance: antimicrobial (antibiotic resistance)
All Metabolism pathways
Genetic/Environmental Information Processing
Cellular Processes

Examples

## Not run: 
library(ggpicrust2)
library(readr)

# Example 1: Default - filtered for prokaryotic analysis
data(ko_abundance)
kegg_abundance <- ko2kegg_abundance(data = ko_abundance)

# Example 2: Include all pathways (for eukaryotic analysis)
kegg_abundance_all <- ko2kegg_abundance(data = ko_abundance, filter_for_prokaryotes = FALSE)

# Example 3: Using legacy sum method with filtering
kegg_abundance_sum <- ko2kegg_abundance(data = ko_abundance, method = "sum")

# Example 4: From file
input_file <- "path/to/your/picrust2/results/pred_metagenome_unstrat.tsv"
kegg_abundance <- ko2kegg_abundance(file = input_file)

## End(Not run)

ggpicrust2 documentation built on May 20, 2026, 5:07 p.m.