| ko2kegg_abundance | R Documentation |
This function takes a file containing KO (KEGG Orthology) abundance data in picrust2 export format and converts it to KEGG pathway abundance data. The input file should be in .tsv, .txt, or .csv format.
ko2kegg_abundance(
file = NULL,
data = NULL,
method = c("abundance", "sum"),
filter_for_prokaryotes = TRUE
)
file |
A character string representing the file path of the input file containing KO abundance data in picrust2 export format. The input file should have KO identifiers in the first column and sample identifiers in the first row. The remaining cells should contain the abundance values for each KO-sample pair. |
data |
An optional data.frame containing KO abundance data in the same format as the input file. If provided, the function will use this data instead of reading from the file. By default, this parameter is set to NULL. |
method |
Method for calculating pathway abundance. One of:
|
filter_for_prokaryotes |
Logical. If TRUE (default), filters out KEGG pathways that are not relevant to prokaryotic (bacterial/archaeal) analysis. This removes pathways in categories such as:
Bacterial infection pathways and antimicrobial resistance pathways are retained. Set to FALSE to include all KEGG pathways (for eukaryotic analysis or custom filtering). |
The default "abundance" method follows PICRUSt2's approach for calculating pathway abundance:
For each pathway, collect abundances of all associated KOs present in the data
Sort the abundances in ascending order
Take the upper half of the sorted values
Calculate the mean as the pathway abundance
This approach has several advantages over simple summation:
Does not inflate abundances for pathways containing more KOs
More robust to missing or low-abundance KOs
Provides a more accurate representation of pathway activity
The "sum" method is provided for backward compatibility and simply sums all KO abundances for each pathway.
A data frame with KEGG pathway abundance values. Rows represent KEGG pathways, identified by their KEGG pathway IDs. Columns represent samples, identified by their sample IDs from the input file.
When filter_for_prokaryotes = TRUE, the function excludes KEGG pathways that are
biologically irrelevant to prokaryotic organisms. KEGG reference pathways include pathways
from all domains of life, and many human/animal-specific pathways would appear in bacterial
analysis simply because some KOs are shared across organisms.
The following KEGG Level 2 categories are excluded:
Cancer pathways (overview and specific types)
Neurodegenerative diseases (Alzheimer's, Parkinson's, etc.)
Substance dependence (addiction pathways)
Cardiovascular diseases
Endocrine and metabolic diseases
Immune diseases
Organismal systems (immune, nervous, endocrine, digestive, etc.)
The following are RETAINED even with filtering:
Infectious disease: bacterial (Salmonella, E. coli, Tuberculosis, etc.)
Drug resistance: antimicrobial (antibiotic resistance)
All Metabolism pathways
Genetic/Environmental Information Processing
Cellular Processes
## Not run:
library(ggpicrust2)
library(readr)
# Example 1: Default - filtered for prokaryotic analysis
data(ko_abundance)
kegg_abundance <- ko2kegg_abundance(data = ko_abundance)
# Example 2: Include all pathways (for eukaryotic analysis)
kegg_abundance_all <- ko2kegg_abundance(data = ko_abundance, filter_for_prokaryotes = FALSE)
# Example 3: Using legacy sum method with filtering
kegg_abundance_sum <- ko2kegg_abundance(data = ko_abundance, method = "sum")
# Example 4: From file
input_file <- "path/to/your/picrust2/results/pred_metagenome_unstrat.tsv"
kegg_abundance <- ko2kegg_abundance(file = input_file)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.