Visit https://rmarkdown.rstudio.com/articles_intro.html for rmarkdown syntax

knitr::opts_chunk$set(echo = TRUE)

MetENP

MetENP is a R package that enables detection of significant metabolites from metabolite information (names or names and concentration along with metadata information) and provides

Enrichment score of metabolite class, Maps to pathway of the species of choice, Calculate enrichment score of pathways, Plots the pathways and shows the metabolite increase or decrease Gets gene info, reaction info, enzyme info

USER_HOME=Sys.getenv("HOME");
USER_PWD=Sys.getenv("PWD");
#.libPaths( c( .libPaths(), paste0(USER_HOME, "/.local/R") ))
.libPaths( c( .libPaths(), paste0(USER_PWD, "/../../R") )); # suitable path to where MetENP R package is installed
#.libPaths( c( .libPaths(), paste0("../../R") )); # suitable path to where MetENP R package is installed

library(MetENP)
library(jsonlite)

Data can be fetched via study id from Metabolomics Workbench through getmwstudies. For metabolomics data, 'data' is used, for metadata, 'factors' is used. If users want to upload their own data, please do at the bottom of the page ST000915 is the study of biomarkers of nonalcoholic fatty liver disease (NAFLD) progression.

metdata = getmwstudies('ST000915', 'data')
knitr::kable(head(metdata))

Associate metabolomics data to the refmet class

refmet_class= convert_refmet(metdata)
knitr::kable(head(refmet_class))
metadata = getmwstudies('ST000915', 'factors')
knitr::kable(head(metadata))

Find the factors you would want to compare. Multiple factors/experimental groups (independent variables) are formatted in multiple columns but you can get information on all the experimental groups by "factors" column. For t-test use the independent variables in the same column. For comparing multipe independent variables use anova by anova_ana function.

For ex: in this study, we want to compare Diagnosis experimental methods, so we will compare Cirrhosis and Normal samples

unique(metadata$factors)

Find different type of analysis mode. This is important, because some studies may have different analysis types, and different analysis types detect different metabolites. In this study, there is only one analysis type.

### Find the analysis mode
unique(metdata$analysis_summary)

Find significant metabolites, run significance of all the analysis summary together. The analysis summary/modes you got in the previous section

Significant metabolites can be found by using significant_met function. The parameters are:

metabolomics_data: metabolomics data associated to refmet class

met_col: column with metabolite names

analysis_type: type of analysis ex-GCMS, HILIAC positive ion mode.

metadata: Metadata

factor1: first independent variable

factor2: second independent variable

factor_col: column name of the independent variables

sample_col: the column name having samples

p_adjust: Method for p value adjustment, i.e. "fdr"

normalization: method for normalization a) "half_of_min": where the NAs are replaced by half of min values in the data b) "remove_NAs": where Cols with NAs values are removed and c) "50percent": where cols with more than 50% NAs values are removed, and then the remaining NA values are replaced with half_of_min

half_of_min is ideal when you wish to see which metabolites were present in either group. Very high fold change would mean it was present in either group.

In this example, we show significant metabolite filtered according to pvalue threshold of 0.05 and log2 fold change of 0.5, p adjust method of 'fdr' and we use the normalization method "50percent"

stats_metabolites = significant_met(metabolomics_data=refmet_class, met_col="metabolite_name",analysis_type=c("Core G Fatty acids/Eicosanoids","Core J Sterols","Core K Prenols/Cardiolipins","Core E Neutral Lipids","Core I Sphingolipids","Core H Phospholipids"), metadata=metadata, factor1='Cirrhosis', factor2=c('Normal'), factor_col='Diagnosis',sample_col='local_sample_id', p_adjust='fdr',normalization="50percent")

sig_metabolites = stats_metabolites[which(stats_metabolites[,"pval"] <= 0.05&abs(stats_metabolites[,"log2Fold_change"])>0.5),]
knitr::kable(head(sig_metabolites))

Significant Metabolites can be visualized via Volcano Plot

We will use the same thresholds as used for determining significant metabolites

plot_volcano(stats_metabolites, thres_pval= 0.05,thres_log2foldchange = 0.5, TRUE)

Map metabolite class of the significant metabolites utilzing refmet classification in Metabolomics Workbench

This function maps metabolite to metabolite class

In this example, we will go forward with significant metabolite obtained by t-test

sig_metabolites_kegg_id= map_keggid(sig_metabolites)
knitr::kable(head(sig_metabolites_kegg_id))

Check all your significant metabolites have not been assigned metabolite class

setdiff(sig_metabolites$refmet_name, sig_metabolites_kegg_id$refmet_name)

Count metabolites in each of the metabolite class and plotting

You may choose from sub_class, main_class and super_class

count_changes = metcountplot(df_metclass=sig_metabolites_kegg_id, metclass='sub_class', plotting=TRUE, thres_logfC = 0.0,  updown_fillcolor=c("red", "green"))

To see the Plot

count_changes$plotimg

Enrichment class score

Calculate the enrichment score of each metabolite class. Enrichment score is calculated through hypergeometric method. One can specify the no. of significant metabolites in a class while calculating the enrichment score. We advice to use the number of mtabolites in each class as 3 or more. But if someone just wants to know the enrichment score and rest of the information of all the metabolites, then they can choose the number as 1.

metenrichment = metclassenrichment(df_metclass=sig_metabolites_kegg_id,refmet_class, metclass="sub_class",enrich_stats="HG",no=1, debug = 0)
knitr::kable(head(metenrichment))

Plot the enrichment score via function plot_met_enrichment

plot_met_enrichment(metenrichment, "sub_class","HG", no=3)

Check the pathways with reactions of all the significant metabolites in Species specific manner.

First parameter is the metenrichment dataframe, while second parameter is the species code from KEGG

Here the subject species is Homo sapiens, and the KEGG species annotation for KEGG is 'hsa'

met_path = met_pathways(df_metenrichment = metenrichment, 'hsa')
knitr::kable(head(met_path))

Find metabolites for which no pathways were registered in Kegg and/or no kegg id was found

setdiff(metenrichment$Metabolite,unique(met_path$Metabolite))

Get pathway enrichment sore.

Once we have the pathway information, we can calculate enrichment score of pathways. Again, here I have used hypergeometric (HG) score. For hypergeometric score, we need to know the total number of kegg compounds that are linked to kegg pathways.

N = total no. of compounds/metabolites linked to all kegg pathways or the total number of metabolites detected in a study.

In the former case, this step might take long time, so I advice to run the script comp_linkedto_pathways() just the first time or after 6 months or so if desired to run the pipeline again. Save the result from comp_linkedto_pathways() and load it. Loading from saved file would save time for another analysis with another study. However the package takes care of loading. Save the data in the data folder.

L = No. of significant metabolites detected in a study

M = No. of significant metabolites detected in a metabolite class or pathway

k = Total no. of metabolites detected in a metabolite class/pathway

p-value = phyper(M-1, L, N-L, k, lower.tail=FALSE) # as in the actual R code # originally, it was wriiten as: phyper(M, L, N-L, K)

This function also utilizes korg dataset from pathview package.

In this example, the background set N is the total number of metabolites in a study.

load('../data/ls_path.RData')
load('../data/korg.RData')
kegg_es = path_enrichmentscore(met_path,sig_metabolite_kegg_id=sig_metabolite_kegg_id,ls_path=ls_path,refmet_class=refmet_class,sps='hsa',padj='fdr', kegg_comp_path=FALSE)
knitr::kable(head(kegg_es))

Plot pathway network

The pathway network is such that it shows metabolites that are connected to different pathways and same metabolite in different pathway. Color of nodes of metabolites are according to the fold change of metabolites (low expression in green and high in red) and size of pathway nodes (square nodes) are according to the number of branches (meaning no of metabolites). All metabolite are written in blue

plot_pathway_networks (met_path,kegg_es, TRUE)

Heatmap

plot_heatmap(met_path, shorten_name=TRUE,refmet_name=FALSE, xaxis=8, yaxis=6)

Dotplot

dotplot_met_class_path (met_path, kegg_es,"sub_class",xaxis=8,yaxis=6)

Get the gene and enzyme info

Here we get the information of genes involved in enriched pathways for specified organism

met_gene_info = enzyme_gene_info (metenrichment, "hsa","sub_class")
knitr::kable(head(met_gene_info))

Get the information if metabolite is a reactant or substrate

rclass_info = react_substrate(met_gene_info)
knitr::kable(head(rclass_info))

Get gene info in short form

met_gene_info2=data.table::data.table(rclass_info)[,lapply(.SD, function(x) toString(unique(x))), by = 'Metabolite']
knitr::kable(head(met_gene_info2))

When loading your own dataset

Load your own dataset. If your dataset has sample name in 1st column, enter the third parameter as true. If the dataset has 1st column as metabolite names, enter the third parameter as false. Please check the example files in the data folder of the package.

data_own <- separate_data('./human_cachexia.csv',"data",TRUE)

knitr::kable(head(data_own))
metadata_own <- separate_data('./human_cachexia.csv',"metadata",TRUE)
knitr::kable(head(metadata_own))
refmet_names=convert_refmet(data_own)
knitr::kable(head(refmet_names))

Get significant metabolite via same function significant_met. You may need to change the parameters accordingly

stats_metabolites = significant_met_own(metabolomics_data=refmet_names,met_col="metabolite_name",metadata=metadata_own, factor1='cachexic', factor2='control', factor_col='Muscle loss',sample_col='local_sample_id', p_adjust='fdr',normalization="50percent")
sig_metabolites = stats_metabolites[which(stats_metabolites[,"pval"] <= 0.05&abs(stats_metabolites[,"log2Fold_change"])>0.5),]
plot_volcano(stats_metabolites, thres_pval= 0.05,thres_log2foldchange = 0.5, TRUE)
sig_metabolites_kegg_id= map_keggid(sig_metabolites)
count_changes = metcountplot(df_metclass=sig_metabolites_kegg_id, metclass='sub_class', plotting=TRUE, thres_logfC = 0.5)
count_changes$plotimg
metenrichment = metclassenrichment(df_metclass=sig_metabolites_kegg_id, refmet_names, metclass= 'sub_class',enrich_stats="HG",no=1)
plot_met_enrichment(metenrichment, metclass='sub_class',"HG", no=1)

The rest of the pipeline can be followed similarly.



metabolomicsworkbench/MetENP documentation built on April 12, 2025, 7:55 p.m.