knitr::opts_chunk$set(echo = TRUE)
MetENP is a R package that enables detection of significant metabolites from metabolite information (names or names and concentration along with metadata information) and provides
Enrichment score of metabolite class, Maps to pathway of the species of choice, Calculate enrichment score of pathways, Plots the pathways and shows the metabolite increase or decrease Gets gene info, reaction info, enzyme info
USER_HOME=Sys.getenv("HOME"); USER_PWD=Sys.getenv("PWD"); #.libPaths( c( .libPaths(), paste0(USER_HOME, "/.local/R") )) .libPaths( c( .libPaths(), paste0(USER_PWD, "/../../R") )); # suitable path to where MetENP R package is installed #.libPaths( c( .libPaths(), paste0("../../R") )); # suitable path to where MetENP R package is installed library(MetENP) library(jsonlite)
Data can be fetched via study id from Metabolomics Workbench through getmwstudies. For metabolomics data, 'data' is used, for metadata, 'factors' is used. If users want to upload their own data, please do at the bottom of the page ST000915 is the study of biomarkers of nonalcoholic fatty liver disease (NAFLD) progression.
metdata = getmwstudies('ST000915', 'data') knitr::kable(head(metdata))
Associate metabolomics data to the refmet class
refmet_class= convert_refmet(metdata) knitr::kable(head(refmet_class))
metadata = getmwstudies('ST000915', 'factors') knitr::kable(head(metadata))
Find the factors you would want to compare. Multiple factors/experimental groups (independent variables) are formatted in multiple columns but you can get information on all the experimental groups by "factors" column. For t-test use the independent variables in the same column. For comparing multipe independent variables use anova by anova_ana function.
For ex: in this study, we want to compare Diagnosis experimental methods, so we will compare Cirrhosis and Normal samples
unique(metadata$factors)
Find different type of analysis mode. This is important, because some studies may have different analysis types, and different analysis types detect different metabolites. In this study, there is only one analysis type.
### Find the analysis mode unique(metdata$analysis_summary)
Significant metabolites can be found by using significant_met function. The parameters are:
metabolomics_data: metabolomics data associated to refmet class
met_col: column with metabolite names
analysis_type: type of analysis ex-GCMS, HILIAC positive ion mode.
metadata: Metadata
factor1: first independent variable
factor2: second independent variable
factor_col: column name of the independent variables
sample_col: the column name having samples
p_adjust: Method for p value adjustment, i.e. "fdr"
normalization: method for normalization a) "half_of_min": where the NAs are replaced by half of min values in the data b) "remove_NAs": where Cols with NAs values are removed and c) "50percent": where cols with more than 50% NAs values are removed, and then the remaining NA values are replaced with half_of_min
half_of_min is ideal when you wish to see which metabolites were present in either group. Very high fold change would mean it was present in either group.
In this example, we show significant metabolite filtered according to pvalue threshold of 0.05 and log2 fold change of 0.5, p adjust method of 'fdr' and we use the normalization method "50percent"
stats_metabolites = significant_met(metabolomics_data=refmet_class, met_col="metabolite_name",analysis_type=c("Core G Fatty acids/Eicosanoids","Core J Sterols","Core K Prenols/Cardiolipins","Core E Neutral Lipids","Core I Sphingolipids","Core H Phospholipids"), metadata=metadata, factor1='Cirrhosis', factor2=c('Normal'), factor_col='Diagnosis',sample_col='local_sample_id', p_adjust='fdr',normalization="50percent") sig_metabolites = stats_metabolites[which(stats_metabolites[,"pval"] <= 0.05&abs(stats_metabolites[,"log2Fold_change"])>0.5),] knitr::kable(head(sig_metabolites))
We will use the same thresholds as used for determining significant metabolites
plot_volcano(stats_metabolites, thres_pval= 0.05,thres_log2foldchange = 0.5, TRUE)
This function maps metabolite to metabolite class
In this example, we will go forward with significant metabolite obtained by t-test
sig_metabolites_kegg_id= map_keggid(sig_metabolites) knitr::kable(head(sig_metabolites_kegg_id))
setdiff(sig_metabolites$refmet_name, sig_metabolites_kegg_id$refmet_name)
You may choose from sub_class, main_class and super_class
count_changes = metcountplot(df_metclass=sig_metabolites_kegg_id, metclass='sub_class', plotting=TRUE, thres_logfC = 0.0, updown_fillcolor=c("red", "green"))
To see the Plot
count_changes$plotimg
Calculate the enrichment score of each metabolite class. Enrichment score is calculated through hypergeometric method. One can specify the no. of significant metabolites in a class while calculating the enrichment score. We advice to use the number of mtabolites in each class as 3 or more. But if someone just wants to know the enrichment score and rest of the information of all the metabolites, then they can choose the number as 1.
metenrichment = metclassenrichment(df_metclass=sig_metabolites_kegg_id,refmet_class, metclass="sub_class",enrich_stats="HG",no=1, debug = 0) knitr::kable(head(metenrichment))
plot_met_enrichment(metenrichment, "sub_class","HG", no=3)
First parameter is the metenrichment dataframe, while second parameter is the species code from KEGG
Here the subject species is Homo sapiens, and the KEGG species annotation for KEGG is 'hsa'
met_path = met_pathways(df_metenrichment = metenrichment, 'hsa') knitr::kable(head(met_path))
Find metabolites for which no pathways were registered in Kegg and/or no kegg id was found
setdiff(metenrichment$Metabolite,unique(met_path$Metabolite))
Once we have the pathway information, we can calculate enrichment score of pathways. Again, here I have used hypergeometric (HG) score. For hypergeometric score, we need to know the total number of kegg compounds that are linked to kegg pathways.
N = total no. of compounds/metabolites linked to all kegg pathways or the total number of metabolites detected in a study.
In the former case, this step might take long time, so I advice to run the script comp_linkedto_pathways() just the first time or after 6 months or so if desired to run the pipeline again. Save the result from comp_linkedto_pathways() and load it. Loading from saved file would save time for another analysis with another study. However the package takes care of loading. Save the data in the data folder.
L = No. of significant metabolites detected in a study
M = No. of significant metabolites detected in a metabolite class or pathway
k = Total no. of metabolites detected in a metabolite class/pathway
p-value = phyper(M-1, L, N-L, k, lower.tail=FALSE) # as in the actual R code # originally, it was wriiten as: phyper(M, L, N-L, K)
This function also utilizes korg dataset from pathview package.
In this example, the background set N is the total number of metabolites in a study.
load('../data/ls_path.RData') load('../data/korg.RData') kegg_es = path_enrichmentscore(met_path,sig_metabolite_kegg_id=sig_metabolite_kegg_id,ls_path=ls_path,refmet_class=refmet_class,sps='hsa',padj='fdr', kegg_comp_path=FALSE) knitr::kable(head(kegg_es))
The pathway network is such that it shows metabolites that are connected to different pathways and same metabolite in different pathway. Color of nodes of metabolites are according to the fold change of metabolites (low expression in green and high in red) and size of pathway nodes (square nodes) are according to the number of branches (meaning no of metabolites). All metabolite are written in blue
plot_pathway_networks (met_path,kegg_es, TRUE)
plot_heatmap(met_path, shorten_name=TRUE,refmet_name=FALSE, xaxis=8, yaxis=6)
dotplot_met_class_path (met_path, kegg_es,"sub_class",xaxis=8,yaxis=6)
Here we get the information of genes involved in enriched pathways for specified organism
met_gene_info = enzyme_gene_info (metenrichment, "hsa","sub_class") knitr::kable(head(met_gene_info))
rclass_info = react_substrate(met_gene_info) knitr::kable(head(rclass_info))
met_gene_info2=data.table::data.table(rclass_info)[,lapply(.SD, function(x) toString(unique(x))), by = 'Metabolite'] knitr::kable(head(met_gene_info2))
Load your own dataset. If your dataset has sample name in 1st column, enter the third parameter as true. If the dataset has 1st column as metabolite names, enter the third parameter as false. Please check the example files in the data folder of the package.
data_own <- separate_data('./human_cachexia.csv',"data",TRUE) knitr::kable(head(data_own))
metadata_own <- separate_data('./human_cachexia.csv',"metadata",TRUE) knitr::kable(head(metadata_own))
refmet_names=convert_refmet(data_own) knitr::kable(head(refmet_names))
stats_metabolites = significant_met_own(metabolomics_data=refmet_names,met_col="metabolite_name",metadata=metadata_own, factor1='cachexic', factor2='control', factor_col='Muscle loss',sample_col='local_sample_id', p_adjust='fdr',normalization="50percent")
sig_metabolites = stats_metabolites[which(stats_metabolites[,"pval"] <= 0.05&abs(stats_metabolites[,"log2Fold_change"])>0.5),]
plot_volcano(stats_metabolites, thres_pval= 0.05,thres_log2foldchange = 0.5, TRUE)
sig_metabolites_kegg_id= map_keggid(sig_metabolites)
count_changes = metcountplot(df_metclass=sig_metabolites_kegg_id, metclass='sub_class', plotting=TRUE, thres_logfC = 0.5) count_changes$plotimg
metenrichment = metclassenrichment(df_metclass=sig_metabolites_kegg_id, refmet_names, metclass= 'sub_class',enrich_stats="HG",no=1)
plot_met_enrichment(metenrichment, metclass='sub_class',"HG", no=1)
The rest of the pipeline can be followed similarly.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.