ICA is an important alternative to classical statistical approaches for non-targeted metabolomics data. It extends the concept of regular correlation (e.g. in PCA, ASCA and PLS-DA) to statistical dependance by capturing higher order dependencies. However, its algorithm instability (output variations between different algorithm runs) and the biological validity of components have been overlooked when applied to complex metabolomics data. MetICA adresses these problems by gathering ICs estimated from multiple algorithm runs and from bootstrapped datasets, clustering them so as to find the most representative components. While evaluating the algorithmic stability, MetICA also suggests multiple criteria to select the correct number of components and to rank the extracted components.
Make sure you have a working development environment: Windows: Install Rtools. Mac: Install Xcode from the Mac App Store. * Linux: Install a compiler and various development libraries (details vary across different flavors of Linux).
install.packages("devtools")
library(devtools)
install_github("daniellyz/MetICA2")
library(MetICA)
help(MetICA)
help(validationPlot)
help(MetICA_extract_model)
data(yeast_metabolome)
# Check what is inside the example data:
yeast_metabolome$features[1:10,] # Display metabolic features (m/z values and ids)
yeast_metabolome$X[1:10,1:10] # Display the head of samples x metabolic features data matrix
X = yeast_metabolome$X
yeast_metabolome = read.csv("https://raw.githubusercontent.com/daniellyz/MetICA2/master/inst/Yeast-metabolome.csv")
features = yeast_metabolome[,c("ID","Mass")]
X = yeast_metabolome[,3:ncol(yeast_metabolome)] # Only keep intensity data for MetICA
rownames(X) = features$ID
X = t(X) # Transpose the data since MetICA accepts samples x variables matrices
# Begin a MetICA simulation with 2000 estimated components in total. The samples are not time-dependent, so trend = FALSE. Numbers of clusters are evaluated between 2 and 15:
M1=MetICA(X,pcs = 10,max_iter = 400,boot.prop = 0.3,max.cluster = 15,trends = F)
The function will display the percentage of variance explained based on the number of pcs chosen. User can modify this value:
results=validationPlot(M1)
# According to validation, we chose ics = 8 as optimal number of components.
M2=MetICA_extract_model(M1,ics = 8)
The function orders the extracted components using different criteria:
library(ade4)
# Ploting 5th and 8th MetICA components:
s.class(M2$S[,c(1,8)], yeast_metabolome$strains,cellipse=0,cpoint=0,clabel=1.5,add.p=F,grid=F)
Similar to the visualization of PCA scores, MetICA also allows the comparison of metabolic profiles. The following figure compares the metabolic profiles of 15 yeast strains (biological replicates of the same strain is connected)
If the separation on the first component matched with previous knowledges about yeast strains (e.g. phenotype separation), the variables (mass features) that have high loadings on this component might be potential biomarkers. To extract these top 100:
top100=order(M2$A1[,1],decreasing=T)[1:100]
top100_loading=M2$A1[top100,1]
cbind(yeast_metabolome$features[top100,],Loadings=top100_loading)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.