README.md

MetICA: Independent component analysis for high-resolution mass-spectrometry based metabolomics

Context

ICA is an important alternative to classical statistical approaches for non-targeted metabolomics data. It extends the concept of regular correlation (e.g. in PCA, ASCA and PLS-DA) to statistical dependance by capturing higher order dependencies. However, its algorithm instability (output variations between different algorithm runs) and the biological validity of components have been overlooked when applied to complex metabolomics data. MetICA adresses these problems by gathering ICs estimated from multiple algorithm runs and from bootstrapped datasets, clustering them so as to find the most representative components. While evaluating the algorithmic stability, MetICA also suggests multiple criteria to select the correct number of components and to rank the extracted components.

Install devtools (only if it has not been installed)

Make sure you have a working development environment: Windows: Install Rtools. Mac: Install Xcode from the Mac App Store. * Linux: Install a compiler and various development libraries (details vary across different flavors of Linux).

install.packages("devtools")

Installation from Github using R (with devtools)

library(devtools)
install_github("daniellyz/MetICA2")
library(MetICA)

Check the function manuals before starting

help(MetICA)
help(validationPlot)
help(MetICA_extract_model)

An example of data analysis using MetICA

Load yeast metabolomics data:

data(yeast_metabolome) 
# Check what is inside the example data:
yeast_metabolome$features[1:10,]  # Display metabolic features (m/z values and ids)
yeast_metabolome$X[1:10,1:10] # Display the head of samples x metabolic features data matrix
X = yeast_metabolome$X

Also possible to load example data from .csv file:

yeast_metabolome = read.csv("https://raw.githubusercontent.com/daniellyz/MetICA2/master/inst/Yeast-metabolome.csv")
features = yeast_metabolome[,c("ID","Mass")]
X = yeast_metabolome[,3:ncol(yeast_metabolome)] # Only keep intensity data for MetICA
rownames(X) = features$ID
X = t(X) # Transpose the data since MetICA accepts samples x variables matrices

Run MetICA simulations:

# Begin a MetICA simulation with 2000 estimated components in total. The samples are not time-dependent, so trend = FALSE. Numbers of clusters are evaluated between 2 and 15:
M1=MetICA(X,pcs = 10,max_iter = 400,boot.prop = 0.3,max.cluster = 15,trends = F)

The function will display the percentage of variance explained based on the number of pcs chosen. User can modify this value:

choose

Some plots to decide the number of MetICA components:

results=validationPlot(M1)

choose

choose

choose

Extract the MetICA model

# According to validation, we chose ics = 8 as optimal number of components.
M2=MetICA_extract_model(M1,ics = 8)

The function orders the extracted components using different criteria:

choose

choose

Biological interpretation

library(ade4)
# Ploting 5th and 8th MetICA components:
s.class(M2$S[,c(1,8)], yeast_metabolome$strains,cellipse=0,cpoint=0,clabel=1.5,add.p=F,grid=F) 

Similar to the visualization of PCA scores, MetICA also allows the comparison of metabolic profiles. The following figure compares the metabolic profiles of 15 yeast strains (biological replicates of the same strain is connected)

choose

If the separation on the first component matched with previous knowledges about yeast strains (e.g. phenotype separation), the variables (mass features) that have high loadings on this component might be potential biomarkers. To extract these top 100:

top100=order(M2$A1[,1],decreasing=T)[1:100]
top100_loading=M2$A1[top100,1]
cbind(yeast_metabolome$features[top100,],Loadings=top100_loading)

Collaborators

choose



daniellyz/MetICA2 documentation built on May 16, 2019, 11:11 p.m.