MEREDITH is an approach to integrate multiple omics data sets (e.g. RNAseq, DNA methylation, miRNA and Copy Number variations) and progect data points into 2D-map by performing tSNE1.
Please refer Taskesen et al. 2016 for detail methods.
The package contains two functions mederith
and dbscan_SH
with three dependencies. Please install them first if you don't have those libraries. The package also include an example data sets from TCGA 2. They will be used in the example section.
install.packages("fpc")
install.packages("cluster")
install.packages("Rtsne")
library(fpc)
library(cluster)
library(Rtsne)
Please make sure that the latest version of Rtsne is installed.
install.packages("devtools")
library(devtools)
install_github("Kyoko-wtnb/meRedith")
library(meRedith)
The package contains 5 data sets; 4 data frames for omic data and one for features of samples. The data sets are based on tcga stage 3 used in Taskesen et al. 2016, but only first 500 features are extracted. The number of sample is the same (4434 samples).
To check available data,
data(package="meRedith")
dataGE
: normalized RPKM of RNAseqdataME
: normalized DNA methylationdataMIR
: normalized expression level or microRNAdataCN
: normalized copy number variationssamples
: cancer labels of samplesWhen you perform meredith with your own data, please prepare either data.frame or matrix. All data sets have to have the same number of features. In the example, all data sets have the same number of sample size but the number of features can varies. Most importantly, input data should be normalized propery.
Since example data sets have samples in column and features in row, data need to be transferred as Rtsne
uses row as data points. Note that input data
has to be list object. When you perform MEREDITH for one data set, please provide as list like data=list(your.data)
. Here we perform Rtsne
one time as an example, however, we highly recommend to perform at least 100 times to optimize cost function (it might take some time).
mer.out <- meredith(data=list(dataGE, dataME, dataCN, dataMIR), transfer = T, nTSNE = 1)
Perform DBSCAN for 2D-map with optimization of silhouette score.
clst.out <- dbscan_SH(mer.out$Y, showplot=TRUE)
This visualization is not part of the function, however, here are some example to make density map with ggplot2.
#Preparation of labels
mer.plot <- data.frame(ID=samples$ID[match(colnames(dataGE), samples$ID)], x=mer.out$Y[,1], y=mer.out$Y[,2],
cancerlabel=samples$cancerlabel[match(colnames(dataGE), samples$ID)], cluster=clst.out)
clst.label.plot <- data.frame(cluster=paste("Cluster", unique(mer.plot$cluster[mer.plot$cluster!=0]), sep=""),
x=as.numeric(by(mer.plot$x[mer.plot$cluster!=0], mer.plot$cluster[mer.plot$cluster!=0], median)),
y=as.numeric(by(mer.plot$y[mer.plot$cluster!=0], mer.plot$cluster[mer.plot$cluster!=0], max)))
cancer.label.plot <- data.frame(cancerlabel=sort(unique(as.character(mer.plot$cancerlabel))),
x=as.numeric(by(mer.plot$x, mer.plot$cancerlabel, median)),
y=as.numeric(by(mer.plot$y, mer.plot$cancerlabel, median)))
#density plot
library(ggplot2)
# with cluster label
ggplot(mer.plot, aes(x=x,y=y))+stat_density2d(data=mer.plot[mer.plot$cluster!=0,],aes(fill=factor(cluster)), alpha=0.15, geom="polygon", linetype=0, n=100, show.legend = F)+geom_point(aes(color=cancerlabel), alpha=0.8)+geom_text(data=clst.label.plot, aes(x=x,y=y, label=cluster), alpha=0.8, size=5, show.legend=F)+scale_color_manual(values=rainbow(length(unique(mer.plot$cancerlabel))))+theme_bw()+theme(legend.position="none")
# with cancer label
ggplot(mer.plot, aes(x=x,y=y))+stat_density2d(data=mer.plot[mer.plot$cluster!=0,], aes(fill=factor(cluster)), alpha=0.15, geom="polygon", linetype=0, n=100, show.legend = F)+geom_point(aes(color=cancerlabel), alpha=0.8)+geom_text(data=cancer.label.plot, aes(x=x,y=y, label=cancerlabel), alpha=0.8, size=5, show.legend=F)+scale_color_manual(values=rainbow(length(unique(mer.plot$cancerlabel))))+theme_bw()+theme(legend.position="none")
# with both labels
ggplot(mer.plot, aes(x=x,y=y))+stat_density2d(data=mer.plot[mer.plot$cluster!=0,], aes(fill=factor(cluster)), alpha=0.15, geom="polygon", linetype=0, n=100, show.legend = F)+geom_point(aes(color=cancerlabel), alpha=0.5)+geom_text(data=clst.label.plot, aes(x=x,y=y, label=cluster), alpha=0.8, size=5, show.legend=F)+geom_text(data=cancer.label.plot, aes(x=x,y=y, label=cancerlabel), size=4, fontface="bold", show.legend=F)+scale_color_manual(values=rainbow(length(unique(mer.plot$cancerlabel))))+theme_bw()+theme(legend.position="none")
Results of full data sets of TGCA can be browsed at http://pancancer-map.ewi.tudelft.nl/.
Please cite the following airtive when you use MEREDITH.
Taskesen, E., Huisman, S.M.H., Mahfouz, A., Krijthe, J.K., de Ridder, J., van de Stolpe, A., van den Akker, E., Verheagh, W., Reinders, M.J.Y. 2016. Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics. Scientific Reports. 6:24949. doi:10.1038/srep24949.
Kyoko Watanabe: k.watanabe@vu.nl , Erdogan Taskesen: e.taskesen@vu.nl
Dept. Complex Trait Genetics (CTGlab) , VU University Amsterdam
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.