For those not desiring to re-run the whole analysis from scratch, but wanting to explore and re-create the published analysis, the minimal necessary data has been included in the ClusterSignificanceTesting package to achieve this. This included data is utilized when knitting this document.
If you desire to re-run the whole code from scratch to re-create the analysis, we warn you that it can take some time. Depending on your system, the hematologicalCancers function may take hours to days to run. Due to this fact, the chunk containing that function is set to eval=FALSE. We recommend running the hematologicalCancers function as demonstrated in this document, save the output to your local system, and execute the downstream commands after the output from the hematologicalCancers function is loaded into R. Please note, due to the nature of t-SNE, results for the clustering may not look the same if t-SNE is re-run.
lncRNAs have been reported to play an important role in cellular biological processes such as gene regulation and have also been reported to be highly cell type specific. Previously, lncRNAs have beeen shown to be differentially expressed in pediatric acute lymphoblastic leukemia with MLL t(11q23) translocations and specific expression of these lncRNAs was demonstrated to be important in regulation of the disease phenotype. Due to these reasons, we hypothesized that lncRNA expression may also be capable of distinguishing known hematopoetic malagnancies and is therefore, potentially, important for regulating specific gene expression driving these diseases.
To test this hypothesis, we utilized the ClusterSignificance package to test for specific hematological malagnancy group seperations after running the tSNE algorithm using only lncRNA expression profiles as input. Specifically we utilized the GSE13159 dataset comprised of microarray gene expression data from 6 well characterised hematological malagnancies. We extracted 5165 probes detecting lncRNA from the expression data representing 4283 individual genes. Multidimensional reduction was then performed using the tSNE algorithm by inputing only the expression values of the lncRNAs. The ClusterSignificance Pcp method was then utilized to determine significant seperations within the known hematological malagnancies.
The results indicate that, of the 21 group comparisons made 20 of those were found to exhibit a significant seperation with 10000 iterations of permutation. The 'normal vs MDS' comparison seems to not show a significant seperation due to the inability of t-SNE to significantly seperate these two groups, most likely, due to their relative similarity. These results indicate that lncRNA expression profiles are able to differentiate many common hematological malagnancies and, thus, may be important for disease progression and identity.
BiocStyle::markdown() library(knitr) library(ClusterSignificanceExtras) ##the function below allows dynamic insertion of the function source code insert_fun = function(name) { read_chunk( lines = capture.output(dump(name, '')), labels = paste(name, 'source', sep = '-') ) }
library(scatterplot3d) library(printr) library(grid) library(gridBase) library(gridExtra)
data <- hematologicalCancers() hemCancData <- data[[1]] group.color <- data[[2]] prj <- data[[3]] cl <- data[[4]] pe <- data[[5]] pValues <- data[[6]] mat <- data[[7]] groups <- data[[8]] nc <- data[[9]] lncGenes <- data[[10]]
#number of unique genes length(unique(lncGenes))
#number of samples in each subtype table(hemCancData$characteristics_ch1.1)
Below we utilize the tsnePlots function to plot the tsne results from several different perspectives.
tsnePlots(hemCancData, "X")
tsnePlots(hemCancData, "Y")
tsnePlots(hemCancData, "Z")
mat <- as.matrix(hemCancData[ ,c("X1", "X2", "X3")]) groups <- hemCancData$characteristics_ch1.1 prj <- pcp(mat, groups) plot(prj)
plot(prj, steps=1)
plot(prj, steps=2)
plot(prj, steps=3)
plot(prj, steps=4)
plot(prj, steps=5)
plot(prj, steps=6)
cl <- classify(prj)
plot(cl, comparison=names(getData(cl, "scores"))[1])
plot(cl, comparison=names(getData(cl, "scores"))[2])
plot(cl, comparison=names(getData(cl, "scores"))[3])
plot(cl, comparison=names(getData(cl, "scores"))[4])
plot(cl, comparison=names(getData(cl, "scores"))[5])
plot(cl, comparison=names(getData(cl, "scores"))[6])
plot(cl, comparison=names(getData(cl, "scores"))[7])
plot(cl, comparison=names(getData(cl, "scores"))[8])
plot(cl, comparison=names(getData(cl, "scores"))[9])
plot(cl, comparison=names(getData(cl, "scores"))[10])
plot(cl, comparison=names(getData(cl, "scores"))[11])
plot(cl, comparison=names(getData(cl, "scores"))[12])
plot(cl, comparison=names(getData(cl, "scores"))[13])
plot(cl, comparison=names(getData(cl, "scores"))[14])
plot(cl, comparison=names(getData(cl, "scores"))[15])
plot(cl, comparison=names(getData(cl, "scores"))[16])
plot(cl, comparison=names(getData(cl, "scores"))[17])
plot(cl, comparison=names(getData(cl, "scores"))[18])
plot(cl, comparison=names(getData(cl, "scores"))[19])
plot(cl, comparison=names(getData(cl, "scores"))[20])
plot(cl, comparison=names(getData(cl, "scores"))[21])
pValues <- as.data.frame(pvalue(pe)) colnames(pValues) <- "pValue"
pValues
plot(pe, comparison=names(getData(pe, "scores.vec"))[1])
plot(pe, comparison=names(getData(pe, "scores.vec"))[2])
plot(pe, comparison=names(getData(pe, "scores.vec"))[3])
plot(pe, comparison=names(getData(pe, "scores.vec"))[4])
plot(pe, comparison=names(getData(pe, "scores.vec"))[5])
plot(pe, comparison=names(getData(pe, "scores.vec"))[6])
plot(pe, comparison=names(getData(pe, "scores.vec"))[7])
plot(pe, comparison=names(getData(pe, "scores.vec"))[8])
plot(pe, comparison=names(getData(pe, "scores.vec"))[9])
plot(pe, comparison=names(getData(pe, "scores.vec"))[10])
plot(pe, comparison=names(getData(pe, "scores.vec"))[11])
plot(pe, comparison=names(getData(pe, "scores.vec"))[12])
plot(pe, comparison=names(getData(pe, "scores.vec"))[13])
plot(pe, comparison=names(getData(pe, "scores.vec"))[14])
plot(pe, comparison=names(getData(pe, "scores.vec"))[15])
plot(pe, comparison=names(getData(pe, "scores.vec"))[16])
plot(pe, comparison=names(getData(pe, "scores.vec"))[17])
plot(pe, comparison=names(getData(pe, "scores.vec"))[18])
plot(pe, comparison=names(getData(pe, "scores.vec"))[19])
plot(pe, comparison=names(getData(pe, "scores.vec"))[20])
plot(pe, comparison=names(getData(pe, "scores.vec"))[21])
Here we subset the normal and MDS patient samples from the tSNE reults and plot only those samples using the normalMDS function, allowing us to easily view only those samples. The results indicate that after tSNE the normal and MDS represent a reasonably homogenous cluster and therefore, it may be expected that ClusterSignificance would not detect a significant seperation of these groups.
normalMDS(hemCancData, 1)
normalMDS(hemCancData, 2)
normalMDS(hemCancData, 3)
#format pvalues for easier plotting pValues <- as.data.frame(round(pvalue(pe), digits=6)) colnames(pValues) <- "pValues" pValues$pValues <- ifelse(pValues$pValues == 0.0001, paste("<", 0.0001, sep=""), pValues$pValues) #adjut layout for 2 plots layout(matrix(c(1,2), nrow=1), widths=c(7,3)) #plot plot(prj, steps=2, alpha=0.75, cex.lab = 1.5, cex.axis = 1) #table frame() vps <- baseViewports() pushViewport(vps$inner, vps$figure, vps$plot) grob <- tableGrob(pValues) grid.draw(grob) popViewport(3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.