For those not desiring to re-run the whole analysis from scratch, but wanting to explore and re-create the published analysis, the minimal necessary data has been included in the ClusterSignificanceTesting package to achieve this. This included data is utilized when knitting this document.
If you desire to re-run the whole code from scratch to re-create the analysis, we warn you that it can take some time. Depending on your system, the zeroOrHundred, specificityTest and sensitivityTest functions may take hours to days to run. Due to this fact, the chunks containing those function are set to eval=FALSE. We recommend running the above mentioned functions as demonstrated in this document in a seperate R session, save the output to your local system, and execute the downstream commands after the output from the functions is loaded into R.
Testing of the specificity and sensitivity of the ClusterSignificance package took place using two datasets representing two different known outcomes. The first dataset, called zero, was designed to test the sensitivity of ClusterSignificance and therefore included only groups with a true separation. It was produced with the zeroOrHundred function with the overlap argument set to 0. In summary, random numbers for each group and each of the two dimensions were drawn randomly from the uniform distribution. The ranges for the drawn data points were set to have 0% overlap in both dimensions, i.e. none of the data points from group A overlapped with the range of group B.
To test the specificity of the package a similar design was used with the exception of the group ranges instead being 100% overlapping, i.e. overlap argument to the zeroOrHundred was set to 100. In addition, the criterion that the range of one group be completely contained within the other groups range was implemented. This dataset is commonly refered to as hundred within the document/code.
Finally, the sensitivity and specificity testing datasets were generated for points amounts per group spanning from 5 to 100 with 10 repetitions per points amount. The permute step from ClusterSignificance was run using 104 iterations for each input matrix within the datasets and the p-value was recorded.
It should be mentioned that multiple constraints were placed on these datasets as they were generated. Constraints were specified in the zeroOrHundred function dependent on which overlap was desired. zeroOrHundred calls the matrixTests function which calls additional functions to run each specified check. If all sepcified checks were not passed the matrixTests re-generates the matrix until it conforms to all checks. Checks run for each dataset are included below:
zero/sensitivity dataset:
1. Overlap: The group overlap, as calculated using the calculateOverlap function, should correspond to the desired overlap. In principal this means that, as each matrix was drawn in a random fashion from the uniform distribution, this function checked that the actual overlap corresponded to the desired overlap.
2. Identical dimensions: Here we exclude generated matrices where the any single dimension for any group is identical to a dimension for another or the same group. This is desired due to the fact that such data would be highly unlikely to arise in reality. This is checked using the checkIdenticalDims function.
3. Unique replicates: Since we are running 10 replicates per points amount, we desire each replicate to be unique. Therefore, as replicates are generated they are temporarily saved, as each additional replicate is generated it is compared to the already existing matrices for that points amount. If an identical matrix has already been generated then it fails the checkIdenticalMatrix test.
4. Non-repeating points: The checkIdenticalPoints function checks that the data points within a matrix are not identical, i.e. it is not desired that a matrix is only 1's or 2's etc. The identicalPointsThreshold argument for this function is set to 1, i.e. all points must be unique, for the zero and hundred datasets.
hundred/specificity dataset:
1. All checks listed above that are run for the zero dataset.
2. Interspersion: This basically means that we want a reasonably even distribution of the points within the groups range. See the Functions.Rmd file for an example. Interspersion is checked using the interspersion command and the interspersion cut-off was set to 50.
BiocStyle::markdown() library(printr) library(knitr) ##the function below allows dynamic insertion of the function source code insert_fun = function(name) { read_chunk( lines = capture.output(dump(name, '')), labels = paste(name, 'source', sep = '-') ) }
library(ClusterSignificanceExtras)
#generate the zero dataset, i.e. range overlap = 0 zero <- zeroOrHundred(overlap=0, save=FALSE, verbose=FALSE) #generate the hundred dataset, i.e. range overlap = 100 hundred <- zeroOrHundred(overlap=100, save=FALSE, verbose=FALSE)
#set cores equal to the number of cores to use. Note this will not work across nodes. specificityTestResults <- specificityTest(cores=1, save=FALSE) sensitivityTestResults <- sensitivityTest(cores=1, save=FALSE)
The plots below show example matrices from the zero and hundred datasets for points per group equal to 10 and 100. The plot for the zero dataset indicates, as expected, that group 1 and group 2 have zero spatial overlap with each other in either dimension. The plots for the hundred dataset indicate that the data points for group 1 are totally contained within the range which group 2 occupies in both dimensions.
mat <- zero[['10.points']][,,'1.repitition'] groups <- makeGroups(list(mat), c("grp1", "grp2")) visualizeOverlaps(list(mat), groups, plotType="points", cex=3)
mat <- zero[['100.points']][,,'1.repitition'] groups <- makeGroups(list(mat), c("grp1", "grp2")) visualizeOverlaps(list(mat), groups, plotType="points", cex=3)
mat <- hundred[['10.points']][,,'1.repitition'] groups <- makeGroups(list(mat), c("grp1", "grp2")) visualizeOverlaps(list(mat), groups, plotType="points", cex=3)
mat <- hundred[['100.points']][,,'1.repitition'] groups <- makeGroups(list(mat), c("grp1", "grp2")) visualizeOverlaps(list(mat), groups, plotType="points", cex=3)
We ran ClusterSignificance using points per group from 5 to 100 at an interval of 10 with 10 repititions per number of points per group. This was performed with the sensivitity and specificity functions with their default arguments. The results were plotted with the sensSpecPlot function and indicate that ClusterSignificance's Mlp and Pcp methods both have 100% sensivity and specificity with this dataset.
sensSpecPlot(sens = sensitivityTestResults, spec = specificityTestResults)
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) { library(grid) # Make a list from the ... arguments and plotlist plots <- c(list(...), plotlist) numPlots = length(plots) # If layout is NULL, then use 'cols' to determine layout if (is.null(layout)) { # Make the panel # ncol: Number of columns of plots # nrow: Number of rows needed, calculated from # of cols layout <- matrix(seq(1, cols * ceiling(numPlots/cols)), ncol = cols, nrow = ceiling(numPlots/cols)) } if (numPlots==1) { print(plots[[1]]) } else { # Set up the page grid.newpage() pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout)))) # Make each plot, in the correct location for (i in 1:numPlots) { # Get the i,j matrix positions of the regions that contain this subplot matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE)) print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row, layout.pos.col = matchidx$col)) } } } mat <- zero[['20.points']][,,'1.repitition'] groups <- makeGroups(list(mat), c("grp1", "grp2")) x <- visualizeOverlaps(list(mat), groups, plotType="points", cex=2, title = "Sensitivity test example data", subtitle="0% overlapping", alpha=0.75) mat <- hundred[['20.points']][,,'1.repitition'] groups <- makeGroups(list(mat), c("grp1", "grp2")) y <- visualizeOverlaps(list(mat), groups, plotType="points", cex=2, title = "Specificity test example data", subtitle="100% overlapping", alpha=0.75) p <- sensSpecPlot() plots <- list(x, y, p) layout <- matrix(c(1,2,3,3), ncol=2, byrow=TRUE) multiplot(plotlist = plots, layout = layout)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.