knitr::opts_chunk$set( collapse = TRUE, warnings = FALSE, messages = FALSE, comment = "#>" )
A method to evaluate the number of clusters found in single cell RNAseq data. The idea is to select a the number of clusters with an acceptable ROC curve for each cluster.
A logistic regression model is trained with cells in half the provided clustering and an ROC curve is calculated with the remaining test data.
Installation with devtools package via:
devtools::install_github("jamez-eh/casc")
First load required packages.
library(SingleCellExperiment) library(casc) library(foreach) library(scater) library(scran)
To illustrate the usecase of casc we use a simulated dataset, included with the package, with 1000 cells and 500 genes. There are 10 simulated clusters, each with 20 markers.
data(sce_sim) data(test_sce)
The data must first be clustered. Here we are using the quickCluster package to find clusters.
sce_sim$clusters_1 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts") sce_sim$clusters_2 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts", min.size=200)
SC3 stores clusters for various selection of k in slots sc3_k_clusters.
The function cascer takes a singleCellExperiment object and a list of clusterings and returns a list of casc objects.
casc has the following parameters:
sce: A SingleCellExperiment with normalized logcounts as an assay or a logcounts matrix with cells as columns and genes as rows.clusters: A list of clusterings to evaluate.alpha: A parameter for logistic regression. alpha = 1 represents the lasso penalty and alpha = 0 represents the ridge penalty.casc objects have 4 slots predicted_classes: classes predicted for test data
auc: mean auc for each classes ROC
response: probabilities of each predicted class for test data
truths: true, provided classes for test data
registerDoSEQ() casc_list <- casc(sce = sce_sim, clusters = list(sce_sim$clusters_1, sce_sim$clusters_2), alpha = 0.5)
A scatter plot of auc values for the list of casc objects can be created with the function aucPlot. This AUC is the mean of the 1 vs all binary AUCs. Multiple ROC curves can be visualized on the same graph by calling multROCPlot on a single casc object. aucPlot is for visualizing different values of k for the same clustering method. multROCPlot may be used to select between any clusterings.
aucPlot(casc_list) multROCPlot(casc_list$casc_1) multROCPlot(casc_list$casc_2)
Can create plots for every casc object in the list via:
lapply(seq_along(casc_list), function(x){multROCPlot(casc_list[[x]])})
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.