knitr::opts_chunk$set( collapse = TRUE, warnings = FALSE, messages = FALSE, comment = "#>" )
A method to evaluate the number of clusters found in single cell RNAseq data. The idea is to select a the number of clusters with an acceptable ROC curve for each cluster.
A logistic regression model is trained with cells in half the provided clustering and an ROC curve is calculated with the remaining test data.
Installation with devtools
package via:
devtools::install_github("jamez-eh/casc")
First load required packages.
library(SingleCellExperiment) library(casc) library(foreach) library(scater) library(scran)
To illustrate the usecase of casc
we use a simulated dataset, included with the package, with 1000 cells and 500 genes. There are 10 simulated clusters, each with 20 markers.
data(sce_sim) data(test_sce)
The data must first be clustered. Here we are using the quickCluster package to find clusters.
sce_sim$clusters_1 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts") sce_sim$clusters_2 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts", min.size=200)
SC3 stores clusters for various selection of k in slots sc3_k_clusters
.
The function cascer
takes a singleCellExperiment
object and a list of clusterings and returns a list of casc
objects.
casc
has the following parameters:
sce
: A SingleCellExperiment
with normalized logcounts as an assay or a logcounts matrix with cells as columns and genes as rows.clusters
: A list of clusterings to evaluate.alpha
: A parameter for logistic regression. alpha = 1 represents the lasso penalty and alpha = 0 represents the ridge penalty.casc
objects have 4 slots predicted_classes
: classes predicted for test data
auc
: mean auc for each classes ROC
response
: probabilities of each predicted class for test data
truths
: true, provided classes for test data
registerDoSEQ() casc_list <- casc(sce = sce_sim, clusters = list(sce_sim$clusters_1, sce_sim$clusters_2), alpha = 0.5)
A scatter plot of auc values for the list of casc
objects can be created with the function aucPlot
. This AUC is the mean of the 1 vs all binary AUCs. Multiple ROC curves can be visualized on the same graph by calling multROCPlot
on a single casc
object. aucPlot
is for visualizing different values of k for the same clustering method. multROCPlot
may be used to select between any clusterings.
aucPlot(casc_list) multROCPlot(casc_list$casc_1) multROCPlot(casc_list$casc_2)
Can create plots for every casc
object in the list via:
lapply(seq_along(casc_list), function(x){multROCPlot(casc_list[[x]])})
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.