knitr::opts_chunk$set( collapse = TRUE, warnings = FALSE, messages = FALSE, comment = "#>" )

A method to evaluate the number of clusters found in single cell RNAseq data. The idea is to select a the number of clusters with an acceptable ROC curve for each cluster.

A logistic regression model is trained with cells in half the provided clustering and an ROC curve is calculated with the remaining test data.

Installation with `devtools`

package via:

devtools::install_github("jamez-eh/casc")

First load required packages.

library(SingleCellExperiment) library(casc) library(foreach) library(scater) library(scran)

To illustrate the usecase of `casc`

we use a simulated dataset, included with the package, with 1000 cells and 500 genes. There are 10 simulated clusters, each with 20 markers.

data(sce_sim) data(test_sce)

The data must first be clustered. Here we are using the quickCluster package to find clusters.

sce_sim$clusters_1 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts") sce_sim$clusters_2 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts", min.size=200)

SC3 stores clusters for various selection of k in slots `sc3_k_clusters`

.
The function `cascer`

takes a `singleCellExperiment`

object and a list of clusterings and returns a list of `casc`

objects.

`casc`

has the following parameters:

`sce`

: A`SingleCellExperiment`

with normalized logcounts as an assay or a logcounts matrix with cells as columns and genes as rows.`clusters`

: A list of clusterings to evaluate.`alpha`

: A parameter for logistic regression. alpha = 1 represents the lasso penalty and alpha = 0 represents the ridge penalty.

`casc`

objects have 4 slots `predicted_classes`

: classes predicted for test data
`auc`

: mean auc for each classes ROC
`response`

: probabilities of each predicted class for test data
`truths`

: true, provided classes for test data

registerDoSEQ() casc_list <- casc(sce = sce_sim, clusters = list(sce_sim$clusters_1, sce_sim$clusters_2), alpha = 0.5)

A scatter plot of auc values for the list of `casc`

objects can be created with the function `aucPlot`

. This AUC is the mean of the 1 vs all binary AUCs. Multiple ROC curves can be visualized on the same graph by calling `multROCPlot`

on a single `casc`

object. `aucPlot`

is for visualizing different values of k for the same clustering method. `multROCPlot`

may be used to select between any clusterings.

aucPlot(casc_list) multROCPlot(casc_list$casc_1) multROCPlot(casc_list$casc_2)

Can create plots for every `casc`

object in the list via:

lapply(seq_along(casc_list), function(x){multROCPlot(casc_list[[x]])})

```
sessionInfo()
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.