knitr::opts_chunk$set(
  collapse = TRUE,
  warnings = FALSE,
  messages = FALSE,
  comment = "#>"
)

Overview

A method to evaluate the number of clusters found in single cell RNAseq data. The idea is to select a the number of clusters with an acceptable ROC curve for each cluster.

A logistic regression model is trained with cells in half the provided clustering and an ROC curve is calculated with the remaining test data.

Installation

Installation with devtools package via:

devtools::install_github("jamez-eh/casc")

Basics

First load required packages.

library(SingleCellExperiment)
library(casc)
library(foreach)
library(scater)
library(scran)

Load Data

To illustrate the usecase of casc we use a simulated dataset, included with the package, with 1000 cells and 500 genes. There are 10 simulated clusters, each with 20 markers.

data(sce_sim)
data(test_sce)

Clustering Data

The data must first be clustered. Here we are using the quickCluster package to find clusters.

sce_sim$clusters_1 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts")
sce_sim$clusters_2 <- scran::quickCluster(sce_sim, method="igraph", min.mean=0.1, assay.type="logcounts", min.size=200)

Casc Useage

SC3 stores clusters for various selection of k in slots sc3_k_clusters. The function cascer takes a singleCellExperiment object and a list of clusterings and returns a list of casc objects.

casc has the following parameters:

casc objects have 4 slots predicted_classes: classes predicted for test data auc: mean auc for each classes ROC response: probabilities of each predicted class for test data truths: true, provided classes for test data

registerDoSEQ()
casc_list <- casc(sce = sce_sim, 
                  clusters = list(sce_sim$clusters_1, sce_sim$clusters_2), 
                  alpha = 0.5)

Plot creation

A scatter plot of auc values for the list of casc objects can be created with the function aucPlot. This AUC is the mean of the 1 vs all binary AUCs. Multiple ROC curves can be visualized on the same graph by calling multROCPlot on a single casc object. aucPlot is for visualizing different values of k for the same clustering method. multROCPlot may be used to select between any clusterings.

aucPlot(casc_list)
multROCPlot(casc_list$casc_1)
multROCPlot(casc_list$casc_2)

Can create plots for every casc object in the list via:

lapply(seq_along(casc_list), function(x){multROCPlot(casc_list[[x]])})

Technical

sessionInfo()


jamez-eh/casc documentation built on June 12, 2019, 1:43 a.m.