Sometimes we don't have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data. Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classification time ~5-10x fold. These models were generated from the HPCA training data like so:
# load pre-trained neural network ensemble model ref = GetTrainingData_HPCA() # generate models Models_HPCA = ModelGenerator(R = training_HPCA, N = 100, num.cores = 4)
The "Models_HPCA" are accessed from within the R package:
# load pre-trained neural network ensemble model Models = GetModels_HPCA()
We demonstrate how to use SignacFast in this vignette, which shows that SignacFast is broadly consistent with Signac (just faster). Here, we show how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from this publication.
all_times <- list() # store the time for each chunk knitr::knit_hooks$set(time_it = local({ now <- NULL function(before, options) { if (before) { now <<- Sys.time() } else { res <- difftime(Sys.time(), now, units = "secs") all_times[[options$label]] <<- res } } })) knitr::opts_chunk$set( tidy = TRUE, tidy.opts = list(width.cutoff = 95), message = FALSE, warning = FALSE, time_it = TRUE ) #celltypes_fast = readRDS("./fls/celltypes_fast_citeseq.rds") #celltypes = readRDS("./fls/celltypes_citeseq.rds") # pbmc = readRDS("fls/pbmcs_signac_citeseq.rds") celltypes = readRDS(file = "fls/celltypes_amp_synovium.rds") celltypes_fast = readRDS(file = "fls/celltypes_fast_synovium_celltypes.rds")
Read the CEL-seq2 data.
ReadCelseq <- function (counts.file, meta.file) { E = suppressWarnings(readr::read_tsv(counts.file)); gns <- E$gene; E = E[,-1] E = Matrix::Matrix(as.matrix(E), sparse = TRUE) rownames(E) <- gns E } counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz" meta.file = "./fls/celseq_meta.immport.723957.tsv" E = ReadCelseq(counts.file = counts.file, meta.file = meta.file) M = suppressWarnings(readr::read_tsv(meta.file)) # filter data based on depth and number of genes detected kmu = Matrix::colSums(E != 0) kmu2 = Matrix::colSums(E) E = E[,kmu > 200 & kmu2 > 500] # filter by mitochondrial percentage logik = grepl("^MT-", rownames(E)) MitoFrac = Matrix::colSums(E[logik,]) / Matrix::colSums(E) * 100 E = E[,MitoFrac < 20]
Start with the standard pre-processing steps for a Seurat object.
library(Seurat)
Create a Seurat object, and then perform SCTransform normalization. Note:
# load data synovium <- CreateSeuratObject(counts = E, project = "FACs") # run sctransform synovium <- SCTransform(synovium)
Perform dimensionality reduction by PCA and UMAP embedding. Note:
# These are now standard steps in the Seurat workflow for visualization and clustering synovium <- RunPCA(synovium, verbose = FALSE) synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE) synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)
library(SignacX)
Generate Signac labels for the Seurat object. Note:
labels <- Signac(synovium, num.cores = 4) celltypes = GenerateLabels(labels, E = synovium)
Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:
# Run SignacFast labels_fast <- SignacFast(synovium, num.cores = 4) celltypes_fast = GenerateLabels(labels_fast, E = synovium)
Compare results:
Celltypes:
knitr::kable(table(Signac = celltypes$CellTypes, SignacFast = celltypes_fast$CellTypes), format = "html")
Cellstates:
knitr::kable(table(Signac = celltypes$CellStates, SignacFast = celltypes_fast$CellStates), format = "html")
Save results
saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds") saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds") saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")
write.csv(x = t(as.data.frame(all_times)), file = "fls/tutorial_times_SignacFast_AMP.csv")
Session Info
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.