knitr::opts_chunk$set(echo = TRUE)
This tutorial provides a guided alignment for two groups of cells from cellbench RNA mixture experiments. In this tutorial we demonstrate the unsupervised alignment strategy of scAlign
described in Johansen et al, 2018 along with typical analysis utilizing the aligned dataset, and show how scAlign
can identify and match cell types across platforms without using the labels as input.
The following is a walkthrough of a typical alignment problem for scAlign
and has been designed to provide an overview of data preprocessing, alignment and finally analysis in our joint embedding space. Here, our primary goals include:
## Install scAlign install.packages('devtools') devtools::install_github(repo = 'quon-titative-biology/scAlign') library(scAlign) ## Install Tensorflow library(tensorflow) install_tensorflow(version = "gpu") ## Removing version will install CPU version of Tensorflow
Guide to installing python and tensorflow on different operating systems. On Windows: Download Python 3.6.8. Note, newer versions of Python (e.g. 3.7) cannot use TensorFlow at this time. Make sure pip is included in the installation. On Ubuntu: sudo apt update sudo apt install python3-dev python3-pip On MacOS (homebrew): /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" export PATH="/usr/local/bin:/usr/local/sbin:$PATH" brew update brew install python # Python 3 Further details at: https://www.tensorflow.org/install
The gene count matrices used for this tutorial are hosted on the cellbench github: here.
First, we load in the normalized cellbench data. The data was normalized following the procedures defined on the cellbench github.
library(scAlign) library(SingleCellExperiment) library(ggplot2) ## Load in cellbench data data("cellbench", package = "scAlign", envir = environment()) ## Extract RNA mixture cell types mix.types = unlist(lapply(strsplit(colnames(cellbench), "-"), "[[", 2)) ## Extract Platform batch = c(rep("CEL", length(which(!grepl("sortseq", colnames(cellbench)) == TRUE))), rep("SORT", length(which(grepl("sortseq", colnames(cellbench)) == TRUE))))
The general design of scAlign's makes it agnostic to the input RNA-seq data representation. Thus, the input data can either be
gene-level counts, transformations of those gene level counts or a preliminary step of dimensionality reduction such
as canonical correlates or principal component scores. Here we create the scAlign
object from the previously normalized cellbench
data and perform CCA on the unaligned data.
## Create SCE objects to pass into scAlignCreateObject youngMouseSCE <- SingleCellExperiment( assays = list(scale.data = cellbench[,batch=='CEL']) ) oldMouseSCE <- SingleCellExperiment( assays = list(scale.data = cellbench[,batch=='SORT']) ) ## Build the scAlign class object and compute PCs scAlignCB = scAlignCreateObject(sce.objects = list("CEL"=youngMouseSCE, "SORT"=oldMouseSCE), labels = list(mix.types[batch=='CEL'], mix.types[batch=='SORT']), data.use="scale.data", pca.reduce = FALSE, cca.reduce = TRUE, ccs.compute = 5, project.name = "scAlign_cellbench")
Now we align the cell populations from both protocols. scAlign
returns a low-dimensional joint embedding space where
the effect of platform is removed allowing us to use the complete dataset for downstream analyses such as clustering or
differential expression.
## Run scAlign with all_genes scAlignCB = scAlign(scAlignCB, options=scAlignOptions(steps=1000, log.every=1000, norm=TRUE, early.stop=TRUE), encoder.data="scale.data", supervised='none', run.encoder=TRUE, run.decoder=FALSE, log.dir=file.path('~/models_temp','gene_input'), device="CPU") # ## Additional run of scAlign with CCA # scAlignCB = scAlign(scAlignCB, # options=scAlignOptions(steps=1000, # log.every=1000, # norm=TRUE, # early.stop=TRUE), # encoder.data="CCA", # supervised='none', # run.encoder=TRUE, # run.decoder=FALSE, # log.dir=file.path('~/models','cca_input'), # device="CPU") ## Plot aligned data in tSNE space, when the data was processed in three different ways: ## 1) either using the original gene inputs, ## 2) after CCA dimensionality reduction for preprocessing. ## Cells here are colored by input labels set.seed(5678) gene_plot = PlotTSNE(scAlignCB, "ALIGNED-GENE", title="scAlign-Gene", perplexity=30) ## Show plot gene_plot # cca_plot = PlotTSNE(scAlignCB, # "ALIGNED-CCA", # title="scAlign-CCA", # perplexity=30) # # multi_plot_labels = grid.arrange(gene_plot, cca_plot, nrow = 1)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.