BiocStyle::markdown() packages <- c("CIMseq", "printr", "future", "dplyr") purrr::walk(packages, library, character.only = TRUE) rm(packages)
In this vignette, we will analyze mRNA-seq data from a droplet-based platform (10x genomics 3'tag). The analysis follows the same basic steps as the plate-based version, and requires two droplet libraries analyzed separately containig singlets and multiplets:
The main difference is that droplet data does not contain spike-in mRNA, which means that we are lacking a way to reliably estimate absolute mRNA content of each cell. Instead, we have to estimate mRNA content using unique molecular identifier (UMI) counts.
Create the CIMseqSinglets counts object. The CIMseqSinglets command takes 4 arguments:
We will use a dataset from cells taken from murine small intestine (SI). First, we initialize the CIMseqSinglets and CIMseqMultiplets objects using UMI count matrices.
SmallIntest.singlets.counts[1:2, 1:2] #example dim(SmallIntest.singlets.counts) #genes and sample numbers dim(SmallIntest.multiplets.counts) #genes and sample numbers class(SmallIntest.singlets.counts) #should be a matrix
We next want to perform feature selection and classification for the singlets. For the testData we have already pre-calculated this information. Once we have performed these calculations, we can build the CIMseqSinglets and CIMseqMultiplets objects that will be input into the deconvolution method. When using droplet data, we do not use spike-in control mRNA, that slot of the objects is simply not filled.
It is important to set the normalization factor (norm.to parameter) correctly. This parameter should be approximately similar to the average tag count per cell, and has to be identical for singlet and multiplet datasets. For 10x genomics 3' RNA data, we have found that using a normalization factor of 10000 to work well.
#build CIMseq objects cObjSng.si <- CIMseqSinglets(counts=SmallIntest.singlets.counts, classification=SmallIntest.singlets.classes, norm.to=10000) cObjMul.si <- CIMseqMultiplets(counts=SmallIntest.multiplets.counts, features=SmallIntest.marker.genes, norm.to=10000)
The deconvolution of the multuplets take place in this stage. This is a large dataset and will take several hours to run on a modern desktop computer.
The CIMseqSinglets and CIMseqMultiplets objects must be provided as arguments.
In addition, there are multiple parameters that may be tuned depending on the
complexity of the dataset and the available CPU time. For a production run, we suggest increasing swarmsize to at least 200 and nSyntheticMultiplets to 2000.
See help(CIMseqSwarm)
. Parallelization is provided via the 'future' R package.
# For single-threaded, use "plan(sequential)" instead plan(multicore) options(future.globals.maxSize= 1000000000) sObj.si <- CIMseqSwarm(cObjSng.si, cObjMul.si, maxiter=100, swarmsize=110, nSyntheticMultiplets=200, seed=123)
We can exmaine the resulting connections with the calculateEdgeStats
command:
si.edges <- calculateEdgeStats(sObj.si, cObjSng.si, cObjMul.si, multiplet.factor=2, maxCellsPerMultiplet=2) si.edges %>% filter(pval < 1e-4 & weight > 10) %>% arrange(desc(score)) %>% head(n=10)
This reports all of the detected edges. The edge.cutoff argument specifies the fraction above which an edge is considered to be valid. We can specify the number of edges a connection must have to be reported with the min.num.edges argument and also filter results for a specific significance level with the min.pval argument.
The resulting "connectome" can be plotted via the command below.
classOrder <- c("Paneth", "Stem", "Transit amplifying", "Progenitor early", "Progenitor late-1", "Progenitor late-2", "Enterocyte", "Goblet", "Enteroendocrine", "Tuft")
plotSwarmCircos(sObj.si, cObjSng.si, cObjMul.si, weightCut=10, maxCellsPerMultiplet=2, alpha=1E-4, h.ratio=0.95, depleted=F, multiplet.factor=2, classOrder=classOrder)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.