BiocStyle::markdown() packages <- c("CIMseq", "printr", "future") purrr::walk(packages, library, character.only = TRUE) rm(packages)
CIMseq works in 3 stages:
The CIMseqMultiplets object also holds information concerning which features to select during deconvolution.
Multuplet deconvolution via swarm optimization.
Here we utilize swarm optimization to deconvolute the multuplets and, thus, provide an overview of the tissue "connectome".
Detecting specific interactions by calculating fold enrichment significance
Make CIMseqSinglets counts object. The CIMseqSinglets command takes 4 arguments:
We use a simple test dataset, included in the package, to initialize the CIMseqSinglets and CIMseqMultiplets objects. It contains a matrix of single-cells (counts.sng) that comprise 3 cell types, 79 in total, and a matrix of 83 multiplets (counts.mul). These multiplets were FACS sorted before sequencing such that their composition is known and we expect to detect exactly one connection between each of the cell types in the dataset.
counts.sng[1:2, 1:2] #example dim(counts.sng) #genes and sample numbers dim(counts.mul) #genes and sample numbers class(counts.sng) #should be a matrix
We first extract the ERCC reads. In the test data, ERCC reads can be identified with the regular expression "^ERCC\-[0-9]*$".
ercc.s <- grepl("^ERCC\\-[0-9]*$", rownames(counts.sng)) singlets <- counts.sng[!ercc.s, ] singletERCC <- counts.sng[ercc.s, ] ercc.m <- grepl("^ERCC\\-[0-9]*$", rownames(counts.mul)) multiplets <- counts.mul[!ercc.m, ] multipletERCC <- counts.mul[ercc.m, ]
We next want to perform feature selection, dimensionality reduction, and classification for the singlets. For the testData we have already pre-calculated this information. Once we have performed these calculations, we can build the CIMseqSinglets and CIMseqMultiplets objects that will be input into the deconvolution method.
#extract pre-calculated data dim.red <- getData(CIMseqSinglets_test, "dim.red") classes <- getData(CIMseqSinglets_test, "classification") selected <- getData(CIMseqMultiplets_test, "features") #build CIMseq objects cObjSng <- CIMseqSinglets(counts=singlets, counts.ercc=singletERCC, classification=classes, dim.red=dim.red) cObjMul <- CIMseqMultiplets(counts=multiplets, counts.ercc=multipletERCC, features=selected)
The CIMseqSinglets object contains:
1. The raw counts data input by the user.
2. A function to calculate log normalized counts per million on the fly.
3. A function to calculate counts per million on the fly.
4. The counts.ercc input by the user (if these are not available they can be
substituted using matrix())
5. An optional dimensionality reduced representation of the data for visualization.
6. The classification for each single cell.
The CIMseqMultiplets object contains:
1. The raw counts data input by the user.
2. A function to calculate log normalized counts per million on the fly.
3. A function to calculate counts per million on the fly.
4. The counts.ercc input by the user (if these are not available they can be
substituted using matrix())
5. The index of selected genes to use in the deconvolution.
Individual slots within the CIMseq objects can be accessed with the "getData" function. Note that all other CIMseq object slots can be accessed in the same way. See below:
counts <- getData(cObjSng, "counts")
It is also possible to initiate a empty object and set slots or exchance data in the slots in the following manner:
empty <- CIMseqSinglets() getData(empty, "counts") <- getData(cObjSng, "counts")
The fractions of ERCC spike-ins and the number of approximated cells can be viewed in a plot using the plotCountsERCC function.
plotCountsERCC(cObjSng, cObjMul)
Sometimes it may be desireable to view the expression of 2 marker genes in the counts data. This can be accomplished with the plotCountsMarkers function.
plotCountsMarkers(cObjSng, cObjMul, markers = c("CD74", "ANXA3"))
The deconvolution of the multuplets take place in this stage. The CIMseqSinglets
and CIMseqMultiplets objects must be provided as arguments. In addition, there
are multiple parameters that may be tuned. To keep processing time low, we use
very low values for 'swarmsize', 'nSyntheticMultiplets' and 'maxiter' in this
simple example. For a real dataset, we recommend using not going below the default
parameters (swarmsize=200, nSyntheticMultiplets=2000 and maxiter=100). For complex
datasets, we suggest increasing swarmsize further. See help(CIMseqSwarm)
for
further information. Parallelization is provided via the 'future' R package.
# For single-threaded, use "plan(sequential)" instead plan(multicore) sObj <- CIMseqSwarm( cObjSng, cObjMul, swarmsize = 50, nSyntheticMultiplets = 200, maxiter = 10, seed=123 )
sObj <- CIMseqSwarm_test
We can exmaine the resulting connections with the calculateEdgeStats
command:
calculateEdgeStats(sObj, cObjSng, cObjMul)
This reports all of the detected edges. The edge.cutoff argument specifies the fraction above which an edge is considered to be valid. We can specify the number of edges a connection must have to be reported with the min.num.edges argument and also filter results for a specific significance level with the min.pval argument.
The resulting "connectome" can be plotted via the command below. In this test data set there are no preferential interactions and no enrichment.
plotSwarmCircos(sObj, cObjSng, cObjMul, weightCut=10, maxCellsPerMultiplet=4, alpha=Inf, h.ratio=0.9, depleted=F)
For this test data set, we know the true composition of each multiplet which means that we can measure the performance of the deconvolution. The function "adjustFractions" is used to obtain deconvoluted multiplet compositions, which we can then compare to the known multiplet compositions.
mult.pred <- adjustFractions(singlets=cObjSng, multiplets=cObjMul, swarm=sObj, binary=T, maxCellsPerMultiplet=4) mult.truth <- as.matrix(annot[rownames(mult.pred), c('A375', 'HCT116', 'HOS')]) pred.table <- mult.pred == as.logical(mult.truth) mean(pred.table) barplot(100 * (1-sapply(split(pred.table, rowSums(mult.truth)), mean)), ylab='% Error', xlab='Multiplet complexity', main='Deconvolution error rates')
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.