library(knitr) opts_chunk$set(fig.align = 'center', fig.width = 6, fig.height = 5, dev = 'png') op <- options(gvis.plot.tag='chart')
scfind is built on top of the Bioconductor’s SingleCellExperiment class.
scfind operates on objects of class
SingleCellExperiment and writes all of its results back to the the object.
If you already have an
SCESet object, then proceed to the next chapter.
If you have a matrix or a data frame containing expression data then you first need to create an
SingleCellExperiment object containing your data. For illustrative purposes we will use an example expression matrix provided with
scfind. The dataset (
yan) represents FPKM gene expression of 90 cells derived from human embryo. The authors (Yan et al.) have defined developmental stages of all cells in the original publication (
ann data frame). We will use these stages in projection later.
library(SingleCellExperiment) library(scfind) head(ann) yan[1:3, 1:3]
Note that the cell type information has to be stored in the
cell_type1 column of the
rowData slot of the
Now let's create a
SingleCellExperiment object of the
sce <- SingleCellExperiment(assays = list(normcounts = as.matrix(yan)), colData = ann) # this is needed to calculate dropout rate for feature selection # important: normcounts have the same zeros as raw counts (fpkm) counts(sce) <- normcounts(sce) logcounts(sce) <- log2(normcounts(sce) + 1) # use gene names as feature symbols rowData(sce)$feature_symbol <- rownames(sce) isSpike(sce, "ERCC") <- grepl("^ERCC-", rownames(sce)) # remove features with duplicated names sce <- sce[!duplicated(rownames(sce)), ] sce
If one has a list of genes that you would like to check against you dataset, i.e.
find the cell types that most likely represent your genes (highest expression), then
scfind allows one to do that by first creating a gene index and then very quickly searching the index:
geneIndex <- buildCellTypeIndex(sce) p_values <- -log10(findCellType(geneIndex, c("SOX6", "SNAI3"))) barplot(p_values, ylab = "-log10(pval)", las = 2)
The calculation above shows that a list of genes containing
SNAI3 is specific for the
zygote cell type.
If one is more interested in finding out in which cells all the genes from your
gene list are expressed than you can build a cell index instead of a
cell type index.
buildCellIndex function should be used for building the index
findCell for searching the index:
geneIndex <- buildCellIndex(sce) res <- findCell(geneIndex, c("SOX6", "SNAI3")) res$common_exprs_cells
Cell search reports the p-values corresponding to cell types as well:
barplot(-log10(res$p_values), ylab = "-log10(pval)", las = 2)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.