knitr::opts_chunk$set( collapse = FALSE, comment = "##", highlight = TRUE, prompt = FALSE, results = "markup" )
This vignette provides a basic example of using Bisque to decompose bulk expression. Bisque offers two modes of operation: Reference-based and Marker-based decomposition. We will provide brief examples of both.
library(Biobase) library(BisqueRNA)
Bisque requires expression data in the ExpressionSet format from the Biobase package.
Bulk RNA-seq data can be converted from a matrix (columns are samples, rows are genes) to an ExpressionSet as follows:
bulk.eset <- Biobase::ExpressionSet(assayData = bulk.matrix)
Single-cell data requires additional information in the ExpressionSet, specificially cell type labels and individual labels. Individual labels indicate which individual each cell originated from. To add this information, Biobase requires it to be stored in a data frame format. Assuming we have character vectors of cell type labels (cell.type.labels
) and individual labels (individual.labels
), we can convert scRNA-seq data (with counts also in matrix format) as follows:
sample.ids <- colnames(sc.counts.matrix) # individual.ids and cell.types should be in the same order as in sample.ids sc.pheno <- data.frame(check.names=F, check.rows=F, stringsAsFactors=F, row.names=sample.ids, SubjectName=individual.labels, cellType=cell.type.labels) sc.meta <- data.frame(labelDescription=c("SubjectName", "cellType"), row.names=c("SubjectName", "cellType")) sc.pdata <- new("AnnotatedDataFrame", data=sc.pheno, varMetadata=sc.meta) sc.eset <- Biobase::ExpressionSet(assayData=sc.counts.matrix, phenoData=sc.pdata)
If your single-cell data (from 10x platform) is in a Seurat object with cell type assignments, Bisque includes a function that will automatically convert this object to an ExpressionSet:
sc.eset <- BisqueRNA::SeuratToExpressionSet(seurat.obj, delimiter="-", position=2, version="v3")
The delimiter and position arguments describe the barcode format of 10x single-cell data. For example, barcodes of "ATCGATCG-1" and "ATGCAAGT-2" have the individual ID in position 2 after splitting by the delimiter '-'.
set.seed(42) cell.types <- c("Neurons", "Astrocytes", "Oligodendrocytes", "Microglia", "Endothelial Cells") avg.props <- c(.5, .2, .2, .07, .03) expr.data <- BisqueRNA::SimulateData(n.ind=2, n.genes=10, n.cells=10, cell.types=cell.types, avg.props=avg.props) sc.eset <- expr.data$sc.eset bulk.eset <- expr.data$bulk.eset
Here is an example of input single-cell and bulk data for 2 individuals with 10 cells sequenced each:
sampleNames(sc.eset) sc.eset$SubjectName sc.eset$cellType sampleNames(bulk.eset)
Note that if you have samples with both single-cell and bulk RNA-seq data, their IDs should be found in both sc.eset$SubjectName
and sampleNames(bulk.eset)
.
We will use data simulated under a simple model (code for SimulateData() can be found in R/simulation.R). We simulate single-cell and bulk RNA-seq counts for 10 individuals. We remove 5 individuals from the single-cell data. We will estimate the cell composition for these 5 individuals.
cell.types <- c("Neurons", "Astrocytes", "Oligodendrocytes", "Microglia", "Endothelial Cells") avg.props <- c(.5, .2, .2, .07, .03) sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=500, cell.types=cell.types, avg.props=avg.props) sc.eset <- sim.data$sc.eset[,sim.data$sc.eset$SubjectName %in% as.character(6:10)] bulk.eset <- sim.data$bulk.eset true.props <- sim.data$props markers <- sim.data$markers
rm(sim.data)
By default, Bisque uses all genes for decomposition. However, you may supply a list of genes (such as marker genes) to be used with the markers
parameter. Also, since we have samples with both bulk and single-cell RNA-seq data, we set the use.overlap
parameter to TRUE
. If there are no overlapping samples, you can set this parameter to FALSE
(we expect performance to be better if overlapping samples are available).
Here's how to call the reference-based decomposition method:
res <- BisqueRNA::ReferenceBasedDecomposition(bulk.eset, sc.eset, markers=NULL, use.overlap=TRUE)
A list is returned with decomposition estimates in slot bulk.props
.
ref.based.estimates <- res$bulk.props knitr::kable(ref.based.estimates, digits=2)
Just to make sure this worked, we can correlate all the estimates with the true proportions.
r <- cor(as.vector(ref.based.estimates), as.vector(true.props[row.names(ref.based.estimates),colnames(ref.based.estimates)])) knitr::knit_print(sprintf("R: %f", r))
BisqueMarker can provide estimates of relative cell type abundances using only known marker genes when a reference profile is not available. Marker genes are stored in a data frame with columns that specify gene, cluster that the gene is a marker for, and an optional column for weights (typically fold-change). Here's what this data frame might look like:
marker.data.frame <- data.frame(gene=paste("Gene", 1:6), cluster=c("Neurons", "Neurons", "Astrocytes", "Oligodendrocytes", "Microglia", "Endothelial Cells"), avg_logFC=c(0.82, 0.59, 0.68, 0.66, 0.71, 0.62)) knitr::kable(marker.data.frame)
Here's how to call the marker-based decomposition method:
res <- BisqueRNA::MarkerBasedDecomposition(bulk.eset, markers, weighted=F)
A list is returned with decomposition estimates in slot bulk.props
.
marker.based.estimates <- res$bulk.props knitr::kable(marker.based.estimates, digits = 2)
Note that these estimates are relative within each cell type, so you cannot immediately compare abundance estimates between cell types.
Just to make sure this worked, we can correlate these estimates with the scaled true proportions.
scaled.true.props <- t(scale(t(true.props)))[rownames(marker.based.estimates),] r <- cor(as.vector(marker.based.estimates), as.vector(scaled.true.props)) knitr::knit_print(sprintf("R: %f", r))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.