knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, error = FALSE, tidy = FALSE, dev = c("png"), cache = TRUE )
Since read counts are summed across cells in a pseudobulk approach, modeling continuous cell-level covariates also requires a collapsing step. Here we summarize the values of a variable from a set of cells using the mean, and store the value for each cell type. Including these variables in a regression formula uses the summarized values from the corresponding cell type.
We demonstrate this feature on a lightly modified analysis of PBMCs from 8 individuals stimulated with interferon-β (Kang, et al, 2018, Nature Biotech).
Here is the code from the main vignette:
library(dreamlet) library(muscat) library(ExperimentHub) library(scater) # Download data, specifying EH2259 for the Kang, et al study eh <- ExperimentHub() sce <- eh[["EH2259"]] # only keep singlet cells with sufficient reads sce <- sce[rowSums(counts(sce) > 0) > 0, ] sce <- sce[, colData(sce)$multiplets == "singlet"] # compute QC metrics qc <- perCellQCMetrics(sce) # remove cells with few or many detected genes ol <- isOutlier(metric = qc$detected, nmads = 2, log = TRUE) sce <- sce[, !ol] # set variable indicating stimulated (stim) or control (ctrl) sce$StimStatus <- sce$stim
In many datasets, continuous cell-level variables could be mapped reads, gene count, mitochondrial rate, etc. There are no continuous cell-level variables in this dataset, so we can simulate two from a normal distribution:
sce$value1 <- rnorm(ncol(sce)) sce$value2 <- rnorm(ncol(sce))
Now compute the pseudobulk using standard code:
sce$id <- paste0(sce$StimStatus, sce$ind) # Create pseudobulk pb <- aggregateToPseudoBulk(sce, assay = "counts", cluster_id = "cell", sample_id = "id", verbose = FALSE )
The means per variable, cell type, and sample are stored in the pseudobulk SingleCellExperiment
object:
metadata(pb)$aggr_means
Including these variables in a regression formula uses the summarized values from the corresponding cell type. This happens behind the scenes, so the user doesn't need to distinguish bewteen sample-level variables stored in colData(pb)
and cell-level variables stored in metadata(pb)$aggr_means
.
Variance partition and hypothesis testing proceeds as ususal:
form <- ~ StimStatus + value1 + value2 # Normalize and apply voom/voomWithDreamWeights res.proc <- processAssays(pb, form, min.count = 5) # run variance partitioning analysis vp.lst <- fitVarPart(res.proc, form) # Summarize variance fractions genome-wide for each cell type plotVarPart(vp.lst, label.angle = 60) # Differential expression analysis within each assay res.dl <- dreamlet(res.proc, form) # dreamlet results include coefficients for value1 and value2 res.dl
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.