In jasonratcliff/aggregateBioVar: Differential Gene Expression Analysis for Multi-subject scRNA-seq

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "75%"
)

# Bioconductor vignette links
link_sce <-
    paste0(
        "https://bioconductor.org/packages/release/bioc/",
        "vignettes/SingleCellExperiment/inst/doc/intro.html"
    )
link_se <-
    paste0(
        "https://bioconductor.org/packages/release/bioc/",
        "vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html"
    )
link_DESeq2 <-
    paste0(
        "https://bioconductor.org/packages/devel/bioc/",
        "vignettes/DESeq2/inst/doc/DESeq2.html"
    )

aggregateBioVar

Single cell RNA sequencing (scRNA-seq) studies allow gene expression quantification at the level of individual cells. These studies introduce multiple layers of biological complexity, including variations in gene expression between cell states within a sample (e.g., T cells versus macrophages), between samples within a population (e.g., biological or technical replicates), and between populations (e.g., healthy versus diseased individuals). Many early scRNA-seq studies involved analysis of gene expression within cells from a single sample. For single cell RNA-seq data collected from more than one subject, aggregateBioVar provides tools to summarize summarize single cell gene expression profiles at the level of samples (i.e., subjects) or populations. Given an input SingleCellExperiment object [@SingleCellExperiment2020] with pre-defined cell states, aggregateBioVar() stratifies data as a list of SummarizedExperiment objects [@R-SummarizedExperiment]. For each cell type, gene counts are aggregated by subject into a gene-by-subject count matrix, and column metadata are summarized to retain inter-subject variation for downstream analysis with bulk RNA-seq tools.

Installation

Install the development version of aggregateBioVar from GitHub with:

# install.packages("devtools")
devtools::install_github("jasonratcliff/aggregateBioVar", build_vignettes=TRUE)

Multi-subject scRNA-seq

library(aggregateBioVar)

# Bioconductor Packages
library(SummarizedExperiment, quietly = TRUE)
library(SingleCellExperiment, quietly = TRUE)
library(DESeq2, quietly = TRUE)

# Data analysis and visualization
library(dplyr, quietly = TRUE)
library(magrittr, quietly = TRUE)
library(ggplot2, quietly = TRUE)
library(ggtext, quietly = TRUE)

sce_samples <- sort(unique(small_airway$orig.ident))
sce_genotypes <- list(
    "Wildtype" = grep("WT", sce_samples, value = TRUE),
    "CFTRKO" = grep("CF", sce_samples, value = TRUE)
)

To illustrate the utility of biological replication for scRNA-seq sequencing experiments, consider a SingleCellExperiment object with scRNA-seq data from 7 subjects (r sce_samples) in the context of a cystic fibrosis phenotype. Samples were collected from small airway epithelium of newborn Sus scrofa with genotypes from wild type (CFTR+/+, n=r length(sce_genotypes$Wildtype)) and CFTR-knockout (CFTR-/-, n=r length(sce_genotypes$CFTRKO)) individuals. Note the dimensions of this object, with r nrow(small_airway) genes from r ncol(small_airway) cells:

small_airway

The primary function aggregateBioVar() takes a SingleCellExperiment object with column metadata variables indicating subject identity (e.g., biological sample; subjectVar) and assigned cell type (cellVar). The column metadata of a SingleCellExperiment object can be obtained by SummarizedExperiment::colData(). Here, the metadata variable orig.ident indicates the biological sample identifier and celltype the inferred cell type.

# Perform aggregation of counts and metadata by subject and cell type.
aggregate_counts <-
    aggregateBioVar(
        scExp = small_airway,
        subjectVar = "orig.ident", cellVar = "celltype"
    )

Each element of the returned list contains a SummarizedExperiment object with aggregated counts from cells in the assigned cell type (indicated by cellVar).

aggregate_counts

For each cell type subset, within-subject gene counts are aggregated and column metadata are summarized to exclude variables with intercellular variation. This effectively retains subject metadata and can be used for downstream analysis with bulk RNA-seq tools. After aggregation, the number of columns in the SingleCellExperiment object matches the number of unique values in the subject metadata variable indicated by subjectVar.

assay(aggregate_counts$`Immune cell`, "counts")

colData(aggregate_counts$`Immune cell`)

Differential Gene Expression

The aggregate gene-by-subject matrix and subject metadata can be used as inputs for bulk RNA-seq tools to investigate gene expression. Here, an example is provided using DESeq2 [@DESeq2]. A DESeqDataSet can be constructed from the aggregate gene-by-subject count matrix and summarized column metadata.

subj_dds_dataset <-
    DESeqDataSetFromMatrix(
        countData = assay(aggregate_counts$`Secretory cell`, "counts"),
        colData = colData(aggregate_counts$`Secretory cell`),
        design = ~ Genotype
    )

subj_dds <- DESeq(subj_dds_dataset)

subj_dds_results <-
    results(subj_dds, contrast = c("Genotype", "WT", "CFTRKO"))

Add negative log~10~ adjusted P-values, then plot against log~2~ fold change. Genes with adjusted P-values < 0.05 and fold-change absolute values > 1.0 are highlighted in red and labeled by feature.

subj_dds_transf <- as.data.frame(subj_dds_results) %>%
    bind_cols(feature = rownames(subj_dds_results)) %>%
    mutate(log_padj = - log(.data$padj, base = 10))

ggplot(data = subj_dds_transf) +
    geom_point(aes(x = log2FoldChange, y = log_padj), na.rm = TRUE) +
    geom_point(
        data = filter(
            .data = subj_dds_transf,
            abs(.data$log2FoldChange) > 1, .data$padj < 0.05
        ),
        aes(x = log2FoldChange, y = log_padj), color = "red"
    ) +
    geom_label(
        data = filter(
            .data = subj_dds_transf,
            abs(.data$log2FoldChange) > 1, .data$padj < 0.05
        ),
        aes(x = log2FoldChange, y = log_padj + 0.4, label = feature)
    ) +
    theme_classic() +
    labs(
        x = "log<sub>2</sub> (fold change)",
        y = "-log<sub>10</sub> (p<sub>adj</sub>)"
    ) +
    theme(
        axis.title.x = element_markdown(),
        axis.title.y = element_markdown())

Vignettes

For a detailed workflow and description of package components, see the package vignette:

vignette("multi-subject-scRNA-seq", package = "aggregateBioVar")

References

jasonratcliff/aggregateBioVar documentation built on Sept. 29, 2020, 11:53 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jasonratcliff/aggregateBioVar
Differential Gene Expression Analysis for Multi-subject scRNA-seq

In jasonratcliff/aggregateBioVar: Differential Gene Expression Analysis for Multi-subject scRNA-seq

aggregateBioVar

Installation

Multi-subject scRNA-seq

Differential Gene Expression

Vignettes

References

R Package Documentation

Browse R Packages

We want your feedback!

jasonratcliff/aggregateBioVar Differential Gene Expression Analysis for Multi-subject scRNA-seq

In jasonratcliff/aggregateBioVar: Differential Gene Expression Analysis for Multi-subject scRNA-seq

aggregateBioVar

Installation

Multi-subject scRNA-seq

Differential Gene Expression

Vignettes

References

R Package Documentation

Browse R Packages

We want your feedback!

jasonratcliff/aggregateBioVar
Differential Gene Expression Analysis for Multi-subject scRNA-seq