knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
## Track time spent on making the vignette
startTime <- Sys.time()

## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    R = citation(),
    BiocStyle = citation("BiocStyle")[1],
    knitr = citation("knitr")[1],
    RefManageR = citation("RefManageR")[1],
    rmarkdown = citation("rmarkdown")[1],
    sessioninfo = citation("sessioninfo")[1],
    testthat = citation("testthat")[1],
    DeconvoBuddies = citation("DeconvoBuddies")[1]
)

Introduction

What is Deconvolution?

Inferring the composition of different cell types in a bulk RNA-seq data

Deconvolution is a analysis that aims to calculate the proportion of different cell types that make up a sample of bulk RNA-seq, based off of cell type gene expression profiles in a single cell/nuclei RNA-seq dataset.

![Use single cell data to infer the composition of bulk RNA-seq samples](http://research.libd.org/DeconvoBuddies/reference/figures/Deconvolution.png)

Deconvolution Methods

There are 20+ published reference based deconvolution methods. Below are a selection of 6 methods we tested in our deconvolution benchmark pre-print.

| Approach | Method | Citation | Availability | |-------------------|-----------------|-----------------|---------------------| | weighted least squares | DWLS | Tsoucas et al, Nature Comm, 2019 | R Package Cran | | Bias correction: Assay | Bisque | Jew et al, Nature Comm, 2020 | R Package github | | Bias correction: Source | MuSiC | Wang et al, Nature Communications, 2019 | R Package github | | Machine Learning | CIBERSORTx | Newman et al., Nature BioTech, 2019 | Webtool | | Bayesian | BayesPrism | Chu et al., Nature Cancer, 2022 | Webtool/R Package | | linear | Hspe | Hunt et al., Ann. Appl. Stat, 2021 | R package github |

Goals of this Vignette

We will be demonstrating how to use DeconvoBuddies tools when applying deconvolution with the Bisque package.

  1. Install and load required packages
  2. Download DLPFC RNA-seq data, and reference snRNA-seq data
  3. Find marker genes with DeconvoBuddies tools
  4. Run deconvolution with BisqueRNA
  5. Explore deconvolution output and create composition plots with DeconvoBuddies tools
  6. Check proportion against RNAScope estimated proportions

Video Tutorial

Linked is a video from a presentation of an earlier version of this tutorial from our LIBD Rstats club.

Basics

1. Install and load required packages

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. r Biocpkg("DeconvoBuddies") is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install r Biocpkg("DeconvoBuddies") by using the following commands in your R session:

Install DeconvoBuddies

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("DeconvoBuddies")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

Load Other Packages

library("spatialLIBD")
library("DeconvoBuddies")
library("SummarizedExperiment")
library("SingleCellExperiment")
library("BisqueRNA")
library("Biobase")
library("dplyr")
library("tidyr")
library("tibble")
library("ggplot2")

2. Download DLPFC RNA-seq data, and reference snRNA-seq data.

Bulk RNA-seq data

Access the 110 sample Human DLPFC bulk RNA-seq dataset for LIBD. These samples\ are from 19 tissue blocks, and 10 neurotypical adult donors. Samples were sequenced with two different library_types (polyA and RiboZeroGold), and three different RNA_extraction (Cyto, Total, Nuc), post quality control n=110 samples.

## use fetch deconvo data to load rse_gene
rse_gene <- fetch_deconvo_data("rse_gene")
rse_gene
# lobstr::obj_size(rse_gene)
# 41.16 MB

## Use Symbol as rownames

rownames(rse_gene) <- rowData(rse_gene)$Symbol

## bulk RNA seq samples were sequenced with different library types, and RNA extractions
table(rse_gene$library_type, rse_gene$library_prep)

Reference snRNA-seq data

This data is paired with a single nucleus RNA-seq data set from spatialLIBD. This dataset can be accessed with spatialLIBD::fetch_data().

## Use spatialLIBD to fetch the snRNA-seq dataset
sce_path_zip <- fetch_deconvo_data("sce")

## unzip and load the data
sce_path <- unzip(sce_path_zip, exdir = tempdir())
sce <- HDF5Array::loadHDF5SummarizedExperiment(
    file.path(tempdir(), "sce_DLPFC_annotated")
)

# lobstr::obj_size(sce)
# 172.28 MB

## exclude Ambiguous cell type
sce <- sce[, sce$cellType_broad_hc != "Ambiguous"]
sce$cellType_broad_hc <- droplevels(sce$cellType_broad_hc)

dim(sce)
## Check the broad cell type distribution
table(sce$cellType_broad_hc)

## We're going to subset to the first 5k genes to save memory
sce <- sce[seq_len(5000), ]

Orthogonal Cell Type Proportion from RNAScope/IF

An alternative method for calculating cell type proportions is through imaging a slice of tissue with cell type probes (RNAScope/IF). Then analyse the image to count cells and annotate them by cell type (we used HALO from Indica Labs)

The cell type proportions from the RNAScope/IF experiment will be used to evaluate the accuracy of cell type proportion estimates.

![RNAScope/IF measures the cell type proportions through imaging](http://research.libd.org/DeconvoBuddies/reference/figures/Deconvolution_compare_proportions.png) The RNAScope/IF proportion data is stored as a `data.frame` object in `DeconvoBuddies::RNAScope_prop`. Key columns in `RNAScope_prop`: - `SAMPLE_ID`: DLPFC Tissue block + RNAScope combination. - `Sample` : DLFPC Tissue block (Donor BrNum + DLPFC position). - `cell_type` : The cell type measured. - `n_cell` : the number of cells counted for the Sample and cell type. - `prop` : the calculated cell type proportion from n_cell wzxhzdk:6 # 3. Select Marker Genes Marker genes are genes with high expression in one cell type and low expression in other cell types, or "cell-type specific" expression. These genes can be used to learn more about the identity and function of cell types, but here we are interested in using a sets of cell type specific marker genes to reduce noise in deconvolution and increase accuracy. We have developed a method for finding marker genes called the "Mean Ratio". We calculate the `MeanRatio` for a target cell type for each gene by dividing the mean expression of the target cell by the mean expression of the next highest non-target cell type. Genes with the highest `MeanRatio` values are selected as marker genes. For a tutorial on marker gene selection check out [Vignette: Finding Marker Genes with Deconvo Buddies](https://research.libd.org/DeconvoBuddies/articles/Marker_Finding.html).

![Mean Ratio calculation process compared to 1vALL Marker Gene selection](http://research.libd.org/DeconvoBuddies/reference/figures/get_mean_ratio.png)

## Use `get_mean_ratio` to find marker genes. The function `DeconvoBuddies::get_mean_ratio` calculates the `Mean Ratio` and the rank of genes for a specified cell type annotation in an `SingleCellExperiment` object. wzxhzdk:7 ## Plot the top marker genes Use `DeconvoBuddies` plotting tools to quickly plot the gene expression of the top 4 Excitatory neuron marker genes across the `cellType_broad_hc` cell type annotations. wzxhzdk:8 Looks nice and cell type specific! ### Create a List of Marker Genes With the `Mean Ratio` calculated, we will select the top 25 highest Mean Ratio genes for each cell type, that also exists in the bulk data `rse_gene`. wzxhzdk:9 # 4. Prep Data and Run Bisque [*Bisque*](https://github.com/cozygene/bisque) is an R package for cell type deconvolution. In our deconvolution benchmark, we found it was a top preforming method. Below we will briefly show how to run Bisque's "reference based decomposition (deconvolution)". ## prep data To run `Bisque` the snRNA-seq and bulk data must first be converted to `ExpressionSet` format. We will subset our data to our selected MeanRatio marker genes. The snRNA-seq data must also be filtered for cells with no counts across marker genes. wzxhzdk:10 ## Run Bisque `Bisque` needs the bulk and single cell `ExpressionSet` we prepared above, plus columns in the single cell data that specify the cell type annotation to use `cellType_broad_hc` and donor id (`BrNum` in this data). wzxhzdk:11 ## Explore Output Bisque predicts the proportion of the cell types in `cellType_broad_hc` for each sample in the bulk data. wzxhzdk:12 # 5. Explore deconvolution output and create composition plots with `DeconvoBuddies` tools To visualize the cell type proportion predictions, we can plot cell type composition bar plots with \`DeconvoBuddies::plot_composition_bar\`, either the prediction for each sample, or the average proportion over a group of samples. wzxhzdk:13 # 6. Check proportion against RNAScope estimated proportions Note to compare the deconvolution results to the RNAScope proportions, Oligo and OPC need to be added together. wzxhzdk:14 Calculate the correlation plot a scatter plot of the proportions wzxhzdk:15 # 7. How to run deconvolution with `hspe` *hspe* (formerly *dtangle*) is another R package for deconvolution. It was also a top performing method in our deconvolution benchmark. `hspe` is [downloadable](https://gjhunt.github.io/hspe/) as a tar.gz but can't be shown on this vignettes Below we show some example code to prep input and run `hspe` (not run here): wzxhzdk:16 # Conclusion In this vignette we have demonstrated some of the functions and data in the `DeconvoBuddies` package, and how to use them in a deconvolution workflow to predict cell type proportions. # Reproducibility The `r Biocpkg("DeconvoBuddies")` package `r Citep(bib[["DeconvoBuddies"]])` was made possible thanks to: - R `r Citep(bib[["R"]])` - `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])` - `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` - `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])` - `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` - `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])` - `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])` This package was developed using `r BiocStyle::Biocpkg("biocthis")`. Code for creating the vignette wzxhzdk:17 Date the vignette was generated. wzxhzdk:18 Wallclock time spent generating the vignette. wzxhzdk:19 `R` session information. wzxhzdk:20 # Bibliography This vignette was generated using `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the scenes. Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`. wzxhzdk:21

LieberInstitute/DeconvoBuddies documentation built on Aug. 16, 2024, 3:37 a.m.