all_times <- list() # store the time for each chunk knitr::knit_hooks$set(time_it = local({ now <- NULL function(before, options) { if (before) { now <<- Sys.time() } else { res <- difftime(Sys.time(), now, units = "secs") all_times[[options$label]] <<- res } } })) knitr::opts_chunk$set( tidy = TRUE, tidy.opts = list(width.cutoff = 95), message = FALSE, warning = FALSE, time_it = TRUE )
CellBender is software for elimination of technical artifacts in scRNA-seq/snRNA-seq that uses deep generative model for unsupervised removal of ambient RNA and chimeric PCR artifacts. You can find out more info about CellBender from the bioRxiv Preprint, GitHub Repo, and Documentation.
Following completion of CellBender scCustomize contains a couple of functions that may be helpful when creating and visualizing data in Seurat.
library(ggplot2) library(dplyr) library(magrittr) library(patchwork) library(viridis) library(Seurat) library(scCustomize) library(qs)
# Read in example Cell Bender Dual Assay Object. astrocytes_cortex <- qread("assets/astro_nuc_seq.qs")
The output from CellBender is an H5 file that is styled and can be read like 10X Genomics H5 file. In CellBender pre-v3 this file could be read by Seurat::Read10X_h5()
(although that function assumes that the file contains no name prefix). However, v3+ contains extra information that causes Read10X_h5
to fail.
scCustomize contains function Read_CellBender_h5_Mat()
can be used when reading in CellBender h5 files regardless of the version of CellBender used.
cell_bender_mat <- Read_CellBender_h5_Mat(file_name = "PATH/SampleA_out_filtered.h5")
If the input that CellBender uses is based on STARsolo or pre-V3 Cell Ranger data then some of the slot name which stores feature/gene ids in the H5 file is different. These files can be read by specifying the optional feature_slot_name
parameter.
cell_bender_starsolo_mat <- Read_CellBender_h5_Mat(file_name = "PATH/SampleA_out_filtered.h5", feature_slot_name = "genes")
If CellBender H5 file contains non-standard H5 group names then Read_CellBender_h5_Mat
will error. To circumvent this simply supply the name of the H5 group that contains the count data.
cell_bender_name_mat <- Read_CellBender_h5_Mat(file_name = "PATH/SampleA_out_filtered.h5", h5_group_name = "background_removed")
scCustomize also contains two wrapper functions to easily read multiple CellBender files stored either in single directory or in multiple sub-directories Read_CellBender_h5_Multi_File
and Read_CellBender_h5_Multi_Directory
.
Read_CellBender_h5_Multi_Directory()
.The following is typical output directory structure for Cell Bender files with sub-directory labeled with sample name and each file also prefixed with the sample name.
Parent_Directory ├── sample_01 │ └── sample_01_out_cell_barcodes.csv │ └── sample_01_out_filtered.h5 │ └── sample_01_out.h5 │ └── sample_01_out.log │ └── sample_01_out.pdf └── sample_02 │ └── sample_02_out_cell_barcodes.csv │ └── sample_02_out_filtered.h5 │ └── sample_02_out.h5 │ └── sample_02_out.log │ └── sample_02_out.pdf
All we have to do is adjust the parameters to account for cell bender file names and directory structure.
secondary_path
In this case we can leave as NULL
because samples are located in immediate subdirectory (secondary path must be the same for all samples).custom_name = "_out.h5"
This specifies what the common file suffix of all files are. Optional Parameters
parallel
and num_cores
to use multiple core processing.
sample_list
By default Read_CellBender_h5_Multi_Directory
will read in all sub-directories present in parent directory. However a subset can be specified by passing a vector of sample directory names.
sample_names
As with other functions by default Read_CellBender_h5_Multi_Directory
will use the sub-directory names within parent directory to name the output list entries. Alternate names for the list entries can be provided here if desired. These names will also be used to add cell prefixes if merge = TRUE
(see below).
merge
logical (default FALSE). Whether to combine all samples into single sparse matrix and using sample_names
to provide sample prefixes.
cell_bender_merged <- Read_CellBender_h5_Multi_Directory(base_path = "assets/Cell_Bender_Example/", custom_name = "_out.h5", sample_names = c("WT1", "WT2"), merge = TRUE)
Read_CellBender_h5_Multi_File()
.If all output files are in single directory you can use Read_CellBender_h5_Multi_File
to read in all of the files with single function.
Parent_Directory ├── CellBender_Outputs │ └── sample_01_out.h5 │ └── sample_02_out.h5
cell_bender_merged <- Read_CellBender_h5_Multi_File(data_dir = "assets/Cell_Bender_Example/", custom_name = "_out.h5", sample_names = c("WT1", "WT2"))
Creating a Seurat object from merged CellBender matrices then is identical to creating any other Seurat object.
cell_bender_seurat <- CreateSeuratObject(counts = cell_bender_merged, names.field = 1, names.delim = "_")
Sometimes it can be helpful to create object that contains both the cell ranger values and cell bender values (we'll come to why below). scCustomize contains a helper function Create_CellBender_Merged_Seurat()
to handle object creation in one quick step.
For this function we assume that we will use the cell calling algorithm of Cell Ranger with the modified counts for Cell Bender.
cell_bender_merged <- Read_CellBender_h5_Multi_Directory(base_path = "assets/Cell_Bender_Cell_Ranger_Data/Cell_Bender_Example/", custom_name = "_out.h5", sample_names = c("WT1", "WT2"), merge = TRUE) cell_ranger_merged <- Read10X_h5_Multi_Directory(base_path = "assets/Cell_Bender_Cell_Ranger_Data/Cell_Ranger_Example/", default_10X_path = FALSE, h5_filename = "filtered_feature_bc_matrix.h5", merge = TRUE, sample_names = c("WT1", "WT2"), parallel = TRUE, num_cores = 2)
To run the function the user simply needs to provide the names of the two matrices and a name for assay containing the Cell Ranger counts (by default this is named "RAW").
dual_seurat <- Create_CellBender_Merged_Seurat(raw_cell_bender_matrix = cell_bender_merged, raw_counts_matrix = cell_ranger_merged, raw_assay_name = "RAW")
Users can specify any additional parameters normally passed to Seurat::CreateSeuratObject()
when using this function.
dual_seurat <- Create_CellBender_Merged_Seurat(raw_cell_bender_matrix = cell_bender_merged, raw_counts_matrix = cell_ranger_merged, raw_assay_name = "RAW", min_cells = 5, min_features = 200)
It can be very important with tools like Cell Bender to analyze how much the process has effected data on a per cell basis.
scCustomize includes function Add_CellBender_Diff()
to help with this process. This function will take the nCount and nFeature statistics from both assays in the object and calculate the difference and return 2 new columns ("nCount_Diff" and "nFeature_Diff") to the object meta.data.
dual_seurat <- Add_CellBender_Diff(seurat_object = dual_seurat, raw_assay_name = "RAW", cell_bender_assay_name = "RNA") head(dual_seurat@meta.data, 5)
dual_seurat <- Add_CellBender_Diff(seurat_object = dual_seurat, raw_assay_name = "RAW", cell_bender_assay_name = "RNA") head(dual_seurat@meta.data, 5) %>% kableExtra::kbl() %>% kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))
We can then use Median_Stats()
to calculate per sample averages across all cells by supplying the new variables to the median_var
parameter.
median_stats <- Median_Stats(seurat_object = dual_seurat, group_by_var = "orig.ident", median_var = c("nCount_Diff", "nFeature_Diff"))
median_stats <- Median_Stats(seurat_object = dual_seurat, group_by_var = "orig.ident", median_var = c("nCount_Diff", "nFeature_Diff")) median_stats %>% kableExtra::kbl() %>% kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))
It can also be helpful to understand what features may have changed the most. scCustomize provides the function CellBender_Feature_Diff
to determine changes in features. This will return a data.frame with rowSums for each feature, the difference in each feature, and the percent change of each feature.
feature_diff <- CellBender_Feature_Diff(seurat_object = dual_seurat, raw_assay = "RAW", cell_bender_assay = "RNA")
feature_diff <- CellBender_Feature_Diff(seurat_object = dual_seurat, raw_assay = "RAW", cell_bender_assay = "RNA") head(feature_diff, 5) %>% kableExtra::kbl() %>% kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))
In addition to returning the data.frame it can be useful to visually examine the changes/trends after running CellBender. The function CellBender_Diff_Plot
takes the data.frame from CellBender_Feature_Diff
as input and plots the results.
CellBender_Diff_Plot(feature_diff_df = feature_diff)
Optional Parameters
pct_diff_threshold
plot genes that exhibit a change equal to or greater than this threshold. num_features
instead of plotting genes above a threshold simply plot the top X changed genes. num_labels
change how many genes are labeled. label
logical, whether or not to label features. custom_labels
specify vector of specific features to label. p1 <- CellBender_Diff_Plot(feature_diff_df = feature_diff) p2 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, pct_diff_threshold = 50) p3 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, num_features = 500, pct_diff_threshold = NULL) p4 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, num_labels = 10) p5 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, label = F) p6 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, custom_labels = "Gm48099") wrap_plots(p1, p2, p3, p4, p5, p6, ncol = 2)
p1 <- CellBender_Diff_Plot(feature_diff_df = feature_diff) p2 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, pct_diff_threshold = 50) p3 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, num_features = 500, pct_diff_threshold = NULL) p4 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, num_labels = 10) p5 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, label = F) p6 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, custom_labels = "Gm48099") wrap_plots(p1, p2, p3, p4, p5, p6, ncol = 2)
For Cell Bender especially, but also potentially for other assays as well, it can be helpful during analysis to plot the corrected and uncorrected counts for given feature. scCustomize contains function FeaturePlot_DualAssay()
to make easy.
Users just need to supply the names of the two assays to plot and the features. NOTE: Make sure both assays have been normalized before plotting. The function will attempt to check and make sure both assays have been normalized but has not been tested in all scenarios.
For this example I'm using unpublished single nucleus RNA dataset from mouse cortex and have subsetted the astrocytes.
First let's plot gene that represents ambient RNA as it's restricted in expression to neurons (synaptic gene). If Cell Bender has worked well we expect that expression of this gene will be very different between the two assays.
FeaturePlot_DualAssay(seurat_object = astrocytes_cortex, features = "Syt1", assay1 = "RNA", assay2 = "RAW")
Now let's plot normally astrocyte restricted gene. If Cell Bender has worked well we expect that expression of this gene shouldn't be very different between the two assays.
FeaturePlot_DualAssay(seurat_object = astrocytes_cortex, features = "Gja1", assay1 = "RNA", assay2 = "RAW")
FeaturePlot_DualAssay
also contains optional parameter (colors_use_assay2
) that allows for returning plots with different color palettes for each assay.
assay2_pal <- turbo(n = 20) FeaturePlot_DualAssay(seurat_object = astrocytes_cortex, features = "Gja1", assay1 = "RNA", assay2 = "RAW", colors_use_assay2 = assay2_pal)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.