In samuel-marsh/scCustomize: Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing

all_times <- list()  # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
  now <- NULL
  function(before, options) {
    if (before) {
      now <<- Sys.time()
    } else {
      res <- difftime(Sys.time(), now, units = "secs")
      all_times[[options$label]] <<- res
    }
  }
}))
knitr::opts_chunk$set(
  tidy = TRUE,
  tidy.opts = list(width.cutoff = 95),
  message = FALSE,
  warning = FALSE,
  time_it = TRUE
)

Plotting Sequencing/Alignment Metrics

In addition to other QC metrics previously discussed it can also be helpful to plot the metrics generated during alignment of sequencing data to see if there are differences that might arise as result of sequencing differences as opposed to biological differences.

For this tutorial, I will be utilizing de-identified example data from my own experience but I will walk through process for using your own data.

library(ggplot2)
library(dplyr)
library(magrittr)
library(Seurat)
library(scCustomize)
library(qs)

raw_metrics <- qread("assets/seq_metrics_dataframe_RAW.qs")
meta_metrics <- qread("assets/seq_metrics_dataframe_meta.qs")

Loading Data

Before we can plot these metrics we first need to import raw data.

Accepted Data Types

These functions currently only work with data from 10X Genomics processed via Cell Ranger. If there is interest in adapting to other platforms I'd be happy to entertain PRs.

Data Location

Inside the output folder from Cell Ranger count is a file called "metrics_summary" which contains the metrics normally displayed in the html summary. See abbreviated example below.

Parent_Directory
├── sample_01
│   └── outs
│       └── filtered_feature_bc_matrix/
│       └── raw_feature_bc_matrix/
│       └── web_summary.html
│       └── metrics_summary.csv
└── sample_02
│   └── outs
│       └── filtered_feature_bc_matrix/
│       └── raw_feature_bc_matrix/
│       └── web_summary.html
│       └── metrics_summary.csv
└── sample_03

NOTE: If the path to these files on your system is different that the default 10X path be sure to specify the secondary_path param and set default_10X = FALSE when importing data.

Support for Cell Ranger `multi` pipeline

Read_Metrics_10X includes support for data processed using Cell Ranger multi pipeline by setting the parameter cellranger_multi = TRUE. The function will automatically alter it's default secondary path value and will return both gene expression and TCR metrics as separate data.frames.

Read Data

We can read all metrics files within a given parent directory using the Read_Metrics_10X function.

raw_metrics <- Read_Metrics_10X(base_path = "path/to/parent/directory/", default_10X = TRUE)

head(raw_metrics)

The result of this function will be a data.frame with one row per library.

head(raw_metrics) %>%
  kableExtra::kbl() %>%
  kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))

Optional parameters

Similarly to the scCustomize functions for reading raw count matrices (See Read & Write Data Functions Vignette) there are a few optional parameters for Read_Metrics_10X.

secondary_path/default_10X tells the function how to navigate from the parent directory (base_path) to the "metrics_summary.csv" files.
- If the directory specified is directly from Cell Ranger output or has an identical structure (see above; parent/library_name/outs/files) then user can specify default_10X = TRUE and function will find required files.
- If structure is different then user must supply secondary_path.
- If all files are in one directory simply supply secondary_path = "".
lib_list if only a subset of libraries are desired then user can supply the names of those libraries (sub-directories) to the lib_list parameter to limit which files are read in. Default is NULL and will read all samples from parent directory.
lib_names How to name files in output data.frame. Default will use sub-directory names to label the samples. Alternatively can provide a vector a sample/library names to use instead.

Add Meta Data

In order to make the resulting plots more meaningful we need to added corresponding sample level meta data.
NOTE: Make sure that one column in your meta data data.frame matches the "sample_id" column to metrics data.frame so that they are properly joined.

head(meta_metrics)

head(meta_metrics) %>%
  kableExtra::kbl() %>%
  kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))

Now we join the two data.frames before plotting.

metrics_final <- right_join(raw_metrics, meta_metrics)

Plotting

In total there are 15 Sequencing QC plotting functions. See Reference Manual for full details.

However, to simplify plotting there are also two wrapper functions which will return patchwork layouts of several of these functions in a single plot.

Basic Metrics Summary

A set of basic metrics can be plotted using Seq_QC_Plot_Basic_Combined

Seq_QC_Plot_Basic_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch")

Alignment Metrics Summary

A set of alignment-based metrics can be plotted using Seq_QC_Plot_Alignment_Combined

Seq_QC_Plot_Alignment_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch")

Basic Statistics

All of the Seq_QC_Plot_... functions also contain parameter to calculate significance between all groups using ggpubr::stat_compare_means().
NOTE: See function help pages and ggpubr for more info to make sure the right test is being used for your data.

Seq_QC_Plot_Basic_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch", significance = T)

Barcode Rank Plots

scCustomize can also be used to create customized barcode rank plots from raw feature barcode data using outputs from DropletUtils package. scCustomize provides the function Iterate_Barcode_Rank_Plot which reads data, processes using DropletUtils, and returns plots.

The plots are adaptation of the plots from DropletUtils package moved to ggplot2 framework with some additional customization and the ability to return rastered plots.

Iterate_Barcode_Rank_Plot(dir_path_h5 = "assets/Barcode_Rank_Example/", plateau = 20000, file_path = "plots/", file_name = "Barcode_Rank_Plots.pdf")

Iterate_Barcode_Rank_Plot(dir_path_h5 = "assets/Barcode_Rank_Example/", plateau = 20000, file_path = "~/Downloads/", file_name = "Barcode_Rank_Plots.pdf", )