all_times <- list() # store the time for each chunk knitr::knit_hooks$set(time_it = local({ now <- NULL function(before, options) { if (before) { now <<- Sys.time() } else { res <- difftime(Sys.time(), now, units = "secs") all_times[[options$label]] <<- res } } })) knitr::opts_chunk$set( tidy = TRUE, tidy.opts = list(width.cutoff = 95), message = FALSE, warning = FALSE, time_it = TRUE )
In addition to other QC metrics previously discussed it can also be helpful to plot the metrics generated during alignment of sequencing data to see if there are differences that might arise as result of sequencing differences as opposed to biological differences.
For this tutorial, I will be utilizing de-identified example data from my own experience but I will walk through process for using your own data.
library(ggplot2) library(dplyr) library(magrittr) library(Seurat) library(scCustomize) library(qs)
raw_metrics <- qread("assets/seq_metrics_dataframe_RAW.qs") meta_metrics <- qread("assets/seq_metrics_dataframe_meta.qs")
Before we can plot these metrics we first need to import raw data.
These functions currently only work with data from 10X Genomics processed via Cell Ranger. If there is interest in adapting to other platforms I'd be happy to entertain PRs.
Inside the output folder from Cell Ranger count
is a file called "metrics_summary" which contains the metrics normally displayed in the html summary. See abbreviated example below.
Parent_Directory ├── sample_01 │ └── outs │ └── filtered_feature_bc_matrix/ │ └── raw_feature_bc_matrix/ │ └── web_summary.html │ └── metrics_summary.csv └── sample_02 │ └── outs │ └── filtered_feature_bc_matrix/ │ └── raw_feature_bc_matrix/ │ └── web_summary.html │ └── metrics_summary.csv └── sample_03
NOTE: If the path to these files on your system is different that the default 10X path be sure to specify the secondary_path
param and set default_10X = FALSE
when importing data.
multi
pipelineRead_Metrics_10X
includes support for data processed using Cell Ranger multi
pipeline by setting the parameter cellranger_multi = TRUE
. The function will automatically alter it's default secondary path value and will return both gene expression and TCR metrics as separate data.frames.
We can read all metrics files within a given parent directory using the Read_Metrics_10X
function.
raw_metrics <- Read_Metrics_10X(base_path = "path/to/parent/directory/", default_10X = TRUE) head(raw_metrics)
The result of this function will be a data.frame with one row per library.
head(raw_metrics) %>% kableExtra::kbl() %>% kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))
Similarly to the scCustomize functions for reading raw count matrices (See Read & Write Data Functions Vignette) there are a few optional parameters for Read_Metrics_10X
.
secondary_path
/default_10X
tells the function how to navigate from the parent directory (base_path
) to the "metrics_summary.csv" files. default_10X = TRUE
and function will find required files.secondary_path
. secondary_path = ""
. lib_list
if only a subset of libraries are desired then user can supply the names of those libraries (sub-directories) to the lib_list
parameter to limit which files are read in. Default is NULL and will read all samples from parent directory.lib_names
How to name files in output data.frame. Default will use sub-directory names to label the samples. Alternatively can provide a vector a sample/library names to use instead.In order to make the resulting plots more meaningful we need to added corresponding sample level meta data.
NOTE: Make sure that one column in your meta data data.frame matches the "sample_id" column to metrics data.frame so that they are properly joined.
head(meta_metrics)
head(meta_metrics) %>% kableExtra::kbl() %>% kableExtra::kable_styling(bootstrap_options = c("bordered", "condensed", "responsive", "striped"))
Now we join the two data.frames before plotting.
metrics_final <- right_join(raw_metrics, meta_metrics)
In total there are 15 Sequencing QC plotting functions. See Reference Manual for full details.
However, to simplify plotting there are also two wrapper functions which will return patchwork layouts of several of these functions in a single plot.
A set of basic metrics can be plotted using Seq_QC_Plot_Basic_Combined
Seq_QC_Plot_Basic_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch")
A set of alignment-based metrics can be plotted using Seq_QC_Plot_Alignment_Combined
Seq_QC_Plot_Alignment_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch")
All of the Seq_QC_Plot_...
functions also contain parameter to calculate significance between all groups using ggpubr::stat_compare_means()
.
NOTE: See function help pages and ggpubr for more info to make sure the right test is being used for your data.
Seq_QC_Plot_Basic_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch", significance = T)
scCustomize can also be used to create customized barcode rank plots from raw feature barcode data using outputs from DropletUtils package. scCustomize provides the function Iterate_Barcode_Rank_Plot
which reads data, processes using DropletUtils, and returns plots.
The plots are adaptation of the plots from DropletUtils package moved to ggplot2 framework with some additional customization and the ability to return rastered plots.
Iterate_Barcode_Rank_Plot(dir_path_h5 = "assets/Barcode_Rank_Example/", plateau = 20000, file_path = "plots/", file_name = "Barcode_Rank_Plots.pdf")
Iterate_Barcode_Rank_Plot(dir_path_h5 = "assets/Barcode_Rank_Example/", plateau = 20000, file_path = "~/Downloads/", file_name = "Barcode_Rank_Plots.pdf", )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.