ARCHIVED: A walkthrough of APackOfTheClones v0.1.x"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(echo = FALSE)
options(repos = c(CRAN = "http://cran.rstudio.com"))

# function to install and load packages
load_package <- function(pkg, CRAN = TRUE, dev_dir = "Qile0317/") {
  if (!require(pkg, quietly = TRUE, character.only = TRUE)) {
    if (CRAN) {
      install.packages(pkg, quiet = TRUE, verbose = FALSE, character.only = TRUE)
    } else {
      devtools::install_github(paste(dev_dir, pkg, sep = ""))
    }
  }
  require(pkg, character.only = TRUE)
}

load_cran_packages <- function(...) {
  for (pkg in list(...)) load_package(pkg)
}

# load pkgs
load_cran_packages("ggplot2", "cowplot", "Seurat", "devtools")

# load the legacy APackOfTheClones version from github
devtools::install_github("Qile0317/APackOfTheClones@v0")
library(APackOfTheClones)

# load data
load_package("SCIPData", CRAN = FALSE)
data("pbmc", package = "SCIPData")
data("tcr_dataframe", package = "SCIPData")

# redefine "head" to output a nicer dataframe
view_head <- function(df) {
  knitr::kable(head(df))
}

Introduction

Single-cell RNA sequencing (scRNA-seq) and T cell receptor (TCR) sequencing are popular techniques for studying immune cell function and disease. The combined use of such data can provide insights into clonal expansion. APackOfTheClones provides a simple, publication-ready method to intuitively visualize clonal expansion between different cell clusters with ggplot2, and can be easily slotted into any analysis pipeline.

In this user vignette, we will assume basic familiarity with Seurat and R, and a Seurat object has already been processed through the basic pipeline until at least the UMAP reduction has been obtained. Most importantly, clonal expansion can only be analyzed if the biosample has been run through the 10X genomics Chromium Single Cell Immune Profiling service, and the resulting data processed by Cell Ranger.

We will be showing basic usage of the main function of the package, Namely, in the integration of the 10X genomics T-cell receptor library generated by Cell Ranger into a Seurat object, followed by the usage of a fully customizable clonal expansion plotting method.

Setup the Seurat object and T cell receptor library

Before anything else can be done, we need a Seurat object of the single cell experiment with a dimensional reduction. The most modern and common reduction used in publications is UMAP (Uniform Manifold Approximation and Projection) which is what this package defaults to, and what this vignette will use. There is also a way to base it off of the t-SNE (t-distributed stochastic neighbor embedding) or PCA (Principle Component Analysis) reduction though choosing the latter is probably not the best idea.

library(Seurat)
suppressPackageStartupMessages(library(APackOfTheClones))

# Here, a Seurat object has already been loaded named `pbmc`
print(pbmc)

Here, we can also see its UMAP reduction plot. It is very important that it is present within the Seurat object so that the later clonal expansion visualization can be based on the coordinates for intuitiveness.

umap_plot <- UMAPPlot(pbmc)
umap_plot

To analyze clonal expansion, the T-cell library (in the form of the all_contig_annotations.csv file) generated by Cell Ranger has to be "integrated" into the seurat object. This simply means that the cell-level receptor information is incorporated into the @meta.data attribute. However, before doing so, let's quickly view the file structure. An r dataframe of the read-in file has been loaded named tcr_dataframe:

head(tcr_dataframe)
view_head(tcr_dataframe)

This is the first 5 rows of what the file should look like directly from cell ranger. If you for some reason have some custom version of this dataframe that is different, it is highly important that there AT LEAST exists the column with the column name barcode (column 1) and raw_clonotype_id (column 17), otherwise everything else will not work as intended.

how do I load in the raw file?

library(readr)
tcr_dataframe <- readr::read_csv("your_file_location/all_contig_annotations.csv")

Integration of clonotype information

To finally incorporate the TCR(T-cell receptor library) data into the seurat object, we'll use the simple integrate_tcr function.

pbmc <- integrate_tcr(pbmc, tcr_dataframe, verbose = FALSE)

Note that the last argument verbose was intentionally set to FALSE due to the presence of a progress bar which corrupts the formatting of the vignette (it defaults to the recommended TRUE). Otherwise, the output to the R Console would have been the following:

#> integrating TCR library into seurat object
#>   |====================================| 100%
#> Percent of unique barcodes: 31 %

By calling integrate_tcr, one can now find all the information from all_contig_annotations.csv within the @meta.data attribute. Let's take a quick look on what it looks like:

head(pbmc@meta.data)
view_head(pbmc@meta.data)

From this snippet of the meta data, we can see that for this particular seurat object, the first 5 columns orig.ident, nCount_RNA, nFeature_RNA, RNA_snn_res.0.6, seurat_clusters were originally present. What we care about is all the new columns after them. A few key things to note: There are a lot of NA values, and that is due to 1) some cells aren't T cells, 2) some T cell receptors do not end up being sequenced for many possible reasons, 3) Of the sequenced T cells in the TCR library, there are usually a lot of duplicate barcodes (repeated TCR sequencing of the same cell) as each individual cell may have several TCR types and/or need several contigs to assemble the final sequence. unique data about TCRs of the same cell (same barcode) are collapsed together with __ in the order of the contigs

Now, the seurat object can be analyzed downstream with the new integrated information however we wish. It also is now ready to have its clonotypes counted and the clonal expansion visualized.

do I HAVE to integrate the the TCR library into my seurat object?

well... no. But there is no reason not to do so and it is heavily recommended to do so. Otherwise, we are about to see in the upcoming clonal expansion plotting function how you can avoid the integration.

How do I get the clonal expansion per cluster information myself?

There is the count_clone_sizes(seurat_obj) function for doing so. Check the documentation by running ?count_clone_sizes for more details.

Ball packing visualization

All one has to do to produce the visualization is by simply running the following code:

pbmc_expansion_plot <- clonal_expansion_plot(pbmc, verbose = FALSE)
# once again, `verbose` is set to false but it defaults to TRUE
pbmc_expansion_plot

How do I base the clonal expansion plot off of the t-SNE or PCA reduction? clonal_expansion_plot has an optional argument reduction which defaults to 'umap' that handles which reduction the plot is based off of. The argument can be set to 'tsne' or 'pca' like so:

# t-SNE based plot
pbmc_expansion_plot <- clonal_expansion_plot(pbmc, reduction = 'tsne')
# PCA based plot
pbmc_expansion_plot <- clonal_expansion_plot(pbmc, reduction = 'pca')

A few things to note about the plot

As APackOfTheClones is in a rapid-development phase, the resulting clonal expansion plot will usually not be visually satisfactory on the first run without customizations. However, there are many optional arguments and ggplot tricks that we are about to explore that will help us make it publication quality.

How do I create the plot without integrating the Seurat object and TCR library?

clonal_expansion_plot will also work if the user provides the original seurat object and the raw TCR library dataframe, like so:

# let pbmc be the raw seurat object before the integration
pbmc_expansion_plot <- clonal_expansion_plot(pbmc, tcr_dataframe, verbose = FALSE)

Adjusting intra-cluster spacing

The considerable overlap between clusters (due to the algorithm's attempt to fit clusters to the original UMAP coordinates) may sometimes obstruct eachother excessively.

To account for this, there are four optional arguments in clonal_expansion plot:

repulse = FALSE,
repulsion_threshold = 1,
repulsion_strength = 1,
max_repulsion_iter = 10

For more details on them, read the "Arguments" section in the function documentation (?clonal_expansion_plot). But to summarize, first, to make the circle clusters move away from eachother, repulse should be set to TRUE, and the function should be ran AGAIN. (If you feel this excessive re-running takes too long or is inefficient for your workflow in its current form, please make an issue on the github page.)

Setting repulse to TRUE with those default parameters should probably yield something like the following:

pbmc_expansion_plot <- clonal_expansion_plot(pbmc, repulse = TRUE, verbose = FALSE)
pbmc_expansion_plot

As we can see, the red and green clusters have "moved away" from each other, and there is very little overlap remaining. All other clusters which were not touching remain in their original positions.

Do I HAVE to re-run clonal_expansion_plot each time I want to modify a parameter?

Unfortunately, yes. With the current version (v1) of APackOfTheClones, to change the appearance of the plot, clonal_expansion_plot has to be re-ran with modified parameters each time, which unfortunately may be slow/tedious if a lot of modifications are required. Please create a github issue or contact the author if a new version should be made to prevent this.

If you are still unhappy with the spacing of the plot, see the following ways to adjust the spacing:

Adjusting and customizing size legend

The plot we've generated here has a size legend on the top left, but we see that it is partially outside of the plot. Unfortunately the algorithm for placing the legend is far from perfect in version 0 so the user will have to do a little extra work.

There are the following six parameters to adjust the legend:

add_size_legend = TRUE,
legend_sizes = c(1, 5, 50),
legend_position = "top_left",
legend_buffer = 1.5,
legend_color = "#808080",
legend_spacing = 0.4

Here, we can adjust the argument legend_buffer, which indicates the amount of ggplot2 units to move the legend "inwards" by. We can also slightly increase legend_spacing, an argument specifying the vertical distance between circles in the legend.

# the previous arguments have to be maintained!
pbmc_expansion_plot <- clonal_expansion_plot(
  pbmc,
  legend_buffer = 3,
  legend_spacing = 0.6,
  repulse = TRUE,
  verbose = FALSE
)
pbmc_expansion_plot

We can even change the circle (clone) sizes displayed in the legend by changing the vector legend_sizes:

pbmc_expansion_plot <- clonal_expansion_plot(
  pbmc,
  legend_sizes = c(1, 5, 25, 50),
  legend_buffer = 3,
  legend_spacing = 0.6,
  repulse = TRUE,
  verbose = FALSE
)
pbmc_expansion_plot

More details about these arguments can be read in the function documentation.

Adjusting and customizing circle size and scaling

So far, the sizes of the individual circles (clones) have seemed to look quite pleasant, fitting roughly into their UMAP cluster centroids snugly. However, this may not always be the case. A likely very common issue upon the initialization of the plot with default arguments is that the circles may be too small or too large.

In these cases, the argument clone_scale_factor should be changed. The argument defaults to 0.1, which is the multiplicative scale factor for the square root radius of a clone. So if a clone size is four, and the factor is 0.1, then the actual radius of the circle on the plot would be sqrt(4) * 0.1 which equates to 0.2.

Additionally, rad_scale_factor can optionally be changed to adjust the intra-circular spacing. The factor defaults to 0.95 and indicates the multiplicative amount to adjust the radius of the circles after they have been placed. decreasing this value (slightly) will basically make each circle smaller in place, increasing the spacing between then,

For demonstration sake, we'll change the clone_scale_factor to 0.07 and rad_scale_factor to 0.9:

pbmc_expansion_plot <- clonal_expansion_plot(
  pbmc,
  clone_scale_factor = 0.07,
  rad_scale_factor = 0.9,
  legend_sizes = c(1, 5, 25, 50),
  legend_buffer = 3,
  legend_spacing = 0.6,
  repulse = TRUE,
  verbose = FALSE
)
pbmc_expansion_plot

Other possible modifications

There are a few other arguments this vignette did not cover which can be read in the function documentation. However, to end off, one more important argument to consider is use_default_theme = TRUE. It makes the plot have the same axis labels and general theme as the original UMAP plot. If this isn't to your liking or you want to customize the plot your own way (since clonal_expansion_plot generates a ggplot object!), simply set it to FALSE and you'll have a very minimally themed plot:

clonal_expansion_plot(
  pbmc,
  use_default_theme = FALSE,
  clone_scale_factor = 0.07,
  rad_scale_factor = 0.9,
  legend_sizes = c(1, 5, 25, 50),
  legend_buffer = 3,
  legend_spacing = 0.6,
  repulse = TRUE,
  verbose = FALSE
)

Additionally, if you really want the plot to reflect the original UMAP plot, you could try using ggplot2's xlim and ylim functions.

Final product

That's about it for the most basic functionalities of the clonal expansion visualization function. Remember to save the plot first as an .svg file for maximal resolution, and it should be publication ready.

For fun, here's a side-by-side comparison of the UMAP plot and the clonal expansion plot:

library(ggplot2)
library(cowplot)

cowplot::plot_grid(
  umap_plot + ggtitle("scRNA-seq UMAP"),
  pbmc_expansion_plot + ggtitle("APackOfTheClones clonal expansion plot"),
  labels = "AUTO"
)

How do I reproduce the plots in this vignette?

If you really want to reproduce the plots and code in this vignette, you will need to install the data used in this vignette which has been compiled in an R package SCIPData (Single Cell Immune Profiling Data). You can do so with the following code

# install the data package
library(devtools)
devtools::install_github("Qile0317/SCIPData")

# load the package data directly into R objects in memory
data("pbmc", "tcr_dataframe")

And the rest of the code can be copied and used accordingly.

Alternatively, there are two smaller, minimal, artificial versions of the seurat object and clonotype data that can be loaded in APackOfTheClones, run ?mini_seurat_obj and ?mini_clonotype_data for more details.



Try the APackOfTheClones package in your browser

Any scripts or data that you put into this service are public.

APackOfTheClones documentation built on Oct. 11, 2024, 1:08 a.m.