In rnabioco/djvdj: A collection of single-cell V(D)J tools

# Chunk opts
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  warning   = FALSE,
  message   = FALSE
)

This vignette provides detailed examples for manipulating V(D)J data imported into a single-cell object using djvdj. For the examples shown below, we use data for splenocytes from BL6 and MD4 mice collected using the 10X Genomics scRNA-seq platform. MD4 B cells are monoclonal and specifically bind hen egg lysozyme.

library(djvdj)
library(Seurat)
library(ggplot2)
library(dplyr)

# Load GEX data
data_dir <- system.file("extdata/splen", package = "djvdj")

gex_dirs <- c(
  BL6 = file.path(data_dir, "BL6_GEX/filtered_feature_bc_matrix"),
  MD4 = file.path(data_dir, "MD4_GEX/filtered_feature_bc_matrix")
)

so <- gex_dirs |>
  Read10X() |>
  CreateSeuratObject() |>
  AddMetaData(splen_meta)

# Add V(D)J data to object
vdj_dirs <- c(
  BL6 = system.file("extdata/splen/BL6_BCR", package = "djvdj"),
  MD4 = system.file("extdata/splen/MD4_BCR", package = "djvdj")
)

so <- so |>
  import_vdj(
    vdj_dirs,
    define_clonotypes = "cdr3_gene",
    include_mutations = TRUE
  )

Filtering V(D)J data

Per-chain data can be filtered with the filter_vdj() function. The per-chain values for each cell are parsed based on the ; separator and converted to a vector. This allows vector operations to be used for filtering. This function will not remove cells from the object, but instead remove the V(D)J data for cells that do not match the provided filtering expression.

In this example we are only including V(D)J data for cells that have all of the chains, IGH, IGK, and IGL.

res <- so |>
  filter_vdj(
    all(c("IGH", "IGK", "IGL") %in% chains)
  )

res |>
  slot("meta.data") |>
  filter(!is.na(clonotype_id)) |>
  select(chains, cdr3) |>
  head()

In this example we are removing V(D)J data for all chains except IGH.

res <- so |>
  filter_vdj(chains == "IGH")

res |>
  slot("meta.data") |>
  filter(!is.na(clonotype_id)) |>
  select(chains, cdr3) |>
  head(3)

Summarizing per-chain data

The summarize_vdj() function can be used to summarize the per-chain data for each cell and add the results to the meta.data. In this example we are calculating the median number of deletions and the median number of insertions for each cell. The col_names argument can be used to name the new columns, use '{.col}' to refer to the original column name.

res <- so |>
  summarize_vdj(
    data_cols = c("all_ins", "all_del"),
    fn        = stats::median,
    col_names = "median_{.col}"
  )

res |>
  slot("meta.data") |>
  filter(n_chains > 1) |>
  select(matches("all_(del|ins)")) |>
  head(2)

This function can also be used for character strings such as the chains column. In this example we are creating a new column in the meta.data containing the unique chains for each cell.

res <- so |>
  summarize_vdj(
    data_cols = "chains",
    fn        = ~ paste0(unique(.x), collapse = "_"),
    col_names = "unique_chains"
  )

res |>
  slot("meta.data") |>
  filter(n_chains > 2) |>
  select(chains, unique_chains) |>
  head(2)

Mutating per-chain data

Another way to modify V(D)J data present in the object is with the mutate_vdj() function. The function behaves in a similar way as dplyr::mutate(), but will parse the per-chain values for each cell and convert them to a vector. This allows vector operations to be performed when modifying the meta.data.

In this example we are calculating the sum of all insertions and deletions for each cell and storing this information in a new column called 'total_indels'.

res <- so |>
  mutate_vdj(
    total_indels = sum(all_ins, all_del)
  )

res |>
  slot("meta.data") |>
  select(all_ins, all_del, total_indels) |>
  head()