vdjdb_hits: Compare CDR3s from sequencing to known CDR3s in the vdjdb

View source: R/vdjdb_hits.R

vdjdb_hitsR Documentation

Compare CDR3s from sequencing to known CDR3s in the vdjdb

Description

Matching CDR3s from sequencing with CDR3s of known antigen specificity may help to narrow potential antigen specificities of one's sequenced TCRs. Export a tsv of all or selected entries here https://vdjdb.cdr3.net/search. Provide that tsv as data frame (vdjdb). From vdjdb and tcrs distinct rows of tr_cols and cdr3_col are compared by levensthein distance and if selected (PAM30_similarity) by amino acid similarity. Returned data frame may be used to join original columns from vdjdb and tcrs followed by plotting with ggplot2 (see example). Original column names (vdj_tr_col, tcr_tr_col and vdj_cdr3_col, tcr_cdr3_col) are kept in the returned data frame to allow easy subsequent joining of vdjdb and tcrs. It requires though that these columns have different names in vdjdb and tcrs.

Usage

vdjdb_hits(
  vdjdb,
  tcrs,
  vdj_tr_col = "Gene",
  tcr_tr_col = "chain",
  vdj_cdr3_col = "CDR3",
  tcr_cdr3_col = "CDR3_aa_cr",
  TRAxTRB = F,
  max_lvdist = 3,
  PAM30_similarity = F,
  mapply_fun = mapply,
  lapply_fun = lapply,
  ...
)

Arguments

vdjdb

data frame of TCR data from vdjdb; vdj_cdr3_col and vdj_tr_col must be present; if not provided system.file("extdata", "vdjdb.tsv.rds", package = "igsc") is used (table downloaded 06/12/2021)

tcrs

data frame of TCR data from sequencing, tcr_cdr3_col and tcr_tr_col must be present; e.g. cl_long from clonotype prep

vdj_tr_col

column which codes the TCR chain in the vdjdb reference data frame (should contain TRA and/or TRB only)

tcr_tr_col

column which codes the TCR chain in the tcrs data frame (should contain TRA and/or TRB only)

vdj_cdr3_col

column which codes the CDR3s in the vdjdb reference data frame

tcr_cdr3_col

column which codes the CDR3s in the tcrs data frame

TRAxTRB

logical, apart from comparing TRA vs TRA and TRB vs TRB between tcrs and vdjdb also check for TRA vs TRB?

max_lvdist

maximum levensthein distance between CDR3s for return data frame (also filtering before PAM30 similarity calculation)

PAM30_similarity

logical whether to calculate the similarity (pairwise alignment) between CDR3s based on PAM30 substitution matrix

mapply_fun

mapply function for PAM30_similarity, suggested are mapply, pbapply::pbmapply, parallel::mcmapply

lapply_fun

lapply function for levensthein distance calculation, suggested are lapply, pblapply::pblapply, parallel::mclapply

...

arguments passed to mapply_fun and lapply_fun, most relevant: mc.cores (e.g parallel::detectCores())

Value

data frame of matched CDR3s from tcrs and vdjdb, column lv indicates the levensthein between CDR3 from vdjdb and tcrs

Examples

## Not run: 
# tcrs data frame with sequencing data from TCRs has to be created (cl_long is appropriate)
matches <- vdjdb_hits(vdjdb, tcrs, mapply_fun = parallel::mcmapply, mc.cores = parallel::detectCores(), nthread = parallel::detectCores())
matches_append <- matches %>% dplyr::left_join(tcrs) %>% dplyr::left_join(vdjdb)

df <-
df %>%
dplyr::distinct(CDR3, CDR3_aa_cr, lv, Gene, chain, cl_name, patient, Epitope, Epitope.gene, Epitope.species, Score) %>%
tidyr::drop_na() %>%
dplyr::filter(cl_name %in% c("Madalyn", "Mikaylla", "Morgan")) %>%
dplyr::filter(lv < 3) %>%
dplyr::filter(Score > 0)

# use jitter for many hits (when beeswarms become overlapping)
ggplot(df, aes(x = Score, y = lv, color = Epitope.species)) +
 ggbeeswarm::geom_beeswarm(groupOnX = F, cex = 4, size = 2.5) +
 #geom_jitter(width = 0.15, height = 0.15, size = 3) +
 theme_bw() +
 theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), strip.background = element_rect(fill = "white"), text = element_text(family = "Courier")) +
 scale_x_continuous(breaks = fcexpr::int_breaks(n = 3), limits = c(0.8,3.2)) +
 scale_y_reverse(breaks = fcexpr::int_breaks(n = 3)) +
 guides(fill = guide_legend(override.aes = list(size = 5)), color = guide_legend(override.aes = list(size = 5))) +
 facet_grid(rows = vars(chain), cols = vars(patient, cl_name))

## End(Not run)

Close-your-eyes/igsc documentation built on Jan. 28, 2024, 10:28 p.m.