vdjdb_hits | R Documentation |
Matching CDR3s from sequencing with CDR3s of known antigen specificity may help to narrow potential antigen specificities of one's sequenced TCRs. Export a tsv of all or selected entries here https://vdjdb.cdr3.net/search. Provide that tsv as data frame (vdjdb). From vdjdb and tcrs distinct rows of tr_cols and cdr3_col are compared by levensthein distance and if selected (PAM30_similarity) by amino acid similarity. Returned data frame may be used to join original columns from vdjdb and tcrs followed by plotting with ggplot2 (see example). Original column names (vdj_tr_col, tcr_tr_col and vdj_cdr3_col, tcr_cdr3_col) are kept in the returned data frame to allow easy subsequent joining of vdjdb and tcrs. It requires though that these columns have different names in vdjdb and tcrs.
vdjdb_hits(
vdjdb,
tcrs,
vdj_tr_col = "Gene",
tcr_tr_col = "chain",
vdj_cdr3_col = "CDR3",
tcr_cdr3_col = "CDR3_aa_cr",
TRAxTRB = F,
max_lvdist = 3,
PAM30_similarity = F,
mapply_fun = mapply,
lapply_fun = lapply,
...
)
vdjdb |
data frame of TCR data from vdjdb; vdj_cdr3_col and vdj_tr_col must be present; if not provided system.file("extdata", "vdjdb.tsv.rds", package = "igsc") is used (table downloaded 06/12/2021) |
tcrs |
data frame of TCR data from sequencing, tcr_cdr3_col and tcr_tr_col must be present; e.g. cl_long from clonotype prep |
vdj_tr_col |
column which codes the TCR chain in the vdjdb reference data frame (should contain TRA and/or TRB only) |
tcr_tr_col |
column which codes the TCR chain in the tcrs data frame (should contain TRA and/or TRB only) |
vdj_cdr3_col |
column which codes the CDR3s in the vdjdb reference data frame |
tcr_cdr3_col |
column which codes the CDR3s in the tcrs data frame |
TRAxTRB |
logical, apart from comparing TRA vs TRA and TRB vs TRB between tcrs and vdjdb also check for TRA vs TRB? |
max_lvdist |
maximum levensthein distance between CDR3s for return data frame (also filtering before PAM30 similarity calculation) |
PAM30_similarity |
logical whether to calculate the similarity (pairwise alignment) between CDR3s based on PAM30 substitution matrix |
mapply_fun |
mapply function for PAM30_similarity, suggested are mapply, pbapply::pbmapply, parallel::mcmapply |
lapply_fun |
lapply function for levensthein distance calculation, suggested are lapply, pblapply::pblapply, parallel::mclapply |
... |
arguments passed to mapply_fun and lapply_fun, most relevant: mc.cores (e.g parallel::detectCores()) |
data frame of matched CDR3s from tcrs and vdjdb, column lv indicates the levensthein between CDR3 from vdjdb and tcrs
## Not run:
# tcrs data frame with sequencing data from TCRs has to be created (cl_long is appropriate)
matches <- vdjdb_hits(vdjdb, tcrs, mapply_fun = parallel::mcmapply, mc.cores = parallel::detectCores(), nthread = parallel::detectCores())
matches_append <- matches %>% dplyr::left_join(tcrs) %>% dplyr::left_join(vdjdb)
df <-
df %>%
dplyr::distinct(CDR3, CDR3_aa_cr, lv, Gene, chain, cl_name, patient, Epitope, Epitope.gene, Epitope.species, Score) %>%
tidyr::drop_na() %>%
dplyr::filter(cl_name %in% c("Madalyn", "Mikaylla", "Morgan")) %>%
dplyr::filter(lv < 3) %>%
dplyr::filter(Score > 0)
# use jitter for many hits (when beeswarms become overlapping)
ggplot(df, aes(x = Score, y = lv, color = Epitope.species)) +
ggbeeswarm::geom_beeswarm(groupOnX = F, cex = 4, size = 2.5) +
#geom_jitter(width = 0.15, height = 0.15, size = 3) +
theme_bw() +
theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), strip.background = element_rect(fill = "white"), text = element_text(family = "Courier")) +
scale_x_continuous(breaks = fcexpr::int_breaks(n = 3), limits = c(0.8,3.2)) +
scale_y_reverse(breaks = fcexpr::int_breaks(n = 3)) +
guides(fill = guide_legend(override.aes = list(size = 5)), color = guide_legend(override.aes = list(size = 5))) +
facet_grid(rows = vars(chain), cols = vars(patient, cl_name))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.