knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

SPANTCR

Alex Xu

alex.m.xu@gmail.com

The goal of SPANTCR is to analyze TCR datasets.

Installation

You can install the development version of SPANTCR from GitHub with:

# install.packages("devtools")
devtools::install_github("alexandermxu/SPANTCR")

Example

SPAN-TCR analyzes TCR input data generated by software such as MiXCR (https://mixcr.readthedocs.io/en/master/index.html).
Sample data in the object VDJDBCMVPairedInput is obtained from https://vdjdb.cdr3.net/.

A CDR3 sequence can be broken down into k-mers.
SPAN-TCR extracts the k-mers contributing to the normalized position of amino acids within CDR3 sequences

Basic workflow:

  1. Prepare data.

  2. Create an object AminoAcidFilter which contains relevant amino acid information

  3. Organize TCR data with columns CDR3|gene|Vgene|Jgene|score|id
    • CDR3 - amino acid sequence (from MixCR)
    • gene - TRA or TRB
    • Vgene/Jgene - TRAV1-2, TRBJ2-2, etc.
    • score - a score describing the quality of the TCR
    • id - an integer used to track paired CDR3s
  4. Define a scoring function
    • The score column is often taken from an external data source
    • Define a scoring function to scale the score appropriately
    • This is up to your discretion
  5. Define a weighting function

    • A k-mer has its maximum binding character likely at the center of its position
    • The weighting function determines how the influence of a k-mer increases or decreases from its center
  6. Generate CDR3Breakdown objects using SPANTCR

  7. The majority of k-mers are rare and likely have minimal impact on binding
  8. Set a significance cutoff to determine which k-mers to visualize
  9. This example uses 5^x as the score function, a linear weighting as the weight function, and has a cutoff of 0.03
  10. k-mers of length 2 are analyzed and 100 bins are used
  11. k-mers are ordered by their hydrophobicity

  12. Analyze distance between CDR3s using ComparisonFunction

  13. Analyze entropy using EntropyScan

Each CDR3Breakdown object contains metadata, the ordering of elements, and the matrix used to rank k-mers
The clone data is broken down into the subclass "CDR3Breakdown@CloneData"

library(SPANTCR)

SPANTCR_VDJDBCMVPaired <- SPANTCR(VDJDBCMVPairedInput, "VDJDBCMV", "Paired", "Hydrophobicity", 100, 2, 0.03, ScoreFunctionExponential5, WeightFunctionLinear)

head(SPANTCR_VDJDBCMVPaired@CloneData[,c(1:6,11,12,15,18)])

The TCR data is processed into the subclass "CDR3Breakdown@Output"
Each row is a single clone in a single bin

head(SPANTCR_VDJDBCMVPaired@Output)

SPANTCR plots can be generated using ggplot2
Ordering output by the same AAOperation will stack bar plots appropriately
The levels of factor (Window) will match the AAOperation order

library(ggplot2)

ggplot(SPANTCR_VDJDBCMVPaired@Output[order(Window)])+
  geom_bar(aes(x=Tick, y=WeightProbability, fill=SignificantColor), stat="identity", width=1/100)+
  theme(legend.position="none")+
  scale_fill_gradientn(colors=c(alpha("#4D4D4D",0.01),"red","white","blue"), values=c(0,0.01,0.50,1))+
  coord_fixed()

The distance between sets of CDR3s can be measured using ComparisonFunction
Here we compare Levenshtein distance between VDJDBCMV Paired CDR3s and their nearest neighbor in the set

CMVSelfDistance <- ComparisonFunction(SPANTCR_VDJDBCMVPaired, SPANTCR_VDJDBCMVPaired, 100, FineBinTicksPaired, BLOSUMCappedDiff, F)

CMVSelfDistance[, SecondChain := sapply(Top5Chain, function(x) x[2])]
CMVSelfDistance[, SecondChainScore := sapply(Top5CompScore, function(x) x[2])]
CMVSelfDistance[, SecondChainLev := diag(adist(SecondChain, BaseCDR3))]

ggplot(CMVSelfDistance)+
  geom_point(aes(x=SecondChainLev, y=SecondChainScore))+
  coord_fixed()

Entropy analysis is performed across ranges of CDR3 using EntropyScan, or in specific windows using SearchIterator
EntropyScan is a wrapper for SearchIterator
Entropy across the entire range of k-mers and positions is summarized in the first list element of the output

SearchBoxesPaired20 <- mapply(c, seq(0,1.9,length.out=20), seq(0.1,2,length.out=20), SIMPLIFY=F)
VDJDBCMV_Entropy <- EntropyScan(SPANTCR_VDJDBCMVPaired, SearchBoxesPaired20, 0.03, FineBinTicksPaired)

VDJDBCMV_Entropy[[1]]$Source <- factor(VDJDBCMV_Entropy[[1]]$Source, levels=c("Base", sort(unique(VDJDBCMV_Entropy[[1]]$Source)[-1])))

ggplot(VDJDBCMV_Entropy[[1]][Count>20])+
  geom_tile(aes(x=Range, y=Source, fill=DeltaAverage))+
  scale_x_discrete(breaks=c("0-0.1","1-1.1","1.9-2"), labels=c(0,1,2))+
  scale_fill_gradient2(low="blue", mid="white", high="red", midpoint = 1)+
  theme_dark()

Individual CDR3 entropy lines for specific sets of TCRS can be plotted using data from the second list element of the output

ggplot(VDJDBCMV_Entropy[[2]][N>10 & Range=="0.5-0.6"])+
  geom_line(aes(x=Location, y=Entropy, color=Source))+
  theme_bw()


alexandermxu/SPANTCR documentation built on Dec. 19, 2021, 12:30 a.m.