knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of SPANTCR is to analyze TCR datasets.
You can install the development version of SPANTCR from GitHub with:
# install.packages("devtools") devtools::install_github("alexandermxu/SPANTCR")
SPAN-TCR analyzes TCR input data generated by software such as MiXCR (https://mixcr.readthedocs.io/en/master/index.html).
Sample data in the object VDJDBCMVPairedInput is obtained from https://vdjdb.cdr3.net/.
A CDR3 sequence can be broken down into k-mers.
SPAN-TCR extracts the k-mers contributing to the normalized position of amino acids within CDR3 sequences
Prepare data.
Create an object AminoAcidFilter which contains relevant amino acid information
Define a weighting function
Generate CDR3Breakdown objects using SPANTCR
k-mers are ordered by their hydrophobicity
Analyze distance between CDR3s using ComparisonFunction
Analyze entropy using EntropyScan
Each CDR3Breakdown object contains metadata, the ordering of elements, and the matrix used to rank k-mers
The clone data is broken down into the subclass "CDR3Breakdown@CloneData"
library(SPANTCR) SPANTCR_VDJDBCMVPaired <- SPANTCR(VDJDBCMVPairedInput, "VDJDBCMV", "Paired", "Hydrophobicity", 100, 2, 0.03, ScoreFunctionExponential5, WeightFunctionLinear) head(SPANTCR_VDJDBCMVPaired@CloneData[,c(1:6,11,12,15,18)])
The TCR data is processed into the subclass "CDR3Breakdown@Output"
Each row is a single clone in a single bin
head(SPANTCR_VDJDBCMVPaired@Output)
SPANTCR plots can be generated using ggplot2
Ordering output by the same AAOperation will stack bar plots appropriately
The levels of factor (Window) will match the AAOperation order
library(ggplot2) ggplot(SPANTCR_VDJDBCMVPaired@Output[order(Window)])+ geom_bar(aes(x=Tick, y=WeightProbability, fill=SignificantColor), stat="identity", width=1/100)+ theme(legend.position="none")+ scale_fill_gradientn(colors=c(alpha("#4D4D4D",0.01),"red","white","blue"), values=c(0,0.01,0.50,1))+ coord_fixed()
The distance between sets of CDR3s can be measured using ComparisonFunction
Here we compare Levenshtein distance between VDJDBCMV Paired CDR3s and their nearest neighbor in the set
CMVSelfDistance <- ComparisonFunction(SPANTCR_VDJDBCMVPaired, SPANTCR_VDJDBCMVPaired, 100, FineBinTicksPaired, BLOSUMCappedDiff, F) CMVSelfDistance[, SecondChain := sapply(Top5Chain, function(x) x[2])] CMVSelfDistance[, SecondChainScore := sapply(Top5CompScore, function(x) x[2])] CMVSelfDistance[, SecondChainLev := diag(adist(SecondChain, BaseCDR3))] ggplot(CMVSelfDistance)+ geom_point(aes(x=SecondChainLev, y=SecondChainScore))+ coord_fixed()
Entropy analysis is performed across ranges of CDR3 using EntropyScan, or in specific windows using SearchIterator
EntropyScan is a wrapper for SearchIterator
Entropy across the entire range of k-mers and positions is summarized in the first list element of the output
SearchBoxesPaired20 <- mapply(c, seq(0,1.9,length.out=20), seq(0.1,2,length.out=20), SIMPLIFY=F) VDJDBCMV_Entropy <- EntropyScan(SPANTCR_VDJDBCMVPaired, SearchBoxesPaired20, 0.03, FineBinTicksPaired) VDJDBCMV_Entropy[[1]]$Source <- factor(VDJDBCMV_Entropy[[1]]$Source, levels=c("Base", sort(unique(VDJDBCMV_Entropy[[1]]$Source)[-1]))) ggplot(VDJDBCMV_Entropy[[1]][Count>20])+ geom_tile(aes(x=Range, y=Source, fill=DeltaAverage))+ scale_x_discrete(breaks=c("0-0.1","1-1.1","1.9-2"), labels=c(0,1,2))+ scale_fill_gradient2(low="blue", mid="white", high="red", midpoint = 1)+ theme_dark()
Individual CDR3 entropy lines for specific sets of TCRS can be plotted using data from the second list element of the output
ggplot(VDJDBCMV_Entropy[[2]][N>10 & Range=="0.5-0.6"])+ geom_line(aes(x=Location, y=Entropy, color=Source))+ theme_bw()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.