This readme covers TCR analysis with the tcrSeqR package at a high level.
The example.R script contains a more complete example.
An R package for analyzing TCR sequencing data from Adaptive Biotechnology's ImmunoSeq platform. It is primarily a set of tools for importing the data into R, but it also includes a number of scripts that can be used as examples for analyzing complex experiments.
The data is imported into a single tcr object, containing all of the samples and associated metadata.
the tcr class extends SummarizedExperiment and has a number of custom
methods. The data is represented as a table with rows representing
unique T cell clones and columns representing samples. When clones are not shared between samples,
the table is completed by adding zeros. This results in a larger than necessary table, but saves
time by not having to join samples on the fly.
Metadata for the samples, as well as metrics summarizing them are stored in the
colData slot of the tcr object.
scripts/example.R contains a full example using some small sample files.
Data is imported using iseqr_merge(), which takes a path to the directory containing Adaptive
tsv files as its argument.
path <- '/data/ex_tsv/' # change to location of example files
setwd(path)
all_files <- list.files(pattern=".tsv")
# construct the dataset with iseqr_merge
ds <- iseqr_merge(all_files)
tcrSeqR can help make plots of various metrics, but it needs to be aware of the metadata
associated with each sample. This is accomplished with a dictionary, which is simply a
data.frame in which each row corresponds to a column in the dataset. A simple example might
look like
fn | patient | type | response ---|---------|------|---------- samplept1post| 1| Post| R samplept1pre| 1| Pre| R samplept1tumor| 1| Tumor| R samplept2post| 2| Post| NR samplept2pre| 2| Pre| NR samplept2tumor| 2| Tumor| NR
In this example, fn refers to the original filename of the tsv (which is brought in as the
column name in the dataset), patient is a patient/subject number, type is a sample type (in
this case pre and post treatment as well as tumor) and response indicates if a patient was a
responder or non-responder.
tcr objectNext, the data is combined with any metadata available to create the tcr
object. To match the samples to their metadata, the metadata must contain a
column called fn which matches the filenames of the sample tsv file
(specifically, it must match the colnames of the ds object at this stage.
R does not like column names that begin with numerics, so the word 'sample' is
prepended to the file name and underscores and dashes are removed. The best way
to make the fn column in the dictionary is to create the dataset first using
iseqr_merge (see above) and then copy the colnames(ds) from that object.
With this metadata loaded as a data.frame, the tcr object can be constructed
using
#load an example dictionary
dict <- readRDS('dict.Rds')
#make tcr object
ds <- iseqr_make_tcr(ds,dict)
Many nucleic acid sequences can encode the same CDR3, so many analyses may require aggregated data (data in which synonymous nucleotide sequences are combined into a single amino acid level representation). Metrics such as Clonality and Richness should typically be computed from aggregated data. This step also removes any sequences with stop codons, or sequences with no translation.
An imported dataset can be aggregated using iseqr_aggregate()
# aggregate the data
# this collapses synonymous nucleotide sequences
ds_agg <- iseqr_aggregate(ds,inc_nt=FALSE)
A variety of metrics are available in tcrSeqR, including clonality, richness, and morisita.
These functions can, if given an input vector, calculate the statistic of
interest. If given a tcr object, however, they will calculate the metric for
all samples and (optionally) merge the results back into the metadata.
# Clonality for one sample
clonality(assay(ds)[,1])
#Clonality for all samples
clonality(ds, merge=F)
#Clonality for all samples. merged back into the metadata
clonality(ds, merge=T)
Clonality, Richness and Total Sequences all work in this manner, using
clonality, richness and total, respectively.
Once these metrics are calculated, it is often of interest to calculate the
their change following treatment. To accomplish this, the delta_stats function
can be used in a similar manner to the metric functions described above. Because
it compares samples of different type, delta_stats requires a list of
comparisons to be made. Changes are calculated relative to the first item in the
list, and multiple comparisons are possible, however restrictions on how the
data is stored prevent duplicates in the second slot (i.e. 'Pre' vs 'Post1' and
'Pre' vs 'Post2' can coexist, however 'Pre' vs 'Post2' and 'Post1' vs 'Post2'
can not because 'Post2' is duplicated in the second position).
# make a list of comparisons
comps <- list(c('PRE','POST1'), c('PRE','POST3'))
#return the change in clonality
delta_stats(ds,comps,'Clonality', merge=F)
# calculate the change in Richness and merge it into the tcr object
ds <- delta_stats(ds,comps,'Richness')
Morisita Index and Expanded Clones operate on a similar principle, see
example.R for a more detailed look at these.
Once you have created a valid tcr object, you can use the example shiny web
app in /shiny to view comparisons in a web browser
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.