Quick_Start
In rTCRBCRr: Repertoire Analysis of the Detected Clonotype

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The goal of rTCRBCRr is to process the results from clonotyping tools such as trust, mixcr, and immunoseq to analyze the clonotype repertoire metrics

Installation

The package is accepted by the CRAN, you can install the released version of rTCRBCRr from CRAN with:

install.packages("rTCRBCRr")

You can also install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("sciencepeak/rTCRBCRr")

Example code

Attach packages

library("rTCRBCRr")
library("magrittr")
library("readr")

Read raw data files (trust generated for example) into a list of data frames

present_tool <- c("trust", "mixcr")[1]
example_data_directory <- system.file(paste("extdata", present_tool, sep = "/"), package = "rTCRBCRr")
input_paths <- dir(example_data_directory, full.names = TRUE)
input_files <- dir(example_data_directory, full.names = FALSE)
input_files

sample_names <- sub(".tsv.*", "", input_files)
sample_names

raw_clonotype_dataframe_list <- lapply(input_paths, readr::read_tsv) %>%
    magrittr::set_names(., value = sample_names)
raw_clonotype_dataframe_list

Tidy up the clonotype dataframes

The tidy-up consists of four steps, namely four functions:

format_clonotype_to_immunarch_style
remove_nonproductive_CDR3aa
annotate_chain_name_and_isotype_name
merge_convergent_clonotype

# If you only want to test one sample, you can process the only sample as follows.
the_divergent_clonotype_dataframe <- raw_clonotype_dataframe_list[["sample_01"]] %>%
    format_clonotype_to_immunarch_style(., clonotyping_tool = present_tool) %>%
    remove_nonproductive_CDR3aa %>%
    annotate_chain_name_and_isotype_name %>%
    merge_convergent_clonotype

# Then the only one sample should be put into a list, element of which uses the sample name,
# because the later step need a named list of data frames as input.
divergent_clonotype_dataframe_list <- list(sample_01 = the_divergent_clonotype_dataframe)

# Otherwise, normally you will have multiple samples,
# then functional style of processing is preferred as follows.
divergent_clonotype_dataframe_list <- raw_clonotype_dataframe_list %>%
    lapply(., format_clonotype_to_immunarch_style, clonotyping_tool = present_tool) %>%
    lapply(., remove_nonproductive_CDR3aa) %>%
    lapply(., annotate_chain_name_and_isotype_name) %>%
    lapply(., merge_convergent_clonotype)

Calculate and merge repertoire metrics by chains for each sample in the list

This step consists of three functions.

# handle repertoire metrics for all the chains.
all_sample_all_chain_all_metrics_wide_format_dataframe_list <- the_divergent_clonotype_dataframe_list %>%
    lapply(., compute_repertoire_metrics_by_chain_name)

all_sample_all_chain_all_metrics_wide_format_dataframe_list

all_sample_all_chain_all_metrics_wide_format_dataframe <- all_sample_all_chain_all_metrics_wide_format_dataframe_list %>%
    combine_all_sample_repertoire_metrics

all_sample_all_chain_all_metrics_wide_format_dataframe

all_sample_all_chain_individual_metrics_dataframe_list <- all_sample_all_chain_all_metrics_wide_format_dataframe %>%
    get_item_name_x_sample_name_for_each_metric

all_sample_all_chain_individual_metrics_dataframe_list

Calculate and merge repertoire metrics by IGH isotypes for each sample in the list

This step consists of three functions.

# handle repertoire metrics all all the isotypes of IGH chain.
all_sample_IGH_chain_all_metrics_wide_format_dataframe_list <- the_divergent_clonotype_dataframe_list %>%
    lapply(., calculate_IGH_isotype_proportion)

all_sample_IGH_chain_all_metrics_wide_format_dataframe_list

all_sample_IGH_chain_all_metrics_wide_format_dataframe <- all_sample_IGH_chain_all_metrics_wide_format_dataframe_list %>%
    combine_all_sample_repertoire_metrics

all_sample_IGH_chain_all_metrics_wide_format_dataframe

all_sample_IGH_chain_individual_metrics_dataframe_list <- all_sample_IGH_chain_all_metrics_wide_format_dataframe %>%
    get_item_name_x_sample_name_for_each_metric

all_sample_IGH_chain_individual_metrics_dataframe_list

Clonotype repertoire metrics formulas

The repertoire metrics formula including richness, diversity (Shannon entropy), evenness (Pielou's eveness), clonality, and median (frequency median) were defined as follows, where $p_i$ is the frequency of ${\rm clonotype}_i$ in a sample with $N$ unique clonotypes (Khunger, Rytlewski et al. 2019, Looney, Topacio-Hall et al. 2020). $P$ is the frequency vector of unique clonotypes in a sample.

$$ richness\ =\ N $$

$$ Shannon\ entropy=-\sum_{i=1}^{N}{p_i\log_2{\left(p_i\right)}} $$

$$ Pielou\prime s\ eveness\ =\ \frac{Shannon\ entropy}{\log_2{N}} $$

$$ clonality\ =\ 1\ -\ Pielou\prime s\ evenness $$

$$ frequency\ median\ =\ median(P) $$

The function calculate_repertoire_metrics is essential to implement the repertoire metrics formulas

calculate_repertoire_metrics

Acknowledgements

The hexagon logo of the package was created with the help of the package hexSticker. The math formula was written with the help of recognition tool MyScript. The latex formula in markdown was inspired by rmd4sci. The code in this study was inspired by the UCSB R tutorial note, LymphoSeq script, and vegan package.