clusterize_TCR: TCR sequence clustering

View source: R/clusterize_TCR.R

clusterize_TCRR Documentation

TCR sequence clustering

Description

Find clusters of similar TCRs that are likely to recognize the same epitope.

Usage

clusterize_TCR(
  sequence_df,
  chains,
  tmp_folder,
  id_col,
  scores_filename = NA,
  threshold = NA,
  ncores = 1
)

Arguments

sequence_df

A data.frame containing TCR sequence data. Each row must describe a unique TCR sequence. Following fields are required:

  • junction_beta - amino acid sequence of CDR3 plus the two flanking conserved residues

  • v_beta, j_beta - V and J gene with or without allele; allele information is not used for score calculation.

If chains="AB" junction_alpha, v_alpha and j_alpha must be provided too.

chains

Which chains to cluster. "B" for beta chain only, "AB" for paired alpha and beta chains.

tmp_folder

Path to a directory where temporary files could be stored. They are deleted when clustering is finished.

id_col

Name of a column with unique ids for each TCR.

scores_filename

If a character string for naming a file is provided BL-scores of each TCR pair will be exported to this file. Supported formats: .Rds, .csv.

threshold

Clustering threshold (optional).

ncores

The number of cores to use for parallel computation (default = 1). Using >1 core is not supported in Windows.

Details

The default clustering thresholds were defined to optimally detect clusters of TCRs recognizing the same epitope. If instead of full junction only CDR3 sequence witout flanking residues is provided the scores will be overestimated which may lead to wrong cluster assignment.

Value

A data.frame containing same information as sequence_df plus the cluster ids. If scores_filename is provided a file with pairwise BL-scores is created.

Examples

clusters <- clusterize_TCR(example_TCR_df, chains="AB", id_col="id", tmp_folder=".", ncores=2)


obrzts/BLscore documentation built on Nov. 21, 2024, 4:28 a.m.