yassai_identifier: yassai_identifier
In clonotypeR: High throughput analysis of T cell antigen receptor sequences

Description Usage Arguments Details Value Methods (by class) See Also Examples

TCR clonotype identifier (Yassai et al.)

yassai_identifier(data, V_after_C, J_before_FGxG, long = FALSE)

## S4 method for signature 'character,data.frame,data.frame,ANY'
yassai_identifier(data,
  V_after_C, J_before_FGxG, long = FALSE)

## S4 method for signature 'ANY,missing,missing,ANY'
yassai_identifier(data, long)

## S4 method for signature 'data.frame,data.frame,data.frame,logical'
yassai_identifier(data,
  V_after_C, J_before_FGxG, long = FALSE)

`data`	A data frame or a character vector containing a clonotype(s) with proper row or element names.
`V_after_C`	(optional) A data frame indicating the aminoacids following the conserved cystein for each V segment.
`J_before_FGxG`	(optional) A data frame indicating the aminoacids preceding the conserved FGxG motif for each V segment.
`long`	(optional) Avoids identifier collisions by displaying the codons, and indicating the position of the V–J junction in ambiguous cases.

The clonotype nomenclature defined by Yassai et al. in http://dx.doi.org/10.1007/s00251-009-0383-x.

By default, yassai_identifier() assume mouse sequences and will load the V_after_C and J_before_FGxG tables distributed in this package. It is possible to provide alternative tables either by passing them directly as argument, or by installing them as “./inst/extdata/V_after_C.txt.gz” and “./inst/extdata/J_before_FGxG.txt.gz”.

Some clonotypes have a different DNA sequence but the same identifier following the original nomenclature (see below for examples). The ‘long’ mode was created to avoid these collisions. First, it displays all codons, instead of only the non-templated ones and their immediate neighbors. Second, for the clonotypes where all codons are identical to the V or J germline sequence, it indicates the position of the V–J junction in place of the codon IDs.

The name (for instance sIRSSy.1456B19S1B27L11) consists of five segments:

CDR3 amino acid identifier (ex. sIRSSy), followed by a dot;
CDR3 nucleotide sequence identifier (ex. 1456);
variable (V) segment identifier (ex. BV19S1);
joining (J) segment identifier (ex. BJ2S7);
CDR3 length identifier (ex. L11).

data = character,V_after_C = data.frame,J_before_FGxG = data.frame,long = ANY: TCR clonotype identifier (Yassai et al.)
data = ANY,V_after_C = missing,J_before_FGxG = missing,long = ANY: TCR clonotype identifier (Yassai et al.)
data = data.frame,V_after_C = data.frame,J_before_FGxG = data.frame,long = logical: TCR clonotype identifier (Yassai et al.)

codon_ids, J_before_FGxG, V_after_C

clonotypes <- read_clonotypes(system.file('extdata', 'clonotypes.txt.gz', package = "clonotypeR"))
head(yassai_identifier(clonotypes))

# The following two clonotypes have a the same identifier, and are
# disambiguated by using the long mode

yassai_identifier(c(V="TRAV14-1", J="TRAJ43", dna="GCAGCTAATAACAACAATGCCCCACGA", pep="AANNNNAPR"))
# [1] "aAn.1A14-1A43L9"

yassai_identifier(c(V="TRAV14-1", J="TRAJ43", dna="GCAGCAGCTAACAACAATGCCCCACGA", pep="AAANNNAPR"))
# [1] "aAn.1A14-1A43L9"

yassai_identifier(c(V="TRAV14-1", J="TRAJ43", dna="GCAGCTAATAACAACAATGCCCCACGA", pep="AANNNNAPR"), long=TRUE)
# [1] "aAnnnnapr.1A14-1A43L9"

yassai_identifier(c(V="TRAV14-1", J="TRAJ43", dna="GCAGCAGCTAACAACAATGCCCCACGA", pep="AAANNNAPR"), long=TRUE)
# [1] "aaAnnnapr.1A14-1A43L9"

# The following two clonotypes would have the same identifier in long mode
# if the position of the V-J junction would not be indicated in place of the
# codon IDs.

yassai_identifier(c(V="TRAV14N-1", J="TRAJ56", dna="GCAGCTACTGGAGGCAATAATAAGCTGACT", pep="AATGGNNKLT"), long=TRUE)
# [1] "aatggnnklt.1A14N1A56L10"

yassai_identifier(c(V="TRAV14N-1", J="TRAJ56", dna="GCAGCAACTGGAGGCAATAATAAGCTGACT", pep="AATGGNNKLT"), long=TRUE)
# [1] "aatggnnklt.2A14N1A56L10"