clones: Grouping sequences into clones

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

This function uses IMGT/HighV-QUEST output files to define B cell clones. Therefore criteria using amino acid CDR3 sequences, V genes and J genes (optional) are used. A treshold for CDR3 identity/similarity can be given. Parallel processing is possible.

Usage

1
2
3
4
5
clones(aaseqtab = NULL, summarytab = NULL, ntseqtab = NULL, identity = 0.85, 
     useJ = TRUE,dispD = FALSE, dispSeqID = FALSE, dispCDR3aa = FALSE, 
     dispCDR3nt = FALSE, dispJunctionFr.ratio = FALSE, 
     dispJunctionFr.list = FALSE, dispFunctionality.ratio = FALSE, 
     dispFunctionality.list = FALSE, dispTotalSeq = FALSE, nrCores=1)

Arguments

aaseqtab

IMGT/HighV-QUEST output, file 5_AA-sequences(...).txt

summarytab

IMGT/HighV-QUEST output, file 1_Summary(...).txt

ntseqtab

IMGT/HighV-QUEST output, file 3_Nt-sequences(...).txt (optional)

identity

Treshold of CDR3 identity. A value between 0 and 1.

useJ

Shall J genes be included into analysis? default: TRUE

dispD

Shall D genes and alleles be returned? default: FALSE

dispSeqID

Shall sequence ID's be returned? default: FALSE

dispCDR3aa

Shall amino acid CDR3 sequences be returned? default: FALSE

dispCDR3nt

Shall nucleotide amino acid sequences be returned? default: FALSE

dispJunctionFr.ratio

Shall ratios of in-frame, out-of-frame and unknown junctions be returned? default: FALSE

dispJunctionFr.list

Shall a list of all junction frames be returned? default: FALSE

dispFunctionality.ratio

Shall ratios of productive, unproductive and unknown functionality sequences be returned? default: FALSE

dispFunctionality.list

Shall a list of all functionalities be returned? default: FALSE

dispTotalSeq

Shall all total nucleotide sequences be returned? default: FALSE

nrCores

Number of cores used for parallel processing (default: 1)

Details

This function uses IMGT/HighV-QUEST output to define clones. Therefore amino acid CDR3 sequences, V genes and J genes (optional) are used. Criteria for clone groups are 1) same CDR3 length, 2) CDR3 identity of a given treshold, 3) same V gene and 4) same J gene (optional). A treshold for CDR3 identity has to be between 0 and 1. A cutoff of 0.85 means CDR3 identity of 85%. For example for a CDR3 length of 15 amino acids 85% identity would mean that at least 11 of 15 positions have to be identical (0.85*15 = 10.75; values are rounded).

useJ=T includes also the criteria of same J genes for clone defintion.

Important to know: - if useJ=T, sequences having no J information are ignored

Value

Output of clones() is a data frame containing

unique_CDR3_sequences_[AA]

unique CDR3 sequences belonging to this clone

CDR3_length_AA

CDR3 length in amino acids

number_of_unique_sequences

number of unique CDR3 sequences belonging to this clone

total_number_of_sequences

number of all sequences belonging to this clone (one sequence can appear several times)

sequence_count_per_CDR3

sequence count for each of the unique CDR3 sequences

V_gene

V gene belonging to this clone

V_gene_and_allele

original IMGT V gene nomenclature

J_gene

J gene(s) belonging to this clone (if useJ=F, there can be several J genes)

J_gene_and_allele

original IMGT J gene nomenclature

optional arguments

D_gene;_all_CDR3_sequences_AA; all_CDR3_sequences_nt; Funct_all_sequences; Funct_productive/unproductive/unknown sequences; Junction_frame_all_sequences; JF_in-frame/out-of-frame/unknown sequences; Sequence_IDs; Total_sequences_nt

Note

For large datasets computational time can be extensive.

Author(s)

Julia Bischof

See Also

clones.CDR3Length, plotClonesCDR3Length, plotClonesCopyNumber, geneUsage,

plotGeneUsage, clones.shared

Examples

1
2
3
4
5
6
7
8
## Not run: 
data(summarytab)
data(aaseqtab)

clones.tab<-clones(aaseqtab=aaseqtab,summarytab=summarytab, identity=0.85, useJ=TRUE, 
     dispCDR3aa=TRUE, dispFunctionality.ratio=TRUE, dispFunctionality.list=TRUE)

## End(Not run)

bcRep documentation built on May 2, 2019, 5:14 a.m.