clonality: Clonality
In LymphoSeq: Analyze high-throughput sequencing of T and B cell receptors

Description Usage Arguments Details Value See Also Examples

Creates a data frame giving the total number of sequences, number of unique productive sequences, number of genomes, entropy, clonality, Gini coefficient, and the frequency (%) of the top productive sequences in a list of sample data frames.

1	clonality(file.list)

file.list

A list of data frames consisting of antigen receptor sequencing imported by the LymphoSeq function readImmunoSeq. "aminoAcid", "count", and "frequencyCount" are required columns. "estimatedNumberGenomes" is optional. Note that clonality is usually calculated from productive nucleotide sequences. Therefore, it is not recommended to run this function using a productive sequence list aggregated by amino acids.

Clonality is derived from the Shannon entropy, which is calculated from the frequencies of all productive sequences divided by the logarithm of the total number of unique productive sequences. This normalized entropy value is then inverted (1 - normalized entropy) to produce the clonality metric.

The Gini coefficient is an alternative metric used to calculate repertoire diversity and is derived from the Lorenz curve. The Lorenz curve is drawn such that x-axis represents the cumulative percentage of unique sequences and the y-axis represents the cumulative percentage of reads. A line passing through the origin with a slope of 1 reflects equal frequencies of all clones. The Gini coefficient is the ratio of the area between the line of equality and the observed Lorenz curve over the total area under the line of equality. Both Gini coefficient and clonality are reported on a scale from 0 to 1 where 0 indicates all sequences have the same frequency and 1 indicates the repertoire is dominated by a single sequence.

Returns a data frame giving the total number of sequences, number of unique productive sequences, number of genomes, clonality, Gini coefficient, and the frequency (%) of the top productive sequence in each sample.

lorenzCurve

file.path <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")

file.list <- readImmunoSeq(path = file.path)

clonality(file.list = file.list)

Loading required package: LymphoSeqDB
sh: 1: wc: Permission denied
sh: 1: cannot create /dev/null: Permission denied
Could not detect number of cores, defaulting to 1.

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%
             samples totalSequences uniqueProductiveSequences totalCount
1        TRB_CD4_949           1000                       845      25769
2   TRB_Unsorted_369           1000                       830     339413
3    TRB_Unsorted_83           1000                       823     236732
4        TRB_CD8_949           1000                       794      26239
5    TRB_CD8_CMV_369            414                       281       1794
6  TRB_Unsorted_1320           1000                       838     178190
7  TRB_Unsorted_1496           1000                       832      33669
8   TRB_Unsorted_949           1000                       831       6549
9     TRB_Unsorted_0           1000                       838      18161
10   TRB_Unsorted_32            920                       767      31078
   clonality giniCoefficient topProductiveSequence totalGenomes
1   0.442719       0.8665242             30.091732        25769
2   0.425965       0.8447387             29.720171           NA
3   0.338114       0.7766277             23.645843           NA
4   0.430615       0.9026124             19.346779        26239
5   0.331570       0.7606261             16.487936         1794
6   0.421630       0.9016617             14.579022       178190
7   0.389318       0.8812733             14.248338        33669
8   0.305784       0.7654438             13.837321         6549
9   0.280923       0.8184686              5.773283        18161
10  0.134242       0.6007820              4.865016           NA