hlaCompareAllele: Evaluate prediction accuracies
In HIBAG: HLA Genotype Imputation with Attribute Bagging

Description Usage Arguments Value Author(s) See Also Examples

To evaluate the overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value.

1
2
3

hlaCompareAllele(TrueHLA, PredHLA, allele.limit=NULL, call.threshold=NaN,
    match.threshold=NaN, max.resolution="", output.individual=FALSE,
    verbose=TRUE)

`TrueHLA`	an object of `hlaAlleleClass`, the true HLA types
`PredHLA`	an object of `hlaAlleleClass`, the predicted HLA types
`allele.limit`	a list of HLA alleles, the validation samples are limited to those having HLA alleles in `allele.limit`, or `NULL` for no limit. `allele.limit` could be character-type, `hlaAttrBagClass` or `hlaAttrBagObj`
`call.threshold`	the call threshold for posterior probability, i.e., call or no call is determined by whether `prob >= call.threshold` or not
`match.threshold`	the matching threshold for SNP haplotype similiarity, e.g., use 1% quantile of matching statistics of a training model
`max.resolution`	"2-digit", "4-digit", "6-digit", "8-digit", "allele", "protein", "2", "4", "6", "8", "full" or "": "allele" = "2-digit", "protein" = "4-digit", "full" and "" indicating no limit on resolution
`output.individual`	if TRUE, output accuracy for each individual
`verbose`	if TRUE, show information

Return a list(overall, confusion, detail), or list(overall, confusion, detail, individual) if output.individual=TRUE.

overall (data.frame):

`total.num.ind`	the total number of individuals
`crt.num.ind`	the number of individuals with correct HLA types
`crt.num.haplo`	the number of chromosomes with correct HLA alleles
`acc.ind`	the proportion of individuals with correctly predicted HLA types (i.e., both of alleles are correct, the accuracy of an individual is 0 or 1.)
`acc.haplo`	the proportion of chromosomes with correctly predicted HLA alleles (i.e., the accuracy of an individual is 0, 0.5 or 1, since an individual has two alleles.)
`call.threshold`	call threshold, if it is `NaN`, no call threshold is executed
`n.call`	the number of individuals with call
`call.rate`	overall call rate

confusion (matrix): a confusion matrix.

detail (data.frame):

`allele`	HLA alleles
`train.num`	the number of training haplotypes
`train.freq`	the training haplotype frequencies
`valid.num`	the number of validation haplotypes
`valid.freq`	the validation haplotype frequencies
`call.rate`	the call rates for HLA alleles
`accuracy`	allele accuracy
`sensitivity`	sensitivity
`specificity`	specificity
`ppv`	positive predictive value
`npv`	negative predictive value
`miscall`	the most likely miss-called alleles
`miscall.prop`	the proportions of the most likely miss-called allele in all miss-called alleles

individual (data.frame):

`sample.id`	sample id
`true.hla`	the true HLA type
`pred.hla`	the prediction of HLA type
`accuracy`	accuracy, 0, 0.5, or 1

Xiuwen Zheng

hlaAttrBagging, predict.hlaAttrBagClass, hlaReport

# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training"   "validation"
summary(hlatab$training)
summary(hlatab$validation)

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel=match(hlatab$training$value$sample.id,
    HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    samp.sel=match(hlatab$validation$value$sample.id,
    HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
    verbose.detail=TRUE)
summary(model)

# validation
pred <- hlaPredict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))

HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.5 (64-bit, AVX2)
[1] "training"   "validation"
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 34
# of unique HLA alleles: 14
# of unique HLA genotypes: 23
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 26
# of unique HLA alleles: 12
# of unique HLA genotypes: 14
[1] 275
Exclude 11 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
    # of SNPs randomly sampled as candidates for each selection: 17
    # of SNPs: 264
    # of samples: 34
    # of unique HLA alleles: 14
CPU flags: 64-bit, AVX2
# of threads: 1
[-] 2021-01-21 13:25:28
=== building individual classifier 1, out-of-bag (11/32.4%) ===
     1, SNP: 211, Loss: 196.4, OOB Acc: 54.55%, # of Haplo: 13
     2, SNP: 66, Loss: 173.548, OOB Acc: 63.64%, # of Haplo: 13
     3, SNP: 177, Loss: 136.352, OOB Acc: 68.18%, # of Haplo: 13
     4, SNP: 108, Loss: 95.8359, OOB Acc: 72.73%, # of Haplo: 13
     5, SNP: 127, Loss: 67.3216, OOB Acc: 77.27%, # of Haplo: 13
     6, SNP: 95, Loss: 47.5888, OOB Acc: 77.27%, # of Haplo: 13
     7, SNP: 33, Loss: 37.2631, OOB Acc: 77.27%, # of Haplo: 16
     8, SNP: 6, Loss: 29.7419, OOB Acc: 77.27%, # of Haplo: 18
     9, SNP: 208, Loss: 25.6913, OOB Acc: 77.27%, # of Haplo: 19
    10, SNP: 225, Loss: 25.3087, OOB Acc: 77.27%, # of Haplo: 21
    11, SNP: 11, Loss: 24.8356, OOB Acc: 77.27%, # of Haplo: 23
    12, SNP: 151, Loss: 19.4134, OOB Acc: 77.27%, # of Haplo: 23
    13, SNP: 199, Loss: 17.011, OOB Acc: 77.27%, # of Haplo: 23
[1] 2021-01-21 13:25:28, OOB Acc: 77.27%, # of SNPs: 13, # of Haplo: 23
=== building individual classifier 2, out-of-bag (13/38.2%) ===
     1, SNP: 160, Loss: 221.236, OOB Acc: 76.92%, # of Haplo: 17
     2, SNP: 145, Loss: 173.538, OOB Acc: 80.77%, # of Haplo: 23
     3, SNP: 177, Loss: 128.58, OOB Acc: 84.62%, # of Haplo: 31
     4, SNP: 111, Loss: 79.6877, OOB Acc: 84.62%, # of Haplo: 31
     5, SNP: 207, Loss: 52.5557, OOB Acc: 88.46%, # of Haplo: 32
     6, SNP: 245, Loss: 41.8731, OOB Acc: 88.46%, # of Haplo: 34
     7, SNP: 230, Loss: 31.7937, OOB Acc: 88.46%, # of Haplo: 38
     8, SNP: 151, Loss: 20.4566, OOB Acc: 88.46%, # of Haplo: 36
     9, SNP: 14, Loss: 19.5805, OOB Acc: 88.46%, # of Haplo: 42
    10, SNP: 132, Loss: 19.5101, OOB Acc: 88.46%, # of Haplo: 42
    11, SNP: 221, Loss: 19.485, OOB Acc: 88.46%, # of Haplo: 44
    12, SNP: 251, Loss: 18.5695, OOB Acc: 88.46%, # of Haplo: 48
[2] 2021-01-21 13:25:28, OOB Acc: 88.46%, # of SNPs: 12, # of Haplo: 48
=== building individual classifier 3, out-of-bag (14/41.2%) ===
     1, SNP: 191, Loss: 193.067, OOB Acc: 57.14%, # of Haplo: 11
     2, SNP: 264, Loss: 150.427, OOB Acc: 64.29%, # of Haplo: 12
     3, SNP: 132, Loss: 93.4067, OOB Acc: 67.86%, # of Haplo: 12
     4, SNP: 128, Loss: 39.8353, OOB Acc: 71.43%, # of Haplo: 12
     5, SNP: 160, Loss: 28.2998, OOB Acc: 75.00%, # of Haplo: 12
     6, SNP: 144, Loss: 13.635, OOB Acc: 75.00%, # of Haplo: 12
     7, SNP: 111, Loss: 6.04609, OOB Acc: 75.00%, # of Haplo: 12
     8, SNP: 40, Loss: 6.04583, OOB Acc: 82.14%, # of Haplo: 14
     9, SNP: 141, Loss: 6.04583, OOB Acc: 85.71%, # of Haplo: 14
    10, SNP: 73, Loss: 2.9038, OOB Acc: 85.71%, # of Haplo: 14
    11, SNP: 199, Loss: 2.20025, OOB Acc: 85.71%, # of Haplo: 14
[3] 2021-01-21 13:25:28, OOB Acc: 85.71%, # of SNPs: 11, # of Haplo: 14
=== building individual classifier 4, out-of-bag (10/29.4%) ===
     1, SNP: 147, Loss: 158.631, OOB Acc: 50.00%, # of Haplo: 12
     2, SNP: 152, Loss: 140.375, OOB Acc: 55.00%, # of Haplo: 13
     3, SNP: 78, Loss: 115.887, OOB Acc: 60.00%, # of Haplo: 16
     4, SNP: 115, Loss: 77.8082, OOB Acc: 60.00%, # of Haplo: 18
     5, SNP: 148, Loss: 62.6831, OOB Acc: 65.00%, # of Haplo: 18
     6, SNP: 13, Loss: 46.5657, OOB Acc: 75.00%, # of Haplo: 20
     7, SNP: 109, Loss: 31.0312, OOB Acc: 75.00%, # of Haplo: 20
     8, SNP: 176, Loss: 22.5073, OOB Acc: 75.00%, # of Haplo: 21
     9, SNP: 145, Loss: 20.9122, OOB Acc: 75.00%, # of Haplo: 21
    10, SNP: 128, Loss: 20.6728, OOB Acc: 75.00%, # of Haplo: 21
    11, SNP: 73, Loss: 14.6217, OOB Acc: 75.00%, # of Haplo: 22
    12, SNP: 151, Loss: 10.2879, OOB Acc: 75.00%, # of Haplo: 23
    13, SNP: 199, Loss: 8.74645, OOB Acc: 75.00%, # of Haplo: 23
[4] 2021-01-21 13:25:28, OOB Acc: 75.00%, # of SNPs: 13, # of Haplo: 23
Calculating matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470 
        Max.         Mean           SD 
0.5261716555 0.0476729895 0.1180875414 
Accuracy with training data: 97.06%
Out-of-bag accuracy: 81.61%
Gene: A
Training dataset: 34 samples X 264 SNPs
    # of HLA alleles: 14
    # of individual classifiers: 4
    total # of SNPs used: 38
    avg. # of SNPs in an individual classifier: 12.25
        (sd: 0.96, min: 11, max: 13, median: 12.50)
    avg. # of haplotypes in an individual classifier: 27.00
        (sd: 14.63, min: 14, max: 48, median: 23.00)
    avg. out-of-bag accuracy: 81.61%
        (sd: 6.49%, min: 75.00%, max: 88.46%, median: 81.49%)
Matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470 
        Max.         Mean           SD 
0.5261716555 0.0476729895 0.1180875414 
Genome assembly: hg19
HIBAG model:
    4 individual classifiers
    264 SNPs
    14 unique HLA alleles
Prediction:
    based on the averaged posterior probabilities
Model assembly: hg19, SNP assembly: hg19
No allelic strand orders are switched.
# of samples: 26
CPU flags: 64-bit, AVX2
# of threads: 1
Predicting (2021-01-21 13:25:28)	0%
Predicting (2021-01-21 13:25:28)	100%
$overall
  total.num.ind crt.num.ind crt.num.haplo   acc.ind acc.haplo call.threshold
1            26          23            49 0.8846154 0.9423077              0
  n.call call.rate
1     26         1

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01    12     1     0     0     0     0     0     0     0     0     0     0
  02:01     0    21     0     0     0     0     0     0     0     0     0     0
  02:06     0     0     0     0     0     0     0     0     0     0     0     0
  03:01     0     0     0     5     0     0     0     0     0     0     0     0
  11:01     0     0     0     0     2     0     0     0     0     0     0     0
  23:01     0     0     0     0     0     1     0     0     0     0     0     0
  24:02     0     0     0     0     0     1     3     0     0     0     0     0
  24:03     0     0     0     0     0     0     0     0     0     0     0     0
  25:01     0     0     0     0     0     0     0     0     1     0     0     0
  26:01     0     0     0     0     0     0     0     0     0     0     0     0
  29:02     0     0     0     0     0     0     0     0     0     0     1     0
  31:01     0     0     0     0     0     0     0     0     0     1     0     1
  32:01     0     0     0     0     0     0     0     0     0     0     0     0
  68:01     0     0     0     0     0     0     0     0     0     0     0     0
  ...       0     0     0     0     0     0     0     0     0     0     0     0
       True
Predict 32:01 68:01
  01:01     0     0
  02:01     0     0
  02:06     0     0
  03:01     0     0
  11:01     0     0
  23:01     0     0
  24:02     0     0
  24:03     0     0
  25:01     0     0
  26:01     0     0
  29:02     0     0
  31:01     0     0
  32:01     1     0
  68:01     0     1
  ...       0     0

$detail
   allele train.num train.freq valid.num valid.freq call.rate  accuracy
1   01:01        13 0.19117647        12 0.23076923         1 0.9807692
2   02:01        21 0.30882353        22 0.42307692         1 0.9807692
3   02:06         1 0.01470588         0 0.00000000         0       NaN
4   03:01         4 0.05882353         5 0.09615385         1 1.0000000
5   11:01         3 0.04411765         2 0.03846154         1 1.0000000
6   23:01         1 0.01470588         2 0.03846154         1 0.9807692
7   24:02         8 0.11764706         3 0.05769231         1 0.9807692
8   24:03         1 0.01470588         0 0.00000000         0       NaN
9   25:01         4 0.05882353         1 0.01923077         1 1.0000000
10  26:01         2 0.02941176         1 0.01923077         1 0.9807692
11  29:02         3 0.04411765         1 0.01923077         1 1.0000000
12  31:01         2 0.02941176         1 0.01923077         1 0.9807692
13  32:01         3 0.04411765         1 0.01923077         1 1.0000000
14  68:01         2 0.02941176         1 0.01923077         1 1.0000000
   sensitivity specificity       ppv       npv miscall miscall.prop
1    1.0000000   0.9750000 0.9230769 1.0000000    <NA>          NaN
2    0.9545455   1.0000000 1.0000000 0.9677419   01:01            1
3          NaN         NaN       NaN       NaN    <NA>          NaN
4    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
5    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
6    0.5000000   1.0000000 1.0000000 0.9803922   24:02            1
7    1.0000000   0.9795918 0.7500000 1.0000000    <NA>          NaN
8          NaN         NaN       NaN       NaN    <NA>          NaN
9    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
10   0.0000000   1.0000000       NaN 0.9807692   31:01            1
11   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
12   1.0000000   0.9803922 0.5000000 1.0000000    <NA>          NaN
13   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
14   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN

$overall
  total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1            26          21            42       1         1            0.5
  n.call call.rate
1     21 0.8076923

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01    12     0     0     0     0     0     0     0     0     0     0     0
  02:01     0    18     0     0     0     0     0     0     0     0     0     0
  02:06     0     0     0     0     0     0     0     0     0     0     0     0
  03:01     0     0     0     4     0     0     0     0     0     0     0     0
  11:01     0     0     0     0     2     0     0     0     0     0     0     0
  23:01     0     0     0     0     0     0     0     0     0     0     0     0
  24:02     0     0     0     0     0     0     2     0     0     0     0     0
  24:03     0     0     0     0     0     0     0     0     0     0     0     0
  25:01     0     0     0     0     0     0     0     0     0     0     0     0
  26:01     0     0     0     0     0     0     0     0     0     0     0     0
  29:02     0     0     0     0     0     0     0     0     0     0     1     0
  31:01     0     0     0     0     0     0     0     0     0     0     0     1
  32:01     0     0     0     0     0     0     0     0     0     0     0     0
  68:01     0     0     0     0     0     0     0     0     0     0     0     0
  ...       0     0     0     0     0     0     0     0     0     0     0     0
       True
Predict 32:01 68:01
  01:01     0     0
  02:01     0     0
  02:06     0     0
  03:01     0     0
  11:01     0     0
  23:01     0     0
  24:02     0     0
  24:03     0     0
  25:01     0     0
  26:01     0     0
  29:02     0     0
  31:01     0     0
  32:01     1     0
  68:01     0     1
  ...       0     0

$detail
   allele train.num train.freq valid.num valid.freq call.rate accuracy
1   01:01        13 0.19117647        12 0.23076923 1.0000000        1
2   02:01        21 0.30882353        22 0.42307692 0.8181818        1
3   02:06         1 0.01470588         0 0.00000000 0.0000000      NaN
4   03:01         4 0.05882353         5 0.09615385 0.8000000        1
5   11:01         3 0.04411765         2 0.03846154 1.0000000        1
6   23:01         1 0.01470588         2 0.03846154 0.0000000      NaN
7   24:02         8 0.11764706         3 0.05769231 0.6666667        1
8   24:03         1 0.01470588         0 0.00000000 0.0000000      NaN
9   25:01         4 0.05882353         1 0.01923077 0.0000000      NaN
10  26:01         2 0.02941176         1 0.01923077 0.0000000      NaN
11  29:02         3 0.04411765         1 0.01923077 1.0000000        1
12  31:01         2 0.02941176         1 0.01923077 1.0000000        1
13  32:01         3 0.04411765         1 0.01923077 1.0000000        1
14  68:01         2 0.02941176         1 0.01923077 1.0000000        1
   sensitivity specificity ppv npv miscall miscall.prop
1            1           1   1   1    <NA>          NaN
2            1           1   1   1    <NA>          NaN
3          NaN         NaN NaN NaN    <NA>          NaN
4            1           1   1   1    <NA>          NaN
5            1           1   1   1    <NA>          NaN
6          NaN         NaN NaN NaN    <NA>          NaN
7            1           1   1   1    <NA>          NaN
8          NaN         NaN NaN NaN    <NA>          NaN
9          NaN         NaN NaN NaN    <NA>          NaN
10         NaN         NaN NaN NaN    <NA>          NaN
11           1           1   1   1    <NA>          NaN
12           1           1   1   1    <NA>          NaN
13           1           1   1   1    <NA>          NaN
14           1           1   1   1    <NA>          NaN