hlaCompareAllele: Evaluate prediction accuracies

Description Usage Arguments Value Author(s) See Also Examples

View source: R/DataUtilities.R

Description

To evaluate the overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value.

Usage

1
2
3
hlaCompareAllele(TrueHLA, PredHLA, allele.limit=NULL, call.threshold=NaN,
    match.threshold=NaN, max.resolution="", output.individual=FALSE,
    verbose=TRUE)

Arguments

TrueHLA

an object of hlaAlleleClass, the true HLA types

PredHLA

an object of hlaAlleleClass, the predicted HLA types

allele.limit

a list of HLA alleles, the validation samples are limited to those having HLA alleles in allele.limit, or NULL for no limit. allele.limit could be character-type, hlaAttrBagClass or hlaAttrBagObj

call.threshold

the call threshold for posterior probability, i.e., call or no call is determined by whether prob >= call.threshold or not

match.threshold

the matching threshold for SNP haplotype similiarity, e.g., use 1% quantile of matching statistics of a training model

max.resolution

"2-digit", "4-digit", "6-digit", "8-digit", "allele", "protein", "2", "4", "6", "8", "full" or "": "allele" = "2-digit", "protein" = "4-digit", "full" and "" indicating no limit on resolution

output.individual

if TRUE, output accuracy for each individual

verbose

if TRUE, show information

Value

Return a list(overall, confusion, detail), or list(overall, confusion, detail, individual) if output.individual=TRUE.

overall (data.frame):

total.num.ind

the total number of individuals

crt.num.ind

the number of individuals with correct HLA types

crt.num.haplo

the number of chromosomes with correct HLA alleles

acc.ind

the proportion of individuals with correctly predicted HLA types (i.e., both of alleles are correct, the accuracy of an individual is 0 or 1.)

acc.haplo

the proportion of chromosomes with correctly predicted HLA alleles (i.e., the accuracy of an individual is 0, 0.5 or 1, since an individual has two alleles.)

call.threshold

call threshold, if it is NaN, no call threshold is executed

n.call

the number of individuals with call

call.rate

overall call rate

confusion (matrix): a confusion matrix.

detail (data.frame):

allele

HLA alleles

train.num

the number of training haplotypes

train.freq

the training haplotype frequencies

valid.num

the number of validation haplotypes

valid.freq

the validation haplotype frequencies

call.rate

the call rates for HLA alleles

accuracy

allele accuracy

sensitivity

sensitivity

specificity

specificity

ppv

positive predictive value

npv

negative predictive value

miscall

the most likely miss-called alleles

miscall.prop

the proportions of the most likely miss-called allele in all miss-called alleles

individual (data.frame):

sample.id

sample id

true.hla

the true HLA type

pred.hla

the prediction of HLA type

accuracy

accuracy, 0, 0.5, or 1

Author(s)

Xiuwen Zheng

See Also

hlaAttrBagging, predict.hlaAttrBagClass, hlaReport

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training"   "validation"
summary(hlatab$training)
summary(hlatab$validation)

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel=match(hlatab$training$value$sample.id,
    HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    samp.sel=match(hlatab$validation$value$sample.id,
    HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
    verbose.detail=TRUE)
summary(model)

# validation
pred <- hlaPredict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))

Example output

HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.5 (64-bit, AVX2)
[1] "training"   "validation"
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 34
# of unique HLA alleles: 14
# of unique HLA genotypes: 23
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 26
# of unique HLA alleles: 12
# of unique HLA genotypes: 14
[1] 275
Exclude 11 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
    # of SNPs randomly sampled as candidates for each selection: 17
    # of SNPs: 264
    # of samples: 34
    # of unique HLA alleles: 14
CPU flags: 64-bit, AVX2
# of threads: 1
[-] 2021-01-21 13:25:28
=== building individual classifier 1, out-of-bag (11/32.4%) ===
     1, SNP: 211, Loss: 196.4, OOB Acc: 54.55%, # of Haplo: 13
     2, SNP: 66, Loss: 173.548, OOB Acc: 63.64%, # of Haplo: 13
     3, SNP: 177, Loss: 136.352, OOB Acc: 68.18%, # of Haplo: 13
     4, SNP: 108, Loss: 95.8359, OOB Acc: 72.73%, # of Haplo: 13
     5, SNP: 127, Loss: 67.3216, OOB Acc: 77.27%, # of Haplo: 13
     6, SNP: 95, Loss: 47.5888, OOB Acc: 77.27%, # of Haplo: 13
     7, SNP: 33, Loss: 37.2631, OOB Acc: 77.27%, # of Haplo: 16
     8, SNP: 6, Loss: 29.7419, OOB Acc: 77.27%, # of Haplo: 18
     9, SNP: 208, Loss: 25.6913, OOB Acc: 77.27%, # of Haplo: 19
    10, SNP: 225, Loss: 25.3087, OOB Acc: 77.27%, # of Haplo: 21
    11, SNP: 11, Loss: 24.8356, OOB Acc: 77.27%, # of Haplo: 23
    12, SNP: 151, Loss: 19.4134, OOB Acc: 77.27%, # of Haplo: 23
    13, SNP: 199, Loss: 17.011, OOB Acc: 77.27%, # of Haplo: 23
[1] 2021-01-21 13:25:28, OOB Acc: 77.27%, # of SNPs: 13, # of Haplo: 23
=== building individual classifier 2, out-of-bag (13/38.2%) ===
     1, SNP: 160, Loss: 221.236, OOB Acc: 76.92%, # of Haplo: 17
     2, SNP: 145, Loss: 173.538, OOB Acc: 80.77%, # of Haplo: 23
     3, SNP: 177, Loss: 128.58, OOB Acc: 84.62%, # of Haplo: 31
     4, SNP: 111, Loss: 79.6877, OOB Acc: 84.62%, # of Haplo: 31
     5, SNP: 207, Loss: 52.5557, OOB Acc: 88.46%, # of Haplo: 32
     6, SNP: 245, Loss: 41.8731, OOB Acc: 88.46%, # of Haplo: 34
     7, SNP: 230, Loss: 31.7937, OOB Acc: 88.46%, # of Haplo: 38
     8, SNP: 151, Loss: 20.4566, OOB Acc: 88.46%, # of Haplo: 36
     9, SNP: 14, Loss: 19.5805, OOB Acc: 88.46%, # of Haplo: 42
    10, SNP: 132, Loss: 19.5101, OOB Acc: 88.46%, # of Haplo: 42
    11, SNP: 221, Loss: 19.485, OOB Acc: 88.46%, # of Haplo: 44
    12, SNP: 251, Loss: 18.5695, OOB Acc: 88.46%, # of Haplo: 48
[2] 2021-01-21 13:25:28, OOB Acc: 88.46%, # of SNPs: 12, # of Haplo: 48
=== building individual classifier 3, out-of-bag (14/41.2%) ===
     1, SNP: 191, Loss: 193.067, OOB Acc: 57.14%, # of Haplo: 11
     2, SNP: 264, Loss: 150.427, OOB Acc: 64.29%, # of Haplo: 12
     3, SNP: 132, Loss: 93.4067, OOB Acc: 67.86%, # of Haplo: 12
     4, SNP: 128, Loss: 39.8353, OOB Acc: 71.43%, # of Haplo: 12
     5, SNP: 160, Loss: 28.2998, OOB Acc: 75.00%, # of Haplo: 12
     6, SNP: 144, Loss: 13.635, OOB Acc: 75.00%, # of Haplo: 12
     7, SNP: 111, Loss: 6.04609, OOB Acc: 75.00%, # of Haplo: 12
     8, SNP: 40, Loss: 6.04583, OOB Acc: 82.14%, # of Haplo: 14
     9, SNP: 141, Loss: 6.04583, OOB Acc: 85.71%, # of Haplo: 14
    10, SNP: 73, Loss: 2.9038, OOB Acc: 85.71%, # of Haplo: 14
    11, SNP: 199, Loss: 2.20025, OOB Acc: 85.71%, # of Haplo: 14
[3] 2021-01-21 13:25:28, OOB Acc: 85.71%, # of SNPs: 11, # of Haplo: 14
=== building individual classifier 4, out-of-bag (10/29.4%) ===
     1, SNP: 147, Loss: 158.631, OOB Acc: 50.00%, # of Haplo: 12
     2, SNP: 152, Loss: 140.375, OOB Acc: 55.00%, # of Haplo: 13
     3, SNP: 78, Loss: 115.887, OOB Acc: 60.00%, # of Haplo: 16
     4, SNP: 115, Loss: 77.8082, OOB Acc: 60.00%, # of Haplo: 18
     5, SNP: 148, Loss: 62.6831, OOB Acc: 65.00%, # of Haplo: 18
     6, SNP: 13, Loss: 46.5657, OOB Acc: 75.00%, # of Haplo: 20
     7, SNP: 109, Loss: 31.0312, OOB Acc: 75.00%, # of Haplo: 20
     8, SNP: 176, Loss: 22.5073, OOB Acc: 75.00%, # of Haplo: 21
     9, SNP: 145, Loss: 20.9122, OOB Acc: 75.00%, # of Haplo: 21
    10, SNP: 128, Loss: 20.6728, OOB Acc: 75.00%, # of Haplo: 21
    11, SNP: 73, Loss: 14.6217, OOB Acc: 75.00%, # of Haplo: 22
    12, SNP: 151, Loss: 10.2879, OOB Acc: 75.00%, # of Haplo: 23
    13, SNP: 199, Loss: 8.74645, OOB Acc: 75.00%, # of Haplo: 23
[4] 2021-01-21 13:25:28, OOB Acc: 75.00%, # of SNPs: 13, # of Haplo: 23
Calculating matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470 
        Max.         Mean           SD 
0.5261716555 0.0476729895 0.1180875414 
Accuracy with training data: 97.06%
Out-of-bag accuracy: 81.61%
Gene: A
Training dataset: 34 samples X 264 SNPs
    # of HLA alleles: 14
    # of individual classifiers: 4
    total # of SNPs used: 38
    avg. # of SNPs in an individual classifier: 12.25
        (sd: 0.96, min: 11, max: 13, median: 12.50)
    avg. # of haplotypes in an individual classifier: 27.00
        (sd: 14.63, min: 14, max: 48, median: 23.00)
    avg. out-of-bag accuracy: 81.61%
        (sd: 6.49%, min: 75.00%, max: 88.46%, median: 81.49%)
Matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470 
        Max.         Mean           SD 
0.5261716555 0.0476729895 0.1180875414 
Genome assembly: hg19
HIBAG model:
    4 individual classifiers
    264 SNPs
    14 unique HLA alleles
Prediction:
    based on the averaged posterior probabilities
Model assembly: hg19, SNP assembly: hg19
No allelic strand orders are switched.
# of samples: 26
CPU flags: 64-bit, AVX2
# of threads: 1
Predicting (2021-01-21 13:25:28)	0%
Predicting (2021-01-21 13:25:28)	100%
$overall
  total.num.ind crt.num.ind crt.num.haplo   acc.ind acc.haplo call.threshold
1            26          23            49 0.8846154 0.9423077              0
  n.call call.rate
1     26         1

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01    12     1     0     0     0     0     0     0     0     0     0     0
  02:01     0    21     0     0     0     0     0     0     0     0     0     0
  02:06     0     0     0     0     0     0     0     0     0     0     0     0
  03:01     0     0     0     5     0     0     0     0     0     0     0     0
  11:01     0     0     0     0     2     0     0     0     0     0     0     0
  23:01     0     0     0     0     0     1     0     0     0     0     0     0
  24:02     0     0     0     0     0     1     3     0     0     0     0     0
  24:03     0     0     0     0     0     0     0     0     0     0     0     0
  25:01     0     0     0     0     0     0     0     0     1     0     0     0
  26:01     0     0     0     0     0     0     0     0     0     0     0     0
  29:02     0     0     0     0     0     0     0     0     0     0     1     0
  31:01     0     0     0     0     0     0     0     0     0     1     0     1
  32:01     0     0     0     0     0     0     0     0     0     0     0     0
  68:01     0     0     0     0     0     0     0     0     0     0     0     0
  ...       0     0     0     0     0     0     0     0     0     0     0     0
       True
Predict 32:01 68:01
  01:01     0     0
  02:01     0     0
  02:06     0     0
  03:01     0     0
  11:01     0     0
  23:01     0     0
  24:02     0     0
  24:03     0     0
  25:01     0     0
  26:01     0     0
  29:02     0     0
  31:01     0     0
  32:01     1     0
  68:01     0     1
  ...       0     0

$detail
   allele train.num train.freq valid.num valid.freq call.rate  accuracy
1   01:01        13 0.19117647        12 0.23076923         1 0.9807692
2   02:01        21 0.30882353        22 0.42307692         1 0.9807692
3   02:06         1 0.01470588         0 0.00000000         0       NaN
4   03:01         4 0.05882353         5 0.09615385         1 1.0000000
5   11:01         3 0.04411765         2 0.03846154         1 1.0000000
6   23:01         1 0.01470588         2 0.03846154         1 0.9807692
7   24:02         8 0.11764706         3 0.05769231         1 0.9807692
8   24:03         1 0.01470588         0 0.00000000         0       NaN
9   25:01         4 0.05882353         1 0.01923077         1 1.0000000
10  26:01         2 0.02941176         1 0.01923077         1 0.9807692
11  29:02         3 0.04411765         1 0.01923077         1 1.0000000
12  31:01         2 0.02941176         1 0.01923077         1 0.9807692
13  32:01         3 0.04411765         1 0.01923077         1 1.0000000
14  68:01         2 0.02941176         1 0.01923077         1 1.0000000
   sensitivity specificity       ppv       npv miscall miscall.prop
1    1.0000000   0.9750000 0.9230769 1.0000000    <NA>          NaN
2    0.9545455   1.0000000 1.0000000 0.9677419   01:01            1
3          NaN         NaN       NaN       NaN    <NA>          NaN
4    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
5    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
6    0.5000000   1.0000000 1.0000000 0.9803922   24:02            1
7    1.0000000   0.9795918 0.7500000 1.0000000    <NA>          NaN
8          NaN         NaN       NaN       NaN    <NA>          NaN
9    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
10   0.0000000   1.0000000       NaN 0.9807692   31:01            1
11   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
12   1.0000000   0.9803922 0.5000000 1.0000000    <NA>          NaN
13   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
14   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN

$overall
  total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1            26          21            42       1         1            0.5
  n.call call.rate
1     21 0.8076923

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01    12     0     0     0     0     0     0     0     0     0     0     0
  02:01     0    18     0     0     0     0     0     0     0     0     0     0
  02:06     0     0     0     0     0     0     0     0     0     0     0     0
  03:01     0     0     0     4     0     0     0     0     0     0     0     0
  11:01     0     0     0     0     2     0     0     0     0     0     0     0
  23:01     0     0     0     0     0     0     0     0     0     0     0     0
  24:02     0     0     0     0     0     0     2     0     0     0     0     0
  24:03     0     0     0     0     0     0     0     0     0     0     0     0
  25:01     0     0     0     0     0     0     0     0     0     0     0     0
  26:01     0     0     0     0     0     0     0     0     0     0     0     0
  29:02     0     0     0     0     0     0     0     0     0     0     1     0
  31:01     0     0     0     0     0     0     0     0     0     0     0     1
  32:01     0     0     0     0     0     0     0     0     0     0     0     0
  68:01     0     0     0     0     0     0     0     0     0     0     0     0
  ...       0     0     0     0     0     0     0     0     0     0     0     0
       True
Predict 32:01 68:01
  01:01     0     0
  02:01     0     0
  02:06     0     0
  03:01     0     0
  11:01     0     0
  23:01     0     0
  24:02     0     0
  24:03     0     0
  25:01     0     0
  26:01     0     0
  29:02     0     0
  31:01     0     0
  32:01     1     0
  68:01     0     1
  ...       0     0

$detail
   allele train.num train.freq valid.num valid.freq call.rate accuracy
1   01:01        13 0.19117647        12 0.23076923 1.0000000        1
2   02:01        21 0.30882353        22 0.42307692 0.8181818        1
3   02:06         1 0.01470588         0 0.00000000 0.0000000      NaN
4   03:01         4 0.05882353         5 0.09615385 0.8000000        1
5   11:01         3 0.04411765         2 0.03846154 1.0000000        1
6   23:01         1 0.01470588         2 0.03846154 0.0000000      NaN
7   24:02         8 0.11764706         3 0.05769231 0.6666667        1
8   24:03         1 0.01470588         0 0.00000000 0.0000000      NaN
9   25:01         4 0.05882353         1 0.01923077 0.0000000      NaN
10  26:01         2 0.02941176         1 0.01923077 0.0000000      NaN
11  29:02         3 0.04411765         1 0.01923077 1.0000000        1
12  31:01         2 0.02941176         1 0.01923077 1.0000000        1
13  32:01         3 0.04411765         1 0.01923077 1.0000000        1
14  68:01         2 0.02941176         1 0.01923077 1.0000000        1
   sensitivity specificity ppv npv miscall miscall.prop
1            1           1   1   1    <NA>          NaN
2            1           1   1   1    <NA>          NaN
3          NaN         NaN NaN NaN    <NA>          NaN
4            1           1   1   1    <NA>          NaN
5            1           1   1   1    <NA>          NaN
6          NaN         NaN NaN NaN    <NA>          NaN
7            1           1   1   1    <NA>          NaN
8          NaN         NaN NaN NaN    <NA>          NaN
9          NaN         NaN NaN NaN    <NA>          NaN
10         NaN         NaN NaN NaN    <NA>          NaN
11           1           1   1   1    <NA>          NaN
12           1           1   1   1    <NA>          NaN
13           1           1   1   1    <NA>          NaN
14           1           1   1   1    <NA>          NaN

HIBAG documentation built on March 24, 2021, 6 p.m.