Description Usage Arguments Value Author(s) See Also Examples
View source: R/DataUtilities.R
To evaluate the overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value.
1 2 3 |
TrueHLA |
an object of |
PredHLA |
an object of |
allele.limit |
a list of HLA alleles, the validation samples are
limited to those having HLA alleles in |
call.threshold |
the call threshold for posterior probability, i.e.,
call or no call is determined by whether |
match.threshold |
the matching threshold for SNP haplotype similiarity, e.g., use 1% quantile of matching statistics of a training model |
max.resolution |
"2-digit", "4-digit", "6-digit", "8-digit", "allele", "protein", "2", "4", "6", "8", "full" or "": "allele" = "2-digit", "protein" = "4-digit", "full" and "" indicating no limit on resolution |
output.individual |
if TRUE, output accuracy for each individual |
verbose |
if TRUE, show information |
Return a list(overall, confusion, detail)
, or
list(overall, confusion, detail, individual)
if
output.individual=TRUE
.
overall
(data.frame):
total.num.ind |
the total number of individuals |
crt.num.ind |
the number of individuals with correct HLA types |
crt.num.haplo |
the number of chromosomes with correct HLA alleles |
acc.ind |
the proportion of individuals with correctly predicted HLA types (i.e., both of alleles are correct, the accuracy of an individual is 0 or 1.) |
acc.haplo |
the proportion of chromosomes with correctly predicted HLA alleles (i.e., the accuracy of an individual is 0, 0.5 or 1, since an individual has two alleles.) |
call.threshold |
call threshold, if it is |
n.call |
the number of individuals with call |
call.rate |
overall call rate |
confusion
(matrix): a confusion matrix.
detail
(data.frame):
allele |
HLA alleles |
train.num |
the number of training haplotypes |
train.freq |
the training haplotype frequencies |
valid.num |
the number of validation haplotypes |
valid.freq |
the validation haplotype frequencies |
call.rate |
the call rates for HLA alleles |
accuracy |
allele accuracy |
sensitivity |
sensitivity |
specificity |
specificity |
ppv |
positive predictive value |
npv |
negative predictive value |
miscall |
the most likely miss-called alleles |
miscall.prop |
the proportions of the most likely miss-called allele in all miss-called alleles |
individual
(data.frame):
sample.id |
sample id |
true.hla |
the true HLA type |
pred.hla |
the prediction of HLA type |
accuracy |
accuracy, 0, 0.5, or 1 |
Xiuwen Zheng
hlaAttrBagging
, predict.hlaAttrBagClass
,
hlaReport
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
locus=hla.id, assembly="hg19")
# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training" "validation"
summary(hlatab$training)
summary(hlatab$validation)
# SNP predictors within the flanking region on each side
region <- 500 # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
hla.id, region*1000, assembly="hg19")
length(snpid) # 275
# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
samp.sel=match(hlatab$training$value$sample.id,
HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
samp.sel=match(hlatab$validation$value$sample.id,
HapMap_CEU_Geno$sample.id))
# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
verbose.detail=TRUE)
summary(model)
# validation
pred <- hlaPredict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
call.threshold=0.5))
|
HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.5 (64-bit, AVX2)
[1] "training" "validation"
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 34
# of unique HLA alleles: 14
# of unique HLA genotypes: 23
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 26
# of unique HLA alleles: 12
# of unique HLA genotypes: 14
[1] 275
Exclude 11 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
# of SNPs randomly sampled as candidates for each selection: 17
# of SNPs: 264
# of samples: 34
# of unique HLA alleles: 14
CPU flags: 64-bit, AVX2
# of threads: 1
[-] 2021-01-21 13:25:28
=== building individual classifier 1, out-of-bag (11/32.4%) ===
1, SNP: 211, Loss: 196.4, OOB Acc: 54.55%, # of Haplo: 13
2, SNP: 66, Loss: 173.548, OOB Acc: 63.64%, # of Haplo: 13
3, SNP: 177, Loss: 136.352, OOB Acc: 68.18%, # of Haplo: 13
4, SNP: 108, Loss: 95.8359, OOB Acc: 72.73%, # of Haplo: 13
5, SNP: 127, Loss: 67.3216, OOB Acc: 77.27%, # of Haplo: 13
6, SNP: 95, Loss: 47.5888, OOB Acc: 77.27%, # of Haplo: 13
7, SNP: 33, Loss: 37.2631, OOB Acc: 77.27%, # of Haplo: 16
8, SNP: 6, Loss: 29.7419, OOB Acc: 77.27%, # of Haplo: 18
9, SNP: 208, Loss: 25.6913, OOB Acc: 77.27%, # of Haplo: 19
10, SNP: 225, Loss: 25.3087, OOB Acc: 77.27%, # of Haplo: 21
11, SNP: 11, Loss: 24.8356, OOB Acc: 77.27%, # of Haplo: 23
12, SNP: 151, Loss: 19.4134, OOB Acc: 77.27%, # of Haplo: 23
13, SNP: 199, Loss: 17.011, OOB Acc: 77.27%, # of Haplo: 23
[1] 2021-01-21 13:25:28, OOB Acc: 77.27%, # of SNPs: 13, # of Haplo: 23
=== building individual classifier 2, out-of-bag (13/38.2%) ===
1, SNP: 160, Loss: 221.236, OOB Acc: 76.92%, # of Haplo: 17
2, SNP: 145, Loss: 173.538, OOB Acc: 80.77%, # of Haplo: 23
3, SNP: 177, Loss: 128.58, OOB Acc: 84.62%, # of Haplo: 31
4, SNP: 111, Loss: 79.6877, OOB Acc: 84.62%, # of Haplo: 31
5, SNP: 207, Loss: 52.5557, OOB Acc: 88.46%, # of Haplo: 32
6, SNP: 245, Loss: 41.8731, OOB Acc: 88.46%, # of Haplo: 34
7, SNP: 230, Loss: 31.7937, OOB Acc: 88.46%, # of Haplo: 38
8, SNP: 151, Loss: 20.4566, OOB Acc: 88.46%, # of Haplo: 36
9, SNP: 14, Loss: 19.5805, OOB Acc: 88.46%, # of Haplo: 42
10, SNP: 132, Loss: 19.5101, OOB Acc: 88.46%, # of Haplo: 42
11, SNP: 221, Loss: 19.485, OOB Acc: 88.46%, # of Haplo: 44
12, SNP: 251, Loss: 18.5695, OOB Acc: 88.46%, # of Haplo: 48
[2] 2021-01-21 13:25:28, OOB Acc: 88.46%, # of SNPs: 12, # of Haplo: 48
=== building individual classifier 3, out-of-bag (14/41.2%) ===
1, SNP: 191, Loss: 193.067, OOB Acc: 57.14%, # of Haplo: 11
2, SNP: 264, Loss: 150.427, OOB Acc: 64.29%, # of Haplo: 12
3, SNP: 132, Loss: 93.4067, OOB Acc: 67.86%, # of Haplo: 12
4, SNP: 128, Loss: 39.8353, OOB Acc: 71.43%, # of Haplo: 12
5, SNP: 160, Loss: 28.2998, OOB Acc: 75.00%, # of Haplo: 12
6, SNP: 144, Loss: 13.635, OOB Acc: 75.00%, # of Haplo: 12
7, SNP: 111, Loss: 6.04609, OOB Acc: 75.00%, # of Haplo: 12
8, SNP: 40, Loss: 6.04583, OOB Acc: 82.14%, # of Haplo: 14
9, SNP: 141, Loss: 6.04583, OOB Acc: 85.71%, # of Haplo: 14
10, SNP: 73, Loss: 2.9038, OOB Acc: 85.71%, # of Haplo: 14
11, SNP: 199, Loss: 2.20025, OOB Acc: 85.71%, # of Haplo: 14
[3] 2021-01-21 13:25:28, OOB Acc: 85.71%, # of SNPs: 11, # of Haplo: 14
=== building individual classifier 4, out-of-bag (10/29.4%) ===
1, SNP: 147, Loss: 158.631, OOB Acc: 50.00%, # of Haplo: 12
2, SNP: 152, Loss: 140.375, OOB Acc: 55.00%, # of Haplo: 13
3, SNP: 78, Loss: 115.887, OOB Acc: 60.00%, # of Haplo: 16
4, SNP: 115, Loss: 77.8082, OOB Acc: 60.00%, # of Haplo: 18
5, SNP: 148, Loss: 62.6831, OOB Acc: 65.00%, # of Haplo: 18
6, SNP: 13, Loss: 46.5657, OOB Acc: 75.00%, # of Haplo: 20
7, SNP: 109, Loss: 31.0312, OOB Acc: 75.00%, # of Haplo: 20
8, SNP: 176, Loss: 22.5073, OOB Acc: 75.00%, # of Haplo: 21
9, SNP: 145, Loss: 20.9122, OOB Acc: 75.00%, # of Haplo: 21
10, SNP: 128, Loss: 20.6728, OOB Acc: 75.00%, # of Haplo: 21
11, SNP: 73, Loss: 14.6217, OOB Acc: 75.00%, # of Haplo: 22
12, SNP: 151, Loss: 10.2879, OOB Acc: 75.00%, # of Haplo: 23
13, SNP: 199, Loss: 8.74645, OOB Acc: 75.00%, # of Haplo: 23
[4] 2021-01-21 13:25:28, OOB Acc: 75.00%, # of SNPs: 13, # of Haplo: 23
Calculating matching proportion:
Min. 0.1% Qu. 1% Qu. 1st Qu. Median 3rd Qu.
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470
Max. Mean SD
0.5261716555 0.0476729895 0.1180875414
Accuracy with training data: 97.06%
Out-of-bag accuracy: 81.61%
Gene: A
Training dataset: 34 samples X 264 SNPs
# of HLA alleles: 14
# of individual classifiers: 4
total # of SNPs used: 38
avg. # of SNPs in an individual classifier: 12.25
(sd: 0.96, min: 11, max: 13, median: 12.50)
avg. # of haplotypes in an individual classifier: 27.00
(sd: 14.63, min: 14, max: 48, median: 23.00)
avg. out-of-bag accuracy: 81.61%
(sd: 6.49%, min: 75.00%, max: 88.46%, median: 81.49%)
Matching proportion:
Min. 0.1% Qu. 1% Qu. 1st Qu. Median 3rd Qu.
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470
Max. Mean SD
0.5261716555 0.0476729895 0.1180875414
Genome assembly: hg19
HIBAG model:
4 individual classifiers
264 SNPs
14 unique HLA alleles
Prediction:
based on the averaged posterior probabilities
Model assembly: hg19, SNP assembly: hg19
No allelic strand orders are switched.
# of samples: 26
CPU flags: 64-bit, AVX2
# of threads: 1
Predicting (2021-01-21 13:25:28) 0%
Predicting (2021-01-21 13:25:28) 100%
$overall
total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1 26 23 49 0.8846154 0.9423077 0
n.call call.rate
1 26 1
$confusion
True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
01:01 12 1 0 0 0 0 0 0 0 0 0 0
02:01 0 21 0 0 0 0 0 0 0 0 0 0
02:06 0 0 0 0 0 0 0 0 0 0 0 0
03:01 0 0 0 5 0 0 0 0 0 0 0 0
11:01 0 0 0 0 2 0 0 0 0 0 0 0
23:01 0 0 0 0 0 1 0 0 0 0 0 0
24:02 0 0 0 0 0 1 3 0 0 0 0 0
24:03 0 0 0 0 0 0 0 0 0 0 0 0
25:01 0 0 0 0 0 0 0 0 1 0 0 0
26:01 0 0 0 0 0 0 0 0 0 0 0 0
29:02 0 0 0 0 0 0 0 0 0 0 1 0
31:01 0 0 0 0 0 0 0 0 0 1 0 1
32:01 0 0 0 0 0 0 0 0 0 0 0 0
68:01 0 0 0 0 0 0 0 0 0 0 0 0
... 0 0 0 0 0 0 0 0 0 0 0 0
True
Predict 32:01 68:01
01:01 0 0
02:01 0 0
02:06 0 0
03:01 0 0
11:01 0 0
23:01 0 0
24:02 0 0
24:03 0 0
25:01 0 0
26:01 0 0
29:02 0 0
31:01 0 0
32:01 1 0
68:01 0 1
... 0 0
$detail
allele train.num train.freq valid.num valid.freq call.rate accuracy
1 01:01 13 0.19117647 12 0.23076923 1 0.9807692
2 02:01 21 0.30882353 22 0.42307692 1 0.9807692
3 02:06 1 0.01470588 0 0.00000000 0 NaN
4 03:01 4 0.05882353 5 0.09615385 1 1.0000000
5 11:01 3 0.04411765 2 0.03846154 1 1.0000000
6 23:01 1 0.01470588 2 0.03846154 1 0.9807692
7 24:02 8 0.11764706 3 0.05769231 1 0.9807692
8 24:03 1 0.01470588 0 0.00000000 0 NaN
9 25:01 4 0.05882353 1 0.01923077 1 1.0000000
10 26:01 2 0.02941176 1 0.01923077 1 0.9807692
11 29:02 3 0.04411765 1 0.01923077 1 1.0000000
12 31:01 2 0.02941176 1 0.01923077 1 0.9807692
13 32:01 3 0.04411765 1 0.01923077 1 1.0000000
14 68:01 2 0.02941176 1 0.01923077 1 1.0000000
sensitivity specificity ppv npv miscall miscall.prop
1 1.0000000 0.9750000 0.9230769 1.0000000 <NA> NaN
2 0.9545455 1.0000000 1.0000000 0.9677419 01:01 1
3 NaN NaN NaN NaN <NA> NaN
4 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
5 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
6 0.5000000 1.0000000 1.0000000 0.9803922 24:02 1
7 1.0000000 0.9795918 0.7500000 1.0000000 <NA> NaN
8 NaN NaN NaN NaN <NA> NaN
9 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
10 0.0000000 1.0000000 NaN 0.9807692 31:01 1
11 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
12 1.0000000 0.9803922 0.5000000 1.0000000 <NA> NaN
13 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
14 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
$overall
total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1 26 21 42 1 1 0.5
n.call call.rate
1 21 0.8076923
$confusion
True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
01:01 12 0 0 0 0 0 0 0 0 0 0 0
02:01 0 18 0 0 0 0 0 0 0 0 0 0
02:06 0 0 0 0 0 0 0 0 0 0 0 0
03:01 0 0 0 4 0 0 0 0 0 0 0 0
11:01 0 0 0 0 2 0 0 0 0 0 0 0
23:01 0 0 0 0 0 0 0 0 0 0 0 0
24:02 0 0 0 0 0 0 2 0 0 0 0 0
24:03 0 0 0 0 0 0 0 0 0 0 0 0
25:01 0 0 0 0 0 0 0 0 0 0 0 0
26:01 0 0 0 0 0 0 0 0 0 0 0 0
29:02 0 0 0 0 0 0 0 0 0 0 1 0
31:01 0 0 0 0 0 0 0 0 0 0 0 1
32:01 0 0 0 0 0 0 0 0 0 0 0 0
68:01 0 0 0 0 0 0 0 0 0 0 0 0
... 0 0 0 0 0 0 0 0 0 0 0 0
True
Predict 32:01 68:01
01:01 0 0
02:01 0 0
02:06 0 0
03:01 0 0
11:01 0 0
23:01 0 0
24:02 0 0
24:03 0 0
25:01 0 0
26:01 0 0
29:02 0 0
31:01 0 0
32:01 1 0
68:01 0 1
... 0 0
$detail
allele train.num train.freq valid.num valid.freq call.rate accuracy
1 01:01 13 0.19117647 12 0.23076923 1.0000000 1
2 02:01 21 0.30882353 22 0.42307692 0.8181818 1
3 02:06 1 0.01470588 0 0.00000000 0.0000000 NaN
4 03:01 4 0.05882353 5 0.09615385 0.8000000 1
5 11:01 3 0.04411765 2 0.03846154 1.0000000 1
6 23:01 1 0.01470588 2 0.03846154 0.0000000 NaN
7 24:02 8 0.11764706 3 0.05769231 0.6666667 1
8 24:03 1 0.01470588 0 0.00000000 0.0000000 NaN
9 25:01 4 0.05882353 1 0.01923077 0.0000000 NaN
10 26:01 2 0.02941176 1 0.01923077 0.0000000 NaN
11 29:02 3 0.04411765 1 0.01923077 1.0000000 1
12 31:01 2 0.02941176 1 0.01923077 1.0000000 1
13 32:01 3 0.04411765 1 0.01923077 1.0000000 1
14 68:01 2 0.02941176 1 0.01923077 1.0000000 1
sensitivity specificity ppv npv miscall miscall.prop
1 1 1 1 1 <NA> NaN
2 1 1 1 1 <NA> NaN
3 NaN NaN NaN NaN <NA> NaN
4 1 1 1 1 <NA> NaN
5 1 1 1 1 <NA> NaN
6 NaN NaN NaN NaN <NA> NaN
7 1 1 1 1 <NA> NaN
8 NaN NaN NaN NaN <NA> NaN
9 NaN NaN NaN NaN <NA> NaN
10 NaN NaN NaN NaN <NA> NaN
11 1 1 1 1 <NA> NaN
12 1 1 1 1 <NA> NaN
13 1 1 1 1 <NA> NaN
14 1 1 1 1 <NA> NaN
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.