Description Usage Arguments Details Value Author(s) See Also Examples
To predict HLA type based on a HIBAG model (in parallel).
1 2 3 4 5 6 7 8 9 | hlaPredict(object, snp, cl=FALSE,
type=c("response", "prob", "response+prob"), vote=c("prob", "majority"),
allele.check=TRUE, match.type=c("Position", "RefSNP+Position", "RefSNP"),
same.strand=FALSE, verbose=TRUE)
## S3 method for class 'hlaAttrBagClass'
predict(object, snp, cl=FALSE,
type=c("response", "prob", "response+prob"), vote=c("prob", "majority"),
allele.check=TRUE, match.type=c("Position", "RefSNP+Position", "RefSNP"),
same.strand=FALSE, verbose=TRUE, ...)
|
object |
a model of |
snp |
a genotypic object of |
cl |
|
type |
"response": return the best-guess type plus its posterior probability; "prob": return all posterior probabilities; "response+prob": return the best-guess and all posterior probabilities |
vote |
|
allele.check |
if |
match.type |
|
same.strand |
|
verbose |
if TRUE, show information |
... |
unused |
If more than 50% of SNP predictors are missing, a warning will be given.
When match.type="RefSNP+Position"
, the matching of SNPs requires
both SNP IDs and positions. A lower missing fraction maybe gained by
matching SNP IDs or positions only. Call
hlaPredict(..., match.type="RefSNP")
or
hlaPredict(..., match.type="Position")
for this purpose.
It could be safe to assume that the SNPs with the same positions on the same
genome reference (e.g., hg19) are the same variant albeit the different SNP
IDs. Any concern about SNP mismatching should be emailed to the genotyping
platform provider.
Return a hlaAlleleClass
object with posterior probabilities
of predicted HLA types, or a matrix of pairwise possible HLA types with all
posterior probabilities. If type = "response+prob"
, return a
hlaAlleleClass
object with a matrix of postprob
for the
probabilities of all pairs of alleles.
If a probability matrix is returned, colnames
is sample.id
and rownames
is an unordered pair of HLA alleles.
Xiuwen Zheng
hlaAttrBagging
, hlaAllele
,
hlaCompareAllele
, hlaParallelAttrBagging
,
hlaSetKernelTarget
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
locus=hla.id, assembly="hg19")
# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training" "validation"
summary(hlatab$training)
summary(hlatab$validation)
# SNP predictors within the flanking region on each side
region <- 500 # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
hla.id, region*1000, assembly="hg19")
length(snpid) # 275
# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
samp.sel=match(hlatab$training$value$sample.id,
HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
samp.sel=match(hlatab$validation$value$sample.id,
HapMap_CEU_Geno$sample.id))
# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
verbose.detail=TRUE)
summary(model)
# validation
pred <- hlaPredict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
call.threshold=0.5))
|
HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.5 (64-bit, AVX2)
[1] "training" "validation"
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 34
# of unique HLA alleles: 14
# of unique HLA genotypes: 23
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 26
# of unique HLA alleles: 12
# of unique HLA genotypes: 14
[1] 275
Exclude 11 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
# of SNPs randomly sampled as candidates for each selection: 17
# of SNPs: 264
# of samples: 34
# of unique HLA alleles: 14
CPU flags: 64-bit, AVX2
# of threads: 1
[-] 2021-01-21 13:25:28
=== building individual classifier 1, out-of-bag (11/32.4%) ===
1, SNP: 211, Loss: 196.4, OOB Acc: 54.55%, # of Haplo: 13
2, SNP: 66, Loss: 173.548, OOB Acc: 63.64%, # of Haplo: 13
3, SNP: 177, Loss: 136.352, OOB Acc: 68.18%, # of Haplo: 13
4, SNP: 108, Loss: 95.8359, OOB Acc: 72.73%, # of Haplo: 13
5, SNP: 127, Loss: 67.3216, OOB Acc: 77.27%, # of Haplo: 13
6, SNP: 95, Loss: 47.5888, OOB Acc: 77.27%, # of Haplo: 13
7, SNP: 33, Loss: 37.2631, OOB Acc: 77.27%, # of Haplo: 16
8, SNP: 6, Loss: 29.7419, OOB Acc: 77.27%, # of Haplo: 18
9, SNP: 208, Loss: 25.6913, OOB Acc: 77.27%, # of Haplo: 19
10, SNP: 225, Loss: 25.3087, OOB Acc: 77.27%, # of Haplo: 21
11, SNP: 11, Loss: 24.8356, OOB Acc: 77.27%, # of Haplo: 23
12, SNP: 151, Loss: 19.4134, OOB Acc: 77.27%, # of Haplo: 23
13, SNP: 199, Loss: 17.011, OOB Acc: 77.27%, # of Haplo: 23
[1] 2021-01-21 13:25:28, OOB Acc: 77.27%, # of SNPs: 13, # of Haplo: 23
=== building individual classifier 2, out-of-bag (13/38.2%) ===
1, SNP: 160, Loss: 221.236, OOB Acc: 76.92%, # of Haplo: 17
2, SNP: 145, Loss: 173.538, OOB Acc: 80.77%, # of Haplo: 23
3, SNP: 177, Loss: 128.58, OOB Acc: 84.62%, # of Haplo: 31
4, SNP: 111, Loss: 79.6877, OOB Acc: 84.62%, # of Haplo: 31
5, SNP: 207, Loss: 52.5557, OOB Acc: 88.46%, # of Haplo: 32
6, SNP: 245, Loss: 41.8731, OOB Acc: 88.46%, # of Haplo: 34
7, SNP: 230, Loss: 31.7937, OOB Acc: 88.46%, # of Haplo: 38
8, SNP: 151, Loss: 20.4566, OOB Acc: 88.46%, # of Haplo: 36
9, SNP: 14, Loss: 19.5805, OOB Acc: 88.46%, # of Haplo: 42
10, SNP: 132, Loss: 19.5101, OOB Acc: 88.46%, # of Haplo: 42
11, SNP: 221, Loss: 19.485, OOB Acc: 88.46%, # of Haplo: 44
12, SNP: 251, Loss: 18.5695, OOB Acc: 88.46%, # of Haplo: 48
[2] 2021-01-21 13:25:28, OOB Acc: 88.46%, # of SNPs: 12, # of Haplo: 48
=== building individual classifier 3, out-of-bag (14/41.2%) ===
1, SNP: 191, Loss: 193.067, OOB Acc: 57.14%, # of Haplo: 11
2, SNP: 264, Loss: 150.427, OOB Acc: 64.29%, # of Haplo: 12
3, SNP: 132, Loss: 93.4067, OOB Acc: 67.86%, # of Haplo: 12
4, SNP: 128, Loss: 39.8353, OOB Acc: 71.43%, # of Haplo: 12
5, SNP: 160, Loss: 28.2998, OOB Acc: 75.00%, # of Haplo: 12
6, SNP: 144, Loss: 13.635, OOB Acc: 75.00%, # of Haplo: 12
7, SNP: 111, Loss: 6.04609, OOB Acc: 75.00%, # of Haplo: 12
8, SNP: 40, Loss: 6.04583, OOB Acc: 82.14%, # of Haplo: 14
9, SNP: 141, Loss: 6.04583, OOB Acc: 85.71%, # of Haplo: 14
10, SNP: 73, Loss: 2.9038, OOB Acc: 85.71%, # of Haplo: 14
11, SNP: 199, Loss: 2.20025, OOB Acc: 85.71%, # of Haplo: 14
[3] 2021-01-21 13:25:28, OOB Acc: 85.71%, # of SNPs: 11, # of Haplo: 14
=== building individual classifier 4, out-of-bag (10/29.4%) ===
1, SNP: 147, Loss: 158.631, OOB Acc: 50.00%, # of Haplo: 12
2, SNP: 152, Loss: 140.375, OOB Acc: 55.00%, # of Haplo: 13
3, SNP: 78, Loss: 115.887, OOB Acc: 60.00%, # of Haplo: 16
4, SNP: 115, Loss: 77.8082, OOB Acc: 60.00%, # of Haplo: 18
5, SNP: 148, Loss: 62.6831, OOB Acc: 65.00%, # of Haplo: 18
6, SNP: 13, Loss: 46.5657, OOB Acc: 75.00%, # of Haplo: 20
7, SNP: 109, Loss: 31.0312, OOB Acc: 75.00%, # of Haplo: 20
8, SNP: 176, Loss: 22.5073, OOB Acc: 75.00%, # of Haplo: 21
9, SNP: 145, Loss: 20.9122, OOB Acc: 75.00%, # of Haplo: 21
10, SNP: 128, Loss: 20.6728, OOB Acc: 75.00%, # of Haplo: 21
11, SNP: 73, Loss: 14.6217, OOB Acc: 75.00%, # of Haplo: 22
12, SNP: 151, Loss: 10.2879, OOB Acc: 75.00%, # of Haplo: 23
13, SNP: 199, Loss: 8.74645, OOB Acc: 75.00%, # of Haplo: 23
[4] 2021-01-21 13:25:28, OOB Acc: 75.00%, # of SNPs: 13, # of Haplo: 23
Calculating matching proportion:
Min. 0.1% Qu. 1% Qu. 1st Qu. Median 3rd Qu.
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470
Max. Mean SD
0.5261716555 0.0476729895 0.1180875414
Accuracy with training data: 97.06%
Out-of-bag accuracy: 81.61%
Gene: A
Training dataset: 34 samples X 264 SNPs
# of HLA alleles: 14
# of individual classifiers: 4
total # of SNPs used: 38
avg. # of SNPs in an individual classifier: 12.25
(sd: 0.96, min: 11, max: 13, median: 12.50)
avg. # of haplotypes in an individual classifier: 27.00
(sd: 14.63, min: 14, max: 48, median: 23.00)
avg. out-of-bag accuracy: 81.61%
(sd: 6.49%, min: 75.00%, max: 88.46%, median: 81.49%)
Matching proportion:
Min. 0.1% Qu. 1% Qu. 1st Qu. Median 3rd Qu.
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470
Max. Mean SD
0.5261716555 0.0476729895 0.1180875414
Genome assembly: hg19
HIBAG model:
4 individual classifiers
264 SNPs
14 unique HLA alleles
Prediction:
based on the averaged posterior probabilities
Model assembly: hg19, SNP assembly: hg19
No allelic strand orders are switched.
# of samples: 26
CPU flags: 64-bit, AVX2
# of threads: 1
Predicting (2021-01-21 13:25:28) 0%
Predicting (2021-01-21 13:25:28) 100%
$overall
total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1 26 23 49 0.8846154 0.9423077 0
n.call call.rate
1 26 1
$confusion
True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
01:01 12 1 0 0 0 0 0 0 0 0 0 0
02:01 0 21 0 0 0 0 0 0 0 0 0 0
02:06 0 0 0 0 0 0 0 0 0 0 0 0
03:01 0 0 0 5 0 0 0 0 0 0 0 0
11:01 0 0 0 0 2 0 0 0 0 0 0 0
23:01 0 0 0 0 0 1 0 0 0 0 0 0
24:02 0 0 0 0 0 1 3 0 0 0 0 0
24:03 0 0 0 0 0 0 0 0 0 0 0 0
25:01 0 0 0 0 0 0 0 0 1 0 0 0
26:01 0 0 0 0 0 0 0 0 0 0 0 0
29:02 0 0 0 0 0 0 0 0 0 0 1 0
31:01 0 0 0 0 0 0 0 0 0 1 0 1
32:01 0 0 0 0 0 0 0 0 0 0 0 0
68:01 0 0 0 0 0 0 0 0 0 0 0 0
... 0 0 0 0 0 0 0 0 0 0 0 0
True
Predict 32:01 68:01
01:01 0 0
02:01 0 0
02:06 0 0
03:01 0 0
11:01 0 0
23:01 0 0
24:02 0 0
24:03 0 0
25:01 0 0
26:01 0 0
29:02 0 0
31:01 0 0
32:01 1 0
68:01 0 1
... 0 0
$detail
allele train.num train.freq valid.num valid.freq call.rate accuracy
1 01:01 13 0.19117647 12 0.23076923 1 0.9807692
2 02:01 21 0.30882353 22 0.42307692 1 0.9807692
3 02:06 1 0.01470588 0 0.00000000 0 NaN
4 03:01 4 0.05882353 5 0.09615385 1 1.0000000
5 11:01 3 0.04411765 2 0.03846154 1 1.0000000
6 23:01 1 0.01470588 2 0.03846154 1 0.9807692
7 24:02 8 0.11764706 3 0.05769231 1 0.9807692
8 24:03 1 0.01470588 0 0.00000000 0 NaN
9 25:01 4 0.05882353 1 0.01923077 1 1.0000000
10 26:01 2 0.02941176 1 0.01923077 1 0.9807692
11 29:02 3 0.04411765 1 0.01923077 1 1.0000000
12 31:01 2 0.02941176 1 0.01923077 1 0.9807692
13 32:01 3 0.04411765 1 0.01923077 1 1.0000000
14 68:01 2 0.02941176 1 0.01923077 1 1.0000000
sensitivity specificity ppv npv miscall miscall.prop
1 1.0000000 0.9750000 0.9230769 1.0000000 <NA> NaN
2 0.9545455 1.0000000 1.0000000 0.9677419 01:01 1
3 NaN NaN NaN NaN <NA> NaN
4 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
5 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
6 0.5000000 1.0000000 1.0000000 0.9803922 24:02 1
7 1.0000000 0.9795918 0.7500000 1.0000000 <NA> NaN
8 NaN NaN NaN NaN <NA> NaN
9 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
10 0.0000000 1.0000000 NaN 0.9807692 31:01 1
11 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
12 1.0000000 0.9803922 0.5000000 1.0000000 <NA> NaN
13 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
14 1.0000000 1.0000000 1.0000000 1.0000000 <NA> NaN
$overall
total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1 26 21 42 1 1 0.5
n.call call.rate
1 21 0.8076923
$confusion
True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
01:01 12 0 0 0 0 0 0 0 0 0 0 0
02:01 0 18 0 0 0 0 0 0 0 0 0 0
02:06 0 0 0 0 0 0 0 0 0 0 0 0
03:01 0 0 0 4 0 0 0 0 0 0 0 0
11:01 0 0 0 0 2 0 0 0 0 0 0 0
23:01 0 0 0 0 0 0 0 0 0 0 0 0
24:02 0 0 0 0 0 0 2 0 0 0 0 0
24:03 0 0 0 0 0 0 0 0 0 0 0 0
25:01 0 0 0 0 0 0 0 0 0 0 0 0
26:01 0 0 0 0 0 0 0 0 0 0 0 0
29:02 0 0 0 0 0 0 0 0 0 0 1 0
31:01 0 0 0 0 0 0 0 0 0 0 0 1
32:01 0 0 0 0 0 0 0 0 0 0 0 0
68:01 0 0 0 0 0 0 0 0 0 0 0 0
... 0 0 0 0 0 0 0 0 0 0 0 0
True
Predict 32:01 68:01
01:01 0 0
02:01 0 0
02:06 0 0
03:01 0 0
11:01 0 0
23:01 0 0
24:02 0 0
24:03 0 0
25:01 0 0
26:01 0 0
29:02 0 0
31:01 0 0
32:01 1 0
68:01 0 1
... 0 0
$detail
allele train.num train.freq valid.num valid.freq call.rate accuracy
1 01:01 13 0.19117647 12 0.23076923 1.0000000 1
2 02:01 21 0.30882353 22 0.42307692 0.8181818 1
3 02:06 1 0.01470588 0 0.00000000 0.0000000 NaN
4 03:01 4 0.05882353 5 0.09615385 0.8000000 1
5 11:01 3 0.04411765 2 0.03846154 1.0000000 1
6 23:01 1 0.01470588 2 0.03846154 0.0000000 NaN
7 24:02 8 0.11764706 3 0.05769231 0.6666667 1
8 24:03 1 0.01470588 0 0.00000000 0.0000000 NaN
9 25:01 4 0.05882353 1 0.01923077 0.0000000 NaN
10 26:01 2 0.02941176 1 0.01923077 0.0000000 NaN
11 29:02 3 0.04411765 1 0.01923077 1.0000000 1
12 31:01 2 0.02941176 1 0.01923077 1.0000000 1
13 32:01 3 0.04411765 1 0.01923077 1.0000000 1
14 68:01 2 0.02941176 1 0.01923077 1.0000000 1
sensitivity specificity ppv npv miscall miscall.prop
1 1 1 1 1 <NA> NaN
2 1 1 1 1 <NA> NaN
3 NaN NaN NaN NaN <NA> NaN
4 1 1 1 1 <NA> NaN
5 1 1 1 1 <NA> NaN
6 NaN NaN NaN NaN <NA> NaN
7 1 1 1 1 <NA> NaN
8 NaN NaN NaN NaN <NA> NaN
9 NaN NaN NaN NaN <NA> NaN
10 NaN NaN NaN NaN <NA> NaN
11 1 1 1 1 <NA> NaN
12 1 1 1 1 <NA> NaN
13 1 1 1 1 <NA> NaN
14 1 1 1 1 <NA> NaN
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.