hlaPredict: HIBAG model prediction (in parallel)

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/HIBAG.R

Description

To predict HLA type based on a HIBAG model (in parallel).

Usage

1
2
3
4
5
6
7
8
9
hlaPredict(object, snp, cl=FALSE,
    type=c("response", "prob", "response+prob"), vote=c("prob", "majority"),
    allele.check=TRUE, match.type=c("Position", "RefSNP+Position", "RefSNP"),
    same.strand=FALSE, verbose=TRUE)
## S3 method for class 'hlaAttrBagClass'
predict(object, snp, cl=FALSE,
    type=c("response", "prob", "response+prob"), vote=c("prob", "majority"),
    allele.check=TRUE, match.type=c("Position", "RefSNP+Position", "RefSNP"),
    same.strand=FALSE, verbose=TRUE, ...)

Arguments

object

a model of hlaAttrBagClass

snp

a genotypic object of hlaSNPGenoClass

cl

FALSE, TRUE, an integer, or a cluster object created by the parallel-package; if FALSE, use the serial implementation; if TRUE, use the number of threads returned from RcppParallel::defaultNumThreads() (by default using all threads); if an integer, specify the number of threads

type

"response": return the best-guess type plus its posterior probability; "prob": return all posterior probabilities; "response+prob": return the best-guess and all posterior probabilities

vote

"prob" (default behavior) – make a prediction based on the averaged posterior probabilities from all individual classifiers; "majority" – majority voting from all individual classifiers, where each classifier votes for an HLA type

allele.check

if TRUE, check and then switch allele pairs if needed

match.type

"Position" – use positions only (by default); "RefSNP+Position" – use both of SNP IDs and positions; "RefSNP" – using SNP IDs only

same.strand

TRUE assuming alleles are on the same strand (e.g., forward strand); otherwise, FALSE not assuming whether on the same strand or not

verbose

if TRUE, show information

...

unused

Details

If more than 50% of SNP predictors are missing, a warning will be given.

When match.type="RefSNP+Position", the matching of SNPs requires both SNP IDs and positions. A lower missing fraction maybe gained by matching SNP IDs or positions only. Call hlaPredict(..., match.type="RefSNP") or hlaPredict(..., match.type="Position") for this purpose. It could be safe to assume that the SNPs with the same positions on the same genome reference (e.g., hg19) are the same variant albeit the different SNP IDs. Any concern about SNP mismatching should be emailed to the genotyping platform provider.

Value

Return a hlaAlleleClass object with posterior probabilities of predicted HLA types, or a matrix of pairwise possible HLA types with all posterior probabilities. If type = "response+prob", return a hlaAlleleClass object with a matrix of postprob for the probabilities of all pairs of alleles. If a probability matrix is returned, colnames is sample.id and rownames is an unordered pair of HLA alleles.

Author(s)

Xiuwen Zheng

See Also

hlaAttrBagging, hlaAllele, hlaCompareAllele, hlaParallelAttrBagging, hlaSetKernelTarget

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training"   "validation"
summary(hlatab$training)
summary(hlatab$validation)

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel=match(hlatab$training$value$sample.id,
    HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    samp.sel=match(hlatab$validation$value$sample.id,
    HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
    verbose.detail=TRUE)
summary(model)

# validation
pred <- hlaPredict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))

Example output

HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.5 (64-bit, AVX2)
[1] "training"   "validation"
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 34
# of unique HLA alleles: 14
# of unique HLA genotypes: 23
Gene: A
Range: [29910247bp, 29913661bp] on hg19
# of samples: 26
# of unique HLA alleles: 12
# of unique HLA genotypes: 14
[1] 275
Exclude 11 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
    # of SNPs randomly sampled as candidates for each selection: 17
    # of SNPs: 264
    # of samples: 34
    # of unique HLA alleles: 14
CPU flags: 64-bit, AVX2
# of threads: 1
[-] 2021-01-21 13:25:28
=== building individual classifier 1, out-of-bag (11/32.4%) ===
     1, SNP: 211, Loss: 196.4, OOB Acc: 54.55%, # of Haplo: 13
     2, SNP: 66, Loss: 173.548, OOB Acc: 63.64%, # of Haplo: 13
     3, SNP: 177, Loss: 136.352, OOB Acc: 68.18%, # of Haplo: 13
     4, SNP: 108, Loss: 95.8359, OOB Acc: 72.73%, # of Haplo: 13
     5, SNP: 127, Loss: 67.3216, OOB Acc: 77.27%, # of Haplo: 13
     6, SNP: 95, Loss: 47.5888, OOB Acc: 77.27%, # of Haplo: 13
     7, SNP: 33, Loss: 37.2631, OOB Acc: 77.27%, # of Haplo: 16
     8, SNP: 6, Loss: 29.7419, OOB Acc: 77.27%, # of Haplo: 18
     9, SNP: 208, Loss: 25.6913, OOB Acc: 77.27%, # of Haplo: 19
    10, SNP: 225, Loss: 25.3087, OOB Acc: 77.27%, # of Haplo: 21
    11, SNP: 11, Loss: 24.8356, OOB Acc: 77.27%, # of Haplo: 23
    12, SNP: 151, Loss: 19.4134, OOB Acc: 77.27%, # of Haplo: 23
    13, SNP: 199, Loss: 17.011, OOB Acc: 77.27%, # of Haplo: 23
[1] 2021-01-21 13:25:28, OOB Acc: 77.27%, # of SNPs: 13, # of Haplo: 23
=== building individual classifier 2, out-of-bag (13/38.2%) ===
     1, SNP: 160, Loss: 221.236, OOB Acc: 76.92%, # of Haplo: 17
     2, SNP: 145, Loss: 173.538, OOB Acc: 80.77%, # of Haplo: 23
     3, SNP: 177, Loss: 128.58, OOB Acc: 84.62%, # of Haplo: 31
     4, SNP: 111, Loss: 79.6877, OOB Acc: 84.62%, # of Haplo: 31
     5, SNP: 207, Loss: 52.5557, OOB Acc: 88.46%, # of Haplo: 32
     6, SNP: 245, Loss: 41.8731, OOB Acc: 88.46%, # of Haplo: 34
     7, SNP: 230, Loss: 31.7937, OOB Acc: 88.46%, # of Haplo: 38
     8, SNP: 151, Loss: 20.4566, OOB Acc: 88.46%, # of Haplo: 36
     9, SNP: 14, Loss: 19.5805, OOB Acc: 88.46%, # of Haplo: 42
    10, SNP: 132, Loss: 19.5101, OOB Acc: 88.46%, # of Haplo: 42
    11, SNP: 221, Loss: 19.485, OOB Acc: 88.46%, # of Haplo: 44
    12, SNP: 251, Loss: 18.5695, OOB Acc: 88.46%, # of Haplo: 48
[2] 2021-01-21 13:25:28, OOB Acc: 88.46%, # of SNPs: 12, # of Haplo: 48
=== building individual classifier 3, out-of-bag (14/41.2%) ===
     1, SNP: 191, Loss: 193.067, OOB Acc: 57.14%, # of Haplo: 11
     2, SNP: 264, Loss: 150.427, OOB Acc: 64.29%, # of Haplo: 12
     3, SNP: 132, Loss: 93.4067, OOB Acc: 67.86%, # of Haplo: 12
     4, SNP: 128, Loss: 39.8353, OOB Acc: 71.43%, # of Haplo: 12
     5, SNP: 160, Loss: 28.2998, OOB Acc: 75.00%, # of Haplo: 12
     6, SNP: 144, Loss: 13.635, OOB Acc: 75.00%, # of Haplo: 12
     7, SNP: 111, Loss: 6.04609, OOB Acc: 75.00%, # of Haplo: 12
     8, SNP: 40, Loss: 6.04583, OOB Acc: 82.14%, # of Haplo: 14
     9, SNP: 141, Loss: 6.04583, OOB Acc: 85.71%, # of Haplo: 14
    10, SNP: 73, Loss: 2.9038, OOB Acc: 85.71%, # of Haplo: 14
    11, SNP: 199, Loss: 2.20025, OOB Acc: 85.71%, # of Haplo: 14
[3] 2021-01-21 13:25:28, OOB Acc: 85.71%, # of SNPs: 11, # of Haplo: 14
=== building individual classifier 4, out-of-bag (10/29.4%) ===
     1, SNP: 147, Loss: 158.631, OOB Acc: 50.00%, # of Haplo: 12
     2, SNP: 152, Loss: 140.375, OOB Acc: 55.00%, # of Haplo: 13
     3, SNP: 78, Loss: 115.887, OOB Acc: 60.00%, # of Haplo: 16
     4, SNP: 115, Loss: 77.8082, OOB Acc: 60.00%, # of Haplo: 18
     5, SNP: 148, Loss: 62.6831, OOB Acc: 65.00%, # of Haplo: 18
     6, SNP: 13, Loss: 46.5657, OOB Acc: 75.00%, # of Haplo: 20
     7, SNP: 109, Loss: 31.0312, OOB Acc: 75.00%, # of Haplo: 20
     8, SNP: 176, Loss: 22.5073, OOB Acc: 75.00%, # of Haplo: 21
     9, SNP: 145, Loss: 20.9122, OOB Acc: 75.00%, # of Haplo: 21
    10, SNP: 128, Loss: 20.6728, OOB Acc: 75.00%, # of Haplo: 21
    11, SNP: 73, Loss: 14.6217, OOB Acc: 75.00%, # of Haplo: 22
    12, SNP: 151, Loss: 10.2879, OOB Acc: 75.00%, # of Haplo: 23
    13, SNP: 199, Loss: 8.74645, OOB Acc: 75.00%, # of Haplo: 23
[4] 2021-01-21 13:25:28, OOB Acc: 75.00%, # of SNPs: 13, # of Haplo: 23
Calculating matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470 
        Max.         Mean           SD 
0.5261716555 0.0476729895 0.1180875414 
Accuracy with training data: 97.06%
Out-of-bag accuracy: 81.61%
Gene: A
Training dataset: 34 samples X 264 SNPs
    # of HLA alleles: 14
    # of individual classifiers: 4
    total # of SNPs used: 38
    avg. # of SNPs in an individual classifier: 12.25
        (sd: 0.96, min: 11, max: 13, median: 12.50)
    avg. # of haplotypes in an individual classifier: 27.00
        (sd: 14.63, min: 14, max: 48, median: 23.00)
    avg. out-of-bag accuracy: 81.61%
        (sd: 6.49%, min: 75.00%, max: 88.46%, median: 81.49%)
Matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0002162725 0.0002198443 0.0002519909 0.0043752063 0.0092453043 0.0267068470 
        Max.         Mean           SD 
0.5261716555 0.0476729895 0.1180875414 
Genome assembly: hg19
HIBAG model:
    4 individual classifiers
    264 SNPs
    14 unique HLA alleles
Prediction:
    based on the averaged posterior probabilities
Model assembly: hg19, SNP assembly: hg19
No allelic strand orders are switched.
# of samples: 26
CPU flags: 64-bit, AVX2
# of threads: 1
Predicting (2021-01-21 13:25:28)	0%
Predicting (2021-01-21 13:25:28)	100%
$overall
  total.num.ind crt.num.ind crt.num.haplo   acc.ind acc.haplo call.threshold
1            26          23            49 0.8846154 0.9423077              0
  n.call call.rate
1     26         1

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01    12     1     0     0     0     0     0     0     0     0     0     0
  02:01     0    21     0     0     0     0     0     0     0     0     0     0
  02:06     0     0     0     0     0     0     0     0     0     0     0     0
  03:01     0     0     0     5     0     0     0     0     0     0     0     0
  11:01     0     0     0     0     2     0     0     0     0     0     0     0
  23:01     0     0     0     0     0     1     0     0     0     0     0     0
  24:02     0     0     0     0     0     1     3     0     0     0     0     0
  24:03     0     0     0     0     0     0     0     0     0     0     0     0
  25:01     0     0     0     0     0     0     0     0     1     0     0     0
  26:01     0     0     0     0     0     0     0     0     0     0     0     0
  29:02     0     0     0     0     0     0     0     0     0     0     1     0
  31:01     0     0     0     0     0     0     0     0     0     1     0     1
  32:01     0     0     0     0     0     0     0     0     0     0     0     0
  68:01     0     0     0     0     0     0     0     0     0     0     0     0
  ...       0     0     0     0     0     0     0     0     0     0     0     0
       True
Predict 32:01 68:01
  01:01     0     0
  02:01     0     0
  02:06     0     0
  03:01     0     0
  11:01     0     0
  23:01     0     0
  24:02     0     0
  24:03     0     0
  25:01     0     0
  26:01     0     0
  29:02     0     0
  31:01     0     0
  32:01     1     0
  68:01     0     1
  ...       0     0

$detail
   allele train.num train.freq valid.num valid.freq call.rate  accuracy
1   01:01        13 0.19117647        12 0.23076923         1 0.9807692
2   02:01        21 0.30882353        22 0.42307692         1 0.9807692
3   02:06         1 0.01470588         0 0.00000000         0       NaN
4   03:01         4 0.05882353         5 0.09615385         1 1.0000000
5   11:01         3 0.04411765         2 0.03846154         1 1.0000000
6   23:01         1 0.01470588         2 0.03846154         1 0.9807692
7   24:02         8 0.11764706         3 0.05769231         1 0.9807692
8   24:03         1 0.01470588         0 0.00000000         0       NaN
9   25:01         4 0.05882353         1 0.01923077         1 1.0000000
10  26:01         2 0.02941176         1 0.01923077         1 0.9807692
11  29:02         3 0.04411765         1 0.01923077         1 1.0000000
12  31:01         2 0.02941176         1 0.01923077         1 0.9807692
13  32:01         3 0.04411765         1 0.01923077         1 1.0000000
14  68:01         2 0.02941176         1 0.01923077         1 1.0000000
   sensitivity specificity       ppv       npv miscall miscall.prop
1    1.0000000   0.9750000 0.9230769 1.0000000    <NA>          NaN
2    0.9545455   1.0000000 1.0000000 0.9677419   01:01            1
3          NaN         NaN       NaN       NaN    <NA>          NaN
4    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
5    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
6    0.5000000   1.0000000 1.0000000 0.9803922   24:02            1
7    1.0000000   0.9795918 0.7500000 1.0000000    <NA>          NaN
8          NaN         NaN       NaN       NaN    <NA>          NaN
9    1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
10   0.0000000   1.0000000       NaN 0.9807692   31:01            1
11   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
12   1.0000000   0.9803922 0.5000000 1.0000000    <NA>          NaN
13   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN
14   1.0000000   1.0000000 1.0000000 1.0000000    <NA>          NaN

$overall
  total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1            26          21            42       1         1            0.5
  n.call call.rate
1     21 0.8076923

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01    12     0     0     0     0     0     0     0     0     0     0     0
  02:01     0    18     0     0     0     0     0     0     0     0     0     0
  02:06     0     0     0     0     0     0     0     0     0     0     0     0
  03:01     0     0     0     4     0     0     0     0     0     0     0     0
  11:01     0     0     0     0     2     0     0     0     0     0     0     0
  23:01     0     0     0     0     0     0     0     0     0     0     0     0
  24:02     0     0     0     0     0     0     2     0     0     0     0     0
  24:03     0     0     0     0     0     0     0     0     0     0     0     0
  25:01     0     0     0     0     0     0     0     0     0     0     0     0
  26:01     0     0     0     0     0     0     0     0     0     0     0     0
  29:02     0     0     0     0     0     0     0     0     0     0     1     0
  31:01     0     0     0     0     0     0     0     0     0     0     0     1
  32:01     0     0     0     0     0     0     0     0     0     0     0     0
  68:01     0     0     0     0     0     0     0     0     0     0     0     0
  ...       0     0     0     0     0     0     0     0     0     0     0     0
       True
Predict 32:01 68:01
  01:01     0     0
  02:01     0     0
  02:06     0     0
  03:01     0     0
  11:01     0     0
  23:01     0     0
  24:02     0     0
  24:03     0     0
  25:01     0     0
  26:01     0     0
  29:02     0     0
  31:01     0     0
  32:01     1     0
  68:01     0     1
  ...       0     0

$detail
   allele train.num train.freq valid.num valid.freq call.rate accuracy
1   01:01        13 0.19117647        12 0.23076923 1.0000000        1
2   02:01        21 0.30882353        22 0.42307692 0.8181818        1
3   02:06         1 0.01470588         0 0.00000000 0.0000000      NaN
4   03:01         4 0.05882353         5 0.09615385 0.8000000        1
5   11:01         3 0.04411765         2 0.03846154 1.0000000        1
6   23:01         1 0.01470588         2 0.03846154 0.0000000      NaN
7   24:02         8 0.11764706         3 0.05769231 0.6666667        1
8   24:03         1 0.01470588         0 0.00000000 0.0000000      NaN
9   25:01         4 0.05882353         1 0.01923077 0.0000000      NaN
10  26:01         2 0.02941176         1 0.01923077 0.0000000      NaN
11  29:02         3 0.04411765         1 0.01923077 1.0000000        1
12  31:01         2 0.02941176         1 0.01923077 1.0000000        1
13  32:01         3 0.04411765         1 0.01923077 1.0000000        1
14  68:01         2 0.02941176         1 0.01923077 1.0000000        1
   sensitivity specificity ppv npv miscall miscall.prop
1            1           1   1   1    <NA>          NaN
2            1           1   1   1    <NA>          NaN
3          NaN         NaN NaN NaN    <NA>          NaN
4            1           1   1   1    <NA>          NaN
5            1           1   1   1    <NA>          NaN
6          NaN         NaN NaN NaN    <NA>          NaN
7            1           1   1   1    <NA>          NaN
8          NaN         NaN NaN NaN    <NA>          NaN
9          NaN         NaN NaN NaN    <NA>          NaN
10         NaN         NaN NaN NaN    <NA>          NaN
11           1           1   1   1    <NA>          NaN
12           1           1   1   1    <NA>          NaN
13           1           1   1   1    <NA>          NaN
14           1           1   1   1    <NA>          NaN

HIBAG documentation built on March 24, 2021, 6 p.m.