hlaOutOfBag: Out-of-bag estimation of overall accuracy, per-allele...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/DataUtilities.R View source: R/HIBAG.R

Description

Out-of-bag estimation of overall accuracy, per-allele sensitivity, specificity, positive predictive value, negative predictive value and call rate.

Usage

1
hlaOutOfBag(model, hla, snp, call.threshold=NaN, verbose=TRUE)

Arguments

model

an object of hlaAttrBagClass or hlaAttrBagObj

hla

the training HLA types, an object of hlaAlleleClass

snp

the training SNP genotypes, an object of hlaSNPGenoClass

call.threshold

the specified call threshold; if NaN, no threshold is used

verbose

if TRUE, show information

Value

Return hlaAlleleClass.

Author(s)

Xiuwen Zheng

See Also

hlaCompareAllele, hlaReport

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel = match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel = match(hla$value$sample.id, HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
# please use "nclassifier=100" when you use HIBAG for real data
model <- hlaAttrBagging(hla, geno, nclassifier=4)
summary(model)

# out-of-bag estimation
(comp <- hlaOutOfBag(model, hla, geno, call.threshold=NaN, verbose=TRUE))

# report
hlaReport(comp, type="txt")

hlaReport(comp, type="tex")

hlaReport(comp, type="html")

Example output

HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.3
Supported by Streaming SIMD Extensions (SSE2) [64-bit]
[1] 275
Remove 9 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
# of SNPs randomly sampled as candidates for each selection: 17
# of SNPs: 266, # of samples: 60
# of unique HLA alleles: 14
Wed Mar 11 17:37:37 2020,   1 individual classifier, out-of-bag acc: 86.96%, # of SNPs: 12, # of haplo: 32
Wed Mar 11 17:37:37 2020,   2 individual classifier, out-of-bag acc: 87.50%, # of SNPs: 15, # of haplo: 40
Wed Mar 11 17:37:37 2020,   3 individual classifier, out-of-bag acc: 97.92%, # of SNPs: 14, # of haplo: 21
Wed Mar 11 17:37:37 2020,   4 individual classifier, out-of-bag acc: 95.45%, # of SNPs: 14, # of haplo: 25
Gene: A
Training dataset: 60 samples X 266 SNPs
	# of HLA alleles: 14
	# of individual classifiers: 4
	total # of SNPs used: 42
	average # of SNPs in an individual classifier: 13.75, sd: 1.26, min: 12, max: 15
	average # of haplotypes in an individual classifier: 29.50, sd: 8.35, min: 21, max: 40
	average out-of-bag accuracy: 91.96%, sd: 5.56%, min: 86.96%, max: 97.92%
Genome assembly: hg19
Gene: A
Training dataset: 60 samples X 266 SNPs
	# of HLA alleles: 14
	# of individual classifiers: 4
	total # of SNPs used: 42
	average # of SNPs in an individual classifier: 13.75, sd: 1.26, min: 12, max: 15
	average # of haplotypes in an individual classifier: 29.50, sd: 8.35, min: 21, max: 40
	average out-of-bag accuracy: 91.96%, sd: 5.56%, min: 86.96%, max: 97.92%
Genome assembly: hg19
Wed Mar 11 17:37:37 2020, passing the 1/4 classifiers.
Wed Mar 11 17:37:37 2020, passing the 2/4 classifiers.
Wed Mar 11 17:37:37 2020, passing the 3/4 classifiers.
Wed Mar 11 17:37:37 2020, passing the 4/4 classifiers.
$overall
  total.num.ind crt.num.ind crt.num.haplo   acc.ind acc.haplo call.threshold
1         23.25          20         42.75 0.8604249 0.9195693              0
  n.call call.rate
1  23.25         1

$confusion
       True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
  01:01   7.5   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.00  0.00
  02:01   0.0  15.5  0.25  0.00   0.0  0.00  0.00  0.25     0 0.125  0.75  0.00
  02:06   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.25  0.00
  03:01   0.0   0.0  0.00  2.75   0.0  0.00  0.00  0.00     0 0.000  0.00  0.00
  11:01   0.0   0.0  0.00  0.00   2.5  0.00  0.00  0.00     0 0.000  0.00  0.00
  23:01   0.0   0.0  0.00  0.00   0.0  1.25  0.00  0.00     0 0.000  0.00  0.00
  24:02   0.0   0.0  0.00  0.00   0.0  0.75  3.75  0.75     0 0.000  0.00  0.00
  24:03   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.00  0.00
  25:01   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     3 0.625  0.00  0.00
  26:01   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.750  0.00  0.00
  29:02   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  1.25  0.00
  31:01   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.00  0.75
  32:01   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.00  0.00
  68:01   0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.00  0.00
  ...     0.0   0.0  0.00  0.00   0.0  0.00  0.00  0.00     0 0.000  0.00  0.00
       True
Predict 32:01 68:01
  01:01  0.00   0.0
  02:01  0.00   0.0
  02:06  0.00   0.0
  03:01  0.00   0.0
  11:01  0.00   0.0
  23:01  0.00   0.0
  24:02  0.00   0.0
  24:03  0.00   0.0
  25:01  0.00   0.0
  26:01  0.00   0.0
  29:02  0.00   0.0
  31:01  0.00   0.0
  32:01  2.25   0.0
  68:01  0.00   1.5
  ...    0.00   0.0

$detail
      allele valid.num  valid.freq call.rate  accuracy sensitivity specificity
01:01  01:01        25 0.208333333      1.00 1.0000000       1.000   1.0000000
02:01  02:01        43 0.358333333      1.00 0.9673707       1.000   0.9514262
02:06  02:06         1 0.008333333      0.25 0.9772727       0.000   1.0000000
03:01  03:01         9 0.075000000      1.00 1.0000000       1.000   1.0000000
11:01  11:01         5 0.041666667      1.00 1.0000000       1.000   1.0000000
23:01  23:01         3 0.025000000      1.00 0.9843750       0.750   1.0000000
24:02  24:02        11 0.091666667      1.00 0.9734848       1.000   0.9711752
24:03  24:03         1 0.008333333      1.00 0.9784667       0.000   1.0000000
25:01  25:01         5 0.041666667      1.00 0.9841486       1.000   0.9831781
26:01  26:01         3 0.025000000      1.00 0.9841486       0.625   1.0000000
29:02  29:02         4 0.033333333      1.00 0.9782609       0.750   1.0000000
31:01  31:01         3 0.025000000      0.75 1.0000000       1.000   1.0000000
32:01  32:01         4 0.033333333      1.00 1.0000000       1.000   1.0000000
68:01  68:01         3 0.025000000      1.00 1.0000000       1.000   1.0000000
            ppv       npv miscall miscall.prop
01:01 1.0000000 1.0000000    <NA>          NaN
02:01 0.9253003 1.0000000    <NA>          NaN
02:06       NaN 0.9772727   02:01    1.0000000
03:01 1.0000000 1.0000000    <NA>          NaN
11:01 1.0000000 1.0000000    <NA>          NaN
23:01 1.0000000 0.9843750   24:02    1.0000000
24:02 0.7625000 1.0000000    <NA>          NaN
24:03       NaN 0.9784667   24:02    0.7500000
25:01 0.8472222 1.0000000    <NA>          NaN
26:01 1.0000000 0.9840278   25:01    0.8333333
29:02 1.0000000 0.9782609   02:01    0.7500000
31:01 1.0000000 1.0000000    <NA>          NaN
32:01 1.0000000 1.0000000    <NA>          NaN
68:01 1.0000000 1.0000000    <NA>          NaN

Allele	Num.	Freq.	CR	ACC	SEN	SPE	PPV	NPV	Miscall
	Valid.	Valid.	(%)	(%)	(%)	(%)	(%)	(%)	(%)
----
Overall accuracy: 92.0%, Call rate: 100.0%
01:01 25 0.2083 100.0 100.0 100.0 100.0 100.0 100.0 --
02:01 43 0.3583 100.0 96.7 100.0 95.1 92.5 100.0 --
02:06 1 0.0083 25.0 97.7 0.0 100.0 -- 97.7 02:01 (100)
03:01 9 0.0750 100.0 100.0 100.0 100.0 100.0 100.0 --
11:01 5 0.0417 100.0 100.0 100.0 100.0 100.0 100.0 --
23:01 3 0.0250 100.0 98.4 75.0 100.0 100.0 98.4 24:02 (100)
24:02 11 0.0917 100.0 97.3 100.0 97.1 76.2 100.0 --
24:03 1 0.0083 100.0 97.8 0.0 100.0 -- 97.8 24:02 (75)
25:01 5 0.0417 100.0 98.4 100.0 98.3 84.7 100.0 --
26:01 3 0.0250 100.0 98.4 62.5 100.0 100.0 98.4 25:01 (83)
29:02 4 0.0333 100.0 97.8 75.0 100.0 100.0 97.8 02:01 (75)
31:01 3 0.0250 75.0 100.0 100.0 100.0 100.0 100.0 --
32:01 4 0.0333 100.0 100.0 100.0 100.0 100.0 100.0 --
68:01 3 0.0250 100.0 100.0 100.0 100.0 100.0 100.0 --
\title{Imputation Evaluation}

\documentclass[12pt]{article}

\usepackage{fullpage}
\usepackage{longtable}

\begin{document}

\maketitle

\setlength{\LTcapwidth}{6.5in}

% -------- BEGIN TABLE --------
\begin{longtable}{rrr | rrrrrrl}
\caption{The sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV) and call rate (CR).}
\label{tab:accuracy} \\
Allele & Num. & Freq. & CR & ACC & SEN & SPE & PPV & NPV & Miscall \\
 & Valid. & Valid. & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) \\
\hline\hline
\endfirsthead
\multicolumn{10}{c}{{\normalsize \tablename\ \thetable{} -- Continued from previous page}} \\
Allele & Num. & Freq. & CR & ACC & SEN & SPE & PPV & NPV & Miscall \\
 & Valid. & Valid. & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) \\
\hline\hline
\endhead
\hline
\multicolumn{10}{r}{Continued on next page ...} \\
\hline
\endfoot
\hline\hline
\endlastfoot
\multicolumn{10}{l}{\it Overall accuracy: 92.0\%, Call rate: 100.0\%} \\
01:01 & 25 & 0.2083 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
02:01 & 43 & 0.3583 & 100.0 & 96.7 & 100.0 & 95.1 & 92.5 & 100.0 & -- \\
02:06 & 1 & 0.0083 & 25.0 & 97.7 & 0.0 & 100.0 & -- & 97.7 & 02:01 (100) \\
03:01 & 9 & 0.0750 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
11:01 & 5 & 0.0417 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
23:01 & 3 & 0.0250 & 100.0 & 98.4 & 75.0 & 100.0 & 100.0 & 98.4 & 24:02 (100) \\
24:02 & 11 & 0.0917 & 100.0 & 97.3 & 100.0 & 97.1 & 76.2 & 100.0 & -- \\
24:03 & 1 & 0.0083 & 100.0 & 97.8 & 0.0 & 100.0 & -- & 97.8 & 24:02 (75) \\
25:01 & 5 & 0.0417 & 100.0 & 98.4 & 100.0 & 98.3 & 84.7 & 100.0 & -- \\
26:01 & 3 & 0.0250 & 100.0 & 98.4 & 62.5 & 100.0 & 100.0 & 98.4 & 25:01 (83) \\
29:02 & 4 & 0.0333 & 100.0 & 97.8 & 75.0 & 100.0 & 100.0 & 97.8 & 02:01 (75) \\
31:01 & 3 & 0.0250 & 75.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
32:01 & 4 & 0.0333 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
68:01 & 3 & 0.0250 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
\end{longtable}
% -------- END TABLE --------

\end{document}
<!DOCTYPE html>
<html>
<head>
  <title>Imputation Evaluation</title>
</head>
<body>
<h1>Imputation Evaluation</h1>
<p></p>
<h3><b>Table 1L:</b> The sensitivity (SEN), specificity (SPE),
positive predictive value (PPV), negative predictive value (NPV)
and call rate (CR).</h3>
<table id="TB-Acc" class="tabular" border="1"  CELLSPACING="1">
<tr>
<th>Allele </th> <th>Num. Valid.</th> <th>Freq. Valid.</th> <th>CR (%)</th> <th>ACC (%)</th> <th>SEN (%)</th> <th>SPE (%)</th> <th>PPV (%)</th> <th>NPV (%)</th> <th>Miscall (%)</th>
</tr>
<tr>
<td colspan="10">
<i> Overall accuracy: 92.0%, Call rate: 100.0% </i>
</td>
</tr>
<tr>
<td>01:01</td> <td>25</td> <td>0.2083</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>02:01</td> <td>43</td> <td>0.3583</td> <td>100.0</td> <td>96.7</td> <td>100.0</td> <td>95.1</td> <td>92.5</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>02:06</td> <td>1</td> <td>0.0083</td> <td>25.0</td> <td>97.7</td> <td>0.0</td> <td>100.0</td> <td>--</td> <td>97.7</td> <td>02:01 (100)</td>
</tr>
<tr>
<td>03:01</td> <td>9</td> <td>0.0750</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>11:01</td> <td>5</td> <td>0.0417</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>23:01</td> <td>3</td> <td>0.0250</td> <td>100.0</td> <td>98.4</td> <td>75.0</td> <td>100.0</td> <td>100.0</td> <td>98.4</td> <td>24:02 (100)</td>
</tr>
<tr>
<td>24:02</td> <td>11</td> <td>0.0917</td> <td>100.0</td> <td>97.3</td> <td>100.0</td> <td>97.1</td> <td>76.2</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>24:03</td> <td>1</td> <td>0.0083</td> <td>100.0</td> <td>97.8</td> <td>0.0</td> <td>100.0</td> <td>--</td> <td>97.8</td> <td>24:02 (75)</td>
</tr>
<tr>
<td>25:01</td> <td>5</td> <td>0.0417</td> <td>100.0</td> <td>98.4</td> <td>100.0</td> <td>98.3</td> <td>84.7</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>26:01</td> <td>3</td> <td>0.0250</td> <td>100.0</td> <td>98.4</td> <td>62.5</td> <td>100.0</td> <td>100.0</td> <td>98.4</td> <td>25:01 (83)</td>
</tr>
<tr>
<td>29:02</td> <td>4</td> <td>0.0333</td> <td>100.0</td> <td>97.8</td> <td>75.0</td> <td>100.0</td> <td>100.0</td> <td>97.8</td> <td>02:01 (75)</td>
</tr>
<tr>
<td>31:01</td> <td>3</td> <td>0.0250</td> <td>75.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>32:01</td> <td>4</td> <td>0.0333</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>68:01</td> <td>3</td> <td>0.0250</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
</table>

</body>
</html>

HIBAG documentation built on March 24, 2021, 6 p.m.