Description Usage Arguments Value Author(s) See Also Examples
View source: R/DataUtilities.R View source: R/HIBAG.R
Out-of-bag estimation of overall accuracy, per-allele sensitivity, specificity, positive predictive value, negative predictive value and call rate.
1 | hlaOutOfBag(model, hla, snp, call.threshold=NaN, verbose=TRUE)
|
model |
an object of |
hla |
the training HLA types, an object of
|
snp |
the training SNP genotypes, an object of
|
call.threshold |
the specified call threshold; if |
verbose |
if TRUE, show information |
Return hlaAlleleClass
.
Xiuwen Zheng
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
locus=hla.id, assembly="hg19")
# SNP predictors within the flanking region on each side
region <- 500 # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
hla.id, region*1000, assembly="hg19")
length(snpid) # 275
# training and validation genotypes
geno <- hlaGenoSubset(HapMap_CEU_Geno,
snp.sel = match(snpid, HapMap_CEU_Geno$snp.id),
samp.sel = match(hla$value$sample.id, HapMap_CEU_Geno$sample.id))
# train a HIBAG model
set.seed(100)
# please use "nclassifier=100" when you use HIBAG for real data
model <- hlaAttrBagging(hla, geno, nclassifier=4)
summary(model)
# out-of-bag estimation
(comp <- hlaOutOfBag(model, hla, geno, call.threshold=NaN, verbose=TRUE))
# report
hlaReport(comp, type="txt")
hlaReport(comp, type="tex")
hlaReport(comp, type="html")
|
HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.3
Supported by Streaming SIMD Extensions (SSE2) [64-bit]
[1] 275
Remove 9 monomorphic SNPs
Build a HIBAG model with 4 individual classifiers:
# of SNPs randomly sampled as candidates for each selection: 17
# of SNPs: 266, # of samples: 60
# of unique HLA alleles: 14
Wed Mar 11 17:37:37 2020, 1 individual classifier, out-of-bag acc: 86.96%, # of SNPs: 12, # of haplo: 32
Wed Mar 11 17:37:37 2020, 2 individual classifier, out-of-bag acc: 87.50%, # of SNPs: 15, # of haplo: 40
Wed Mar 11 17:37:37 2020, 3 individual classifier, out-of-bag acc: 97.92%, # of SNPs: 14, # of haplo: 21
Wed Mar 11 17:37:37 2020, 4 individual classifier, out-of-bag acc: 95.45%, # of SNPs: 14, # of haplo: 25
Gene: A
Training dataset: 60 samples X 266 SNPs
# of HLA alleles: 14
# of individual classifiers: 4
total # of SNPs used: 42
average # of SNPs in an individual classifier: 13.75, sd: 1.26, min: 12, max: 15
average # of haplotypes in an individual classifier: 29.50, sd: 8.35, min: 21, max: 40
average out-of-bag accuracy: 91.96%, sd: 5.56%, min: 86.96%, max: 97.92%
Genome assembly: hg19
Gene: A
Training dataset: 60 samples X 266 SNPs
# of HLA alleles: 14
# of individual classifiers: 4
total # of SNPs used: 42
average # of SNPs in an individual classifier: 13.75, sd: 1.26, min: 12, max: 15
average # of haplotypes in an individual classifier: 29.50, sd: 8.35, min: 21, max: 40
average out-of-bag accuracy: 91.96%, sd: 5.56%, min: 86.96%, max: 97.92%
Genome assembly: hg19
Wed Mar 11 17:37:37 2020, passing the 1/4 classifiers.
Wed Mar 11 17:37:37 2020, passing the 2/4 classifiers.
Wed Mar 11 17:37:37 2020, passing the 3/4 classifiers.
Wed Mar 11 17:37:37 2020, passing the 4/4 classifiers.
$overall
total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1 23.25 20 42.75 0.8604249 0.9195693 0
n.call call.rate
1 23.25 1
$confusion
True
Predict 01:01 02:01 02:06 03:01 11:01 23:01 24:02 24:03 25:01 26:01 29:02 31:01
01:01 7.5 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.00 0.00
02:01 0.0 15.5 0.25 0.00 0.0 0.00 0.00 0.25 0 0.125 0.75 0.00
02:06 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.25 0.00
03:01 0.0 0.0 0.00 2.75 0.0 0.00 0.00 0.00 0 0.000 0.00 0.00
11:01 0.0 0.0 0.00 0.00 2.5 0.00 0.00 0.00 0 0.000 0.00 0.00
23:01 0.0 0.0 0.00 0.00 0.0 1.25 0.00 0.00 0 0.000 0.00 0.00
24:02 0.0 0.0 0.00 0.00 0.0 0.75 3.75 0.75 0 0.000 0.00 0.00
24:03 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.00 0.00
25:01 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 3 0.625 0.00 0.00
26:01 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.750 0.00 0.00
29:02 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 1.25 0.00
31:01 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.00 0.75
32:01 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.00 0.00
68:01 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.00 0.00
... 0.0 0.0 0.00 0.00 0.0 0.00 0.00 0.00 0 0.000 0.00 0.00
True
Predict 32:01 68:01
01:01 0.00 0.0
02:01 0.00 0.0
02:06 0.00 0.0
03:01 0.00 0.0
11:01 0.00 0.0
23:01 0.00 0.0
24:02 0.00 0.0
24:03 0.00 0.0
25:01 0.00 0.0
26:01 0.00 0.0
29:02 0.00 0.0
31:01 0.00 0.0
32:01 2.25 0.0
68:01 0.00 1.5
... 0.00 0.0
$detail
allele valid.num valid.freq call.rate accuracy sensitivity specificity
01:01 01:01 25 0.208333333 1.00 1.0000000 1.000 1.0000000
02:01 02:01 43 0.358333333 1.00 0.9673707 1.000 0.9514262
02:06 02:06 1 0.008333333 0.25 0.9772727 0.000 1.0000000
03:01 03:01 9 0.075000000 1.00 1.0000000 1.000 1.0000000
11:01 11:01 5 0.041666667 1.00 1.0000000 1.000 1.0000000
23:01 23:01 3 0.025000000 1.00 0.9843750 0.750 1.0000000
24:02 24:02 11 0.091666667 1.00 0.9734848 1.000 0.9711752
24:03 24:03 1 0.008333333 1.00 0.9784667 0.000 1.0000000
25:01 25:01 5 0.041666667 1.00 0.9841486 1.000 0.9831781
26:01 26:01 3 0.025000000 1.00 0.9841486 0.625 1.0000000
29:02 29:02 4 0.033333333 1.00 0.9782609 0.750 1.0000000
31:01 31:01 3 0.025000000 0.75 1.0000000 1.000 1.0000000
32:01 32:01 4 0.033333333 1.00 1.0000000 1.000 1.0000000
68:01 68:01 3 0.025000000 1.00 1.0000000 1.000 1.0000000
ppv npv miscall miscall.prop
01:01 1.0000000 1.0000000 <NA> NaN
02:01 0.9253003 1.0000000 <NA> NaN
02:06 NaN 0.9772727 02:01 1.0000000
03:01 1.0000000 1.0000000 <NA> NaN
11:01 1.0000000 1.0000000 <NA> NaN
23:01 1.0000000 0.9843750 24:02 1.0000000
24:02 0.7625000 1.0000000 <NA> NaN
24:03 NaN 0.9784667 24:02 0.7500000
25:01 0.8472222 1.0000000 <NA> NaN
26:01 1.0000000 0.9840278 25:01 0.8333333
29:02 1.0000000 0.9782609 02:01 0.7500000
31:01 1.0000000 1.0000000 <NA> NaN
32:01 1.0000000 1.0000000 <NA> NaN
68:01 1.0000000 1.0000000 <NA> NaN
Allele Num. Freq. CR ACC SEN SPE PPV NPV Miscall
Valid. Valid. (%) (%) (%) (%) (%) (%) (%)
----
Overall accuracy: 92.0%, Call rate: 100.0%
01:01 25 0.2083 100.0 100.0 100.0 100.0 100.0 100.0 --
02:01 43 0.3583 100.0 96.7 100.0 95.1 92.5 100.0 --
02:06 1 0.0083 25.0 97.7 0.0 100.0 -- 97.7 02:01 (100)
03:01 9 0.0750 100.0 100.0 100.0 100.0 100.0 100.0 --
11:01 5 0.0417 100.0 100.0 100.0 100.0 100.0 100.0 --
23:01 3 0.0250 100.0 98.4 75.0 100.0 100.0 98.4 24:02 (100)
24:02 11 0.0917 100.0 97.3 100.0 97.1 76.2 100.0 --
24:03 1 0.0083 100.0 97.8 0.0 100.0 -- 97.8 24:02 (75)
25:01 5 0.0417 100.0 98.4 100.0 98.3 84.7 100.0 --
26:01 3 0.0250 100.0 98.4 62.5 100.0 100.0 98.4 25:01 (83)
29:02 4 0.0333 100.0 97.8 75.0 100.0 100.0 97.8 02:01 (75)
31:01 3 0.0250 75.0 100.0 100.0 100.0 100.0 100.0 --
32:01 4 0.0333 100.0 100.0 100.0 100.0 100.0 100.0 --
68:01 3 0.0250 100.0 100.0 100.0 100.0 100.0 100.0 --
\title{Imputation Evaluation}
\documentclass[12pt]{article}
\usepackage{fullpage}
\usepackage{longtable}
\begin{document}
\maketitle
\setlength{\LTcapwidth}{6.5in}
% -------- BEGIN TABLE --------
\begin{longtable}{rrr | rrrrrrl}
\caption{The sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV) and call rate (CR).}
\label{tab:accuracy} \\
Allele & Num. & Freq. & CR & ACC & SEN & SPE & PPV & NPV & Miscall \\
& Valid. & Valid. & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) \\
\hline\hline
\endfirsthead
\multicolumn{10}{c}{{\normalsize \tablename\ \thetable{} -- Continued from previous page}} \\
Allele & Num. & Freq. & CR & ACC & SEN & SPE & PPV & NPV & Miscall \\
& Valid. & Valid. & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) & (\%) \\
\hline\hline
\endhead
\hline
\multicolumn{10}{r}{Continued on next page ...} \\
\hline
\endfoot
\hline\hline
\endlastfoot
\multicolumn{10}{l}{\it Overall accuracy: 92.0\%, Call rate: 100.0\%} \\
01:01 & 25 & 0.2083 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
02:01 & 43 & 0.3583 & 100.0 & 96.7 & 100.0 & 95.1 & 92.5 & 100.0 & -- \\
02:06 & 1 & 0.0083 & 25.0 & 97.7 & 0.0 & 100.0 & -- & 97.7 & 02:01 (100) \\
03:01 & 9 & 0.0750 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
11:01 & 5 & 0.0417 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
23:01 & 3 & 0.0250 & 100.0 & 98.4 & 75.0 & 100.0 & 100.0 & 98.4 & 24:02 (100) \\
24:02 & 11 & 0.0917 & 100.0 & 97.3 & 100.0 & 97.1 & 76.2 & 100.0 & -- \\
24:03 & 1 & 0.0083 & 100.0 & 97.8 & 0.0 & 100.0 & -- & 97.8 & 24:02 (75) \\
25:01 & 5 & 0.0417 & 100.0 & 98.4 & 100.0 & 98.3 & 84.7 & 100.0 & -- \\
26:01 & 3 & 0.0250 & 100.0 & 98.4 & 62.5 & 100.0 & 100.0 & 98.4 & 25:01 (83) \\
29:02 & 4 & 0.0333 & 100.0 & 97.8 & 75.0 & 100.0 & 100.0 & 97.8 & 02:01 (75) \\
31:01 & 3 & 0.0250 & 75.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
32:01 & 4 & 0.0333 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
68:01 & 3 & 0.0250 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & 100.0 & -- \\
\end{longtable}
% -------- END TABLE --------
\end{document}
<!DOCTYPE html>
<html>
<head>
<title>Imputation Evaluation</title>
</head>
<body>
<h1>Imputation Evaluation</h1>
<p></p>
<h3><b>Table 1L:</b> The sensitivity (SEN), specificity (SPE),
positive predictive value (PPV), negative predictive value (NPV)
and call rate (CR).</h3>
<table id="TB-Acc" class="tabular" border="1" CELLSPACING="1">
<tr>
<th>Allele </th> <th>Num. Valid.</th> <th>Freq. Valid.</th> <th>CR (%)</th> <th>ACC (%)</th> <th>SEN (%)</th> <th>SPE (%)</th> <th>PPV (%)</th> <th>NPV (%)</th> <th>Miscall (%)</th>
</tr>
<tr>
<td colspan="10">
<i> Overall accuracy: 92.0%, Call rate: 100.0% </i>
</td>
</tr>
<tr>
<td>01:01</td> <td>25</td> <td>0.2083</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>02:01</td> <td>43</td> <td>0.3583</td> <td>100.0</td> <td>96.7</td> <td>100.0</td> <td>95.1</td> <td>92.5</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>02:06</td> <td>1</td> <td>0.0083</td> <td>25.0</td> <td>97.7</td> <td>0.0</td> <td>100.0</td> <td>--</td> <td>97.7</td> <td>02:01 (100)</td>
</tr>
<tr>
<td>03:01</td> <td>9</td> <td>0.0750</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>11:01</td> <td>5</td> <td>0.0417</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>23:01</td> <td>3</td> <td>0.0250</td> <td>100.0</td> <td>98.4</td> <td>75.0</td> <td>100.0</td> <td>100.0</td> <td>98.4</td> <td>24:02 (100)</td>
</tr>
<tr>
<td>24:02</td> <td>11</td> <td>0.0917</td> <td>100.0</td> <td>97.3</td> <td>100.0</td> <td>97.1</td> <td>76.2</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>24:03</td> <td>1</td> <td>0.0083</td> <td>100.0</td> <td>97.8</td> <td>0.0</td> <td>100.0</td> <td>--</td> <td>97.8</td> <td>24:02 (75)</td>
</tr>
<tr>
<td>25:01</td> <td>5</td> <td>0.0417</td> <td>100.0</td> <td>98.4</td> <td>100.0</td> <td>98.3</td> <td>84.7</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>26:01</td> <td>3</td> <td>0.0250</td> <td>100.0</td> <td>98.4</td> <td>62.5</td> <td>100.0</td> <td>100.0</td> <td>98.4</td> <td>25:01 (83)</td>
</tr>
<tr>
<td>29:02</td> <td>4</td> <td>0.0333</td> <td>100.0</td> <td>97.8</td> <td>75.0</td> <td>100.0</td> <td>100.0</td> <td>97.8</td> <td>02:01 (75)</td>
</tr>
<tr>
<td>31:01</td> <td>3</td> <td>0.0250</td> <td>75.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>32:01</td> <td>4</td> <td>0.0333</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
<tr>
<td>68:01</td> <td>3</td> <td>0.0250</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>100.0</td> <td>--</td>
</tr>
</table>
</body>
</html>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.