hlaDistance: Distance matrix of HLA alleles

Description Usage Arguments Value Author(s) Examples

View source: R/HIBAG.R

Description

To calculate the distance matrix of HLA alleles from a HIBAG model.

Usage

1
hlaDistance(model)

Arguments

model

a model of hlaAttrBagClass or hlaAttrBagObj

Value

Return a distance matrix with row and column names for HLA alleles.

Author(s)

Xiuwen Zheng

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# flanking genotypes
train.geno <- hlaGenoSubsetFlank(HapMap_CEU_Geno, hla.id, 500000)
summary(train.geno)

# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hla, train.geno, nclassifier=10)
summary(model)

# distance matrix
d <- hlaDistance(model)

# draw
p <- hclust(as.dist(d))
plot(p, xlab="HLA alleles")

Example output

HIBAG (HLA Genotype Imputation with Attribute Bagging)
Kernel Version: v1.5 (64-bit, AVX2)
SNP genotypes: 
    60 samples X 275 SNPs
    SNPs range from 29417816bp to 30410205bp on hg19
Missing rate per SNP:
    min: 0, max: 0.0666667, mean: 0.0652727, median: 0.0666667, sd: 0.00939558
Missing rate per sample:
    min: 0, max: 0.974545, mean: 0.0652727, median: 0, sd: 0.245066
Minor allele frequency:
    min: 0, max: 0.491071, mean: 0.215181, median: 0.1875, sd: 0.139271
Allelic information:
C/T A/G G/T A/C 
125  97  32  21 
Exclude 9 monomorphic SNPs
Build a HIBAG model with 10 individual classifiers:
    # of SNPs randomly sampled as candidates for each selection: 17
    # of SNPs: 266
    # of samples: 60
    # of unique HLA alleles: 14
CPU flags: 64-bit, AVX2
# of threads: 1
[-] 2021-02-02 01:07:27
=== building individual classifier 1, out-of-bag (23/38.3%) ===
[1] 2021-02-02 01:07:27, OOB Acc: 86.96%, # of SNPs: 12, # of Haplo: 32
=== building individual classifier 2, out-of-bag (24/40.0%) ===
[2] 2021-02-02 01:07:27, OOB Acc: 87.50%, # of SNPs: 15, # of Haplo: 40
=== building individual classifier 3, out-of-bag (24/40.0%) ===
[3] 2021-02-02 01:07:27, OOB Acc: 97.92%, # of SNPs: 14, # of Haplo: 21
=== building individual classifier 4, out-of-bag (22/36.7%) ===
[4] 2021-02-02 01:07:27, OOB Acc: 95.45%, # of SNPs: 14, # of Haplo: 25
=== building individual classifier 5, out-of-bag (19/31.7%) ===
[5] 2021-02-02 01:07:27, OOB Acc: 78.95%, # of SNPs: 14, # of Haplo: 21
=== building individual classifier 6, out-of-bag (24/40.0%) ===
[6] 2021-02-02 01:07:27, OOB Acc: 93.75%, # of SNPs: 16, # of Haplo: 22
=== building individual classifier 7, out-of-bag (24/40.0%) ===
[7] 2021-02-02 01:07:28, OOB Acc: 93.75%, # of SNPs: 24, # of Haplo: 81
=== building individual classifier 8, out-of-bag (21/35.0%) ===
[8] 2021-02-02 01:07:28, OOB Acc: 92.86%, # of SNPs: 20, # of Haplo: 45
=== building individual classifier 9, out-of-bag (19/31.7%) ===
[9] 2021-02-02 01:07:28, OOB Acc: 94.74%, # of SNPs: 16, # of Haplo: 45
=== building individual classifier 10, out-of-bag (19/31.7%) ===
[10] 2021-02-02 01:07:28, OOB Acc: 97.37%, # of SNPs: 15, # of Haplo: 40
Calculating matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0001493079 0.0001617044 0.0002732730 0.0039571951 0.0150902995 0.0320853535 
        Max.         Mean           SD 
0.3770420992 0.0416903408 0.0825371577 
Accuracy with training data: 98.33%
Out-of-bag accuracy: 91.92%
Gene: A
Training dataset: 60 samples X 266 SNPs
    # of HLA alleles: 14
    # of individual classifiers: 10
    total # of SNPs used: 95
    avg. # of SNPs in an individual classifier: 16.00
        (sd: 3.50, min: 12, max: 24, median: 15.00)
    avg. # of haplotypes in an individual classifier: 37.20
        (sd: 18.22, min: 21, max: 81, median: 36.00)
    avg. out-of-bag accuracy: 91.92%
        (sd: 5.83%, min: 78.95%, max: 97.92%, median: 93.75%)
Matching proportion:
        Min.     0.1% Qu.       1% Qu.      1st Qu.       Median      3rd Qu. 
0.0001493079 0.0001617044 0.0002732730 0.0039571951 0.0150902995 0.0320853535 
        Max.         Mean           SD 
0.3770420992 0.0416903408 0.0825371577 
Genome assembly: hg19

HIBAG documentation built on March 24, 2021, 6 p.m.