hlaCompareAllele: Evaluate prediction accuracies

View source: R/DataUtilities.R

hlaCompareAlleleR Documentation

Evaluate prediction accuracies

Description

To evaluate the overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value.

Usage

hlaCompareAllele(TrueHLA, PredHLA, allele.limit=NULL, call.threshold=NaN,
    match.threshold=NaN, max.resolution="", output.individual=FALSE,
    verbose=TRUE)

Arguments

TrueHLA

an object of hlaAlleleClass, the true HLA types

PredHLA

an object of hlaAlleleClass, the predicted HLA types

allele.limit

a list of HLA alleles, the validation samples are limited to those having HLA alleles in allele.limit, or NULL for no limit. allele.limit could be character-type, hlaAttrBagClass or hlaAttrBagObj

call.threshold

the call threshold for posterior probability, i.e., call or no call is determined by whether prob >= call.threshold or not

match.threshold

the matching threshold for SNP haplotype similiarity, e.g., use 1% quantile of matching statistics of a training model

max.resolution

"2-digit", "4-digit", "6-digit", "8-digit", "allele", "protein", "2", "4", "6", "8", "full" or "": "allele" = "2-digit", "protein" = "4-digit", "full" and "" indicating no limit on resolution

output.individual

if TRUE, output accuracy for each individual

verbose

if TRUE, show information

Value

Return a list(overall, confusion, detail), or list(overall, confusion, detail, individual) if output.individual=TRUE.

overall (data.frame):

total.num.ind

the total number of individuals

crt.num.ind

the number of individuals with correct HLA types

crt.num.haplo

the number of chromosomes with correct HLA alleles

acc.ind

the proportion of individuals with correctly predicted HLA types (i.e., both of alleles are correct, the accuracy of an individual is 0 or 1.)

acc.haplo

the proportion of chromosomes with correctly predicted HLA alleles (i.e., the accuracy of an individual is 0, 0.5 or 1, since an individual has two alleles.)

call.threshold

call threshold, if it is NaN, no call threshold is executed

n.call

the number of individuals with call

call.rate

overall call rate

confusion (matrix): a confusion matrix.

detail (data.frame):

allele

HLA alleles

train.num

the number of training haplotypes

train.freq

the training haplotype frequencies

valid.num

the number of validation haplotypes

valid.freq

the validation haplotype frequencies

call.rate

the call rates for HLA alleles

accuracy

allele accuracy

sensitivity

sensitivity

specificity

specificity

ppv

positive predictive value

npv

negative predictive value

miscall

the most likely miss-called alleles

miscall.prop

the proportions of the most likely miss-called allele in all miss-called alleles

individual (data.frame):

sample.id

sample id

true.hla

the true HLA type

pred.hla

the prediction of HLA type

accuracy

accuracy, 0, 0.5, or 1

Author(s)

Xiuwen Zheng

See Also

hlaAttrBagging, predict.hlaAttrBagClass, hlaReport

Examples

# make a "hlaAlleleClass" object
hla.id <- "A"
hla <- hlaAllele(HLA_Type_Table$sample.id,
    H1 = HLA_Type_Table[, paste(hla.id, ".1", sep="")],
    H2 = HLA_Type_Table[, paste(hla.id, ".2", sep="")],
    locus=hla.id, assembly="hg19")

# divide HLA types randomly
set.seed(100)
hlatab <- hlaSplitAllele(hla, train.prop=0.5)
names(hlatab)
# "training"   "validation"
summary(hlatab$training)
summary(hlatab$validation)

# SNP predictors within the flanking region on each side
region <- 500   # kb
snpid <- hlaFlankingSNP(HapMap_CEU_Geno$snp.id, HapMap_CEU_Geno$snp.position,
    hla.id, region*1000, assembly="hg19")
length(snpid)  # 275

# training and validation genotypes
train.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    snp.sel=match(snpid, HapMap_CEU_Geno$snp.id),
    samp.sel=match(hlatab$training$value$sample.id,
    HapMap_CEU_Geno$sample.id))
test.geno <- hlaGenoSubset(HapMap_CEU_Geno,
    samp.sel=match(hlatab$validation$value$sample.id,
    HapMap_CEU_Geno$sample.id))

# train a HIBAG model
set.seed(100)
model <- hlaAttrBagging(hlatab$training, train.geno, nclassifier=4,
    verbose.detail=TRUE)
summary(model)

# validation
pred <- hlaPredict(model, test.geno)
# compare
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0))
(comp <- hlaCompareAllele(hlatab$validation, pred, allele.limit=model,
    call.threshold=0.5))

zhengxwen/HIBAG documentation built on Nov. 19, 2024, 1:01 p.m.