alternateAlleleDetection: alternateAlleleDetection

alternateAlleleDetectionR Documentation

alternateAlleleDetection

Description

Calculate rates of detecting minor alleles given a “gold standard” dataset

Usage

## S4 method for signature 'SeqVarData,SeqVarData'
alternateAlleleDetection(gdsobj, gdsobj2,
    match.samples.on=c("subject.id", "subject.id"), verbose=TRUE)

Arguments

gdsobj

A SeqVarData object with VCF data.

gdsobj2

A SeqVarData object with VCF data to be used as the “gold standard”.

match.samples.on

A length-2 character vector indicating the column to be used for matching in each dataset's sampleData annotation

verbose

A logical indicating whether to print progress messages.

Details

Calculates the accuracy of detecting alternate alleles in one dataset (gdsobj) given a “gold standard” dataset (gdsobj2). Samples are matched using the match.samples.on argument. The first element of match.samples.on indicates the column to be used as the subject identifier for the first dataset, and the second element is the column to be used for the second dataset. Variants are matched on position and alleles using bi-allelic SNVs only. Genotype dosages are recoded to count the same allele if the reference allele in one dataset is the alternate allele in the other dataset. If a variant in one dataset matches to multiple variants in the second dataset, then only the first match will be used. If a variant is missing in either dataset for a given sample pair, that sample pair is ignored for that variant. To exclude certain variants or samples from the calculate, use seqSetFilter to set appropriate filters on each gds object.

This test is positive if an alternate allele was been detected. Results are returned on an allele level, such that:

TP, TN, FP, and FN are calculated as follows:

genoData2
aa Ra RR
aa 2TP 1TP + 1FP 2FP
genoData1 Ra 1TP + 1FN 1TN + 1TP 1TN + 1FP
RR 2FN 1FN + 1TN 2TN

where “R” indicates a reference allele and “a” indicates an alternate allele.

Value

A data frame with the following columns:

variant.id.1

variant id from the first dataset

variant.id.2

matched variant id from the second dataset

n.samples

the number of samples with non-missing data for this variant

true.pos

the number of alleles that are true positives for this variant

true.neg

the number of alleles that are true negatives for this variant

false.pos

the number of alleles that are false positives for this variant

false.neg

the number of alleles that are false negatives for this variant

Author(s)

Adrienne Stilp

See Also

SeqVarGDSClass

Examples

## Not run: 
gds1 <- seqOpen(gdsfile.1) # dataset to test, e.g. sequencing
sample1 <- data.frame(subject.id=c("a", "b", "c"), sample.id=c("A", "B", "C"), stringsAsFactors=F)
seqData1 <- SeqVarData(gds1, sampleData=sample1)

gds2 <- seqOpen(gdsfile.2) # gold standard dataset, e.g. array genotyping
sample2 <- data.frame(subject.id=c("b", "c", "d"), sample.id=c("B", "C", "D"), stringsAsFactors=F)
seqData2 <- SeqVarData(gds2, sampleData=sample2)

res <- alleleDetectionAccuracy(seqData1, seqData2)

## End(Not run)

smgogarten/SeqVarTools documentation built on July 4, 2023, 2:34 a.m.