alternateAlleleDetection: alternateAlleleDetection
In smgogarten/SeqVarTools: Tools for variant data

alternateAlleleDetection

R Documentation

alternateAlleleDetection

Description

Calculate rates of detecting minor alleles given a “gold standard” dataset

Usage

## S4 method for signature 'SeqVarData,SeqVarData'
alternateAlleleDetection(gdsobj, gdsobj2,
    match.samples.on=c("subject.id", "subject.id"), verbose=TRUE)

Arguments

`gdsobj`	A `SeqVarData` object with VCF data.
`gdsobj2`	A `SeqVarData` object with VCF data to be used as the “gold standard”.
`match.samples.on`	A length-2 character vector indicating the column to be used for matching in each dataset's `sampleData` annotation
`verbose`	A logical indicating whether to print progress messages.

Details

Calculates the accuracy of detecting alternate alleles in one dataset (gdsobj) given a “gold standard” dataset (gdsobj2). Samples are matched using the match.samples.on argument. The first element of match.samples.on indicates the column to be used as the subject identifier for the first dataset, and the second element is the column to be used for the second dataset. Variants are matched on position and alleles using bi-allelic SNVs only. Genotype dosages are recoded to count the same allele if the reference allele in one dataset is the alternate allele in the other dataset. If a variant in one dataset matches to multiple variants in the second dataset, then only the first match will be used. If a variant is missing in either dataset for a given sample pair, that sample pair is ignored for that variant. To exclude certain variants or samples from the calculate, use seqSetFilter to set appropriate filters on each gds object.

This test is positive if an alternate allele was been detected. Results are returned on an allele level, such that:

TP, TN, FP, and FN are calculated as follows:

			genoData2
		aa	Ra	RR
	aa	2TP	1TP + 1FP	2FP
genoData1	Ra	1TP + 1FN	1TN + 1TP	1TN + 1FP
	RR	2FN	1FN + 1TN	2TN

where “R” indicates a reference allele and “a” indicates an alternate allele.

Value

A data frame with the following columns:

`variant.id.1`	variant id from the first dataset
`variant.id.2`	matched variant id from the second dataset
`n.samples`	the number of samples with non-missing data for this variant
`true.pos`	the number of alleles that are true positives for this variant
`true.neg`	the number of alleles that are true negatives for this variant
`false.pos`	the number of alleles that are false positives for this variant
`false.neg`	the number of alleles that are false negatives for this variant

Author(s)

Adrienne Stilp

Examples

## Not run: 
gds1 <- seqOpen(gdsfile.1) # dataset to test, e.g. sequencing
sample1 <- data.frame(subject.id=c("a", "b", "c"), sample.id=c("A", "B", "C"), stringsAsFactors=F)
seqData1 <- SeqVarData(gds1, sampleData=sample1)

gds2 <- seqOpen(gdsfile.2) # gold standard dataset, e.g. array genotyping
sample2 <- data.frame(subject.id=c("b", "c", "d"), sample.id=c("B", "C", "D"), stringsAsFactors=F)
seqData2 <- SeqVarData(gds2, sampleData=sample2)

res <- alleleDetectionAccuracy(seqData1, seqData2)

## End(Not run)

smgogarten/SeqVarTools documentation built on Sept. 15, 2024, 1:08 p.m.