alternateAlleleDetection | R Documentation |
Calculate rates of detecting minor alleles given a “gold standard” dataset
## S4 method for signature 'SeqVarData,SeqVarData'
alternateAlleleDetection(gdsobj, gdsobj2,
match.samples.on=c("subject.id", "subject.id"), verbose=TRUE)
gdsobj |
A |
gdsobj2 |
A |
match.samples.on |
A length-2 character vector indicating the column to be used for matching in each dataset's |
verbose |
A logical indicating whether to print progress messages. |
Calculates the accuracy of detecting alternate alleles in one dataset (gdsobj
) given a “gold standard” dataset (gdsobj2
).
Samples are matched using the match.samples.on
argument. The first element of match.samples.on
indicates the column to be used as the subject identifier for the first dataset, and the second element is the column to be used for the second dataset.
Variants are matched on position and alleles using bi-allelic SNVs only.
Genotype dosages are recoded to count the same allele if the reference allele in one dataset is the alternate allele in the other dataset.
If a variant in one dataset matches to multiple variants in the second dataset, then only the first match will be used.
If a variant is missing in either dataset for a given sample pair, that sample pair is ignored for that variant.
To exclude certain variants or samples from the calculate, use seqSetFilter
to set appropriate filters on each gds object.
This test is positive if an alternate allele was been detected. Results are returned on an allele level, such that:
TP
, TN
, FP
, and FN
are calculated as follows:
genoData2 | ||||
aa | Ra | RR | ||
aa | 2TP | 1TP + 1FP | 2FP | |
genoData1 | Ra | 1TP + 1FN | 1TN + 1TP | 1TN + 1FP |
RR | 2FN | 1FN + 1TN | 2TN | |
where “R” indicates a reference allele and “a” indicates an alternate allele.
A data frame with the following columns:
variant.id.1 |
variant id from the first dataset |
variant.id.2 |
matched variant id from the second dataset |
n.samples |
the number of samples with non-missing data for this variant |
true.pos |
the number of alleles that are true positives for this variant |
true.neg |
the number of alleles that are true negatives for this variant |
false.pos |
the number of alleles that are false positives for this variant |
false.neg |
the number of alleles that are false negatives for this variant |
Adrienne Stilp
SeqVarGDSClass
## Not run:
gds1 <- seqOpen(gdsfile.1) # dataset to test, e.g. sequencing
sample1 <- data.frame(subject.id=c("a", "b", "c"), sample.id=c("A", "B", "C"), stringsAsFactors=F)
seqData1 <- SeqVarData(gds1, sampleData=sample1)
gds2 <- seqOpen(gdsfile.2) # gold standard dataset, e.g. array genotyping
sample2 <- data.frame(subject.id=c("b", "c", "d"), sample.id=c("B", "C", "D"), stringsAsFactors=F)
seqData2 <- SeqVarData(gds2, sampleData=sample2)
res <- alleleDetectionAccuracy(seqData1, seqData2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.